EALTA Special Interest Group: Artificial Intelligence for Language Assessment

News
To be published

Rationale

Recent years have seen an exponential increase in the use of Artificial Intelligence (AI) for language assessment. These uses include, but are not limited to, automated scoring enabled with natural language processing, AI driven test proctoring, and most recently, item writing using generative AI. Because the field of AI is developing very fast, its uses have not been widely researched and there is no legislation in place that would regulate those uses. In the field of language testing and assessment, although there have been some studies into the use of AI (e.g., Attali et al., 2022; Khademi, 2023; Mizumoto & Eguchi, 2023; O’Sullivan et al, 2023; Román, 2023; Su, Lin, & Lai, 2023), there is still a need for a better understanding of how AI works as well as for monitoring its uses. In parallel to this understanding, it is equally important to acknowledge and navigate the uncertainties that come with such progressive technological advancements. A number of academics, appreciating the extraordinary potential of AI-powered tools, have nevertheless expressed worries over their impact on academic integrity (Barrot, 2023) and the risk of psychological dependence (Alharbi, 2023). These concerns are far from insignificant and underscore the urgent need for an ongoing dialogue regarding the responsible, ethical, and thoughtful application of AI in the realm of language assessments.

Aims

Share professional expertise in the use of AI for language testing and assessment;
Work to improve the uses of AI for language testing and assessment systems in Europe;
Make the expertise in using AI for language testing and assessment readily available;
Engage with EU regulatory bodies to produce standard frameworks for the improvement in the uses of AI for language testing and assessment within Europe;
Provide a forum for sharing and solving problems related to using AI for language testing and assessment.

Focus

We expect the SIG to be a space for all interested in AI uses for language testing and assessment, but particularly for:

Test developers
Test administrators
Score users
Educators
Researchers

Organisation

The aim of the SIG is to provide a dedicated forum for discussing and sharing knowledge and practices in the use of AI in language testing and assessment. It seeks to enhance transparency in the utilization of AI for language assessment, explore present and future developments in the field, and foster meaningful discussions on this subject. In particular, the SIG aims to engage in:

Analysing existing policies and practices concerning the implementation of AI in language testing and assessment;
Establishing good practices for valid, meaningful, and fair uses of AI for language testing and assessment;
Collecting information from key stakeholders on their expectations, needs, attitudes, and concerns regarding the use of AI for language testing and assessment;
Disseminating knowledge and sharing best practices in the use of AI for language testing and assessment;
Encouraging research into the use of AI in language testing and assessment;
Promoting a constructive dialogue with European regulatory bodies regarding the use of AI for language testing and assessment;
Producing regulatory documents, including guidelines, to assist various stakeholders, such as language testing organisations, policy makers, and score users in implementing AI responsibly for language assessment;
Increasing assessment literacy of score users regarding the use of AI in language testing.

About us


	Convenor Dr Olena Rossi, independent language testing consultant and researcher. Olena’s interests within language testing lie in the area of item writing and, in particular, in using AI for item generation. Olena used to serve as the chief of ILTA’s Graduate Student Assembly, she was also a founding member and a co-convenor of the UKALTA’s Postgraduate Researcher Network.
	Convenor Dr Sha Liu, recently completed her PhD in Language Assessment from the University of Bristol. Her research is deeply embedded in the domain of technology-assisted language learning and assessment, with a strong focus on automated writing evaluation leveraging AI algorithms and investigating learner engagement through eye-tracking. She presently serves as the Communications Officer for the British Association for Applied Linguistics Testing, Evaluation, and Assessment Special Interest Group (BAAL TEASIG) Committee.
	Content specialist Dr. Nazlınur Göktürk is a language assessment specialist working for the Board of Education at the Republic of Türkiye Ministry of National Education. She obtained her PhD in Applied Linguistic and Technology from Iowa State University. Her research interests encompass L2 oral communication assessment, spoken dialog systems, and the use of AI for language test development. Within her role at the Ministry, she oversees various project activities aimed at aligning foreign language education with the CEFR CV.
	Website manager Mr. Josep Maria Montcada, Senior Assessment Expert, is currently in charge of language test design and language evaluation at the Ministry of Education of the Generalitat de Catalunya. He coordinates the development of high-stakes certification tests for the Official Language Schools (EOI) in Catalonia in 8 different languages and at levels B1, B2, C1 and C2. He trained in language assessment at Lancaster University and his main interests in this field are test validation and item writing.
	Publication project lead Dr Stefan O’Grady, associate lecturer in academic English and TESOL at the University of St Andrews. Stefan obtained his PhD from the Centre for English Language Learning and Assessment at the University of Bedfordshire. His research interests are in language test development and validation primarily for English medium university admissions and the potential for AI to impact on the assessment process.
	Content specialist Mr. Darren Perrett, Senior Assessment Services Manager, Cambridge University Press & Assessment. Darren is in his final year of his PhD with Leeds University, which is focused on the validation of Cambridge reading texts with NLP features and training an ML algorithm for CEFR classification of previously unseen texts. His main interests are centred around Automated Item Generation (AIG) for high-stakes assessment.

The following EALTA expert members act as AI SIG advisors:

Prof. Tony Green, University of Bedfordshire
Prof. Barry O’Sullivan, British Council
Dr Veronika Timpe-Laughlin, Educational Testing Service

Upcoming events


	To be published

Past Meetings


	SIG Inaugural Meeting \| EALTA Annual Conference Belfast 2024 6th June, 2 to 5:30 pm \| Stranmillis University College (Belfast) Programme
	SIG Meeting Date: 05.03.2025 \| 13:00 to 16:00 (UK time) Programme and video recordings: Click here

SIG Inaugural Meeting | EALTA Annual Conference Belfast 2024
6th June, 2 to 5:30 pm | Stranmillis University College (Belfast)
Programme

SIG Meeting

Date: 05.03.2025 | 13:00 to 16:00 (UK time)
Programme and video recordings: Click here

Resources

Guidelines \| Fairness \| Ethics

Australian Framework for Generative Artificial Intelligence (AI) in Schools	The Ethical Framework for AI in Education \| The University of Buckingham	Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators \| The European Commission	Artificial intelligence and English language teaching: Preparing for the future \| British Council

Artificial Intelligence and the Future of Teaching and Learning \| US Department of Education	ChatGPT and artificial intelligence in higher education: quick start guide \| ETICO UNESCO’s International Institute for Education Planning	Guía sobre el uso de la inteligencia artificial en el ámbito educativo \| Spanish Department of Education

Generative AI and Item Writing

Item Writing for Language Testing \| Generative AI and Item Writing

Courses

Generative AI for beginners \| Microsoft

Digital Library

Aryadoust, V., Zakaria, A., & Jia, Y. (2024). Investigating the affordances of OpenAI’s large language model in developing listening assessments, Computers and Education: Artificial Intelligence, 6, https://doi.org/10.1016/j.caeai.2024.100204

Tags: listening, task and item generation

Attali, Y., Runge, A., LaFlair, G. T., Yancey, K., Goodwin, S., Park, Y., & von Davier, A. A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, https://doi.org/10.3389/frai.2022.903077

Tags: reading, task and item generation

Belzak, W.C.M., Naismith, B., Burstein, J. (2023). Ensuring Fairness of Human- and AI-Generated Test Items. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_108

Tags: fairness, bias, differential item functioning, task and item generation

Bezirhan, U., & von Davier, M. (2023). Automated reading passage generation with OpenAI’s large language model. Computers and Education: Artificial Intelligence, 5, 100161. https://doi.org/10.1016/j.caeai.2023.100161

Tags: text generation, reading

Bolender, B., Foster, C. & Vispoel, S. (2023). The Criticality of Implementing Principled Design When Using AI Technologies in Test Development, Language Assessment Quarterly, 20(4-5), 512-519, https://doi.org/10.1080/15434303.2023.2288266

Tags: task and item generation, test development

Choi, I. & Zu, J. (2022), The impact of using synthetically generated listening stimuli on test-taker performance: A case study with multiple-choice, single-selection items. ETS Research Report Series, 2022(1), 1–14. https://doi.org/10.1002/ets2.12347

Tags: text generation, listening

Felice, M., Taslimipoor, S & Buttery, P (2022). Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers. Association for Computational Linguistics. In Findings of the Association for Computational Linguistics. https://arxiv.org/pdf/2204.07237.pdf

Tags: AIG

Ferrara, S., & Qunbar, S. (2022). Validity arguments for AI‐based automated scores: Essay scoring as an illustration. Journal of Educational Measurement, 59(3), 288–313. https://doi.org/10.1111/jedm.12333

Tags: writing, scoring, validity

Hannah, L., Kim, H., & Jang, E. (2022) Investigating the Effects of Task Type and Linguistic Background on Accuracy in Automated Speech Recognition Systems: Implications for Use in Language Assessment of Young Learners, Language Assessment Quarterly, 19(3), 289-313, https://doi.org/10.1080/15434303.2022.2038172

Tags: scoring, speaking, young learners

Isaacs, T., Hu, R., Trenkic, D., & Varga, J. (2023). Examining the predictive validity of the Duolingo English Test: Evidence from a major UK university. Language Testing, 40(3), 748-770. https://doi.org/10.1177/02655322231158550

Tags: criterion related evidence, predictive validity

Jin, Y. & Fan, J. (2023). Test-Taker Engagement in AI Technology-Mediated Language Assessment, Language Assessment Quarterly, 20(4-5), 488-500, https://doi.org/10.1080/15434303.2023.2291731

Tags: test taker engagement, validation, test development

Khademi, A. (2023). Can ChatGPT and Bard generate aligned assessment items? A reliability analysis against human performance. Journal of Applied Learning and Teaching, 6(1), 75-80. https://journals.sfu.ca/jalt/index.php/jalt/article/view/783

Tags: writing, task and item generation

LaFlair, G., Runge, A., Attali, Y., Park, Y., Church, J., & Goodwin, S. (2023). Interactive listening–The Duolingo English Test (Duolingo Research Report DRR-23-01; pp. 1–17). Duolingo. https://go.duolingo.com/interactive-listening-whitepaper

Tags: task design, listening

Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring, Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050

Tags: scoring, writing

O’Grady, S. (2023). An AI Generated Test of Pragmatic Competence and Connected Speech. Language Teaching Research Quarterly, 37, 188-203. https://doi.org/10.32038/ltrq.2023.37.10

Tags: listening, task and item generation

O’Sullivan, B. (2023). Reflections on the Application and Validation of Technology in Language Testing, Language Assessment Quarterly, 20(4-5), 501-511, https://doi.org/10.1080/15434303.2023.2291486

Tags: validity, validation

O’Sullivan, B., Breakspear, T. & Bayliss, W. (2023). Validating an AI-driven scoring system: The Model Card approach. In K. Sadeghi & D. Douglas (Eds.), Fundamental considerations in technology mediated language assessment (pp.115-134). Routledge.

Tags: scoring

Park, Y., Cardwell, R., Goodwin, S., Naismith, B., LaFlair, G., Lo, K., & Yancey, K. (2023). Assessing speaking on the Duolingo English Test (Duolingo Research Report DRR-23-03; pp. 1-15). Duolingo. https://duolingo-testcenter.s3.amazonaws.com/media/resources/speaking-whitepaper.pdf.

Tags: task design, speaking

Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, Early view. https://doi.org/10.1111/emip.12590

Tags: item generation, reading

Shi, H., & Aryadoust, V. (2024). A systematic review of AI-based automated written feedback research. ReCALL, 1–23. https://www.cambridge.org/core/journals/recall/article/systematic-review-of-aibased-automated-written-feedback-research/28A670C4C7F2F1F30C7EA36EC489F867

Tags: feedback

Shin, I., & Gierl, M. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22(3-4), 289-311. https://doi.org/10.1080/15305058.2022.2070755

Tags: item generation, reading

Van Moere, A., & Downey, R. (2016). 21. Technology and artificial intelligence in language assessment. Handbook of second language assessment, 341-358.

Tags: scoring

Voss, E., Cushing, S., Ockey, G. & Yan, X. (2023). The Use of Assistive Technologies Including Generative AI by Test Takers in Language Assessment: A Debate of Theory and Practice, Language Assessment Quarterly, 20(4-5), 520-532, https://doi.org/10.1080/15434303.2023.2288256

Tags: construct definition, scoring and rubric design, validity, fairness, equity, bias, copyright

Xi, X. (2023). Advancing Language Assessment with AI and ML–Leaning into AI is Inevitable, but Can Theory Keep Up?, Language Assessment Quarterly, 20(4-5), 357-376, https://doi.org/10.1080/15434303.2023.2291488

Tags: validity

Xi, X. (2022). Validity and the automated scoring of performance tests. In G. Fulcher & L. Harding (Eds.), The Routledge Handbook of Language Testing (pp. 513-529). Routledge.

Tags: scoring, validity

Yunjiu, L., Wei, W., & Zheng, Y. (2022). Artificial intelligence-generated and human expert-designed vocabulary tests: A comparative study. SAGE Open, 12(1). https://doi.org/10.1177/21582440221082130

Tags: vocabulary, task and item generation

Zhao, R., Zhuang, Y., Zou, D. et al. (2023) AI-assisted automated scoring of picture-cued writing tasks for language assessment. Education and Information Technologies, 28, 7031–7063. https://doi.org/10.1007/s10639-022-11473-y

Tags: writing, scoring

Media



Current applications of Artificial Intelligence in Language Assessment Voss, E. (2024, April 8) Tags: AI, current applications, future directions

The role of AI in educational assessment: What every assessment team should know for 2024. Scharaschkin, A. (2024, February 14). Webinar delivered by AQA Global Assessment Services. Tags: scoring, test development, feedback, bias, fairness

Responsible AI standards in assessment. Burstein, J. (2024, January 19). Webinar delivered as part of the Duolingo webinar series. Tags: test development, fairness, bias

Automated scoring of writing: Comparing deep learning and feature-based approaches. Wei, J., van Moere, A., & Lattanzio, S. (2023, October 6). Webinar delivered as part of the Duolingo webinar series. Tags: scoring

Using AI for test item generation: Opportunities and challenges. Rossi, O. (2023, May 30). Webinar delivered as part of the EALTA Webinar series. Tags: test development, item generation

A new paradigm for test development. Attali, Y., LaFlair, G., & Runge, A. (2023, March 31). Webinar delivered as part of the Duolingo webinar series. Tags: listening / reading, task and item generation

Generative AI for test development. Davier, A. (2023, February 27). Talk given for the Department of Education, University of Oxford. Tags: test development, item generation

Artificial Intelligence, Neuroscience, and Validity \| Interface 1 Aryadoust, V. (2021, December 11) Tags: test validity

Contact

ealtasigai@gmail.com

Guidelines \| Fairness \| Ethics

Australian Framework for Generative Artificial Intelligence (AI) in Schools	The Ethical Framework for AI in Education \| The University of Buckingham	Ethical guidelines on the use of artificial intelligence (AI) and data in teaching and learning for educators \| The European Commission	Artificial intelligence and English language teaching: Preparing for the future \| British Council

Artificial Intelligence and the Future of Teaching and Learning \| US Department of Education	ChatGPT and artificial intelligence in higher education: quick start guide \| ETICO UNESCO’s International Institute for Education Planning	Guía sobre el uso de la inteligencia artificial en el ámbito educativo \| Spanish Department of Education

EALTA Special Interest Group: Artificial Intelligence for Language Assessment

News

Rationale

Aims

Focus

Organisation

About us

Convenor

Convenor

Content specialist

Website manager

Publication project lead

Content specialist

Upcoming events

Past Meetings

Resources

Digital Library

Media

Current applications of Artificial Intelligence in Language Assessment Voss, E. (2024, April 8)

The role of AI in educational assessment: What every assessment team should know for 2024. Scharaschkin, A. (2024, February 14). Webinar delivered by AQA Global Assessment Services.

Responsible AI standards in assessment. Burstein, J. (2024, January 19). Webinar delivered as part of the Duolingo webinar series.

Automated scoring of writing: Comparing deep learning and feature-based approaches. Wei, J., van Moere, A., & Lattanzio, S. (2023, October 6). Webinar delivered as part of the Duolingo webinar series.

Using AI for test item generation: Opportunities and challenges. Rossi, O. (2023, May 30). Webinar delivered as part of the EALTA Webinar series.

A new paradigm for test development. Attali, Y., LaFlair, G., & Runge, A. (2023, March 31). Webinar delivered as part of the Duolingo webinar series.

Generative AI for test development. Davier, A. (2023, February 27). Talk given for the Department of Education, University of Oxford.

Artificial Intelligence, Neuroscience, and Validity | Interface 1 Aryadoust, V. (2021, December 11)

Contact

Current applications of Artificial Intelligence in Language Assessment
Voss, E. (2024, April 8)

The role of AI in educational assessment: What every assessment team should know for 2024.
Scharaschkin, A. (2024, February 14). Webinar delivered by AQA Global Assessment Services.

Responsible AI standards in assessment.
Burstein, J. (2024, January 19). Webinar delivered as part of the Duolingo webinar series.

Automated scoring of writing: Comparing deep learning and feature-based approaches.
Wei, J., van Moere, A., & Lattanzio, S. (2023, October 6). Webinar delivered as part of the Duolingo webinar series.

Using AI for test item generation: Opportunities and challenges.
Rossi, O. (2023, May 30). Webinar delivered as part of the EALTA Webinar series.

A new paradigm for test development.
Attali, Y., LaFlair, G., & Runge, A. (2023, March 31). Webinar delivered as part of the Duolingo webinar series.

Generative AI for test development.
Davier, A. (2023, February 27). Talk given for the Department of Education, University of Oxford.

Artificial Intelligence, Neuroscience, and Validity | Interface 1
Aryadoust, V. (2021, December 11)