EALTA Special Interest Group: Artificial Intelligence for Language Assessment
News |
---|
SIG Meeting
|
Rationale
Recent years have seen an exponential increase in the use of Artificial Intelligence (AI) for language assessment. These uses include, but are not limited to, automated scoring enabled with natural language processing, AI driven test proctoring, and most recently, item writing using generative AI. Because the field of AI is developing very fast, its uses have not been widely researched and there is no legislation in place that would regulate those uses. In the field of language testing and assessment, although there have been some studies into the use of AI (e.g., Attali et al., 2022; Khademi, 2023; Mizumoto & Eguchi, 2023; O’Sullivan et al, 2023; Román, 2023; Su, Lin, & Lai, 2023), there is still a need for a better understanding of how AI works as well as for monitoring its uses. In parallel to this understanding, it is equally important to acknowledge and navigate the uncertainties that come with such progressive technological advancements. A number of academics, appreciating the extraordinary potential of AI-powered tools, have nevertheless expressed worries over their impact on academic integrity (Barrot, 2023) and the risk of psychological dependence (Alharbi, 2023). These concerns are far from insignificant and underscore the urgent need for an ongoing dialogue regarding the responsible, ethical, and thoughtful application of AI in the realm of language assessments.
Aims
- Share professional expertise in the use of AI for language testing and assessment;
- Work to improve the uses of AI for language testing and assessment systems in Europe;
- Make the expertise in using AI for language testing and assessment readily available;
- Engage with EU regulatory bodies to produce standard frameworks for the improvement in the uses of AI for language testing and assessment within Europe;
- Provide a forum for sharing and solving problems related to using AI for language testing and assessment.
Focus
We expect the SIG to be a space for all interested in AI uses for language testing and assessment, but particularly for:
- Test developers
- Test administrators
- Score users
- Educators
- Researchers
Organisation
The aim of the SIG is to provide a dedicated forum for discussing and sharing knowledge and practices in the use of AI in language testing and assessment. It seeks to enhance transparency in the utilization of AI for language assessment, explore present and future developments in the field, and foster meaningful discussions on this subject. In particular, the SIG aims to engage in:
- Analysing existing policies and practices concerning the implementation of AI in language testing and assessment;
- Establishing good practices for valid, meaningful, and fair uses of AI for language testing and assessment;
- Collecting information from key stakeholders on their expectations, needs, attitudes, and concerns regarding the use of AI for language testing and assessment;
- Disseminating knowledge and sharing best practices in the use of AI for language testing and assessment;
- Encouraging research into the use of AI in language testing and assessment;
- Promoting a constructive dialogue with European regulatory bodies regarding the use of AI for language testing and assessment;
- Producing regulatory documents, including guidelines, to assist various stakeholders, such as language testing organisations, policy makers, and score users in implementing AI responsibly for language assessment;
- Increasing assessment literacy of score users regarding the use of AI in language testing.
About us
![]() |
ConvenorDr Olena Rossi, independent language testing consultant and researcher. Olena’s interests within language testing lie in the area of item writing and, in particular, in using AI for item generation. Olena used to serve as the chief of ILTA’s Graduate Student Assembly, she was also a founding member and a co-convenor of the UKALTA’s Postgraduate Researcher Network. |
![]() |
ConvenorDr Sha Liu, recently completed her PhD in Language Assessment from the University of Bristol. Her research is deeply embedded in the domain of technology-assisted language learning and assessment, with a strong focus on automated writing evaluation leveraging AI algorithms and investigating learner engagement through eye-tracking. She presently serves as the Communications Officer for the British Association for Applied Linguistics Testing, Evaluation, and Assessment Special Interest Group (BAAL TEASIG) Committee. |
![]() |
Content specialistDr. Nazlınur Göktürk is a language assessment specialist working for the Board of Education at the Republic of Türkiye Ministry of National Education. She obtained her PhD in Applied Linguistic and Technology from Iowa State University. Her research interests encompass L2 oral communication assessment, spoken dialog systems, and the use of AI for language test development. Within her role at the Ministry, she oversees various project activities aimed at aligning foreign language education with the CEFR CV. |
![]() |
Website managerMr. Josep Maria Montcada, Senior Assessment Expert, is currently in charge of language test design and language evaluation at the Ministry of Education of the Generalitat de Catalunya. He coordinates the development of high-stakes certification tests for the Official Language Schools (EOI) in Catalonia in 8 different languages and at levels B1, B2, C1 and C2. He trained in language assessment at Lancaster University and his main interests in this field are test validation and item writing. |
![]() |
Publication project leadDr Stefan O’Grady, associate lecturer in academic English and TESOL at the University of St Andrews. Stefan obtained his PhD from the Centre for English Language Learning and Assessment at the University of Bedfordshire. His research interests are in language test development and validation primarily for English medium university admissions and the potential for AI to impact on the assessment process. |
![]() |
Content specialistMr. Darren Perrett, Senior Assessment Services Manager, Cambridge University Press & Assessment. Darren is in his final year of his PhD with Leeds University, which is focused on the validation of Cambridge reading texts with NLP features and training an ML algorithm for CEFR classification of previously unseen texts. His main interests are centred around Automated Item Generation (AIG) for high-stakes assessment.
|
The following EALTA expert members act as AI SIG advisors:
- Prof. Tony Green, University of Bedfordshire
- Prof. Barry O’Sullivan, British Council
- Dr Veronika Timpe-Laughlin, Educational Testing Service
Upcoming events
|
SIG Meeting
|
Past Meetings
![]() |
SIG Inaugural Meeting | EALTA Annual Conference Belfast 2024 6th June, 2 to 5:30 pm | Stranmillis University College (Belfast) Programme |
Resources
Generative AI and Item Writing
|
|||
---|---|---|---|
![]() |
|||
Item Writing for Language Testing | Generative AI and Item Writing |
Courses | |||
---|---|---|---|
![]() |
|||
Generative AI for beginners | Microsoft |
Digital Library
-
Aryadoust, V., Zakaria, A., & Jia, Y. (2024). Investigating the affordances of OpenAI’s large language model in developing listening assessments, Computers and Education: Artificial Intelligence, 6, https://doi.org/10.1016/j.caeai.2024.100204
Tags: listening, task and item generation
-
Attali, Y., Runge, A., LaFlair, G. T., Yancey, K., Goodwin, S., Park, Y., & von Davier, A. A. (2022). The interactive reading task: Transformer-based automatic item generation. Frontiers in Artificial Intelligence, 5, https://doi.org/10.3389/frai.2022.903077
Tags: reading, task and item generation
-
Belzak, W.C.M., Naismith, B., Burstein, J. (2023). Ensuring Fairness of Human- and AI-Generated Test Items. In: Wang, N., Rebolledo-Mendez, G., Dimitrova, V., Matsuda, N., Santos, O.C. (eds) Artificial Intelligence in Education. Posters and Late Breaking Results, Workshops and Tutorials, Industry and Innovation Tracks, Practitioners, Doctoral Consortium and Blue Sky. AIED 2023. Communications in Computer and Information Science, 1831. Springer, Cham. https://doi.org/10.1007/978-3-031-36336-8_108
Tags: fairness, bias, differential item functioning, task and item generation
-
Bezirhan, U., & von Davier, M. (2023). Automated reading passage generation with OpenAI’s large language model. Computers and Education: Artificial Intelligence, 5, 100161. https://doi.org/10.1016/j.caeai.2023.100161
Tags: text generation, reading
-
Bolender, B., Foster, C. & Vispoel, S. (2023). The Criticality of Implementing Principled Design When Using AI Technologies in Test Development, Language Assessment Quarterly, 20(4-5), 512-519, https://doi.org/10.1080/15434303.2023.2288266
Tags: task and item generation, test development
-
Choi, I. & Zu, J. (2022), The impact of using synthetically generated listening stimuli on test-taker performance: A case study with multiple-choice, single-selection items. ETS Research Report Series, 2022(1), 1–14. https://doi.org/10.1002/ets2.12347
Tags: text generation, listening
-
Felice, M., Taslimipoor, S & Buttery, P (2022). Constructing Open Cloze Tests Using Generation and Discrimination Capabilities of Transformers. Association for Computational Linguistics. In Findings of the Association for Computational Linguistics. https://arxiv.org/pdf/2204.07237.pdf
Tags: AIG
-
Ferrara, S., & Qunbar, S. (2022). Validity arguments for AI‐based automated scores: Essay scoring as an illustration. Journal of Educational Measurement, 59(3), 288–313. https://doi.org/10.1111/jedm.12333
Tags: writing, scoring, validity
-
Hannah, L., Kim, H., & Jang, E. (2022) Investigating the Effects of Task Type and Linguistic Background on Accuracy in Automated Speech Recognition Systems: Implications for Use in Language Assessment of Young Learners, Language Assessment Quarterly, 19(3), 289-313, https://doi.org/10.1080/15434303.2022.2038172
Tags: scoring, speaking, young learners
-
Isaacs, T., Hu, R., Trenkic, D., & Varga, J. (2023). Examining the predictive validity of the Duolingo English Test: Evidence from a major UK university. Language Testing, 40(3), 748-770. https://doi.org/10.1177/02655322231158550
Tags: criterion related evidence, predictive validity
-
Jin, Y. & Fan, J. (2023). Test-Taker Engagement in AI Technology-Mediated Language Assessment, Language Assessment Quarterly, 20(4-5), 488-500, https://doi.org/10.1080/15434303.2023.2291731
Tags: test taker engagement, validation, test development
-
Khademi, A. (2023). Can ChatGPT and Bard generate aligned assessment items? A reliability analysis against human performance. Journal of Applied Learning and Teaching, 6(1), 75-80. https://journals.sfu.ca/jalt/index.php/jalt/article/view/783
Tags: writing, task and item generation
-
LaFlair, G., Runge, A., Attali, Y., Park, Y., Church, J., & Goodwin, S. (2023). Interactive listening–The Duolingo English Test (Duolingo Research Report DRR-23-01; pp. 1–17). Duolingo. https://go.duolingo.com/interactive-listening-whitepaper
Tags: task design, listening
-
Mizumoto, A., & Eguchi, M. (2023). Exploring the potential of using an AI language model for automated essay scoring, Research Methods in Applied Linguistics, 2(2), 100050. https://doi.org/10.1016/j.rmal.2023.100050
Tags: scoring, writing
-
O’Grady, S. (2023). An AI Generated Test of Pragmatic Competence and Connected Speech. Language Teaching Research Quarterly, 37, 188-203. https://doi.org/10.32038/ltrq.2023.37.10
Tags: listening, task and item generation
-
O’Sullivan, B. (2023). Reflections on the Application and Validation of Technology in Language Testing, Language Assessment Quarterly, 20(4-5), 501-511, https://doi.org/10.1080/15434303.2023.2291486
Tags: validity, validation
-
O’Sullivan, B., Breakspear, T. & Bayliss, W. (2023). Validating an AI-driven scoring system: The Model Card approach. In K. Sadeghi & D. Douglas (Eds.), Fundamental considerations in technology mediated language assessment (pp.115-134). Routledge.
Tags: scoring
-
Park, Y., Cardwell, R., Goodwin, S., Naismith, B., LaFlair, G., Lo, K., & Yancey, K. (2023). Assessing speaking on the Duolingo English Test (Duolingo Research Report DRR-23-03; pp. 1-15). Duolingo. https://duolingo-testcenter.s3.amazonaws.com/media/resources/speaking-whitepaper.pdf.
Tags: task design, speaking
-
Sayin, A., & Gierl, M. (2024). Using OpenAI GPT to generate reading comprehension items. Educational Measurement: Issues and Practice, Early view. https://doi.org/10.1111/emip.12590
Tags: item generation, reading
-
Shi, H., & Aryadoust, V. (2024). A systematic review of AI-based automated written feedback research. ReCALL, 1–23. https://www.cambridge.org/core/journals/recall/article/systematic-review-of-aibased-automated-written-feedback-research/28A670C4C7F2F1F30C7EA36EC489F867
Tags: feedback
-
Shin, I., & Gierl, M. (2022). Generating reading comprehension items using automated processes. International Journal of Testing, 22(3-4), 289-311. https://doi.org/10.1080/15305058.2022.2070755
Tags: item generation, reading
-
Van Moere, A., & Downey, R. (2016). 21. Technology and artificial intelligence in language assessment. Handbook of second language assessment, 341-358.
Tags: scoring
-
Voss, E., Cushing, S., Ockey, G. & Yan, X. (2023). The Use of Assistive Technologies Including Generative AI by Test Takers in Language Assessment: A Debate of Theory and Practice, Language Assessment Quarterly, 20(4-5), 520-532, https://doi.org/10.1080/15434303.2023.2288256
Tags: construct definition, scoring and rubric design, validity, fairness, equity, bias, copyright
-
Xi, X. (2023). Advancing Language Assessment with AI and ML–Leaning into AI is Inevitable, but Can Theory Keep Up?, Language Assessment Quarterly, 20(4-5), 357-376, https://doi.org/10.1080/15434303.2023.2291488
Tags: validity
-
Xi, X. (2022). Validity and the automated scoring of performance tests. In G. Fulcher & L. Harding (Eds.), The Routledge Handbook of Language Testing (pp. 513-529). Routledge.
Tags: scoring, validity
-
Yunjiu, L., Wei, W., & Zheng, Y. (2022). Artificial intelligence-generated and human expert-designed vocabulary tests: A comparative study. SAGE Open, 12(1). https://doi.org/10.1177/21582440221082130
Tags: vocabulary, task and item generation
-
Zhao, R., Zhuang, Y., Zou, D. et al. (2023) AI-assisted automated scoring of picture-cued writing tasks for language assessment. Education and Information Technologies, 28, 7031–7063. https://doi.org/10.1007/s10639-022-11473-y
Tags: writing, scoring
Media
Current applications of Artificial Intelligence in Language Assessment
|
The role of AI in educational assessment: What every assessment team should know for 2024.
|
Responsible AI standards in assessment.
|
Automated scoring of writing: Comparing deep learning and feature-based approaches.
|
Using AI for test item generation: Opportunities and challenges.
|
A new paradigm for test development.
|
Generative AI for test development.
|
Artificial Intelligence, Neuroscience, and Validity | Interface 1
|
Contact