Good grades: How Canadian researchers are using AI to improve language testing
Matt A.V. Chaban
Senior Editor, Transform
A major tool in applying for jobs, schools, and immigration, language tests in Canada can be an impediment to skilled individuals. University of Calgary researchers hope to change that.
With healthcare workers in great demand, standardized tests to enter the field must be both rigorous and fair to candidates.
In the course of his work aimed at improving such tests, Dr. Gregory Tweedie, an associate professor at the Werklund School of Education at the University of Calgary, has encountered many nurses and other high-skill workers who are exemplars in their field but can stumble when it comes to filling in test bubbles.
Tweedie often gives a specific example from his research in Applied Linguistics. He describes the case of a man he calls Guy, an internationally-educated nurse who wanted to come to Canada to practice medicine. Canada relies on several standardized tests, including the International English Language Testing System (IELTS), to test immigrants’ language proficiency. One test typically costs around $300 Canadian to take the exam and has oral and written components on randomly-assigned topics.
“Guy took the test nine times without passing,” Tweedie said in an interview, “and he never got any feedback about what he was getting wrong. Now, no one disputes that if he’s going to function as a nurse in English, then his English language ability has to be assessed. But the test needs to be relevant to his medical work.”
Guy’s story, and others like it, made Tweedie question some of the practices of traditional international tests to assess language ability.
Tests like IELTS and the Test of English as a Foreign Language (TOEFL) generate billion dollar industries, including prep courses and exam fees. But do they accurately assess language abilities? Tweedie wanted to find out. By identifying key predictors of language test scores, he wanted to make language tests more equitable and accessible.
It’s an issue many organizations are wrestling today, as the competition for talent is fierce and the labor pool remains tight. Tweedie’s research is an important part of helping to make all kinds of skills tests more accurate and reliable for the enterprises, governments, and educators using them.
Predicting language test scores with 90% accuracy
Tweedie’s first studies showed that test scores varied greatly from day to day or week to week, which undermined their validity as a standardized tool.
Looking for a better way, Tweedie and his graduate students decided to try using machine learning to predict scores. They applied for Google Research Credits to run the tests on VertexAI, Google Cloud's open and integrated AI platform with tools for training, tuning, and deploying ML models.
With VertexAI they were able to predict student scores with 90% accuracy — for factors that had little direct relationship with language skills.
“Nine out of 10 times we could predict student scores with machine learning through specific demographics — for me this is a social justice issue,” Tweedie said. “These tests are not educational. By and large, they are doing little to improve anyone’s language ability or even demonstrating their ability to function in a future role. In their current state, they are more like hoops for candidates to jump through.”
Tweedie and his team first addressed this problem by developing an app called TestPredikt, which lets students enter data about themselves to generate their predicted score on multiple international standardized language tests. That helps students decide whether to take it now or do more preparation.
The team has other benefits in mind, too: They want to add feedback so test-takers know their strengths and weaknesses, as well as resources for improvement, like human or AI tutors. For now, they are focussing on Canadian languages and tests, but they would like to gather more data and expand internationally.
At first, Tweedie was intimidated by the idea of working with AI, but the team had lots of help from Google’s engineers, as well as access to computing resources they otherwise wouldn’t through their research credits.
“I was wondering how we could figure this out with old school linear regression models,” Tweedie explained. “Vertex opened a new world for me. I don't think I will ever understand the deep mathematical theory underlying ML, and I guess I don't need to. I can still do whatever I need to do in VertexAI.”
The work has been so impressive and impactful, Tweedie now teaches his own students how to use ML in linguistics research.
Expanding equity and access for international test-takers
In March 2022, Tweedie joined the Google Cloud Research Innovators program to collaborate with other researchers using Google’s cloud technology tools to solve real world problems in their fields. He’s hopeful about the change research like his can bring.
“For Guy, instead of taking the test nine times, he could take it once and find out where he’s likely to do well,” Tweedie said. That’s a much better outcome for him and for society.”
Such tests are especially important in Canada, which has high levels of immigration and is officially a bilingual society. Ensuring workers will be proficient at their jobs is a delicate balance of testing knowledge and demonstrating skills in a fair and accurate manner.
“There's a heavy emphasis on these tests — for immigration, for employment, for admission” Tweedie said. “If they aren’t predictive of success for work or study, this represents a significant and unnecessary barrier. Socially just principles of educational measurement require that individuals have both the right to benefit from assessments of their ability that directly involve them, and the right to demand objectivity in those measurements.”
With the help of AI and cloud, those assessments can get better at ensuring meaningful measurements for all parties involved.
To find out how you can get started with generative AI for higher education, download the new 10-step public sector guide. With domain-specific use cases and customer stories from the city of Memphis, the state of Minnesota, the U.S. Department of Defense, and more, it offers a comprehensive guide to kickstart your gen AI journey.