It’s back to school for Artificial Intelligence (AI)
The recent outcry around the use of AI-predicted student grades for the United Kingdom’s ‘A Levels’ highlights the difficulties in the adoption of AI in the context of social decision-making, and the importance of fairness, explainability and transparency in its utilisation.
In the UK, final year high school students would normally sit ‘A Level’ exams, which are used to obtain offers to universities. In the wake of the COVID-19 pandemic affecting the ability of high school students to undertake their final examinations in person, the UK’s assessment regulator, Ofqual, looked for a way to standardise the teacher estimated grades (referred to as centre assessment grades or CAGs), because of what it saw as an 'unprecedented' 12.5% year-on-year increase in grades of A in 2020. Ofqual’s solution was to adopt a mathematical algorithm. The results, however, have caused a national uproar.
How did the algorithm work?
At a high level, the algorithm relied on two key pieces of information: the previous exam results of schools and colleges over the last 3 years, and the ranking order of pupils based on the teacher estimated grades. The algorithm worked out a distribution of grades for students at each school based on the school’s historical data (with some other minor adjustments). The teacher rankings for each student compared with other students at the school were then used to populate the students across the algorithm-determined distribution of grades (the teacher estimated grade played no role in determining the final grade, only a student’s ranking).
What were the results?
On results day, almost 40% of A level grades were marked down by the algorithm from teachers’ predictions, with many students missing out on their conditional university places (that had been offered based on the teacher estimated grades).
Much of the criticism has been addressed at the bias and disparity in the way the algorithm calculated the results. Most notably, the downgraded results disproportionately affected state schools more than private schools.
What can we learn?
While the Ofqual algorithm relied on a simple mathematical model, its effects have important lessons for other artificial intelligence solutions. The incident serves as a reminder of the role of fairness and transparency as part of broader considerations of ‘ethics by design’ in AI systems, particularly in influencing user acceptance and uptake. These principles, similar to the UK, have been highlighted here in Australia by the Commonwealth Government as key considerations when designing and developing AI systems. They have also recently been reinforced in the NSW Government’s AI strategy, with its renewed focus on building public trust in AI systems.
1. Algorithms must be fair
Ofqual’s primary reasoning for the algorithm was to ensure a fairer system, citing previous studies which raised concerns of a potential bias in teacher assessments, and looked to ensure national results were broadly similar to previous years. In calculating results in line with previous years, the algorithm did, for the most part, what it was supposed to. However, in doing so, the model demonstrated a number of examples of algorithmic discrimination. For example:
- Latent bias (algorithmic decision-making based on historical biases or stereotypes in datasets): The algorithm's reliance on historical data meant that the ‘talented outliers’ — such as the bright child in an historically lower-achieving school, or the school that was rapidly improving — were less likely to achieve grades that reflected their current performance. No matter how high a student was performing in the current year, they could only achieve what the algorithm determined was the top mark a student at that school could achieve based on historical data. Consequently, students in lower-performing schools potentially had their grades capped by the results of previous years (while students at higher-performing schools benefited from the higher grades achieved by students in previous years).
- Socio-economic bias: The way the algorithm was applied also amplified socio-economic bias by giving more weight (or entire weight) to the teacher estimated grades (rather than the algorithm distributed grades) in circumstances where a school had smaller class sizes. This resulted in a favouring of the higher, ‘teacher-inflated’ estimated grades in schools attended by students of a higher socio-economic background (as these tend to have smaller class sizes), reinforcing existing social inequalities.
How can you mitigate the risk of bias?
As the Commonwealth Government notes, an AI system should be inclusive, accessible and not result in unfair discrimination against individuals or groups. Given AI systems rely on data, AI systems have the potential to amplify existing biases in those datasets on a much larger scale than when humans apply that bias.
In order to minimise the risk of bias in AI systems, designers and developers should:
- increase engagement with its user base, to ensure operation of the system is well understood in its objectives, accuracy and address of bias;
- identify potential or actual bias in the way an algorithm works, or in the datasets it is trained or relies on;
- implement risk mitigations or use fair representation techniques to address that bias; and
- take the time to test the operation and study the outcomes of an algorithm.
2. Algorithms must be explainable
For any AI system, transparency and explainability are crucial to fostering trust in its model. If a user can understand how and why an AI system decides or acts the way it does, the user is more likely to be accepting of it, influencing consumer uptake. The level of explainability required will depend on the type of AI system being deployed and the importance of the decision being made, and is more difficult in so-called ‘black-box’ or deep learning algorithms or where a designer or manufacture is looking to protect trade secrets in the algorithm. However, the Ofqual algorithm was a relatively straightforward mathematical model using public data sets.
Ofqual attempted to offer transparency through publishing technical documentation on the operation of the algorithm. However, data experts have been critical that it was too little, too late. The lack of consistency in the application of the algorithm (as set out above in relation to small class sizes), further highlighted transparency issues.
Further, Ofqual, along with other education institutions using algorithms to predict grades, have been criticised for having unclear appeals processes, offering a limited scope of appeal, or confusion as to how a student could challenge the grade (for instance, if a student could not understand how their grade was calculated, they would not know which component of the algorithm they should be challenging).
How can you enhance transparency and build effective AI systems?
What is clear from the Ofqual example is, the more important the consequences of the decision being made by an AI system (in this case, influencing the university courses a student is accepted to), the more significant explanation and transparency requirements should be.
To enhance effective artificial intelligence, the design and implementation of AI systems should involve:
- engaging independent experts to test the system and its algorithm, to determine the soundness of its model;
- being able to explain what information the algorithm is processing and how it does so in a manner that users can understand;
- for more complicated machine learning systems, building reporting mechanisms into the AI system to be able to audit the outputs; and
- ensuring the results can be appropriately challenged or scrutinised, with an accessible appeals process involving human review.
Ofqual has since backtracked on its decision to use the algorithm, informing students that their results will be based on the prediction from either their teachers or the algorithm — whichever is higher. The head of Ofqual has also resigned in the wake of the widespread criticism.
While the incident should not deter businesses from leveraging AI decision-making, the incident nonetheless highlights the importance of considering community expectations in the implementation of AI (that is, just because it ‘can’ be done, ‘should’ it be done), together with ensuring certain minimum ethical standards are met.
Authors: Melissa Fai, Jen Bradley and Erin Kirker