Just one of the many things the pandemic disrupted last year were exams. In the UK, the A-level exams are make-or-break moments for many students, their university admissions conditional on the results. They are the culmination of a student’s high school career, influencing their trajectory for years ahead. As most students will confirm, the certainty of an exam’s outcome is unclear until that final moment to prove oneself passes and the results arrive. It is a chance to make up for missed assignments and lazy days in that final important test; or, also, fail to prove their consistent hard work. But it is in their hands, at least until last year. Instead the exams were postponed indefinitely due to the pandemic and their results were delegated to an algorithm.
The UK government’s department that regulates the exams in England, Ofqual, picked an algorithm that downgraded nearly 40% of students’ marks. It robbed them of their last chance to prove themselves in that academic year, along with, for some, their spots in university. Hit particularly hard were students from disadvantaged areas, likely the victims of bias embedded in the data used for the algorithm that reflected the historical academic divide between the rich and poor. They were robbed of their ability to buck the trend through their efforts in the final stretch of the school year. By definition those up and coming students were outliers, and instead they were forced into neat and automated distributions and rankings that ignored their individual effort.
Much of modern life has been shaped to fit rigid machines. We have largely improved the conditions of humanity with our use of machines, but we also pay a price in some of the ways we adapt our lives to suit the machines. It is not by the will of the machines. It is by the limit of our own engineering, but more so by the makers and owners of the machines who use the machines as an abstraction of their power. They have few, if any, checks from others.
Why hadn’t an obviously bad choice been caught before being applied? Indeed, the administrator, Gavin Williamson, was warned, very publicly, of the precise mistakes relying on the algorithm would make. One student’s parent even predicted the precise error rate prior to its deployment. That administrator’s mistakes are at least indicative of negligent overreliance on algorithms, or worse, a scapegoating of the algorithm to perpetuate unfair bias.
Two months after the A-levels fiasco, just before the holiday break in the US, leaders at Stanford Hospital also faced an angry crowd after they allowed an automated system to assign priority for covid-19 vaccines to administrators and doctors working from home ahead of frontline hospital staff. In both the UK and at the Stanford hospital, the most documented chant was “fuck the algorithm”–a phrase we’ll likely hear more and more of.
As I said in Part I: The only counter-balance is government intervention, influenced by civic action. That piece reviewed The AI Act (AIA) out of Europe, a significant step by government. As a quick reminder: the AIA is the most significant regulation of AI in the world to date. Its standards will be made precise over the next 3-4 years, and will likely set the bar throughout most of the world via the “Brussels Effect”. For AI, it may feel early; for digital tech, quite late. Either way, it is certainly the result of consistent and persistent calls by civil society to regulate as digital technology becomes more powerful.
Protesting more than just algorithms
The problems are not purely technical, they go to our very agency as humans against the fallibility of decision makers and the systems we live in. Many of the students in the UK who took to the streets in protest weren’t even downgraded. Students fail crucial tests every year; they weren’t protesting bad marks–they were protesting their agency to prove themselves.
This lack of agency has built up over time. While we have achieved a great deal of new economic and political freedom through information technology, at the same time we have embedded an assumption (or goal) of user passivity. We have created many products that function as “competitive cognitive artefacts”, as dubbed by David Krakaeur, President of the Santa Fe Institute. These are tools that supplement our thinking, rather than support and expand it. It’s the difference between GPS improving or worsening our ability to navigate a map. In the name of “user friendliness”, we have limited potential choice and personalization in order to obscure and protect messy, rigid, and fragile back ends.
AI’s ability to learn patterns from huge amounts of both structured and unstructured information brings a lot of new flexibility to software. It presents a great opportunity to navigate and make sense of all the information available to us, and personalize that synthesis to our needs. However, there’s a real danger in automating too much of our own critical thought through the design of AI tools, and that once we begin to trust we over-trust, losing sense of our values and purposes. We would be ceding more of our agency to the machine’s owners.
With the added flexibility, there is additional opacity of how the machines precisely work that is also problematic. When we are presented with a supposedly omniscient and omnipotent algorithm whose decisions cannot be understood, we have few footholds on which to stand up and push back.
The handful of controls we do have are indeed thanks to government requirements and public pressure. However, those controls suffer from the “user friendly” design principles as to be hidden several layers down in the user settings, and still only marginally change the experience: would you like to see this ad or another one?
Our overall technology landscape has implicitly been reducing individual agency in favor of extracting value from stakeholders. We are nudged by platforms and monitored by sensors with few means for adjusting our experience. The choice is to play the game or opt-out entirely.
AI is the opportunity to flip the script
Paradoxically, I think this moment with AI is an opportunity to flip the script, to fit machines to us and to reclaim our human agency. AI is not omnipotent or omniscient, but it is also not a rigid machine. AI can learn and adapt to far more nuance–if given the right context. With nuanced objectives, AI can serve super specialized needs. The explosion of software capabilities thanks to deep learning could lead to an even greater explosion of applications and digital experiences that fit the lives we want to live. There are still labyrinthian back-ends for engineers to contend with, but there isn’t the same fragile justification for limiting customization by end users.
Right now those objectives are largely controlled and shaped by very few people, which limits how effective AI can be. It is missing localized, personal contexts that could enable it to achieve unbelievably more than we already do with software. This promised potential of this new generation of software (and even the previous one) depends on a dynamic collaboration with users. As of now, the users have little-to-no agency in that collaboration.
Already, we are programming our digital experiences with our behavior. If that kind of programming could be made explicit and transparent, we could see a real democratization of software capabilities. More than just a nice term, democratization could enable the fair and just adaptation of software to local contexts in a way rigid, centrally controlled systems never could.
The risks emanating from AI systems can also be seen as an opportunity. When AI picks up on bias in a data set, we have to ask where the bias came from. In a pragmatic light, we can see that it is merely picking up and making visible the bias in our society’s systems. Choosing to carelessly automate those systems is a huge problem and should be prevented, but we shouldn’t miss the opportunity to dive into these deeper problems when made more visible. To solve the technological risks requires addressing the pre-existing societal problems the tech amplifies. Again, making that change at the scale of AI requires personalized and contextual understanding.
These opportunities, fixing deeper societal problems and applying tech democratically, can only be achieved by regulation, technology, and citizen engagement that is informed by the tech’s dynamics and universal human rights and values. The new AIA regulation from Europe is a chance to achieve this; with it, we can start to rethink and repair our relationship with technology. But we need civil society armed with knowledge of the leverage points in AI technology. They will be needed to keep the pressure on for real and practical change that doesn’t end up throwing out tools that can do a lot of good. More so, they will be needed to build the right systems that understand local contexts and needs. If we play our cards right, we can fix much more than just our relationship with technology.
This is a really hard problem. It is made harder by the riddle of how we can have agency with machines whose decisions we cannot understand, or how bad actors can be kept from abusing the tech. It is unsurprising to hear the growing calls to burn it all down, followed by the majority compromising in favor of convenience over rights. The public first needs to be won over with a showing that good governance is possible; that we can prevent gross harms from AI and even make better tech applications. Let’s look at how we can at least catch and punish bad design of mainstream1 algorithmic systems with the AIA using the UK A-levels example.
Can regulation actually catch bad algorithms?
In the court of law, it is challenging to prove intent. We cannot be inside the head of someone to know exactly what they were thinking. We can try and infer from circumstances and behaviors, but cognitive dissonance would say that even they cannot be sure of what exactly drove them. With AI, there can be no intent to prove. We can only pragmatically look at inputs and outcomes, considering the system the algorithm is within to scrutinize all the possible influences, in order to root out the problem.
The basics of a learning algorithm is that it learns patterns from data to make predictions. The algorithm may also be linked to a decision making system to achieve a larger outcome. As humans, we determine the objectives of the system, which then dictate whether our data of choice is fit for purpose and if the overall system achieves the outcome we want. Not only can the design of the system come with many implicit assumptions that may affect the outcome, algorithms are “designed” by the data they are fed which may also come with biases baked in. When those systems are also actively learning from their use, data from its living environment, it can be very hard to account for all of the inputs.
What the AIA regulation does is it imposes objectives for certain types of systems, so that there can be some universal principles for determining a bad AI system. It explicitly calls out exam scoring as a “high-risk” application that must be subject to strict obligations. Ofqual failed those obligations miserably.
The algorithm in the UK was not a sophisticated one; in fact it was not even a learning algorithm and was more specifically a set of rules to fit a statistical distribution. Data from the last 3 years of cohorts and the schools were used to create a distribution of grades, like what you would see on a bell curve. Teachers were asked to give suggested grades for each student, along with a rank order of the whole class without allowing for any ties. The rankings were then used to fit the grades into the bell curve, adjusting student’s grades as needed. The simplicity of the algorithm makes it easier to analyze, but a more sophisticated algorithm could just as easily be swapped in.
The 7 principles of trustworthy AI make up the objectives imposed by the AIA: Human Agency and Oversight, Technical Robustness and Safety; Privacy and Data Governance; Transparency; Diversity, Non-discrimination and Fairness; Environmental and Societal Well-being; and Accountability. These principles themselves were derived from technical principles and human rights, and the obligations below are all designed to ensure those principles make their way into the tech.
High quality of the datasets to minimise risks and discriminatory outcomes.
The data used was not very fit for purpose. Students were subjected to the performance of their peers in previous cohorts. In just three years, schools can make a turnaround in performance, but these students were forced to a “flat” year of performance. The data ignored individual achievements, including their teachers’ own predictions of performance. With no ties allowed in the rankings, teachers frequently had to make arbitrary choices of who was “better”, opening the door for bias and unfair outcomes. The use of the bell curve itself forced discriminatory outcomes.
High level of robustness, security and accuracy.
Rankings by teachers had not been recorded before, and so therefore their inputs couldn’t be validated historically, nor could the broader system to determine if the outcomes were reasonable. With such a low fidelity of data, and lack of historical precedence, there could not be a robust test of accuracy.
Clear and adequate information to the user.
As with most systems, there are many classes of user. Ofqual never published the details of the algorithm until after the results were public. That means teachers were not informed as to how their rankings would be used, and students, schools, and society as a whole were left in the dark for how this new system would work. The outcome is that many of the results were tossed by universities when it became clear the system was flawed, not to mention the heavy protests.
Detailed documentation of the system and its purpose for authorities to assess its compliance.
It’s not clear if administrators and policy makers were informed ahead of time either. A lack of an apparent appeal mechanism and clear discriminatory impact potentially even broke UK law. Ignoring individualities outside of the statistical distribution also failed Ofqual’s own mandate of providing a grading system that gives a “reliable indication of the knowledge, skills and understanding of the student.”
Adequate risk assessment and mitigation systems.
Interestingly, the Royal Statistical Society (RSS) offered to support Ofqual in developing a fair algorithm. However, they withdrew the offer when required to sign an NDA that would keep the system a secret. The RSS was careful to call this out publicly and later noted Ofqual’s missed opportunities to account for different types of uncertainty, like how to rank middle-ground students or address the variability within the 3-year history. As for mitigation, that was only an afterthought in response to public outrage.
Appropriate human oversight measures to minimise risk.
About 10 days before the Ofqual fiasco, a similar algorithmic approach had been applied in Scotland by its own education authority. The results were clear, the algorithm would downgrade a large portion of students, to public outrage. England announced they would allow for schools (not students) to appeal certain results. Beyond that, however, nothing else appears to have been done to address the actual results of the algorithm and prevent further outrage.
Logging of activity to ensure traceability of results?
At least in this case, the algorithm was unsophisticated (to a fault), making it easy to identify where and how things went wrong. These requirements are useful and are robust enough to evolve with more sophisticated algorithms, but this last requirement becomes more crucial as systems become more embedded at scale and learn from a much greater number of inputs. Ideally, the other requirements prevent a flawed system from even being deployed so that logging activity is not the only safeguard.
These obligations serve more to prevent the 7 principles from being violated. In short, it is not actually about “bad algorithms”, rather it is about irresponsible uses of them. The math itself wasn’t necessarily bad, the UK algorithm was just part of a very badly designed system and there were other tools, human and computational, that could have been used to achieve a better outcome better fit to society’s objectives. To actively support and promote the principles requires a more proactive approach.
Flipping the script
I’ll start by saying I am not an education expert, and the real opportunities will be best identified by those well versed in the space. My aim here is to show how we might take advantage of the dynamics of the technology to fit it to society, and use its risks as an opportunity to ask deeper questions of our institutions and social contract. This is what I mean by flipping the script.
Deploying systems like the UK algorithm is not only getting easier, but the systems themselves are able to incorporate more data and address more complexity. On its face this can seem like a good thing, more accurate algorithms means fewer problems. But if we maintain a lack of critical thought, we are going to only further entrench our society’s ills.
The opportunities of fitting tech to society and questioning the problems in our society are linked and constrain each other. We should take the adaptability of these tools as a means to open up our systems to more input by more stakeholders. Most of the systems we use were designed with compromises between our technical capabilities and goals. Exams are used in algorithmic systems to assign people to roles and opportunities and for social planning that almost by definition remove agency. This isn’t all bad, as the paradox of choice is real and it’s important to have mechanisms for narrowing down options to support social cohesion at scale. The compromise has been that at the social planning level decision makers are not able to incorporate the additional context of other qualities like course grades, recommendation letters, extracurriculars and personal statements in order to give a nuanced picture at an individual level. While AI software could start to incorporate those kinds of information, that doesn’t mean we should simply plug it into the algorithmic system designed from old limits.
Another possibility is to use the tools as a starting point, to model our current systems but stop short of automating them. Much of the criticism levelled at algorithms is for their amplification of bias, but if they are cut off from automated decisions they can become ways of surfacing issues that have been hard to show clearly.
Exams during a pandemic was an unprecedented situation without easy solutions. The result itself serves as a model for the opacity of decision making in a government department. Increasing involvement in the process of determining new solutions could have gone a long way in preventing a fall out. An open process would have created engaged and bought-in students receiving (and believing in) fairer results. They also would have received a learning experience for how to participate and advocate for themselves, especially in an algorithmically modified world.2 Engaging teachers and institutions like the RSS in an open process is how we will be able to use this opportunity to fix the problems that go deeper than the tech, without throwing out useful options along the way.
With aligned goals and expert input, we might be able to make better use of the tools available. Both administrators and teachers should want to make better decisions with good data and greater nuance.
Fixing the deeper problems
These conversations can be before or after the harm comes. Ideally it happens before, but human nature will be that it often won’t. There are important red lines drawn, like for autonomous weapons and manipulation of young people, but the new governance standards in development won’t be perfect from the outset. What is important is that we are ready to learn the lessons surfaced by these moments to call out irresponsible uses sooner and to use the opportunity of willingness to make deeper fixes. This experience should further open up discussion about where we have agency, and whether automated decisions like this should even be legal. Gavin Williamson seems to have at least figured out not to repeat the algorithmic decision making again this summer, in favor of teacher assessments, teacher training to make those assessments more fair, and a “robust” appeals process. Still, a lot of the problems remain under stress from covid.
My hope is that we don’t completely throw out algorithmic tools. Their misuse is killing optimism. What is missing is an orientation around human-machine collaboration as opposed to automation. A combination of diverse models, strong governance guidelines, and engaged debate could achieve much more. The key ingredient is human agency, and the deeper problem to fix is the reduction of agency we have designed into our machines.
The governance framework is coming into place to put up guardrails. The last player to get on board is the one that has been really holding back the power of tech: industry itself. Continued in Part III….
- Defending against bad actors requires its own piece, but I like to think that looking at the mainstream is a good chunk of the problem. It’s like the difference between looking at the nations that tolerate the worst polluting manufacturing processes vs. the nations stimulating all the demand for what’s being manufactured.
- These discussions are happening now with the AI Act, but getting the public’s involvement in this is crucial to making that deeper fix. If you’re interested, The National Institute of Standards and Technology (NIST), part of the U.S. Department of Commerce, is looking for public comment, and in Canada you can can apply to the International Standards Organization’s group on AI for Canada here.