debiasSING: finding and fixing biases in the human–AI complex
|
Name of Recipient |
Prof Lorenz Goette
|
|
Project Title |
debiasSING: finding and fixing biases in the human-AI complex |
|
Project Status |
Ongoing |
|
Year Awarded |
2025 |
|
Type of Grant |
Social Science & Humanities Research Thematic Grant |
|
Funding Type |
B |
This project aims to address biases in AI, and its interaction with potentially biased human users, constituting an urgent and dynamic policy challenge.
This project is structured in three work packages to conceptualise, measure, and counteract these biases.
-
In the concepts work package, the team will develop the theoretical backdrop to identify potential biases in human-AI collaborations by examining two broad categories of biases, cognitive bias and moral bias. The team will define the relevant biases, chart out possible sources of bias, and develop experimental designs to identify and mitigate biases in AI and in humans.
-
In the measurement work package, the team will implement the concepts developed using techniques to generate a large number of test cases algorithmically, and deploy them in large-scale digital experiments in Large Language Models (LLMs) and humans. The team will also characterise how humans, who bring their own biases to the table, use AI and how this affects workplace interactions. The team will also branch out into newer models, including Large Reasoning Models (LRMs).
-
In the debiasing work package, the team will develop policy and interventional tools to restore trust or vigilance in interactions with AI. To debias AI, the team will focus on LLMs and develop an explainability framework for detecting and mitigating behavioural biases in LLMs uncovered in the research. The team will analyse the models’ internal activation patterns and map these signatures across different layers and neurons. This allows the team to pinpoint “hotspots” in the network most responsible for the bias. Once isolated, targeted interventions can be applied to these problematic regions. The team will then assess whether the debiased LLMs lead to better decision making by humans and develop interventions aimed at correcting perceptions of AI in humans.