Countering bias in AI methods in the social sciences

"Countering bias in AI methods in the social sciences" is a collaboration project between Chalmers and the Institute for Analytical Sociology at Linköping University. The project is financed by WASP-HS (The Wallenberg AI, Autonomous Systems and Software Program – Humanities and Society), which offers a graduate school with research visits, partner universities, and visiting lecturers.

We now seek 1 PhD student and 1 postdoc. The PhD student will be employed at Chalmers and supervised jointly by Richard Johansson, Moa Johansson, and Adel Daoud.

Follow this link to read more and apply for the position at Chalmers. Please apply before February 28, 2023.

Project description

Methods developed in artificial intelligence (AI) and machine learning (ML) are increasingly seen as a research tool in the humanities and social sciences. In contrast to statistical approaches traditionally applied in these fields, AI/ML methods allow researchers to work with more complex models and to integrate unstructured types of data (e.g. text, images, signals) more seamlessly. The recent rapid developments in deep learning methods for these types of data can then be harnessed by researchers. However, while these new approaches open up new research avenues, it is critical to understand their applicability and limitations.

In social science, it is important to ground analysis in theoretical constructs and frameworks with explanatory power. For this, researchers want to distinguish causal relationships from confounding effects. New causal inference methods are being developed that allow researchers to draw causal conclusions from observational data in the absence of randomized trials [Pearl, 2009]. Scholars then need to collect information about potential confounders in order to estimate the causal effect (𝜏) of a treatment on an outcome. Yet, because some confounders are difficult to measure directly, scholars are turning to alternative data sources, such as medical records or policy documents, to indirectly measure (proxy) confounders [Kino et al., 2021]. This is an example of where AI-based methods make a difference: recent methodological frameworks supply ways of integrating text into causal estimation [Mozer et al., 2020; Roberts et al., 2020; Feder et al., 2021], often relying on deep learning text processing methods [Veitch et al., 2020]. However, causal inference methods using text-based AI suffer from biases that lead to incorrect conclusions, potentially calling into question the whole machinery of causal inference.

There are different types of biases that may occur in AI-based methods in the social sciences generally and causal inference specifically. In previous work, we pointed to an Achilles heel of AI-based causal inference methods that we termed treatment leakage [Daoud et al., 2022]. This problem occurs when text is used as a proxy for confounders, but the text itself is influenced by the treatment. We showed that when this effect is present, text-based causal inference methods collapse. Other types of biases are caused by the way that AI methods compute representations of text, images and other unstructured data. For instance, it is well-known in the AI community that text representation methods accidentally encode various demographic factors (e.g. gender, race) [Bolukbasi et al., 2016]. This means that using such AI-based text representations risks introducing unintended factors into a causal analysis. Finally, there are biases related to measurement when applying potentially unreliable AI methods for predicting variables that are not directly observable.

This project takes a two-pronged approach and it will simultaneously investigate substantive domain research questions in the social and political sciences as well as methodological questions relating to the pitfalls when applying text-based causal inference methods in these areas. On the substantive side, we will (1) consider the effect of the International Monetary Fund (IMF) macroeconomic programs on population health indicators, extending our previous work [Daoud et al., 2019], and (2) investigate the change of support for political parties in opinion polls, after a change in party leadership.

On the methodological side of the project, we will investigate how biases such as treatment leakage, representation bias, or measurement bias affect estimates. Furthermore, we will develop methods for debiasing representations that researchers in the social sciences can apply when such biases are present in order to counter their effects, and we will analyse these methods theoretically to understand in which contexts they are applicable. We have already carried out exploratory experiments in removing the treatment leakage bias from text representations. In particular, we explored methods that transform texts by removing passages, as well as adversarial methods that debias a representation [Ganin et al., 2015; Ravfogel et al., 2020].