< Projects
Social Complexity and Fairness in Synthetic Medical Data
Medical research is increasingly using big data and powerful computers. But one problem with this is that the powerful computer methods we have now for dealing with big data make it easy to figure out who is who in a data set, which is especially bad for the privacy issues related to sensitive medical data. A solution to this is to use machine learning to generate synthetic data from the raw data, that is, to make a fake data set that still represents important elements of the data, and use that for research purposes.
While this is good in theory, early results from this process indicates that machine learning generated datasets have a tendency to over-represent majority elements and diminish representation of minority elements. When applied to medical data, this would mean that synthetic datasets probably have an over-representation of ‘standard’ patients, i.e. white, middle class, 35-yr old men, despite decades of regulation and research practice that has tried to include other patients and bodies in medical research.
This project, focused around a 2 year postdoc, and as a collaboration between WASP-HS & WASP researchers is going to develop fairness metrics to evaluate the production of synthetic data from a specific medical dataset, with the hypothesis that intersectionality can contribute to better data. Additionally, we will closely examine existing synthetic medical data to see if there are lessons social science can take from it to inform theoretical work about intersectional power dynamics in society.
Start: 1 January 2023
End: 31 December 2024
fairness metrics, ML, synthetic data, medical data, intersectionality
Universities and institutes
Linköping University
Chalmers University of Technology
Project members
Ericka Johnson
Professor
Linköping University
Francis Lee
Associate Professor
Chalmers University of Technology
Gabriel Eilertsen
Linköping University
Saghi Hajisharif
Linköping University