7. The Ethics of Data#

Context: So far, we’ve been given data from our collaborators at the IHH and we’ve done our best to help answer their scientific questions. Without much reflection, we can easily conclude that we’ve done “good” in the world; after all, we’ve just been doing our best to improve patient care for beings across the galaxy. But is this the limit to our role at our IHH? Or are we also responsible for asking the “bigger questions”—to question our professional role from an ethical perspective?

Challenge: One challenge we may have is that we don’t even know what we should ask ourselves. It’s therefore important we work with a diverse team that can help us examine our blind spots: e.g. colleagues with different identities/backgrounds, people affected by our AI/ML systems, experts on gender and critical race theory, anthropology, philosophy, science and technology studies, history of science, and more.

Outline: We’ll focus on three questions:

  • What’s lost when we represent people as data?

  • Is data neutral?

  • How do we responsibly collect and use data?

All three questions are fundamentally concerned with representation. Data offers us a representation of people; models offer us a representation of people, as well as of our assumptions. This is exactly what we’ve focused on so far! And as the readings below will argue, these representations are shaped by implicit and explicit societal values. For us to act responsibly and ethically, we must therefore interrogate them.

7.1. What’s lost when we represent people as data?#

Exercise: Ethics of Representation

Part 1: From the book, Data Feminism, read Chapter 4: “What Gets Counted Counts”.

  • According to the author, what differentiates “data” from “information”?

  • What do we gain, and what do we lose when we turn information into data?

  • What is the “paradox of exposure”?

  • What does the author mean by “counting as healing/accountability”?

Part 2: Read Dehumanized Data Points.

  • What’s the central claim this author makes?

  • Pick one example from Chapter 4 of Data Feminism. In what way(s) would this author argue that the example dehumanized people through data?

Part 3: Read Dehumanization in Medicine: Causes, Solutions, and Functions.

  • In the process of developing ML models for healthcare contexts, are we at risk of dehumanizing patients? Which of the six causes identified by the author are applicable to us (and why). Are there causes that are applicable to us that aren’t mentioned?

  • Read Words, Do No Harm. What are the ways in which dehumanizing language can affect patient care?

  • Is there language we use when talking about ML that’s implicitly dehumanizing?

7.2. Is data neutral?#

Exercise: Data Neutrality

Part 1: From the book, Invisible Women: Data Bias in a World Designed for Men, read Chapter 10: The Drugs Don’t Work. Content Warning: The 2nd paragraph of the reading describes invasive, gendered, medical violence—please skip it. Additionally, the language in the book often conflates gender with sex (e.g. “female” with “women”).

Then answer the following questions:

  • What factors lead to the exclusion of female health data?

  • What are the consequences of excluding female participants from medical research?

  • Given what you read, are there any other groups of people you now suspect are excluded from medical research? What makes you suspect this?

  • What can you, as an individual, do to address the problems presented in the chapter?

  • What can we, as a society, do to address the problems presented in the chapter?

Part 2: Read Data is never a raw, truthful input—and it is never neutral.

  • When the author says “data is never neutral,” what do they mean?

  • What are “who questions”? Why does the author advocate for using them?

  • Whose responsibility is it to ask the “who questions”? The person who funds the research? The researcher who collected the data? The data scientist who analyzed it?

7.3. How do we responsibly collect and use data?#

Exercise: Responsible and Ethical Practices

Part 1: What are pillars of responsible and ethical data collection? What practices will set us up for success?

Part 2: What are pillars of responsible and ethical data usage? What practices will set us up for success?


Acknowledgements. This chapter draws on FASPE 2024.