When data knows us better than we know ourselves

A few months ago, I was enticed by a special offer from 23andMe. I'd been on the fence about genetic testing for some time. Like the millions of people who have taken these tests, I was curious about what my genetic profile had to say about me both from an ancestry and health perspective. However, I've heard several stories about how this data can be shared with third parties. I was also worried about how these findings might impact my eligibility or cost of insurance, despite "pledges" from the insurance industry to protect consumers from genetic discrimination.

Ultimately, I didn't go through with the test.

I cancelled my order for the reasons above, but also, because, after giving this a lot of thought....I realized, I really didn't want to know my genetic health risk profile. It could be the kind of information that would potentially cause needless anxiety for me. I know some people want to know their risks and even make major life choices based on that information. Angelina Jolie opting for a preventative double mastectomy is one of the more famous cases involving such a choice.

Fast forward to today. I've just finished Seth Stephens-Davidowitz's book Everybody Lies: Big Data, New Data and what the internet can tell us about who we really are. In the final section of the book, entitled "Big Data: Handle With Care", Stephens-Davidowitz unpacks the limitations and the ethical concerns surrounding the use of data sets. This section really resonated with me. It made we wonder about the pros and cons of how we use information and who decides what we should know.

On the one hand, there are corporations using our data to shape new AI enabled products and services. This data might be gathered through online searches (the primary focus of the book), social media posts, crowd-sourced data collection projects like 23andMe, GPS data, Fitbits or many other myriad of ways we are tracking ourselves or being tracked. In some cases this data could be used against us in ways that are not obvious. For example, the book cites how Facebook users who like Harleys and the country music group Lady Antebellum correlate with low IQ and speculates how that information might be used in an employment decision. (Stephens-Davidowitz p 261)

On the other hand, what if there is information gleaned by an AI system that could intervene to help protect someone. Is there an ethical duty to try and help? The example in the book centres around a man who searched for ways to murder someone just weeks before actually killing his ex-girlfriend. (Stephens-Davidowitz, p 266). Closer to home, researchers at my own university are using Twitter data to train AI to recognize depressive language that might be used to intervene and offer help to those in need. It seems like a good and helpful thing but as this article points out it has a downside.

Providing intervention on crimes before they happen is reminiscent of the movie Minority Report and fraught with a realm of ethical concerns surrounding intent vs action. However, if you were the prospective victim, would you want to know this information? Similarly, intervening about a prospective health concern, feels like a version of this same problem. There seems to be a need to proactively explore the ethical balance we should strike in these situations particularly as we start to scale this scenario with AI systems. What if the data isn't accurate? What about our right to privacy? What about the right to not know? In the case of my genetic health data and 23andme, I was able to make an active choice. Even if we have good intentions, how do we ensure that people are given appropriate levels of choice especially when it comes to information about their safety, health and well-being?


Folio. (2019, October 4). Retrieved from

Jolie, A. (2013, May 14). My Medical Choice. Retrieved from

Regalado, A. (2019, February 18). More than 26 million people have taken an at-home ancestry test. Retrieved from

Spak, D. (2019, October 8). New mental health AI is a violation of privacy. Retrieved from

Stephens-Davidowitz, S., (2018). Everybody lies: big data, new data, and what the Internet reveals about who we really are. New York: Dey St./William Morrow.

Weeks, C. (2018, March 21). Canadian insurance industry pens rules on use of genetic test results. Retrieved from

  • Twitter
  • LinkedIn

©2019 by Ethically Aligned AI.