“We shape our tools and then our tools shape us.” -Marshall McLuhan/Father John Culkin
My work often looks at the technological affordances of our tools and what they enable or make impossible especially as these elements relate to ethical considerations. Lately I’ve been thinking about the tools that shape the work of data scientists. What are the tools used in data science and how might those tools have ethical impacts? That was the question that led me down the internet rabbit hole of Jupyter Notebooks.
For those of us who are not data scientists, Jupyter Notebooks are a “free, open-source, interactive web tool known as a computational notebook, which researchers can use to combine software code, computational output, explanatory text and multimedia resources in a single document.” (Nature, 2018) According to Wikipedia, between 2015 to 2021 there was an explosion of Jupyter Notebooks on Github, going from 200,000 (2015) to 2.5 million (2018) to over 10 million by 2021. This tracks with the rise of data science itself. However, Jupyter Notebooks might also have become the go to choice for data science because its open source and has been adopted by various cloud computing companies as a frontend interface. It’s also relatively easy to use, making it popular in various 'learn data science' tutorials and classes. It’s become the defacto standard in data science.
A history of notebooks
A Data Camp article chronicles the history of notebooks in data science, which dates back to the 1980s. Companies like Wolfram Mathematica and Maple pioneered these products but they lacked widespread adoption due to costs. As the demand for tools to support data science grew in the 1990s and early 2000s, open source solutions began to emerge such as iPython (command shell for interactive computing), SciPy (library for scientific calculations) and Matplotlib (visualization). In 2005, these discrete functions came together in a single tool called SageMath, the open source challenger to Mathematica and Maple. A few years later in 2011, the first iPython notebook was released and in 2014, Jupyter Notebook was spun out of iPython.
What work does a Jupyter Notebook support?
Fernando Perez, who was behind Project Jupyter, shared that he wanted to use programming languages, such as Python, in an interactive discovery process and needed tools to support this exploration during his graduate studies thesis. That, and a slight desire to procrastinate on finishing his PhD studies (which he did eventually complete!), led to his focusing on the creation of this tool.
Notebooks support data science work with a toolset for exploration and prototyping. They enable a form of interactive computing that provides the ability to run code. Aside from being easy to use for beginners and opening up access to data science, some of the other areas where they shine include:
Collaboration: This was a key driver for Perez who wanted a tool that would enable scientists to easily share their work with everything “all in one place”.
Transparency: Related to collaboration, since it is so easily shareable, it's a simple way to document and publish work.
Flexibility: Like an actual notebook, a Jupyter Notebook is flexible, allowing it to accommodate a number of different functions. It’s also easy to customize.
Yet, the features that make Jupyter Notebooks so useful for data science research and exploration are not necessarily what is needed when it comes to scaling code into a production environment.
Hidden States, Bad Code and Technical Debt
The decision to scale a process is an ethical inflection point. While notebooks are useful to facilitate research experimentation, they present some downsides in their design to facilitate good quality production level code. Joel Grus is an author, software engineer and data scientist whose infamous presentation “I don’t like notebooks” explains the problems. To paraphrase a few of the issues which are also described in this blog post:
Hidden States: The design of computational cells within the notebook makes it hard to troubleshoot and reason through the code. Since things can run out of order (not top to bottom) this compounds the problem. It makes it hard for people to correct problems in the code.
Replicability: Reproducing another person’s results is difficult without reproducing their entire environment. Jupyter Notebooks aren’t designed to be modular. They don’t easily support robust, modular and repeatable code, though they can be awkwardly hacked to do this.
Bad Code: Grus believes that the bad tutorials that Jupyter enables lead to bad code and to beginner’s thinking they understand code, when they really don’t.
Essentially, Jupyter Notebooks design affordances - the things that make it great for data science exploration - make it unsuitable to support software engineering best practices for production code. This is exacerbated by a culture within data science that doesn’t put emphasis on things like the importance of unit testing or standardized annotation. All of this results in code that is highly susceptible to errors and a lack of quality.
Grus is not alone. In a blog post entitled “Jupyter Notebook is the cancer of ML Engineering”, Or Hiltch writes about the ways that notebook code written as an easy way to craft a data story for research, leads to a lack of rigour and standards. As he explains:
“The problems begin when this story needs to interact with a production application. The fun & easy platform used by the data scientist needs to be integrated into a production-grade data pipeline. That’s where nearly all of the benefits of Jupyter become drawbacks, turning the life of the ML Engineer into a living hell.”
And…
“A production-grade ML pipeline needs to be composed out of debuggable, reproducible, easy-to-deploy, high-performance code components that could be orchestrated and scheduled. In its default version, it’s everything research code on Jupyter isn’t.”
All of this can add up to technical debt, particularly if organizations do not have ML engineers essentially redoing the work in ways that support appropriate production level coding practices.
Fixing Notebooks? Or moving on.
Notebooks are not development environments. They were not designed to be development environments. That was not the objective Fernando Perez had in mind. However, through mass adoption by the data science community, coupled with the imperative to move data science projects from the world of research into commercializable products, we see how the mismatch between tool sets arise. This in turn can lead to technical issues such as poor quality, unreliable, unstable and unverifiable code, which in turn can have ethical implications for building responsible AI.
There are ways to address these issues. Hiltch notes Pycharm and VSCode as technical solutions for fixing notebooks. There are tutorials on how to convert notebooks to make it suitable for testing. Others, such as Eric Kahuha (and Grus) make the case to leave notebooks altogether. Irrespective of the fix, from a sociological perspective, it’s interesting to examine the work cultures between data science and software development and to consider how this incongruence in mindsets and toolsets can impact the ability to support responsible development for AI systems.
Resources
By Katrina Ingram, CEO, Ethically Aligned AI
________
Sign up for our newsletter to have new blog posts and other updates delivered to you each month! Ethically Aligned AI is a social enterprise aimed at helping organizations make better choices about designing and deploying technology. Find out more at ethicallyalignedai.com
© 2023 Ethically Aligned AI Inc. All right reserved.
Comments