Image generated by DALL·E 3
Data scientists were placed in an exciting position; while their job in the modern era requires them to use the programming language, there are still many business aspects their job needs to remember. That’s why the Python code used by Data Scientists usually reflects storytelling on how to solve a business problem. The environment for data scientists is also remarkable; we use the Jupyter Notebook IDE, which allows for an excellent way to experiment with data manipulation and model development.
With a different way of coding activity, data scientists would do things differently during the programming activity. It includes the commenting activity, which is an activity to explain your code. For data scientists who constantly have changes of requirements and work collaboratively, it’s crucial to provide an adequate explanation of the code via commenting.
This article will discuss how to perform Python code commenting as a data scientist. We would discuss the various points that would improve your activity and bring value to anyone who reads your codes. Let’s get into it.
Before we go further, let’s learn a little about two different types of commenting. The first one is the single-line commenting, which uses the ‘#’ notation in the code. It’s usually used for a simple explanation of the code. For example, the below code exemplifies the usage of single-line commenting.
# The code is to import the Pandas package and call it pd
import pandas as pd
The other way to comment is using the multi-line method, which employs triple quotes. Technically, they are not comments but string objects, but Python would ignore them if we don’t assign them to a variable. We can see them in action with the following example.
"""
The code below would import the Pandas package, and we would call them pd throughout the whole working environment.
"""
import pandas as pd
In this section, we will discuss some general tips for commenting. It is not necessarily applicable for data scientists as these tips are a best practice for programmers, but it’s good to remember. The tips are:
- Consider placing the comment in a separate line directly above the code we want to explain to increase the readability.
- Consistent in the commenting style throughout the code you are working on.
- Avoid using hard-to-understand jargon and technical terms if you know the audience would not understand them.
- Only commenting if it’s adding value to avoid explaining something that obvious.
- Maintain and update the comment if it is not relevant anymore.
These are the general guidelines to provide a better-commenting experience. Now, let’s move to a more specific one for the data scientist.
For the data scientist, the coding activity would be different from that of a software engineer or web developer. That’s why there would be differences in the commenting activity. Here are some tips that are specific to us data scientists.
1. Use Commenting to clarify complex processes or activities
The data science activity would involve many experimental processes that might confuse the readers or our future selves if we didn’t explain them. The comment on the code would help us explain the intention better, especially if many steps are involved. For example, the code below would explain how we remove outliers by normalization and scaling.
# Perform data normalization (Min-Max scaling)
normalized_data = (data - np.min(data)) / (np.max(data) - np.min(data))
# Remove outliers by using the sigma rule (3 standard deviations removal)
removed_outlier_data = normalized_data[np.abs(stats.zscore(normalized_data)) < 3]
The comment above explains what was done for each process and the concept behind them. Specifying the concepts we used in the code is essential to understand what we have done.
It’s not limited to preprocessing but could be commented on in any data science steps. From data retrieval to model monitoring, commenting on things for anybody to understand is good practice. Remember that as a data scientist, our comment could become the bridge between the code and analytical insight.
2. Having a Commenting Standard
Data science activity is a collaboration process, so having a standard structure that everyone understands is good. It’s also helpful even if you work solo, as you have the standard that you would know. For example, you could standardize the comment for every function you made.
# Function: name of the function
# Usage: description of how to use the function
# Parameters: list the parameters and explain them
# Output: explain the output
The above is a standard example, as you can create something independently. Don’t forget to use the same style, language, and abbreviations when you have a standard like this.
3. Use Comments to Help the Workflow
In a collaborative environment, commenting is essential to help the team understand the workflow. We can use the comment to help understand when there are new code updates or what needs to be done next. For example, an update in another function causes bugs in our process, so we need to fix the bugs next.
# TODO: Fix this function ASAP
some_function_to_fix()
4. Implement the Markdown Notebook Cells
Data Scientist IDE is quite remarkable as we use the Notebook for experimentation. Using the cell in the notebook, we can isolate each code so that it can independently run without a need to run the whole code. The notebook cell is not limited to the code but can be transformed into a Markdown cell.
Markdown is a formatting language that describes how the text should look like. In the cell, markdown could further explain the code below. The advantage of using the Markdown is that we can comment in more detail than the standard commenting process. You can even add tables, images, LaTeX, and many more.
For example, the image below shows how we use Markdown to explain our project, the aim and the steps.
You can read further about Jupyter Markdown Cell in their documentation to understand further what you can do.
Commenting is an integral part of the data scientist activity as it helps the reader clarify what happened with the code. For a data scientist, the comment process differs slightly from the software engineer or web developer, as our work process is different. That’s why this article gives some tips that you can use for commenting as a data scientist. The tips are:
- Use Commenting to clarify complex processes or activities
- Having a Commenting Standard
- Use Comments to Help the Workflow
- Implement the Markdown Notebook Cells
I hope it helps.
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.