Image by Freepik
SQL, or Standard Query Language, is a programming language for managing and manipulating data within a relational database management system (RDBMS). It’s a standard language used in many companies to help businesses access data smoothly. Because it’s widely used, the employment usually cites SQL as one of the necessary skills. That’s why it’s essential to learn about SQL.
One of the common problems for people when learning SQL is understanding the queries, mainly when another person writes them. We would work as a team in the companies and often need to read and understand their SQL queries. So, we need to practice deconstructing the SQL queries and understand them.
This article will walk through a step-by-step process to read and understand SQL queries. How do we do that? Let’s get into it.
The first thing we need to do when encountering an SQL query is to understand the general intention of the SQL queries. The general intention doesn’t mean we know entirely about the query’s structure; it’s more about the overall flows.
We should understand the standard SQL queries to understand the general SQL query. Most SQL queries start with the SELECT clause and follow with the FROM clause. Continuing from there, the queries are most often followed by JOIN, WHERE, GROUP BY, ORDER BY, and HAVING clauses.
The clauses above are standard in the SQL queries we need to understand. For each clause, their functions are:
- SELECT: What columns are to be taken from the table
- FROM: Which table the data came from
- JOIN: Combine tables with the specified identifier
- WHERE: Data filter based on the condition
- GROUP BY: Organize the data based on the column’s value and allow aggregation function to be performed.
- ORDER BY: Arrange the data result order based on the specific column
- HAVING: The filter condition for aggregation function that cannot be specified with WHERE
These are the standard clauses and what you should find in the beginning when understanding the general SQL query structure. Let’s use the example code to learn further.
SELECT
customers.name,
purchases.product,
SUM(price) as total_price
FROM
purchases
JOIN customers ON purchases.cust_id = customers.id
WHERE
purchases.category = 'kitchen'
GROUP BY
customers.name,
purchases.product
HAVING
total_price > 10000
ORDER BY
total_price DESC;
When you look at the query above, try to identify the standard clauses. The clause would provide you with an understanding of what data are selected (SELECT), where it is from (FROM and JOIN), and the condition (WHERE, GROUP BY, ORDER BY, and HAVING).
For example, reading the query above would provide you understanding of the following:
- we try to get three different kinds of data: the Name from a table called customers, the Product from a table called purchases, and the aggregation of price columns that do not identify where the table is from and with the alias total_price (Information from clause SELECT).
- The overall data would come from purchases and customers tables that join together using the cust_id column from purchases and the id column from the customer’s table (Information from clause FROM) and JOIN).
- We would only select the data with the category column value in the purchases table as ‘kitchen’ (Information from clause WHERE),
- group for the aggregation function with the name and product column that came from the respective table (Information from clause GROUP BY),
- filtered as well from the aggregation function result sum where the total_price is more than 10000 (information from clause HAVING), and
- order the data descending according to the total_price (information from clause ORDER BY).
That’s the general SQL query structure you need to know and identify. From there, we can explore further from the advanced query. Let’s get on with the next step.
There would be a time when you would encounter a complex query where so many SELECT clauses exist in the same query. In this case, we should understand the final result of the query or the final (first) SELECT you see in the query. The key is to know what the query output wants to be.
Let’s use a more complex code like the one below.
WITH customerspending AS (
SELECT
customers.id,
SUM(purchases.price) as total_spending
FROM
purchases
JOIN customers ON purchases.cust_id = customers.id
GROUP BY
customers.id
)
SELECT
c.name,
pd.product,
pd.total_product_price,
cs.total_spending
FROM
(
SELECT
purchases.cust_id,
purchases.product,
SUM(purchases.price) as total_product_price
FROM
purchases
WHERE
purchases.category = 'kitchen'
GROUP BY
purchases.cust_id,
purchases.product
HAVING
SUM(purchases.price) > 10000
) AS pd
JOIN customers c ON pd.cust_id = c.id
JOIN customerspending cs ON c.id = cs.id
ORDER BY
pd.total_product_price DESC;
The query seems more complex and longer now, but the initial focus should be on the final SELECT, which seems to try to produce the customer’s total spending and purchase history. Try to assess what the final result wants to be and break it down from there.
We have the insight of the result should be from the queries. The next part is to see what the conditions for the final SELECT are. The conditions clause, including WHERE, GROUP BY, ORDER BY, and HAVING were the ones that controlled the overall data result.
Try to read and understand our query’s conditions, and we will better understand our query’s final result. For example, in our previous SQL query, the final condition is only the ORDER BY. This means the final result would be ordered by the total product price in descending order.
Knowing the final conditions would help you understand a significant part of the query and the overall query intention.
Lastly, we need to understand where the data comes from. After we know about the data to select and the conditions to get them, we need to understand the source. The final JOIN clause would give us an understand of how the tables interact and the data flow.
For example, the previous complex query shows that we performed Join twice. It means we used at least three data sources for the final result. This information would be necessary in the later steps to understand further how each data source comes by, especially when the data source is coming from the subquery.
After understanding how the final result should be and where it comes from, we need to look closer at the details. From here, we would backtrack into each subqueries and understand why they are structured like that.
However, we don’t try to look at them in the top-down structures. Instead, we should try to look at the subqueries that are closer to the final result, and we move up to the one furthest from the final result. From the code example above, we should try to understand this code first:
SELECT
purchases.cust_id,
purchases.product,
SUM(purchases.price) as total_product_price
FROM
purchases
WHERE
purchases.category = 'kitchen'
GROUP BY
purchases.cust_id,
purchases.product
HAVING
SUM(purchases.price) > 10000
Then, we are moving to the furthest code which is this one:
WITH customerspending AS (
SELECT
customers.id,
SUM(purchases.price) as total_spending
FROM
purchases
JOIN customers ON purchases.cust_id = customers.id
GROUP BY
customers.id
)
We can track down the author’s thought process clearly when we break down each subquery from the one closer to the result to the furthest.
Try to repeat the process above if you need help understanding each subquery. With some practice, you will have a better experience in reading and understanding the query.
Reading and understanding SQL queries is a skill that everyone should have in the modern era, as every company deals with them. By using the following step-by-step guide, you would have a better time to understand the complex SQL queries. The steps include:
- Understand the General SQL Query Structure
- Understand the Final Select
- Understand the Final Conditions Clause
- Understand the Final Join
- Reverse Order Reading and Repeat
Cornellius Yudha Wijaya is a data science assistant manager and data writer. While working full-time at Allianz Indonesia, he loves to share Python and Data tips via social media and writing media.