In this story, I will try to shed some light on the benefits of modern data warehouse solutions (DWH) compared to other data platform architecture types. I would dare to say that DWH is the most popular platform among data engineers at the moment. It offers invaluable benefits compared to other solution types but also has some well-known limitations. Want to learn data engineering? This story is a good place to start because it explains data engineering at its core — the DWH solution at the centre of the architecture diagram. We will see how data can be ingested and transformed in different DWHs available in the market.
I’d like to open the discussion with experienced users too. It would be great to know your opinion and see what you have to say on this topic.
Key characteristics of a data warehouse
A serverless, distributed SQL engine (BigQuery, Snowflake, Redshift, Microsoft Azure Synapse, Teradata.) is what we call a modern data warehouse (DWH). It is a SQL-first data architecture [1] where data is stored in a data warehouse, and we can use all the advantages of using denormalized star schema [2] datasets because most of the modern data warehouses are distributed and scale well, which means there is no need to worry about table keys and indices. It suits well for ad-hoc analytical queries on Big Data.
Most of the modern data warehouse solutions can process structured and unstructured data and are very convenient for data analysts with good SQL skills.
Modern data warehouses integrate easily with business intelligence solutions like Looker, Tableau, Sisense, and Mode, which use ANSI-SQL to process data. In the diagram below I tried to map a common data transformation journey and tools used (not a complete list of course). We can see that…