Updated: May 1, 2020
Before going in, I will just give a brief on what exactly you are learning.
Well, we all know that the growth of data is rapidly increasing and this data has to be processed/analyzed to take better decisions w.r.t organizations.
To analyze this data over time and take actions business needs a data warehouse.
So what is a data warehouse?
To put it simply, a data warehouse is a relational database designed to store data from different sources for reporting and analysis.
For more information about data warehouse and data warehousing, please see here
What is Snowflake and how is it different?
Snowflake is an analytic data warehouse provided as Software-as-a-Service (SaaS).
Now, we have data warehouses available on market which are on premise or cloud-based or both.
Even though we have control on an on premise data warehouse we need to spend more on maintenance and hardware, on the other hand a cloud-base data warehouse provides us with much more in other aspects like
System up time and availability
Snowflake is a true SaaS offering.
There is no hardware (virtual or physical) for you to select, install, configure, or manage.
There is no software for you to install, configure, or manage.
Ongoing maintenance, management, and tuning is handled by Snowflake.
Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. Similar to shared-disk architectures, Snowflake uses a central data repository for persisted data that is accessible from all compute nodes in the data warehouse. But similar to shared-nothing architectures, Snowflake processes queries using MPP (massively parallel processing) compute clusters where each node in the cluster stores a portion of the entire data set locally. This approach offers the data management simplicity of a shared-disk architecture, but with the performance and scale-out benefits of a shared-nothing architecture.
We can see there are three major components:
1. Cloud Services: This component handles multiple services like
Query parsing and optimization
2. Query Processing: This component performs query execution. Query processing happens using virtual warehouses, which are like a separate MPP cluster that can be instantiated on demand. These virtual warehouses are composed of multiple compute nodes allocated by Snowflake from a cloud provider. Each virtual warehouse is independent and has no impact on the performance of other virtual warehouses.
3. Database Storage: when data is loaded into Snowflake it automatically reorganizes the data into its internal optimized, compressed, columnar format and stores in cloud storage like S3 or Azure blob storage. Snowflake manages all the aspects like how it is stored, where it is stored. The data objects stored are accessible only via Snowflake SQL queries.
Data Lifecycle in Snowflake
All user data is logically represented as tables, nothing but an object. As mentioned above all the data objects can be accessed using SQL queries.
The entire lifecycle can be categorized into five stages as shown below.
1. Organizing Data: We can organize our data into databases, schemas, and tables. Snowflake does not limit the number of databases we can create or the number of schemas we can create within a database. Snowflake also does not limit the number of tables we can create in a schema.
2. Storing Data: We can insert data directly into tables. In addition, Snowflake provides DML for loading data into Snowflake tables from external, formatted files.
3. Querying Data: Once data is stored in a table, we can issue SELECT statements to query the data.
4. Working with Data: Once data is stored in a table, we can perform all standard DML operations on the data. In addition, Snowflake supports DDL actions such as cloning entire databases, schemas, and tables.
5. Removing Data: In addition to using the DML command, DELETE, to remove data from a table, we can truncate or drop an entire table. We can also drop entire schemas and databases.
For more info on all Snowflake SQL Commands, Please refer here.
There are other interesting aspects in Snowflake like
Scaling, Auto-suspension and Auto-resumption.
Automatic Query Optimization.
Multi-cloud support for Database Storage.
Micro-partitions & Data Clustering.
Snowpipe, a continuous data ingestion service.
We can delve into these aspects some other day.
Snowflake offers a 30-Day free trial with $400 credits to try all its features.
Thanks for reading! Check out my other posts here and share, if you liked it.