Snowflake’s architecture is a hybrid of traditional Shared-disk and Shared-nothing database architectures. Let’s explore these architectures and see how Snowflake combines them into a new Hybrid architecture.
An Overview of the Shared-Disk Architecture
A shared-disk architecture is a database architecture where multiple computing nodes (or servers) share access to a common, centralized storage system. In this architecture, data is stored on a shared disk storage, and multiple compute nodes can access and query this shared data concurrently.
An Overview of the Shared-Nothing Architecture
A Shared-Nothing Architecture, also known as a “Massively Parallel Processing” (MPP) architecture, is a distributed computing design where each processing node or server in a cluster operates independently and does not share memory or disk storage with other nodes.
In this architecture, data is partitioned and distributed across multiple nodes, and each node is responsible for processing a portion of the data and executing queries independently.
Snowflake Architecture – A Hybrid Model
Snowflake’s architecture is a hybrid of traditional shared-disk and shared-nothing database architectures. It is designed to be highly scalable, reliable, and secure.
The three main layers of Snowflake’s architecture are:
1- Database Storage(or Storage layer)
- Snowflake uses cloud-based object storage (e.g., Amazon S3, Azure Blob Storage, GCP) as its storage layer. This storage is highly scalable, durable, and cost-effective.
- Data is stored in a columnar format, typically Apache Parquet, which optimizes compression and query performance.
- Data is organized into databases, schemas, tables, and stages for efficient management.
- The data objects stored by Snowflake are not directly visible nor accessible by customers, they can only be accessed through SQL query operations executed within the Snowflake environment.
2- Query Processing(or Compute layer)
- Query execution is performed in the processing layer. Snowflake processes queries using “virtual warehouses”.
- Each virtual warehouse is an MPP compute cluster composed of multiple compute nodes allocated by Snowflake from a cloud provider.
- Each virtual warehouse is an independent compute cluster that does not share compute resources with other virtual warehouses.
- Users can create and manage virtual warehouses, which are clusters of compute resources, based on their workload requirements.
- Virtual warehouses can be resized dynamically to handle varying workloads. This separation of compute and storage allows for scalability and cost optimization.
3- Cloud Services Layer
- The cloud services layer is a collection of services that coordinate activities across Snowflake.
- These services tie together all of the different components of Snowflake in order to process user requests, from login to query dispatch.
- The cloud services layer also runs on compute instances provisioned by Snowflake from the cloud provider.
Services managed in this layer include:
- Infrastructure management
- Metadata management
- Query parsing and optimization
- Access control
Thank you for taking the time to read this post!