1. What is a Data Warehouse?
A Data Warehouse (DW) is a centralized repository that stores historical, integrated, and structured data from multiple sources to support reporting, analysis, and decision-making.
Simple definition for interview: A data warehouse is a subject-oriented, integrated, time-variant, and non-volatile system used for analytical reporting and business intelligence.
2. Key Characteristics of Data Warehouse
Interviewers often ask this exactly.
-
Subject-Oriented
Data is organized by business subjects like Sales, Finance, Customers. -
Integrated
Data from different sources is standardized and cleaned. -
Time-Variant
Historical data is stored with a time dimension. -
Non-Volatile
Data is not updated or deleted frequently. Mostly append-only.
3. Data Warehouse vs Database (OLAP vs OLTP)
| Feature | OLTP (Database) | OLAP (Data Warehouse) |
|---|---|---|
| Purpose | Day-to-day operations | Analytics & reporting |
| Data | Current | Historical |
| Design | Normalized | Denormalized |
| Queries | Simple, fast | Complex, heavy |
| Examples | Order entry, billing | Sales analysis, trends |
4. Data Warehouse Architecture (Very Important)
3-Tier Architecture
-
Bottom Tier (Data Sources)
-
ERP, CRM, Files, APIs, Logs
-
-
Middle Tier (ETL / ELT Layer)
-
Extract
-
Transform
-
Load
-
-
Top Tier (Presentation Layer)
-
Reports, Dashboards
-
Power BI, Tableau, Excel
-
5. ETL vs ELT
| ETL | ELT |
|---|---|
| Transform before load | Transform after load |
| Used in traditional DW | Used in cloud DW |
| Slower for big data | Faster & scalable |
| Example: SSIS | Example: Azure Synapse, Fabric |
6. Fact Table and Dimension Table
Fact Table
-
Stores measurable data
-
Numeric values
-
Foreign keys to dimensions
Examples:
-
SalesAmount
-
Quantity
-
Profit
Dimension Table
-
Stores descriptive attributes
-
Used for slicing and filtering
Examples:
-
Customer
-
Product
-
Date
-
Region
7. Star Schema vs Snowflake Schema
Star Schema
-
Fact table in center
-
Directly connected dimensions
-
Simple, fast queries
-
Preferred in Power BI
Snowflake Schema
-
Normalized dimensions
-
More joins
-
Complex but saves space
8. Slowly Changing Dimensions (SCD)
Very common interview question.
Types of SCD
| Type | Description |
|---|---|
| Type 0 | No changes allowed |
| Type 1 | Overwrite old data |
| Type 2 | Maintain history (StartDate, EndDate, IsCurrent) |
| Type 3 | Limited history (Old value column) |
9. Data Warehouse Layers
-
Staging Layer
-
Raw data
-
No business logic
-
-
Data Warehouse Layer
-
Cleaned and transformed data
-
-
Data Mart Layer
-
Subject-specific
-
Optimized for reporting
-
10. Data Mart
A Data Mart is a subset of a data warehouse focused on a specific business area.
Examples:
-
Sales Mart
-
Finance Mart
-
HR Mart
11. Data Warehouse Modeling Approaches
-
Kimball Approach
-
Bottom-up
-
Dimensional modeling
-
Star schema
-
-
Inmon Approach
-
Top-down
-
Normalized (3NF)
-
Enterprise DW first
-
12. Data Warehouse Indexing & Performance
-
Indexing (Clustered / Non-Clustered)
-
Partitioning (Date-based)
-
Aggregation tables
-
Columnar storage
-
Materialized views
13. Data Quality in Data Warehouse
-
Duplicate handling
-
Null value handling
-
Data validation rules
-
Data profiling
-
Reconciliation reports
14. Data Warehouse Security
-
Row Level Security (RLS)
-
Column Level Security
-
Data masking
-
Encryption (at rest & in transit)
-
Role-based access
15. Common Data Warehouse Tools
On-Prem
-
SQL Server
-
Oracle
-
Teradata
Cloud
-
Azure Synapse
-
Snowflake
-
BigQuery
-
Amazon Redshift
-
Microsoft Fabric Warehouse
16. Data Warehouse vs Data Lake vs Lakehouse
| Feature | Data Warehouse | Data Lake | Lakehouse |
|---|---|---|---|
| Data Type | Structured | All types | All types |
| Schema | Schema-on-write | Schema-on-read | Hybrid |
| Cost | High | Low | Medium |
| Use Case | BI & reporting | Big data & ML | BI + ML |
Thank you for reading this post, and I hope you enjoyed it!
![]()