Data Science & Analytics

Data Warehousing Services FAQs

What is a data warehouse and how is it different from an operational database?

An operational database (PostgreSQL, MySQL, SQL Server) is optimised for OLTP Online Transaction Processing: many concurrent small reads and writes, row-level operations, ACID transactions. It powers your application the database that stores your users, orders, and products. A data warehouse is optimised for OLAP Online Analytical Processing: fewer, larger queries that scan millions of rows to produce aggregated results. It powers your analytics the database that answers "what was our revenue by region last quarter?" Mixing analytical queries in your operational database creates performance problems: a complex analytical query that scans millions of rows blocks the application queries that need to complete in milliseconds. A data warehouse separates the analytics workload from the operational workload, storing denormalised data in column-oriented storage (optimised for scanning many rows of a few columns) rather than row-oriented storage.

How much does it cost to run Snowflake?

Snowflake costs have two components: compute (virtual warehouse credits charged per second of warehouse activity) and storage (compressed data storage approximately $23/TB/month). A typical startup with one X-Small virtual warehouse running 8 hours/day, 250GB of data, and ELT pipelines costs approximately $150-400/month. A growing B2B SaaS company with multiple warehouses, 1TB data, and active BI usage costs $500-2,000/month. Enterprise deployments with multiple teams, heavy ML workloads, and terabytes of data can cost $5,000-50,000+/month. The most common Snowflake overspending pattern is virtual warehouses that do not auto-suspend (running 24/7 when queries only run for 2 hours/day). ClickMasters configures auto-suspend to 1-5 minutes on all warehouses typically reducing Snowflake spend by 40-60% on new deployments where auto-suspend was not configured.

What is the star schema and why is it used in data warehouses?

The star schema is a data modelling approach for analytical databases: a central fact table (which records events or transactions each row represents one order, one session, one payment) surrounded by dimension tables (which describe the entities involved each customer, each product, each date). The "star" shape comes from the fact table at the centre with dimension tables radiating outward. The star schema is optimised for analytical queries because: joins are simple (fact-to-dimension, never dimension-to-dimension in a properly normalised star schema), query optimisers can efficiently prune irrelevant data using dimension filters, and the model is intuitive for business users and BI tools. Alternative: the medallion architecture (Bronze/Silver/Gold) is increasingly used for modern data lakehouses it describes data quality tiers rather than a specific table structure, and can incorporate star schema at the Gold (business-ready) layer.

When should I migrate from my current database to a data warehouse?

Signs you need a data warehouse: your operational database is slow because of analytical queries running alongside application queries; you are joining data from multiple source systems (CRM + product database + billing) in ad-hoc Python scripts rather than a single queryable store; different analysts produce different numbers for the "same" metric because they each write their own SQL with slightly different logic; dashboard queries take minutes to run; or your data team spends more time extracting and joining data than analysing it. The data warehouse does not replace your operational databases it supplements them by providing a separate, optimised store for analytics that does not compete with application workloads.