Snowflake SQL Optimization
Optimize Snowflake queries for performance and cost savings with clustering and caching.
Prompt
Analyze and optimize Snowflake SQL queries for maximum performance and cost efficiency with the following requirements:
Current Environment
Snowflake Setup
- Account Region: [US-EAST-1 / EU-WEST-1 / etc.]
- Warehouse Sizes: [X-Small / Small / Medium / Large / X-Large]
- Current Monthly Cost: [Approximate spend]
- Performance Issues: [Slow queries / High costs / Both / Unknown]
Database Schema
Provide table schemas to optimize:
Table 1: [TableName]
- Row Count: [Approximate number of rows]
- Size: [GB/TB]
- Columns: [List key columns with types]
- Current Clustering: [Clustered by X / No clustering]
- Query Patterns: [How this table is typically queried]
- Growth Rate: [Rows per day/month]
Table 2: [TableName]
- Row Count: [Number of rows]
- Size: [Size]
- Columns: [Column list]
- Current Clustering: [Clustering status]
- Query Patterns: [Query patterns]
- Growth Rate: [Growth rate]
[Define 3-10 tables]
Queries to Optimize
Query 1: [QueryName/Purpose]
[Paste current query here]
- Execution Time: [Current runtime]
- Frequency: [Runs X times per day/hour]
- Cost Impact: [High / Medium / Low]
- Performance Issues: [Full table scan / Slow joins / Large result set / etc.]
Query 2: [QueryName/Purpose]
[Paste current query here]
- Execution Time: [Current runtime]
- Frequency: [How often it runs]
- Cost Impact: [Impact level]
- Performance Issues: [Issues observed]
Query 3: [QueryName/Purpose]
[Paste current query here]
- Execution Time: [Runtime]
- Frequency: [Frequency]
- Cost Impact: [Impact]
- Performance Issues: [Issues]
[Provide 3-15 queries to optimize]
Optimization Goals
Performance Targets
- Query Response Time: [Target: < 5 seconds / < 1 minute / Custom]
- Warehouse Utilization: [Target: < 80% / Custom]
- Cache Hit Rate: [Target: > 70% / Custom]
- Spill to Disk: [Minimize / Eliminate / Current acceptable]
Cost Reduction Targets
- Monthly Cost Reduction: [Target: 20% / 30% / 50% / Custom]
- Warehouse Auto-Suspend: [After X minutes / Currently disabled]
- Result Caching: [Enable / Already enabled / Optimize]
- Query Consolidation: [Combine similar queries / Not applicable]
Optimization Strategies
Clustering Strategy
For each large table (> 1TB or slow queries):
- Clustering Keys: [Which columns to cluster by]
- Clustering Rationale: [Why these columns - filter patterns, join keys, etc.]
- Re-clustering Frequency: [Auto / Manual / Schedule]
- Expected Improvement: [Estimated query speedup]
Partitioning Strategy
For time-series or large tables:
- Partition Column: [Date/timestamp column]
- Partition Granularity: [Day / Month / Year]
- Partition Pruning: [Expected reduction in scanned data]
- Retention Policy: [Keep data for X months/years]
Materialized Views
For frequently run aggregations:
-
View 1: [Name and purpose]
- Base Query: [Aggregation to materialize]
- Refresh Schedule: [Hourly / Daily / On-demand]
- Expected Savings: [Query time reduction]
-
View 2: [Name and purpose]
- Base Query: [Query to materialize]
- Refresh Schedule: [Schedule]
- Expected Savings: [Savings estimate]
[Define 2-5 materialized views]
Warehouse Optimization
- Warehouse Sizing: [Right-size warehouses based on workload]
- Multi-Cluster Warehouses: [Enable for concurrent queries / Not needed]
- Warehouse Separation: [Separate ETL / BI / Ad-hoc workloads]
- Auto-Suspend Time: [Recommended suspend time in minutes]
- Auto-Resume: [Enable / Disable]
Query Rewrite Requirements
Join Optimization
- Join Order: [Optimize join order based on table sizes]
- Join Types: [Use appropriate join types: INNER, LEFT, etc.]
- Join Conditions: [Ensure proper join keys and filters]
- Broadcast Joins: [Use for small dimension tables]
Filter Optimization
- Early Filtering: [Push filters as early as possible]
- Partition Pruning: [Ensure filters enable partition pruning]
- Cluster Key Filters: [Use cluster keys in WHERE clauses]
- Selective Filters: [Apply most selective filters first]
Projection Optimization
- Column Selection: [Select only needed columns, avoid SELECT *]
- Computed Columns: [Pre-compute expensive calculations]
- Data Type Optimization: [Use appropriate data types]
Aggregation Optimization
- Pre-Aggregation: [Use materialized views for common aggregations]
- GROUP BY Optimization: [Optimize GROUP BY column order]
- DISTINCT Optimization: [Avoid DISTINCT when possible, use GROUP BY]
- Window Functions: [Optimize window function partitions]
Caching Strategy
Result Caching
- Enable Result Cache: [Yes / Already enabled]
- Cache Invalidation: [Strategy for cache invalidation]
- Identical Query Detection: [Ensure queries are identical for cache hits]
Warehouse Caching
- Warehouse Cache Utilization: [Monitor and optimize]
- Cache Warming: [Pre-run queries to warm cache]
- Persistent Warehouses: [For frequently accessed data]
Monitoring and Profiling
Query Profiling
- Enable Query Profiling: [For all queries / Slow queries only]
- Profile Analysis: [Identify bottlenecks: scans, joins, sorts, spills]
- Query History Analysis: [Review QUERY_HISTORY for patterns]
- Execution Plan Review: [Analyze execution plans]
Cost Monitoring
- Warehouse Credit Usage: [Monitor per warehouse]
- Query Cost Attribution: [Track costs by user/team/query type]
- Storage Costs: [Monitor table sizes and Time Travel]
- Data Transfer Costs: [Monitor cross-region transfers]
Performance Metrics
- Query Execution Time: [Track P50, P95, P99]
- Queue Time: [Monitor warehouse queuing]
- Compilation Time: [Track query compilation overhead]
- Bytes Scanned: [Monitor data scanned per query]
Advanced Optimizations
Search Optimization Service
- Enable for Tables: [List tables that would benefit]
- Search Columns: [Columns frequently used in point lookups]
- Maintenance Cost: [Acceptable overhead]
Time Travel Optimization
- Data Retention: [Reduce from 90 days to X days if not needed]
- Fail-Safe: [Understand 7-day fail-safe costs]
Zero-Copy Cloning
- Development Environments: [Use clones instead of copies]
- Testing: [Clone production for testing]
- Cost Savings: [Estimated savings from cloning]
Deliverables
Generate a comprehensive Snowflake optimization package including:
-
Query Analysis Report:
- Detailed analysis of each provided query
- Performance bottlenecks identified
- Cost impact assessment
- Optimization opportunities ranked by impact
-
Optimized SQL Queries:
- Rewritten queries with optimizations applied
- Before/after comparison
- Expected performance improvements
- Explanation of each optimization
-
Schema Optimization Scripts:
- ALTER TABLE statements for clustering keys
- CREATE TABLE statements with optimal clustering
- Partitioning implementations
- Index/constraint recommendations
-
Materialized View Definitions:
- CREATE MATERIALIZED VIEW statements
- Refresh schedules and strategies
- Maintenance procedures
- Cost-benefit analysis
-
Warehouse Configuration:
- Recommended warehouse sizes per workload
- Multi-cluster warehouse configurations
- Auto-suspend/resume settings
- Warehouse separation strategy
-
Monitoring Queries:
- Query performance monitoring SQL
- Cost tracking queries
- Cache hit rate analysis
- Warehouse utilization queries
- Long-running query detection
-
Best Practices Guide:
- Query writing guidelines
- Clustering key selection criteria
- When to use materialized views
- Warehouse sizing recommendations
- Cost optimization checklist
-
Implementation Plan:
- Prioritized optimization steps
- Expected impact of each optimization
- Implementation effort estimates
- Rollback procedures
- Testing strategy
-
Cost Projection:
- Current cost breakdown
- Projected costs after optimization
- ROI calculation
- Monthly savings estimate
Output production-ready Snowflake optimizations following best practices with:
- Proper clustering key selection based on query patterns
- Efficient partition pruning strategies
- Materialized views for expensive aggregations
- Right-sized warehouses with auto-suspend
- Result caching optimization
- Minimal data scanning through filters
- Optimized join orders and types
- Elimination of SELECT * anti-patterns
- Search optimization where beneficial
- Comprehensive monitoring and alerting
- Clear documentation and rationale for each optimization