Snowflake SQL Optimization

data_engineering
SQL
optimization
strict_senior
Remix

Optimize Snowflake queries for performance and cost savings with clustering and caching.

12/8/2025

Prompt

Analyze and optimize Snowflake SQL queries for maximum performance and cost efficiency with the following requirements:

Current Environment

Snowflake Setup

  • Account Region: [US-EAST-1 / EU-WEST-1 / etc.]
  • Warehouse Sizes: [X-Small / Small / Medium / Large / X-Large]
  • Current Monthly Cost: [Approximate spend]
  • Performance Issues: [Slow queries / High costs / Both / Unknown]

Database Schema

Provide table schemas to optimize:

Table 1: [TableName]

  • Row Count: [Approximate number of rows]
  • Size: [GB/TB]
  • Columns: [List key columns with types]
  • Current Clustering: [Clustered by X / No clustering]
  • Query Patterns: [How this table is typically queried]
  • Growth Rate: [Rows per day/month]

Table 2: [TableName]

  • Row Count: [Number of rows]
  • Size: [Size]
  • Columns: [Column list]
  • Current Clustering: [Clustering status]
  • Query Patterns: [Query patterns]
  • Growth Rate: [Growth rate]

[Define 3-10 tables]

Queries to Optimize

Query 1: [QueryName/Purpose]

[Paste current query here]
  • Execution Time: [Current runtime]
  • Frequency: [Runs X times per day/hour]
  • Cost Impact: [High / Medium / Low]
  • Performance Issues: [Full table scan / Slow joins / Large result set / etc.]

Query 2: [QueryName/Purpose]

[Paste current query here]
  • Execution Time: [Current runtime]
  • Frequency: [How often it runs]
  • Cost Impact: [Impact level]
  • Performance Issues: [Issues observed]

Query 3: [QueryName/Purpose]

[Paste current query here]
  • Execution Time: [Runtime]
  • Frequency: [Frequency]
  • Cost Impact: [Impact]
  • Performance Issues: [Issues]

[Provide 3-15 queries to optimize]

Optimization Goals

Performance Targets

  • Query Response Time: [Target: < 5 seconds / < 1 minute / Custom]
  • Warehouse Utilization: [Target: < 80% / Custom]
  • Cache Hit Rate: [Target: > 70% / Custom]
  • Spill to Disk: [Minimize / Eliminate / Current acceptable]

Cost Reduction Targets

  • Monthly Cost Reduction: [Target: 20% / 30% / 50% / Custom]
  • Warehouse Auto-Suspend: [After X minutes / Currently disabled]
  • Result Caching: [Enable / Already enabled / Optimize]
  • Query Consolidation: [Combine similar queries / Not applicable]

Optimization Strategies

Clustering Strategy

For each large table (> 1TB or slow queries):

  • Clustering Keys: [Which columns to cluster by]
  • Clustering Rationale: [Why these columns - filter patterns, join keys, etc.]
  • Re-clustering Frequency: [Auto / Manual / Schedule]
  • Expected Improvement: [Estimated query speedup]

Partitioning Strategy

For time-series or large tables:

  • Partition Column: [Date/timestamp column]
  • Partition Granularity: [Day / Month / Year]
  • Partition Pruning: [Expected reduction in scanned data]
  • Retention Policy: [Keep data for X months/years]

Materialized Views

For frequently run aggregations:

  • View 1: [Name and purpose]

    • Base Query: [Aggregation to materialize]
    • Refresh Schedule: [Hourly / Daily / On-demand]
    • Expected Savings: [Query time reduction]
  • View 2: [Name and purpose]

    • Base Query: [Query to materialize]
    • Refresh Schedule: [Schedule]
    • Expected Savings: [Savings estimate]

[Define 2-5 materialized views]

Warehouse Optimization

  • Warehouse Sizing: [Right-size warehouses based on workload]
  • Multi-Cluster Warehouses: [Enable for concurrent queries / Not needed]
  • Warehouse Separation: [Separate ETL / BI / Ad-hoc workloads]
  • Auto-Suspend Time: [Recommended suspend time in minutes]
  • Auto-Resume: [Enable / Disable]

Query Rewrite Requirements

Join Optimization

  • Join Order: [Optimize join order based on table sizes]
  • Join Types: [Use appropriate join types: INNER, LEFT, etc.]
  • Join Conditions: [Ensure proper join keys and filters]
  • Broadcast Joins: [Use for small dimension tables]

Filter Optimization

  • Early Filtering: [Push filters as early as possible]
  • Partition Pruning: [Ensure filters enable partition pruning]
  • Cluster Key Filters: [Use cluster keys in WHERE clauses]
  • Selective Filters: [Apply most selective filters first]

Projection Optimization

  • Column Selection: [Select only needed columns, avoid SELECT *]
  • Computed Columns: [Pre-compute expensive calculations]
  • Data Type Optimization: [Use appropriate data types]

Aggregation Optimization

  • Pre-Aggregation: [Use materialized views for common aggregations]
  • GROUP BY Optimization: [Optimize GROUP BY column order]
  • DISTINCT Optimization: [Avoid DISTINCT when possible, use GROUP BY]
  • Window Functions: [Optimize window function partitions]

Caching Strategy

Result Caching

  • Enable Result Cache: [Yes / Already enabled]
  • Cache Invalidation: [Strategy for cache invalidation]
  • Identical Query Detection: [Ensure queries are identical for cache hits]

Warehouse Caching

  • Warehouse Cache Utilization: [Monitor and optimize]
  • Cache Warming: [Pre-run queries to warm cache]
  • Persistent Warehouses: [For frequently accessed data]

Monitoring and Profiling

Query Profiling

  • Enable Query Profiling: [For all queries / Slow queries only]
  • Profile Analysis: [Identify bottlenecks: scans, joins, sorts, spills]
  • Query History Analysis: [Review QUERY_HISTORY for patterns]
  • Execution Plan Review: [Analyze execution plans]

Cost Monitoring

  • Warehouse Credit Usage: [Monitor per warehouse]
  • Query Cost Attribution: [Track costs by user/team/query type]
  • Storage Costs: [Monitor table sizes and Time Travel]
  • Data Transfer Costs: [Monitor cross-region transfers]

Performance Metrics

  • Query Execution Time: [Track P50, P95, P99]
  • Queue Time: [Monitor warehouse queuing]
  • Compilation Time: [Track query compilation overhead]
  • Bytes Scanned: [Monitor data scanned per query]

Advanced Optimizations

Search Optimization Service

  • Enable for Tables: [List tables that would benefit]
  • Search Columns: [Columns frequently used in point lookups]
  • Maintenance Cost: [Acceptable overhead]

Time Travel Optimization

  • Data Retention: [Reduce from 90 days to X days if not needed]
  • Fail-Safe: [Understand 7-day fail-safe costs]

Zero-Copy Cloning

  • Development Environments: [Use clones instead of copies]
  • Testing: [Clone production for testing]
  • Cost Savings: [Estimated savings from cloning]

Deliverables

Generate a comprehensive Snowflake optimization package including:

  1. Query Analysis Report:

    • Detailed analysis of each provided query
    • Performance bottlenecks identified
    • Cost impact assessment
    • Optimization opportunities ranked by impact
  2. Optimized SQL Queries:

    • Rewritten queries with optimizations applied
    • Before/after comparison
    • Expected performance improvements
    • Explanation of each optimization
  3. Schema Optimization Scripts:

    • ALTER TABLE statements for clustering keys
    • CREATE TABLE statements with optimal clustering
    • Partitioning implementations
    • Index/constraint recommendations
  4. Materialized View Definitions:

    • CREATE MATERIALIZED VIEW statements
    • Refresh schedules and strategies
    • Maintenance procedures
    • Cost-benefit analysis
  5. Warehouse Configuration:

    • Recommended warehouse sizes per workload
    • Multi-cluster warehouse configurations
    • Auto-suspend/resume settings
    • Warehouse separation strategy
  6. Monitoring Queries:

    • Query performance monitoring SQL
    • Cost tracking queries
    • Cache hit rate analysis
    • Warehouse utilization queries
    • Long-running query detection
  7. Best Practices Guide:

    • Query writing guidelines
    • Clustering key selection criteria
    • When to use materialized views
    • Warehouse sizing recommendations
    • Cost optimization checklist
  8. Implementation Plan:

    • Prioritized optimization steps
    • Expected impact of each optimization
    • Implementation effort estimates
    • Rollback procedures
    • Testing strategy
  9. Cost Projection:

    • Current cost breakdown
    • Projected costs after optimization
    • ROI calculation
    • Monthly savings estimate

Output production-ready Snowflake optimizations following best practices with:

  • Proper clustering key selection based on query patterns
  • Efficient partition pruning strategies
  • Materialized views for expensive aggregations
  • Right-sized warehouses with auto-suspend
  • Result caching optimization
  • Minimal data scanning through filters
  • Optimized join orders and types
  • Elimination of SELECT * anti-patterns
  • Search optimization where beneficial
  • Comprehensive monitoring and alerting
  • Clear documentation and rationale for each optimization

Tags

snowflake
sql
optimization
data-warehouse

Tested Models

gpt-4
claude-3-opus

Comments (0)

Sign in to leave a comment

Sign In