Ova

What is a Scratch schema?

Published in Looker PDTs 5 mins read

A Scratch schema is a dedicated logical container within your database that Looker utilizes specifically for the management and storage of Persistent Derived Tables (PDTs). It serves as a temporary, yet critical, staging area where Looker materializes the results of complex queries as physical tables, enabling efficient data access and updates.

Understanding the Scratch Schema

At its core, a scratch schema is a schema in your database that Looker uses for PDTs. A database schema itself is a logical grouping of database objects such as tables, views, indexes, and stored procedures, often used to organize and manage access to data. The scratch schema is a specialized version of this container, purpose-built to support Looker's data transformation and persistence processes.

Role in Persistent Derived Tables (PDTs)

Looker's Persistent Derived Tables (PDTs) are a powerful feature designed to pre-compute the results of complex SQL queries and store them as actual tables in the database. This significantly improves query performance for end-users and reduces the load on the underlying data source. The scratch schema is fundamental to the operation of PDTs:

  • Storage of Query Results: When a PDT is built or rebuilt, Looker executes the underlying SQL query defined in your LookML. The crucial function of the scratch schema is to store the results of these PDT queries as tables on that schema. These tables are physically present in your database within the designated scratch schema.
  • Dynamic Management: Looker actively manages the lifecycle of these PDTs within the scratch schema. This dynamic management includes:
    • Creation: Generating new PDTs based on defined triggers (e.g., specific time intervals, data group changes).
    • Updates: Refreshing existing PDTs when new data becomes available or their definition changes.
    • Rebuilding: Looker will drop/rebuild those PDTs from there when necessary, ensuring data freshness and structural integrity. This often involves building a new version of the table in the scratch schema, then atomically swapping it with the old version.
    • Cleanup: Automated removal of old, unused, or failed PDT versions to manage database space.

Why is a Scratch Schema Essential?

The strategic use of a dedicated scratch schema provides several critical advantages for data platforms integrating with Looker:

  • Performance Optimization: By pre-computing and storing data, PDTs within the scratch schema dramatically reduce query times for users. Instead of running complex, resource-intensive queries on the fly, users query an already materialized, optimized table.
  • Reduced Database Load: It effectively offloads heavy computational tasks from your primary transactional or analytical database, allowing those systems to focus on their core operations without being bogged down by complex Looker queries.
  • Data Consistency: Ensures that all users querying a specific PDT are accessing the same, consistently processed data, eliminating discrepancies that might arise from ad-hoc querying.
  • Isolation of Operations: The scratch schema acts as a sandboxed environment for PDT creation and maintenance. This prevents these potentially resource-intensive operations from interfering with other critical database activities or user queries on live data.
  • Reliability during Updates: Looker typically builds a new version of a PDT in the scratch schema before making it live. This ensures that users always have access to a stable version of the data, even while an update is in progress.

How it Works: A Practical Insight

Consider a scenario where you have a complex SQL query that calculates Monthly Active Users (MAU) by joining several large tables and performing aggregations. Instead of running this query every time a user requests an MAU report, you can define it as a PDT in Looker.

  1. PDT Definition: You define your MAU calculation within a LookML view file, specifying it as a derived_table with a persistence strategy (e.g., persist_for: "24 hours" or datagroup_trigger: my_datagroup).
  2. Looker's Role: Based on the persistence strategy, Looker identifies when the MAU PDT needs to be built or rebuilt.
  3. Scratch Schema Interaction:
    • Looker generates a unique, temporary table name (e.g., LR_MAU_2023_10_26_123456) within the designated scratch schema.
    • It then executes your MAU query and populates this newly created temporary table with the results.
    • Once the new table is successfully built and validated, Looker atomically swaps the old MAU PDT table with this new, updated one.
    • The old version of the MAU PDT is then typically dropped by Looker during subsequent maintenance cycles to free up space.

This seamless process ensures that your reporting remains uninterrupted while the underlying data is being refreshed and optimized.

Key Considerations for Scratch Schema Configuration

Proper configuration and management of the scratch schema are vital for optimal Looker performance and database health:

  • Permissions: The database user account that Looker connects with must possess the necessary permissions to perform operations (e.g., create, drop, select, insert, update) on tables within the scratch schema.
  • Storage Capacity: While PDTs are "temporary" in their dynamic lifecycle, they can consume significant storage space, especially for large datasets. Ensure your database instance has adequate disk space allocated for the scratch schema.
  • Naming Convention: Looker typically assigns unique, often randomized, names to PDT tables within the scratch schema (e.g., _looker_tmp_table_xxxxx, LKR_PDT_12345). Avoid manually interfering with these tables.
  • Database Specifics: The exact implementation and configuration details for a scratch schema might vary slightly depending on the specific database dialect you are using (e.g., Snowflake, Google BigQuery, Amazon Redshift, PostgreSQL, MySQL).

Further Reading

For a deeper dive into Persistent Derived Tables and their management within Looker, explore the official Looker documentation on PDTs.