Skip to content

quantumiodb/pg_tpcds

 
 

Folders and files

NameName
Last commit message
Last commit date

Latest commit

 

History

25 Commits
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 
 

Repository files navigation

pg_tpcds - TPC-DS for Apache Cloudberry / PostgreSQL

A plug-and-play extension for TPC-DS benchmarking. Inspired by duckdb and hyrise, it eliminates the cumbersome TPC-DS setup — install the extension and start benchmarking immediately.

Supports Apache Cloudberry (MPP) distributed architecture with two distribution modes:

  • original (default): all 24 tables hash-distributed
  • replicated: 14 dimension tables replicated, 10 fact tables hash-distributed

Requirements

  • Apache Cloudberry (PostgreSQL 14 kernel) or PostgreSQL
  • CMake 3.12+
  • C++17 compiler (GCC 7+ or Clang 5+)

Build & Install

Make sure pg_config is in your PATH.

cd contrib/pg_tpcds
cmake -Bbuild
cmake --build build --target install

Quick Start

-- Install the extension (creates metadata schema only, no data tables yet)
CREATE EXTENSION pg_tpcds;

-- Create tables with hash distribution (default)
SELECT tpcds_prepare();

-- Or create tables with replicated dimension tables
SELECT tpcds_prepare('replicated');

-- Generate data for scale factor 1
SELECT * FROM dsdgen(1);

-- Collect answer files for result validation (uses optimizer=off)
SELECT tpcds_collect_answers();

-- Run all 99 queries
SELECT * FROM tpcds;

Distribution Modes

-- Hash distribution (all tables hash-distributed by primary key)
SELECT tpcds_prepare();
-- or explicitly:
SELECT tpcds_prepare('original');

-- Replicated distribution (dimension tables replicated, fact tables hash-distributed)
SELECT tpcds_prepare('replicated');

Switching modes drops and recreates all 24 data tables (and clears query stats):

-- Switch from hash to replicated (data must be reloaded)
SELECT tpcds_prepare('replicated');
SELECT * FROM dsdgen(1);
SELECT tpcds_collect_answers();

Running TPC-DS Queries

-- Run all 99 queries and show benchmark results
SELECT * FROM tpcds;

 Qid  | Stable(ms)  | Current(ms) | Diff(%) | Result
------+-------------+-------------+---------+--------
  01  |      243.99 |      251.74 |   +3.18 | true
  02  |     1297.74 |     1525.93 |  +17.58 | true
  ...
  99  |      984.72 |     1023.06 |   +3.89 | true
 ---- | ----------- | ----------- | ------- |
 Sum  |   134660.86 |   139667.93 |   +3.72 |

-- Run specific queries by ID
SELECT * FROM tpcds(1, 2, 3);

-- Run a single query directly
SELECT query FROM tpcds_queries(1); \gset
:query

Column descriptions:

Column Description
Stable(ms) Best historical duration (from tpcds.tpcds_query_stats)
Current(ms) Duration of the current run
Diff(%) Percentage change: (Current - Stable) / Stable * 100
Result Whether the query result matches the expected answer set
  • On the first run, Stable(ms) is empty — results are recorded as the baseline.
  • On subsequent runs, if the current duration is faster, it replaces the stored baseline.
  • Result validates query output against answer files generated by tpcds_collect_answers().

Result Validation

Answer files are generated per distribution mode:

-- Generate answer files (auto-detects current distribution mode)
SELECT tpcds_collect_answers();
-- Returns 99 (number of answer files generated)
  • Runs all 99 queries with optimizer=off for deterministic results
  • Saves sorted, pipe-delimited output to source tree:
    • src/tpcds/answers/sf1/ for hash distribution
    • src/tpcds/answers/sf1_replicated/ for replicated distribution
  • RunTPCDS auto-detects the current distribution mode and picks the correct answer directory

Other Functions

-- List all 99 TPC-DS queries
SELECT * FROM tpcds_queries();

-- Get a specific query by ID (1-99)
SELECT * FROM tpcds_queries(42);

-- Truncate all TPC-DS table data
SELECT tpcds_cleanup();

-- Remove the extension (drops all tables, functions, and schema)
DROP EXTENSION pg_tpcds CASCADE;

Customization

  • Queries: 99 SQL statements are provided in src/tpcds/queries/. Modify them before building to customize the benchmark.
  • Schema: Table DDL files are in src/tpcds/schema/. Distribution is applied dynamically at runtime from distribution.txt and distribution_original.txt.
  • Post-setup: Add index creation or other post-load statements to src/post_prepare.sql, which runs after table creation.
  • Distribution: Edit src/tpcds/distribution.txt (replicated mode) or src/tpcds/distribution_original.txt (hash mode) to customize table distribution.

About

A plug-and-play postgres extension for testing tpcds, inspired by duckdb

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors

Languages

  • C++ 93.9%
  • C 5.6%
  • Other 0.5%