TPC-H (1999)
A popular benchmark which models the internal data warehouse of a wholesale supplier. The data is stored into a 3rd normal form representation, requiring lots of joins at query runtime. Despite its age and its unrealistic assumption that the data is uniformly and independently distributed, TPC-H remains the most popular OLAP benchmark to date.
References
- TPC-H
- New TPC Benchmarks for Decision Support and Web Commerce (Poess et. al., 2000)
- TPC-H Analyzed: Hidden Messages and Lessons Learned from an Influential Benchmark (Boncz et. al.), 2013
- Quantifying TPC-H Choke Points and Their Optimizations (Dresseler et. al.), 2020
Data Generation and Import
First, checkout the TPC-H repository and compile the data generator:
Then, generate the data. Parameter -s specifies the scale factor. For example, with -s 100, 600 million rows are generated for table 'lineitem'.
Detailed table sizes with scale factor 100:
| Table | size (in rows) | size (compressed in ClickHouse) | 
|---|---|---|
| nation | 25 | 2 kB | 
| region | 5 | 1 kB | 
| part | 20.000.000 | 895 MB | 
| supplier | 1.000.000 | 75 MB | 
| partsupp | 80.000.000 | 4.37 GB | 
| customer | 15.000.000 | 1.19 GB | 
| orders | 150.000.000 | 6.15 GB | 
| lineitem | 600.00.00 | 26.69 GB | 
(Compressed sizes in ClickHouse are taken from system.tables.total_bytes and based on below table definitions.)
Now create tables in ClickHouse.
We stick as closely as possible to the rules of the TPC-H specification:
- Primary keys are created only for the columns mentioned in section 1.4.2.2 of the specification.
- Substitution parameters were replaced by the values for query validation in sections 2.1.x.4 of the specification.
- As per section 1.4.2.1, the table definitions do not use the optional NOT NULLconstraints, even ifdbgengenerates them by default. The performance ofSELECTqueries in ClickHouse is not affected by the presence or absence ofNOT NULLconstraints.
- As per section 1.3.1, we use ClickHouse's native datatypes (e.g. Int32,String) to implement the abstract datatypes mentioned in the specification (e.g.Identifier,Variable text, size N). The only effect of this is better readability, the SQL-92 datatypes generated bydbgen(e.g.INTEGER,VARCHAR(40)) would also work in ClickHouse.
The data can be imported as follows:
Instead of using tpch-kit and generating the tables by yourself, you can alternatively import the data from a public S3 bucket. Make sure
to create empty tables first using above CREATE statements.
Queries
Setting join_use_nulls should be enabled to produce correct results according to SQL standard.
The queries are generated by ./qgen -s <scaling_factor>. Example queries for s = 100:
Correctness
The result of the queries agrees with the official results unless mentioned otherwise. To verify, generate a TPC-H database with scale
factor = 1 (dbgen, see above) and compare with the expected results in tpch-kit.
Q1
Q2
Until v25.5, the query did not work out-of-the box due to correlated subqueries. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/6697
This alternative formulation works and was verified to return the reference results.
Q3
Q4
Until v25.5, the query did not work out-of-the box due to correlated subqueries. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/6697
This alternative formulation works and was verified to return the reference results.
Q5
Q6
As of February 2025, the query does not work out-of-the box due to a bug with Decimal addition. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/70136
This alternative formulation works and was verified to return the reference results.
Q7
Q8
Q9
Q10
Q11
Q12
Q13
Q14
Q15
Q16
Q17
Until v25.5, the query did not work out-of-the box due to correlated subqueries. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/6697
This alternative formulation works and was verified to return the reference results.
Q18
Q19
Q20
Until v25.5, the query did not work out-of-the box due to correlated subqueries. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/6697
Q21
Until v25.5, the query did not work out-of-the box due to correlated subqueries. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/6697
Q22
Until v25.5, the query did not work out-of-the box due to correlated subqueries. Corresponding issue: https://github.com/ClickHouse/ClickHouse/issues/6697
