In this post we will review some of the key concepts in ClickHouse.
Projection
A projection is a live physically stored query result for a specific table.
The goal of the projection is to get the query results ~zero time while adding more computation time upon the inserts to the table.
The projection is updated upon insert the the source table.
Example for a projection creation:
CREATE TABLE IF NOT EXISTS my_table (
application_id String,
event_id String,
event_type String,
event_time DateTime,
event_text String
PROJECTION events_per_type_and_day (
SELECT
application_id,
event_type,
event_day as toYYYYMMDD(event_time),
count(*)
GROUP BY
application_id,
event_type,
event_day as toYYYYMMDD(event_time)
)
)
The projection can be created even after the table is created, but then it will contain only the data added after the projection creation. To fully sync the projection use the materialize command:
ALTER TABLE table MATERIALIZE PROJECTION events_per_type_and_dayNotice the client can still fetch from the original table, but then the ClickHouse for efficiency will decide using the projection.
Explain Plan
Explain plan is a command that requests the ClickHouse to describe the method that is about to run for a specific query. For example, we can run the query related the the projection above and verify that the ClickHouse will indeed use the projection. A nice thing is that we can use the ClickHouse HTTP 8123 port to ask for this, for example:
curl -u username:password 'http://localhost:8123/' \
-d "
EXPLAIN PLAN
SELECT
application_id,
event_type,
event_day as toYYYYMMDD(event_time),
count(*)
FROM my_table
GROUP BY
application_id,
event_type,
event_day as toYYYYMMDD(event_time);
"
Materialized Column
CREATE TABLE IF NOT EXISTS my_table (
application_id String,
event_id String,
event_type String,
event_time DateTime,
event_text String,
event_hour String MATERIALIZED formatDateTime(EVENT_TIME, '%Y-%m-%dT%H:00:00'),
)
TTL
CREATE TABLE IF NOT EXISTS my_table (
application_id String,
event_id String,
event_type String,
event_time DateTime,
event_text String,
)
PARTITION BY event_time
TTL event_time + INTERVAL 7 DAY DELETE
Skip Index
Each granule stores 8K records.
ALTER TABLE my_table ADD INDEX application_id_index application_id TYPE bloom_filter(0.01) GRANULARITY 1
Merge Tree Variants
CREATE TABLE IF NOT EXISTS my_parts (
item_id String,
part_id String,
price int32,
update_time DateTime
)
ENGINE=ReplacingMergeTree(update_time)
ORDER BY (
item_id,
part_id
)
select * from my_parts FINAL
Materialized View
CREATE TABLE IF NOT EXISTS my_events (
application_id String,
event_id String,
event_type String,
event_time DateTime,
event_text String
);
CREATE MATERIALIZED VIEW my_events_by_day
ENGINE = SummingMergeTree
ORDER BY (
application_id,
event_type,
event_day
)
AS
SELECT
application_id,
event_type,
toDate(event_time) as event_day,
count() AS count
FROM my_events
GROUP BY
application_id,
event_type,
event_day
;
No comments:
Post a Comment