System Tables Overview

System tables overview

System tables provide information about:

Server states, processes, and environment.
Server's internal processes.
Options used when the ClickHouse binary was built.

System tables:

Located in the system database.
Available only for reading data.
Can't be dropped or altered, but can be detached.

Most of the system tables store their data in RAM. A ClickHouse server creates such system tables at the start.

Unlike other system tables, the system log tables metric_log, query_log, query_thread_log, trace_log, part_log, crash_log, text_log and backup_log are served by MergeTree table engine and store their data in a filesystem by default. If you remove a table from a filesystem, the ClickHouse server creates the empty one again at the time of the next data writing. If system table schema changed in a new release, then ClickHouse renames the current table and creates a new one.

System log tables can be customized by creating a config file with the same name as the table under /etc/clickhouse-server/config.d/, or setting corresponding elements in /etc/clickhouse-server/config.xml. Elements can be customized are:

database: database the system log table belongs to. This option is deprecated now. All system log tables are under database system.
table: table to insert data.
partition_by: specify PARTITION BY expression.
ttl: specify table TTL expression.
flush_interval_milliseconds: interval of flushing data to disk.
engine: provide full engine expression (starting with ENGINE = ) with parameters. This option conflicts with partition_by and ttl. If set together, the server will raise an exception and exit.

An example:

By default, table growth is unlimited. To control a size of a table, you can use TTL settings for removing outdated log records. Also you can use the partitioning feature of MergeTree-engine tables.

Sources of System Metrics

For collecting system metrics ClickHouse server uses:

CAP_NET_ADMIN capability.
procfs (only in Linux).

procfs

If ClickHouse server does not have CAP_NET_ADMIN capability, it tries to fall back to ProcfsMetricsProvider. ProcfsMetricsProvider allows collecting per-query system metrics (for CPU and I/O).

If procfs is supported and enabled on the system, ClickHouse server collects these metrics:

OSCPUVirtualTimeMicroseconds
OSCPUWaitMicroseconds
OSIOWaitMicroseconds
OSReadChars
OSWriteChars
OSReadBytes
OSWriteBytes

Note

OSIOWaitMicroseconds is disabled by default in Linux kernels starting from 5.14.x. You can enable it using sudo sysctl kernel.task_delayacct=1 or by creating a .conf file in /etc/sysctl.d/ with kernel.task_delayacct = 1

System tables in ClickHouse Cloud

In ClickHouse Cloud, system tables provide critical insights into the state and performance of the service, just as they do in self-managed deployments. Some system tables operate at the cluster-wide level, especially those that derive their data from Keeper nodes, which manage distributed metadata. These tables reflect the collective state of the cluster and should be consistent when queried on individual nodes. For example, the parts should be consistent irrespective of the node it is queried from:

Conversely, other system tables are node-specific e.g. in-memory or persisting their data using the MergeTree table engine. This is typical for data such as logs and metrics. This persistence ensures that historical data remains available for analysis. However, these node-specific tables are inherently unique to each node.

In general, the following rules can be applied when determining if a system table is node-specific:

System tables with a _log suffix.
System tables that expose metrics e.g. metrics, asynchronous_metrics, events.
System tables that expose ongoing processes e.g. processes, merges.

Additionally, new versions of system tables may be created as a result of upgrades or changes to their schema. These versions are named using a numerical suffix.

For example, consider the system.query_log tables, which contain a row for each query executed by the node:

Querying multiple versions

We can query across these tables using the merge function. For example, the query below identifies the latest query issued to the target node in each query_log table:

Don't rely on the numerical suffix for ordering

While the numeric suffix on tables can suggest the order of data, it should never be relied upon. For this reason, always use the merge table function combined with a date filter when targeting specific date ranges.

Importantly, these tables are still local to each node.

Querying across nodes

To comprehensively view the entire cluster, users can leverage the clusterAllReplicas function in combination with the merge function. The clusterAllReplicas function allows querying system tables across all replicas within the "default" cluster, consolidating node-specific data into a unified result. When combined with the merge function this can be used to target all system data for a specific table in a cluster.

This approach is particularly valuable for monitoring and debugging cluster-wide operations, ensuring users can effectively analyze the health and performance of their ClickHouse Cloud deployment.

Note

ClickHouse Cloud provides clusters of multiple replicas for redundancy and failover. This enables its features, such as dynamic autoscaling and zero-downtime upgrades. At a certain moment in time, new nodes could be in the process of being added to the cluster or removed from the cluster. To skip these nodes, add SETTINGS skip_unavailable_shards = 1 to queries using clusterAllReplicas as shown below.

For example, consider the difference when querying the query_log table - often essential to analysis.

Querying across nodes and versions

Due to system table versioning this still does not represent the full data in the cluster. When combining the above with the merge function we get an accurate result for our date range: