Alternative backup methods
ClickHouse stores data on disk, and there are many ways to back up disks. These are some alternatives that have been used in the past, and that may fit your use case.
Duplicating source data somewhere else
Often data ingested into ClickHouse is delivered through some sort of persistent queue, such as Apache Kafka. In this case, it is possible to configure an additional set of subscribers that will read the same data stream while it is being written to ClickHouse and store it in cold storage somewhere. Most companies already have some default recommended cold storage, which could be an object store or a distributed filesystem like HDFS.
Filesystem Snapshots
Some local filesystems provide snapshot functionality (for example, ZFS),
but they might not be the best choice for serving live queries. A possible solution
is to create additional replicas with this kind of filesystem and exclude them
from the Distributed tables that are used for SELECT queries.
Snapshots on such replicas will be out of reach of any queries that modify data.
As a bonus, these replicas might have special hardware configurations with more
disks attached per server, which would be cost-effective.
For smaller volumes of data, a simple INSERT INTO ... SELECT ... to remote tables
might work as well.
Manipulations with Parts
ClickHouse allows using the ALTER TABLE ... FREEZE PARTITION ... query to create
a local copy of table partitions. This is implemented using hardlinks to the /var/lib/clickhouse/shadow/
folder, so it usually does not consume extra disk space for old data. The created
copies of files are not handled by ClickHouse server, so you can just leave them there:
you will have a simple backup that does not require any additional external system,
but it will still be prone to hardware issues. For this reason, it's better to
remotely copy them to another location and then remove the local copies.
Distributed filesystems and object stores are still a good options for this,
but normal attached file servers with a large enough capacity might work as well
(in this case the transfer will occur via the network filesystem or maybe rsync).
Data can be restored from backup using the ALTER TABLE ... ATTACH PARTITION ...
For more information about queries related to partition manipulations, see the
ALTER documentation.
A third-party tool is available to automate this approach: clickhouse-backup.