BACKUP / RESTORE to or from an S3 endpoint

This article covers backing up or restoring backups to/from an S3 bucket via an S3 endpoint.

Syntax

-- core commands
BACKUP | RESTORE [ASYNC]
--- what to backup/restore (or exclude)
TABLE [db.]table_name           [AS [db.]table_name_in_backup] |
DICTIONARY [db.]dictionary_name [AS [db.]name_in_backup] |
DATABASE database_name          [AS database_name_in_backup] |
TEMPORARY TABLE table_name      [AS table_name_in_backup] |
VIEW view_name                  [AS view_name_in_backup] |
[EXCEPT TABLES ...] |
ALL [EXCEPT {TABLES|DATABASES}...] } [,...]
--- 
[ON CLUSTER 'cluster_name']
--- where to backup or restore to or from
TO|FROM 
File('<path>/<filename>') | 
Disk('<disk_name>', '<path>/') | 
S3('<S3 endpoint>/<path>', '<Access key ID>', '<Secret access key>', '<extra_credentials>') |
AzureBlobStorage('<connection string>/<url>', '<container>', '<path>', '<account name>', '<account key>')
--- additional settings
[SETTINGS ...]

See "command summary" for more details of each command.

Usage example

Incremental backup to an S3 endpoint

In this example, we will create a backup to an S3 endpoint and then restore from it again.

Note

For an explanation of the differences between a full backup and an incremental backup, see "Backup types"

You will need the following information to use this method:

Parameter	Example
An S3 endpoint	`https://backup-ch-docs.s3.us-east-1.amazonaws.com/backups/`
Access key ID	`BKIOZLE2VYN3VXXTP9RC`
Secret access key	`40bwYnbqN7xU8bVePaUCh3+YEyGXu8UOMV9ANpwL`

Tip

Creating an S3 bucket is covered in section "use S3 Object Storage as a ClickHouse disk"

The destination for a backup is specified as:

S3('<s3 endpoint>/<directory>', '<access key id>', '<secret access key>', '<extra_credentials>')

Setup

Create the following database and table and insert some random data into it:

CREATE DATABASE IF NOT EXISTS test_db;
CREATE TABLE test_db.test_table
(
    `key` Int,
    `value` String,
    `array` Array(String)
)
ENGINE = MergeTree
ORDER BY tuple()

INSERT INTO test_db.test_table SELECT *
FROM generateRandom('key Int, value String, array Array(String)')
LIMIT 1000

Create a base backup

Incremental backups require a base backup to start from. The first parameter of the S3 destination is the S3 endpoint followed by the directory within the bucket to use for this backup. In this example the directory is named my_backup.

Run the following command to create the base backup:

BACKUP TABLE test_db.test_table TO S3(
'https://backup-ch-docs.s3.us-east-1.amazonaws.com/backups/base_backup',
'<access key id>',
'<secret access key>'
)

┌─id───────────────────────────────────┬─status─────────┐
│ de442b75-a66c-4a3c-a193-f76f278c70f3 │ BACKUP_CREATED │
└──────────────────────────────────────┴────────────────┘

Add more data

Incremental backups are populated with the difference between the base backup and the current content of the table being backed up. Add more data before taking the incremental backup:

INSERT INTO test_db.test_table SELECT *
FROM generateRandom('key Int, value String, array Array(String)')
LIMIT 100

Take an incremental backup

This backup command is similar to the base backup, but adds SETTINGS base_backup and the location of the base backup. Note that the destination for the incremental backup isn't the same directory as the base, it is the same endpoint with a different target directory within the bucket. The base backup is in my_backup, and the incremental will be written to my_incremental:

BACKUP TABLE test_db.test_table TO S3(
'https://backup-ch-docs.s3.us-east-1.amazonaws.com/backups/incremental_backup',
'<access key id>',
'<secret access key>'
)
SETTINGS base_backup = S3(
'https://backup-ch-docs.s3.us-east-1.amazonaws.com/backups/base_backup',
'<access key id>',
'<secret access key>'
)

┌─id───────────────────────────────────┬─status─────────┐
│ f6cd3900-850f-41c9-94f1-0c4df33ea528 │ BACKUP_CREATED │
└──────────────────────────────────────┴────────────────┘

Restore from the incremental backup

This command restores the incremental backup into a new table, test_table_restored.
Note that when an incremental backup is restored, the base backup is also included. Specify only the incremental backup when restoring:

RESTORE TABLE data AS test_db.test_table_restored FROM S3(
'https://backup-ch-docs.s3.us-east-1.amazonaws.com/backups/incremental_backup',
'<access key id>',
'<secret access key>'
)

┌─id───────────────────────────────────┬─status───┐
│ ff0c8c39-7dff-4324-a241-000796de11ca │ RESTORED │
└──────────────────────────────────────┴──────────┘

Verify the count

There were two inserts into the original table data, one with 1,000 rows and one with 100 rows, for a total of 1,100. Verify that the restored table has 1,100 rows:

SELECT count()
FROM test_db.test_table_restored

┌─count()─┐
│    1100 │
└─────────┘

Verify the content

This compares the content of the original table, test_table with the restored table test_table_restored:

SELECT throwIf((
   SELECT groupArray(tuple(*))
   FROM test_db.test_table
   ) != (
   SELECT groupArray(tuple(*))
   FROM test_db.test_table_restored
), 'Data does not match after BACKUP/RESTORE')

Syntax​

Usage example​

Incremental backup to an S3 endpoint​

Setup​

Create a base backup​

Add more data​

Take an incremental backup​

Restore from the incremental backup​

Verify the count​

Verify the content​