编码函数

bech32Decode

引入于：v25.6.0

解码由 bech32 或 bech32m 算法生成的 Bech32 地址字符串。

注意

与编码函数不同，bech32Decode 会自动处理带填充的 FixedStrings。

语法

bech32Decode(address[, 'raw'])

参数

address — 要解码的 Bech32 字符串。String 或 FixedString
mode — 可选。传入 'raw' 可在解码时不将第一个字节剥离为见证版本。将其用于非 SegWit 地址 (例如 Cosmos SDK) 。String

返回值

返回一个元组 (hrp, data)，该元组是用于对该字符串进行编码的数据。数据为二进制格式。Tuple(String, String)

示例

解码地址

SELECT tup.1 AS hrp, hex(tup.2) AS data FROM (SELECT bech32Decode('bc1w508d6qejxtdg4y5r3zarvary0c5xw7kj7gz7z') AS tup)

bc   751E76E8199196D454941C45D1B3A323F1433BD6

测试网地址

SELECT tup.1 AS hrp, hex(tup.2) AS data FROM (SELECT bech32Decode('tb1w508d6qejxtdg4y5r3zarvary0c5xw7kzp034v') AS tup)

tb   751E76E8199196D454941C45D1B3A323F1433BD6

bech32Encode

引入版本：v25.6.0

使用 Bech32 或 Bech32m 算法对二进制数据字符串以及人类可读部分 (HRP) 进行编码。

注意

当使用 FixedString 数据类型时，如果某个值未完全填满该行，则会用空字符 (null 字符) 进行填充。 bech32Encode 函数会自动为 hrp 参数处理这些填充，但对于 data 参数，其值不能包含这些填充字符。因此，不建议将 FixedString 数据类型用于数据值，除非您能确定它们全部具有相同长度，并确保 FixedString 列也被设置为该长度。

语法

bech32Encode(hrp, data[, witver | 'bech32' | 'bech32m'])

参数

hrp — 一个由 1 - 83 个小写字符组成的字符串，用于指定代码的“human-readable part” (人类可读部分) 。通常为 'bc' 或 'tb'。String 或 FixedString
data — 要编码的二进制数据字符串。String 或 FixedString
witver_or_variant — 可选。可以是 UInt* 类型的 witness 版本号 (默认 = 1，Bech32 使用 0，Bech32m 使用 1 或更大值) ，也可以是 String 类型的编码变体：'bech32' (BIP173) 或 'bech32m' (BIP350) 。使用字符串变体时，不会在前面附加 witness 版本字节——这对于 Cosmos SDK 等非 SegWit 地址是必需的。UInt* 或 String

返回值

返回一个 Bech32 地址字符串，由 human-readable 部分、始终为 '1' 的分隔符字符以及数据部分组成。字符串长度不会超过 90 个字符。如果算法无法从输入生成有效地址，则返回空字符串。String

示例

默认 Bech32m

-- When no witness version is supplied, the default is 1, the updated Bech32m algorithm.
SELECT bech32Encode('bc', unhex('751e76e8199196d454941c45d1b3a323f1433bd6'))

bc1w508d6qejxtdg4y5r3zarvary0c5xw7k8zcwmq

Bech32 算法

-- A witness version of 0 will result in a different address string.
SELECT bech32Encode('bc', unhex('751e76e8199196d454941c45d1b3a323f1433bd6'), 0)

bc1w508d6qejxtdg4y5r3zarvary0c5xw7kj7gz7z

自定义 HRP

-- While 'bc' (Mainnet) and 'tb' (Testnet) are the only allowed hrp values for the
-- SegWit address format, Bech32 allows any hrp that satisfies the above requirements.
SELECT bech32Encode('abcdefg', unhex('751e76e8199196d454941c45d1b3a323f1433bd6'), 10)

abcdefg1w508d6qejxtdg4y5r3zarvary0c5xw7k9rp8r4

Cosmos SDK 地址 (BIP173，无 witness 版本)

-- Using 'bech32' variant encodes raw data without a witness version byte,
-- compatible with Cosmos SDK, Injective, Osmosis, and other non-SegWit chains.
SELECT bech32Encode('inj', unhex('751e76e8199196d454941c45d1b3a323f1433bd6'), 'bech32')

inj1w508d6qejxtdg4y5r3zarvary0c5xw7kgj5aqs

bin

引入于：v21.8.0

返回一个包含参数二进制表示的字符串，针对不同类型遵循以下逻辑：

Type	Description
`(U)Int*`	以从最高有效位到最低有效位的顺序 (大端序或“人类可读”顺序) 输出二进制位。输出从最高有效的非零字节开始 (前导零字节会被省略) ，但如果某个字节的最高有效位为零，则该字节仍然会输出 8 位二进制数字。
`Date` and `DateTime`	按照相应的整数进行格式化 (`Date` 为自 epoch (Unix 纪元) 起的天数，`DateTime` 为 Unix 时间戳的值) 。
`String` and `FixedString`	将所有字节直接编码为 8 位二进制数。零字节不会被省略。
`Float*` and `Decimal`	按其在内存中的表示进行编码。由于我们使用小端架构，因此以小端序编码。前导或结尾的零字节不会被省略。
`UUID`	按大端序编码为字符串。

语法

bin(arg)

参数

arg — 要转换为二进制的值。String 或 FixedString 或 (U)Int* 或 Float* 或 Decimal 或 Date 或 DateTime

返回值

返回一个表示该参数二进制形式的字符串。String

示例

简单整数

SELECT bin(14)

┌─bin(14)──┐
│ 00001110 │
└──────────┘

Float32 浮点数

SELECT bin(toFloat32(number)) AS bin_presentation FROM numbers(15, 2)

┌─bin_presentation─────────────────┐
│ 00000000000000000111000001000001 │
│ 00000000000000001000000001000001 │
└──────────────────────────────────┘

Float64 浮点数

SELECT bin(toFloat64(number)) AS bin_presentation FROM numbers(15, 2)

┌─bin_presentation─────────────────────────────────────────────────┐
│ 0000000000000000000000000000000000000000000000000010111001000000 │
│ 0000000000000000000000000000000000000000000000000011000001000000 │
└──────────────────────────────────────────────────────────────────┘

UUID 转换

SELECT bin(toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0')) AS bin_uuid

┌─bin_uuid─────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┐
│ 01100001111100001100010000000100010111001011001100010001111001111001000001111011101001100000000001101010110100111101101110100000 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────────┘

bitPositionsToArray

引入版本：v21.7.0

该函数返回无符号整数二进制表示中所有 1 比特的位置 (按升序排列) 。对于有符号输入整数，会先将其转换为无符号整数。

语法

bitPositionsToArray(arg)

参数

arg — 一个整数值。(U)Int*

返回值

返回一个数组，包含输入值二进制表示中为 1 的位的位置，按升序排列。Array(UInt64)

示例

单个位为 1

SELECT bitPositionsToArray(toInt8(1)) AS bit_positions

┌─bit_positions─┐
│ [0]           │
└───────────────┘

所有位为 1

SELECT bitPositionsToArray(toInt8(-1)) AS bit_positions

┌─bit_positions─────────────┐
│ [0, 1, 2, 3, 4, 5, 6, 7]  │
└───────────────────────────┘

bitmaskToArray

自 v1.1.0 起引入

此函数将一个整数分解为若干个 2 的幂之和。这些 2 的幂以按升序排列的数组形式返回。

语法

bitmaskToArray(num)

参数

num — 一个整数值。(U)Int*

返回值

返回一个数组，其中包含按升序排列的 2 的幂，这些幂的总和等于输入的数值。Array(UInt64)

示例

基础示例

SELECT bitmaskToArray(50) AS powers_of_two

┌─powers_of_two───┐
│ [2, 16, 32]     │
└─────────────────┘

单一 2 的幂

SELECT bitmaskToArray(8) AS powers_of_two

┌─powers_of_two─┐
│ [8]           │
└───────────────┘

bitmaskToList

引入版本：v1.1.0

与 bitmaskToArray 类似，但返回的是由 2 的幂构成、以逗号分隔的字符串。

语法

bitmaskToList(num)

参数

num — 一个整数值。(U)Int*

返回值

返回一个字符串，内容为以逗号分隔的 2 的各次幂。String

示例

基本示例

SELECT bitmaskToList(50) AS powers_list

┌─powers_list───┐
│ 2, 16, 32     │
└───────────────┘

char

引入版本：v20.1.0

返回一个字符串，其长度等于传入参数的数量，其中每个字节的值等于对应的参数值。接受多个数值类型参数。

如果参数值超出了 UInt8 数据类型的取值范围，则会被转换为 UInt8，并可能发生舍入和溢出。

语法

char(num1[, num2[, ...]])

参数

num1[, num2[, num3 ...]] — 数值参数，按整数解释。(U)Int8/16/32/64 或 Float*

返回值

返回由给定字节构成的字符串。String

示例

基本示例

SELECT char(104.1, 101, 108.9, 108.9, 111) AS hello;

┌─hello─┐
│ hello │
└───────┘

构造任意编码字符串

-- You can construct a string of arbitrary encoding by passing the corresponding bytes.
-- for example UTF8
SELECT char(0xD0, 0xBF, 0xD1, 0x80, 0xD0, 0xB8, 0xD0, 0xB2, 0xD0, 0xB5, 0xD1, 0x82) AS hello;

┌─hello──┐
│ привет │
└────────┘

hex

引入于：v1.1.0

返回一个字符串，包含参数的十六进制表示形式，不同类型按照以下逻辑处理：

Type	Description
`(U)Int*`	按从最高有效位到最低有效位的顺序 (大端或“人类可读”顺序) 打印十六进制数字 (“nibbles”) 。从最高有效的非零字节开始 (忽略前导零字节) ，但对每个字节始终打印两位十六进制数字，即使高位为零也不省略。
`Date` and `DateTime`	格式化为对应的整数 (`Date` 为自纪元以来的天数，`DateTime` 为 Unix 时间戳的值) 。
`String` and `FixedString`	所有字节都直接编码为两个十六进制数字。零字节不会被省略。
`Float*` and `Decimal`	按照其在内存中的表示进行编码。ClickHouse 在内部始终以小端方式表示这些值，因此编码结果也是小端顺序。前导或结尾的零字节不会被省略。
`UUID`	按大端顺序编码为字符串。

该函数使用大写字母 A-F，且不使用任何前缀 (如 0x) 或后缀 (如 h) 。

语法

hex(arg)

参数

arg — 要转换为十六进制的值。String 或 (U)Int* 或 Float* 或 Decimal 或 Date 或 DateTime

返回值

返回一个表示参数十六进制形式的字符串。String

示例

简单整数

SELECT hex(1)

Float32 浮点数

SELECT hex(toFloat32(number)) AS hex_presentation FROM numbers(15, 2)

┌─hex_presentation─┐
│ 00007041         │
│ 00008041         │
└──────────────────┘

Float64 浮点数

SELECT hex(toFloat64(number)) AS hex_presentation FROM numbers(15, 2)

┌─hex_presentation─┐
│ 0000000000002E40 │
│ 0000000000003040 │
└──────────────────┘

UUID 转换

SELECT lower(hex(toUUID('61f0c404-5cb3-11e7-907b-a6006ad3dba0'))) AS uuid_hex

┌─uuid_hex─────────────────────────┐
│ 61f0c4045cb311e7907ba6006ad3dba0 │
└──────────────────────────────────┘

hilbertDecode

引入版本：v24.6.0

将 Hilbert 曲线索引解码为无符号整数的元组，用于表示多维空间中的坐标。

与 hilbertEncode 函数一样，该函数有两种工作模式：

简单模式
扩展模式

简单模式

接受至多 2 个无符号整数作为参数，并生成一个 UInt64 编码值。

扩展模式

接受一个范围掩码 (元组) 作为第一个参数，并接受至多 2 个无符号整数作为其他参数。掩码中的每个数字用于配置对应参数左移的位数，从而在其范围内对参数进行缩放。

当你需要让取值范围 (或基数) 差异很大的参数获得相似的分布时，范围扩展会很有用。例如：'IP Address' (0...FFFFFFFF) 和 'Country code' (0...FF)。与编码函数相同，最多只能使用 8 个数字。

语法

hilbertDecode(tuple_size, code)

参数

tuple_size — 不大于 2 的整数值。UInt8/16/32/64 或 Tuple(UInt8/16/32/64)
code — UInt64 类型的代码值。UInt64

返回值

返回指定大小的元组。Tuple(UInt64)

示例

简单模式

SELECT hilbertDecode(2, 31)

["3", "4"]

单参数

-- Hilbert code for one argument is always the argument itself (as a tuple).
SELECT hilbertDecode(1, 1)

["1"]

展开模式

-- A single argument with a tuple specifying bit shifts will be right-shifted accordingly.
SELECT hilbertDecode(tuple(2), 32768)

["128"]

列的用法

-- First create the table and insert some data
CREATE TABLE hilbert_numbers(
    n1 UInt32,
    n2 UInt32
)
ENGINE=MergeTree()
ORDER BY n1 SETTINGS index_granularity_bytes = '10Mi';
insert into hilbert_numbers (*) values(1,2);

-- Use column names instead of constants as function arguments
SELECT untuple(hilbertDecode(2, hilbertEncode(n1, n2))) FROM hilbert_numbers;

1    2

hilbertEncode

引入版本：v24.6.0

为一组无符号整数计算 Hilbert 曲线的编码。

此 FUNCTION 有两种运行模式：

简单模式
扩展模式

简单模式

接受最多 2 个无符号整数作为参数，并返回一个 UInt64 编码值。

扩展模式

将范围掩码 (Tuple) 作为第一个参数，并将最多 2 个无符号整数作为其余参数。

掩码中的每个数值用于指定对应参数左移的位数，从而在其各自范围内对参数进行缩放。

语法

-- Simplified mode
hilbertEncode(args)

-- Expanded mode
hilbertEncode(range_mask, args)

参数

args — 最多两个 UInt 值或 UInt 类型的列。UInt8/16/32/64
range_mask — 在扩展模式下，最多两个 UInt 值或 UInt 类型的列。UInt8/16/32/64

返回值

返回一个 UInt64 编码值。UInt64

示例

简单模式

SELECT hilbertEncode(3, 4)

扩展模式

-- Range expansion can be beneficial when you need a similar distribution for
-- arguments with wildly different ranges (or cardinality).
-- For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
-- Note: tuple size must be equal to the number of the other arguments.
SELECT hilbertEncode((10, 6), 1024, 16)

4031541586602

单个参数

-- For a single argument without a tuple, the function returns the argument
-- itself as the Hilbert index, since no dimensional mapping is needed.
SELECT hilbertEncode(1)

单个参数 (展开模式)

-- If a single argument is provided with a tuple specifying bit shifts, the function
-- shifts the argument left by the specified number of bits.
SELECT hilbertEncode(tuple(2), 128)

列的使用方式

-- First create the table and insert some data
CREATE TABLE hilbert_numbers(
    n1 UInt32,
    n2 UInt32
)
ENGINE=MergeTree()
ORDER BY n1;
insert into hilbert_numbers (*) values(1, 2);

-- Use column names instead of constants as function arguments
SELECT hilbertEncode(n1, n2) FROM hilbert_numbers;

mortonDecode

引入版本：v24.6.0

将 Morton 编码 (ZCurve) 解码为对应的无符号整数元组。

与 mortonEncode 函数一样，此函数有两种运行模式：

简单模式
扩展模式

简单模式

接收结果元组大小作为第一个参数，编码值作为第二个参数。

扩展模式

接收范围掩码 (元组) 作为第一个参数，编码值作为第二个参数。掩码中的每个数字用于配置范围缩减倍数：

1 - 不缩减
2 - 缩减 2 倍
3 - 缩减 3 倍 ⋮
最多缩减 8 倍。

当你需要为范围 (或基数) 相差极大的参数获得相似分布时，范围扩展会很有用。例如：'IP Address' (0...FFFFFFFF) 和 'Country code' (0...FF)。与编码函数一样，这里同样最多限制为 8 个数字。

语法

-- Simple mode
mortonDecode(tuple_size, code)

-- Expanded mode
mortonDecode(range_mask, code)

参数

tuple_size — 不超过 8 的整数值。UInt8/16/32/64
range_mask — 在扩展模式下，每个参数对应的掩码。该掩码是一个无符号整数的元组。掩码中的每个数字用于配置范围缩减倍数。Tuple(UInt8/16/32/64)
code — UInt64 代码。UInt64

返回值

返回一个指定大小的元组。Tuple(UInt64)

示例

简单模式

SELECT mortonDecode(3, 53)

["1", "2", "3"]

单参数

SELECT mortonDecode(1, 1)

["1"]

扩展模式 (缩减一个参数的范围)

SELECT mortonDecode(tuple(2), 32768)

["128"]

列使用说明

-- First create the table and insert some data
CREATE TABLE morton_numbers(
    n1 UInt32,
    n2 UInt32,
    n3 UInt16,
    n4 UInt16,
    n5 UInt8,
    n6 UInt8,
    n7 UInt8,
    n8 UInt8
)
ENGINE=MergeTree()
ORDER BY n1;
INSERT INTO morton_numbers (*) values(1, 2, 3, 4, 5, 6, 7, 8);

-- Use column names instead of constants as function arguments
SELECT untuple(mortonDecode(8, mortonEncode(n1, n2, n3, n4, n5, n6, n7, n8))) FROM morton_numbers;

1 2 3 4 5 6 7 8

mortonEncode

引入于：v24.6.0

为一组无符号整数计算 Morton 编码 (Z 曲线，ZCurve) 。

该函数有两种运行模式：

简单
扩展*

简单模式

接受最多 8 个无符号整数作为参数，并生成一个 UInt64 编码值。

扩展模式

接受一个范围掩码 (Tuple) 作为第一个参数，以及最多 8 个无符号整数作为其他参数。

掩码中的每个数字用于配置范围扩展倍数：

1 - 不扩展
2 - 2 倍扩展
3 - 3 倍扩展 ⋮
最多 8 倍扩展。

语法

-- Simplified mode
mortonEncode(args)

-- Expanded mode
mortonEncode(range_mask, args)

参数

args — 最多 8 个无符号整数或上述类型的列。UInt8/16/32/64
range_mask — 在扩展模式下，每个参数对应的掩码。掩码是由 1–8 范围内的无符号整数组成的元组。掩码中的每个数字用于配置区间缩减的幅度。Tuple(UInt8/16/32/64)

返回值

返回一个 UInt64 编码值。UInt64

示例

简单模式

SELECT mortonEncode(1, 2, 3)

扩展模式

-- Range expansion can be beneficial when you need a similar distribution for
-- arguments with wildly different ranges (or cardinality)
-- For example: 'IP Address' (0...FFFFFFFF) and 'Country code' (0...FF).
-- Note: the Tuple size must be equal to the number of the other arguments.
SELECT mortonEncode((1,2), 1024, 16)

单个参数

-- Morton encoding for one argument is always the argument itself
SELECT mortonEncode(1)

单参数扩展模式

SELECT mortonEncode(tuple(2), 128)

列的使用

-- First create the table and insert some data
CREATE TABLE morton_numbers(
    n1 UInt32,
    n2 UInt32,
    n3 UInt16,
    n4 UInt16,
    n5 UInt8,
    n6 UInt8,
    n7 UInt8,
    n8 UInt8
)
ENGINE=MergeTree()
ORDER BY n1;
INSERT INTO morton_numbers (*) values(1, 2, 3, 4, 5, 6, 7, 8);

-- Use column names instead of constants as function arguments
SELECT mortonEncode(n1, n2, n3, n4, n5, n6, n7, n8) FROM morton_numbers;

2155374165

sqidDecode

引入于：v24.1.0

将一个 sqid 解码回数字数组。

语法

sqidDecode(sqid)

参数

sqid — 要解码的 sqid。String

返回值

返回从 sqid 解码得到的数字数组。Array(UInt64)

示例

用法示例

SELECT sqidDecode('gXHfJ1C6dN');

┌─sqidDecode('gXHfJ1C6dN')─────┐
│ [1, 2, 3, 4, 5]              │
└──────────────────────────────┘

sqidEncode

自 v24.1.0 引入

将数字转换为一个 sqid (类似 YouTube 的 ID 字符串) 。

语法

sqidEncode(n1[, n2, ...])

别名: sqid

参数

n1[, n2, ...] — 任意数量的数值。UInt8/16/32/64

返回值

返回一个哈希 ID，类型为 String

示例

用法示例

SELECT sqidEncode(1, 2, 3, 4, 5);

┌─sqidEncode(1, 2, 3, 4, 5)─┐
│ gXHfJ1C6dN                │
└───────────────────────────┘

unbin

引入版本：v21.8.0

将参数中每两位二进制数字解释为一个数值，并将其转换为该数值所表示的字节。该函数执行与 bin 相反的操作。

对于数值参数，unbin() 不会返回 bin() 的逆运算结果。如果你想将结果再转换为数值，可以使用 reverse 和 reinterpretAs<Type> 函数。

注意

如果在 clickhouse-client 中调用 unbin，二进制字符串将以 UTF-8 编码的形式显示。

支持二进制数字 0 和 1。二进制数字的数量不必是 8 的倍数。如果参数字符串包含二进制数字以外的任何字符，结果是未定义的 (不会抛出异常) 。

语法

unbin(arg)

参数

arg — 包含任意数量二进制位的字符串。String

返回值

返回一个二进制字符串 (BLOB) 。String

示例

基本用法

SELECT UNBIN('001100000011000100110010'), UNBIN('0100110101111001010100110101000101001100')

┌─unbin('001100000011000100110010')─┬─unbin('0100110101111001010100110101000101001100')─┐
│ 012                               │ MySQL                                             │
└───────────────────────────────────┴───────────────────────────────────────────────────┘

转换为数值

SELECT reinterpretAsUInt64(reverse(unbin('1110'))) AS num

┌─num─┐
│  14 │
└─────┘

unhex

引入版本：v1.1.0

执行与 hex 相反的操作。它将参数中的每一对十六进制数字解释为一个数值，并将其转换为该数值所表示的字节。返回值是一个二进制字符串 (BLOB) 。

如果想将结果转换为数值，可以使用 reverse 和 reinterpretAs<Type> 函数。

注意

clickhouse-client 将字符串解释为 UTF-8。这可能会导致 hex 返回的值在显示时看起来与预期不符。

同时支持大写和小写字母 A-F。十六进制数字的数量不必为偶数。如果是奇数，最后一位数字会被解释为 00-0F 字节中最低有效的半字节。如果参数字符串包含十六进制数字以外的任何内容，将返回某种与具体实现相关的结果 (不会抛出异常) 。对于数值参数，unhex() 不会执行 hex(N) 的逆运算。

语法

unhex(arg)

参数

arg — 包含任意数量十六进制数字的字符串。String 或 FixedString

返回值

返回一个二进制字符串 (BLOB) 。String

示例

基本用法

SELECT unhex('303132'), UNHEX('4D7953514C')

┌─unhex('303132')─┬─unhex('4D7953514C')─┐
│ 012             │ MySQL               │
└─────────────────┴─────────────────────┘

转换为数值

SELECT reinterpretAsUInt64(reverse(unhex('FFF'))) AS num

┌──num─┐
│ 4095 │
└──────┘

bech32Decode​

bech32Encode​

bin​

bitPositionsToArray​

bitmaskToArray​

bitmaskToList​

char​

hex​

hilbertDecode​

hilbertEncode​

mortonDecode​

mortonEncode​

sqidDecode​

sqidEncode​

unbin​

unhex​

bech32Decode

bech32Encode

bin

bitPositionsToArray

bitmaskToArray

bitmaskToList

char

hex

hilbertDecode

hilbertEncode

mortonDecode

mortonEncode

sqidDecode

sqidEncode

unbin

unhex