Skip to main content
Skip to main content

Functions for Searching in Strings

All functions in this section search case-sensitively by default. Case-insensitive search is usually provided by separate function variants.

Note

Case-insensitive search follows the lowercase-uppercase rules of the English language. E.g. Uppercased i in the English language is I whereas in the Turkish language it is İ - results for languages other than English may be unexpected.

Functions in this section also assume that the searched string (referred to in this section as haystack) and the search string (referred to in this section as needle) are single-byte encoded text. If this assumption is violated, no exception is thrown and results are undefined. Search with UTF-8 encoded strings is usually provided by separate function variants. Likewise, if a UTF-8 function variant is used and the input strings are not UTF-8 encoded text, no exception is thrown and the results are undefined. Note that no automatic Unicode normalization is performed, however you can use the normalizeUTF8*() functions for that.

General strings functions and functions for replacing in strings are described separately.

position

Returns the position (in bytes, starting at 1) of a substring needle in a string haystack.

Syntax

position(haystack, needle[, start_pos])

Alias:

  • position(needle IN haystack)

Arguments

  • haystack — String in which the search is performed. String or Enum.
  • needle — Substring to be searched. String.
  • start_pos – Position (1-based) in haystack at which the search starts. UInt. Optional.

Returned value

  • Starting position in bytes and counting from 1, if the substring was found. UInt64.
  • 0, if the substring was not found. UInt64.

If substring needle is empty, these rules apply:

  • if no start_pos was specified: return 1
  • if start_pos = 0: return 1
  • if start_pos >= 1 and start_pos <= length(haystack) + 1: return start_pos
  • otherwise: return 0

The same rules also apply to functions locate, positionCaseInsensitive, positionUTF8 and positionCaseInsensitiveUTF8.

Examples

Query:

SELECT position('Hello, world!', '!');

Result:

┌─position('Hello, world!', '!')─┐
│                             13 │
└────────────────────────────────┘

Example with start_pos argument:

Query:

SELECT
    position('Hello, world!', 'o', 1),
    position('Hello, world!', 'o', 7)

Result:

┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│                                 5 │                                 9 │
└───────────────────────────────────┴───────────────────────────────────┘

Example for needle IN haystack syntax:

Query:

SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s);

Result:

┌─equals(6, position(s, '/'))─┐
│                           1 │
└─────────────────────────────┘

Examples with empty needle substring:

Query:

SELECT
    position('abc', ''),
    position('abc', '', 0),
    position('abc', '', 1),
    position('abc', '', 2),
    position('abc', '', 3),
    position('abc', '', 4),
    position('abc', '', 5)

Result:

┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│                   1 │                      1 │                      1 │                      2 │                      3 │                      4 │                      0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘

locate

Like position but with arguments haystack and locate switched.

The behavior of this function depends on the ClickHouse version:

  • in versions < v24.3, locate was an alias of function position and accepted arguments (haystack, needle[, start_pos]).
  • in versions >= 24.3,, locate is an individual function (for better compatibility with MySQL) and accepts arguments (needle, haystack[, start_pos]). The previous behavior can be restored using setting function_locate_has_mysql_compatible_argument_order = false;

Syntax

locate(needle, haystack[, start_pos])

positionCaseInsensitive

A case insensitive invariant of position.

Example

Query:

SELECT positionCaseInsensitive('Hello, world!', 'hello');

Result:

┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│                                                 1 │
└───────────────────────────────────────────────────┘

positionUTF8

Like position but assumes haystack and needle are UTF-8 encoded strings.

Examples

Function positionUTF8 correctly counts character ö (represented by two points) as a single Unicode codepoint:

Query:

SELECT positionUTF8('Motörhead', 'r');

Result:

┌─position('Motörhead', 'r')─┐
│                          5 │
└────────────────────────────┘

positionCaseInsensitiveUTF8

Like positionUTF8 but searches case-insensitively.

multiSearchAllPositions

Like position but returns an array of positions (in bytes, starting at 1) for multiple needle substrings in a haystack string.

Note

All multiSearch*() functions only support up to 28 needles.

Syntax

multiSearchAllPositions(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array.

Returned value

  • Array of the starting position in bytes and counting from 1, if the substring was found.
  • 0, if the substring was not found.

Example

Query:

SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world']);

Result:

┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0]                                                          │
└───────────────────────────────────────────────────────────────────┘

multiSearchAllPositionsCaseInsensitive

Like multiSearchAllPositions but ignores case.

Syntax

multiSearchAllPositionsCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array.

Returned value

  • Array of the starting position in bytes and counting from 1 (if the substring was found).
  • 0 if the substring was not found.

Example

Query:

SELECT multiSearchAllPositionsCaseInsensitive('ClickHouse',['c','h']);

Result:

["1","6"]

multiSearchAllPositionsUTF8

Like multiSearchAllPositions but assumes haystack and the needle substrings are UTF-8 encoded strings.

Syntax

multiSearchAllPositionsUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 encoded string in which the search is performed. String.
  • needle — UTF-8 encoded substrings to be searched. Array.

Returned value

  • Array of the starting position in bytes and counting from 1 (if the substring was found).
  • 0 if the substring was not found.

Example

Given ClickHouse as a UTF-8 string, find the positions of C (\x43) and H (\x48).

Query:

SELECT multiSearchAllPositionsUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x43','\x48']);

Result:

["1","6"]

multiSearchAllPositionsCaseInsensitiveUTF8

Like multiSearchAllPositionsUTF8 but ignores case.

Syntax

multiSearchAllPositionsCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 encoded string in which the search is performed. String.
  • needle — UTF-8 encoded substrings to be searched. Array.

Returned value

  • Array of the starting position in bytes and counting from 1 (if the substring was found).
  • 0 if the substring was not found.

Example

Given ClickHouse as a UTF-8 string, find the positions of c (\x63) and h (\x68).

Query:

SELECT multiSearchAllPositionsCaseInsensitiveUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x63','\x68']);

Result:

["1","6"]

multiSearchFirstPosition

Like position but returns the leftmost offset in a haystack string which matches any of multiple needle strings.

Functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8 and multiSearchFirstPositionCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

multiSearchFirstPosition(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array.

Returned value

  • Leftmost offset in a haystack string which matches any of multiple needle strings.
  • 0, if there was no match.

Example

Query:

SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld']);

Result:

3

multiSearchFirstPositionCaseInsensitive

Like multiSearchFirstPosition but ignores case.

Syntax

multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Array of substrings to be searched. Array.

Returned value

  • Leftmost offset in a haystack string which matches any of multiple needle strings.
  • 0, if there was no match.

Example

Query:

SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello']);

Result:

2

multiSearchFirstPositionUTF8

Like multiSearchFirstPosition but assumes haystack and needle to be UTF-8 strings.

Syntax

multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 string in which the search is performed. String.
  • needle — Array of UTF-8 substrings to be searched. Array.

Returned value

  • Leftmost offset in a haystack string which matches any of multiple needle strings.
  • 0, if there was no match.

Example

Find the leftmost offset in UTF-8 string hello world which matches any of the given needles.

Query:

SELECT multiSearchFirstPositionUTF8('\x68\x65\x6c\x6c\x6f\x20\x77\x6f\x72\x6c\x64',['wor', 'ld', 'ello']);

Result:

2

multiSearchFirstPositionCaseInsensitiveUTF8

Like multiSearchFirstPosition but assumes haystack and needle to be UTF-8 strings and ignores case.

Syntax

multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 string in which the search is performed. String.
  • needle — Array of UTF-8 substrings to be searched. Array

Returned value

  • Leftmost offset in a haystack string which matches any of multiple needle strings, ignoring case.
  • 0, if there was no match.

Example

Find the leftmost offset in UTF-8 string HELLO WORLD which matches any of the given needles.

Query:

SELECT multiSearchFirstPositionCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['wor', 'ld', 'ello']);

Result:

2

multiSearchFirstIndex

Returns the index i (starting from 1) of the leftmost found needlei in the string haystack and 0 otherwise.

Functions multiSearchFirstIndexCaseInsensitive, multiSearchFirstIndexUTF8 and multiSearchFirstIndexCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

multiSearchFirstIndex(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array.

Returned value

  • index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. UInt8.

Example

Query:

SELECT multiSearchFirstIndex('Hello World',['World','Hello']);

Result:

1

multiSearchFirstIndexCaseInsensitive

Returns the index i (starting from 1) of the leftmost found needlei in the string haystack and 0 otherwise. Ignores case.

Syntax

multiSearchFirstIndexCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array.

Returned value

  • index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. UInt8.

Example

Query:

SELECT multiSearchFirstIndexCaseInsensitive('hElLo WoRlD',['World','Hello']);

Result:

1

multiSearchFirstIndexUTF8

Returns the index i (starting from 1) of the leftmost found needlei in the string haystack and 0 otherwise. Assumes haystack and needle are UTF-8 encoded strings.

Syntax

multiSearchFirstIndexUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 string in which the search is performed. String.
  • needle — Array of UTF-8 substrings to be searched. Array

Returned value

  • index (starting from 1) of the leftmost found needle, Otherwise 0, if there was no match. UInt8.

Example

Given Hello World as a UTF-8 string, find the first index of UTF-8 strings Hello and World.

Query:

SELECT multiSearchFirstIndexUTF8('\x48\x65\x6c\x6c\x6f\x20\x57\x6f\x72\x6c\x64',['\x57\x6f\x72\x6c\x64','\x48\x65\x6c\x6c\x6f']);

Result:

1

multiSearchFirstIndexCaseInsensitiveUTF8

Returns the index i (starting from 1) of the leftmost found needlei in the string haystack and 0 otherwise. Assumes haystack and needle are UTF-8 encoded strings. Ignores case.

Syntax

multiSearchFirstIndexCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 string in which the search is performed. String.
  • needle — Array of UTF-8 substrings to be searched. Array.

Returned value

  • index (starting from 1) of the leftmost found needle. Otherwise 0, if there was no match. UInt8.

Example

Given HELLO WORLD as a UTF-8 string, find the first index of UTF-8 strings hello and world.

Query:

SELECT multiSearchFirstIndexCaseInsensitiveUTF8('\x48\x45\x4c\x4c\x4f\x20\x57\x4f\x52\x4c\x44',['\x68\x65\x6c\x6c\x6f','\x77\x6f\x72\x6c\x64']);

Result:

1

multiSearchAny

Returns 1, if at least one string needlei matches the string haystack and 0 otherwise.

Functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8 and multiSearchAnyCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

multiSearchAny(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array.

Returned value

  • 1, if there was at least one match.
  • 0, if there was not at least one match.

Example

Query:

SELECT multiSearchAny('ClickHouse',['C','H']);

Result:

1

multiSearchAnyCaseInsensitive

Like multiSearchAny but ignores case.

Syntax

multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — String in which the search is performed. String.
  • needle — Substrings to be searched. Array

Returned value

  • 1, if there was at least one case-insensitive match.
  • 0, if there was not at least one case-insensitive match.

Example

Query:

SELECT multiSearchAnyCaseInsensitive('ClickHouse',['c','h']);

Result:

1

multiSearchAnyUTF8

Like multiSearchAny but assumes haystack and the needle substrings are UTF-8 encoded strings.

Syntax*

multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 string in which the search is performed. String.
  • needle — UTF-8 substrings to be searched. Array.

Returned value

  • 1, if there was at least one match.
  • 0, if there was not at least one match.

Example

Given ClickHouse as a UTF-8 string, check if there are any C ('\x43') or H ('\x48') letters in the word.

Query:

SELECT multiSearchAnyUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x43','\x48']);

Result:

1

multiSearchAnyCaseInsensitiveUTF8

Like multiSearchAnyUTF8 but ignores case.

Syntax*

multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Parameters

  • haystack — UTF-8 string in which the search is performed. String.
  • needle — UTF-8 substrings to be searched. Array

Returned value

  • 1, if there was at least one case-insensitive match.
  • 0, if there was not at least one case-insensitive match.

Example

Given ClickHouse as a UTF-8 string, check if there is any letter h(\x68) in the word, ignoring case.

Query:

SELECT multiSearchAnyCaseInsensitiveUTF8('\x43\x6c\x69\x63\x6b\x48\x6f\x75\x73\x65',['\x68']);

Result:

1

hasAnyTokens

Returns 1, if at least one token in the needle string or array matches the input string, and 0 otherwise. If input is a column, returns all rows that satisfy this condition.

Note

Column input should have a text index defined for optimal performance. If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.

Prior to searching, the function tokenizes

  • the input argument (always), and
  • the needle argument (if given as a String) using the tokenizer specified for the text index. If the column has no text index defined, the splitByNonAlpha tokenizer is used instead. If the needle argument is of type Array(String), each array element is treated as a token — no additional tokenization takes place.

Duplicate tokens are ignored. For example, ['ClickHouse', 'ClickHouse'] is treated the same as ['ClickHouse'].

Syntax

hasAnyTokens(input, needles)

Parameters

Returned value

Returns 1, if there was at least one match. 0, otherwise.

Example

Query:

CREATE TABLE table (
    id UInt32,
    msg String,
    INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');

SELECT count() FROM table WHERE hasAnyTokens(msg, 'a\\d()');

Result:

┌─count()─┐
│       3 │
└─────────┘

Specify needles to be searched for AS-IS (no tokenization) in an array:

SELECT count() FROM table WHERE hasAnyTokens(msg, ['a', 'd']);

Result:

┌─count()─┐
│       3 │
└─────────┘

Generate needles using the tokens function:

SELECT count() FROM table WHERE hasAnyTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));

Result:

┌─count()─┐
│       3 │
└─────────┘

hasAllTokens

Like hasAnyTokens, but returns 1, if all tokens in the needle string or array match the input string, and 0 otherwise. If input is a column, returns all rows that satisfy this condition.

Note

Column input should have a text index defined for optimal performance. If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.

Prior to searching, the function tokenizes

  • the input argument (always), and
  • the needle argument (if given as a String) using the tokenizer specified for the text index. If the column has no text index defined, the splitByNonAlpha tokenizer is used instead. If the needle argument is of type Array(String), each array element is treated as a token — no additional tokenization takes place.

Duplicate tokens are ignored. For example, needles = ['ClickHouse', 'ClickHouse'] is treated the same as ['ClickHouse'].

Syntax

hasAllTokens(input, needles)

Parameters

Returned value

Returns 1, if all needles match. 0, otherwise.

Example

Query:

CREATE TABLE table (
    id UInt32,
    msg String,
    INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');

SELECT count() FROM table WHERE hasAllTokens(msg, 'a\\d()');

Result:

┌─count()─┐
│       1 │
└─────────┘

Specify needles to be searched for AS-IS (no tokenization) in an array:

SELECT count() FROM table WHERE hasAllTokens(msg, ['a', 'd']);

Result:

┌─count()─┐
│       1 │
└─────────┘

Generate needles using the tokens function:

SELECT count() FROM table WHERE hasAllTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));

Result:

┌─count()─┐
│       1 │
└─────────┘

match

Returns whether string haystack matches the regular expression pattern in re2 regular expression syntax.

Matching is based on UTF-8, e.g. . matches the Unicode code point ¥ which is represented in UTF-8 using two bytes. The regular expression must not contain null bytes. If the haystack or the pattern are not valid UTF-8, then the behavior is undefined.

Unlike re2's default behavior, . matches line breaks. To disable this, prepend the pattern with (?-s).

If you only want to search substrings in a string, you can use functions like or position instead - they work much faster than this function.

Syntax

match(haystack, pattern)

Alias: haystack REGEXP pattern operator

multiMatchAny

Like match but returns 1 if at least one of the patterns match and 0 otherwise.

Note

Functions in the multi[Fuzzy]Match*() family use the the (Vectorscan)[https://github.com/VectorCamp/vectorscan] library. As such, they are only enabled if ClickHouse is compiled with support for vectorscan.

To turn off all functions that use hyperscan, use setting SET allow_hyperscan = 0;.

Due to restrictions of vectorscan, the length of the haystack string must be less than 232 bytes.

Hyperscan is generally vulnerable to regular expression denial of service (ReDoS) attacks (e.g. see (here)[https://www.usenix.org/conference/usenixsecurity22/presentation/turonova], (here)[https://doi.org/10.1007/s10664-021-10033-1] and (here)[https://doi.org/10.1145/3236024.3236027]. Users are adviced to check the provided patterns carefully.

If you only want to search multiple substrings in a string, you can use function multiSearchAny instead - it works much faster than this function.

Syntax

multiMatchAny(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, ..., pattern<sub>n</sub>\])

multiMatchAnyIndex

Like multiMatchAny but returns any index that matches the haystack.

Syntax

multiMatchAnyIndex(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, ..., pattern<sub>n</sub>\])

multiMatchAllIndices

Like multiMatchAny but returns the array of all indices that match the haystack in any order.

Syntax

multiMatchAllIndices(haystack, \[pattern<sub>1</sub>, pattern<sub>2</sub>, ..., pattern<sub>n</sub>\])

multiFuzzyMatchAny

Like multiMatchAny but returns 1 if any pattern matches the haystack within a constant edit distance. This function relies on the experimental feature of hyperscan library, and can be slow for some corner cases. The performance depends on the edit distance value and patterns used, but it's always more expensive compared to a non-fuzzy variants.

Note

multiFuzzyMatch*() function family do not support UTF-8 regular expressions (it threats them as a sequence of bytes) due to restrictions of hyperscan.

Syntax

multiFuzzyMatchAny(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, ..., pattern<sub>n</sub>\])

multiFuzzyMatchAnyIndex

Like multiFuzzyMatchAny but returns any index that matches the haystack within a constant edit distance.

Syntax

multiFuzzyMatchAnyIndex(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, ..., pattern<sub>n</sub>\])

multiFuzzyMatchAllIndices

Like multiFuzzyMatchAny but returns the array of all indices in any order that match the haystack within a constant edit distance.

Syntax

multiFuzzyMatchAllIndices(haystack, distance, \[pattern<sub>1</sub>, pattern<sub>2</sub>, ..., pattern<sub>n</sub>\])

extract

Returns the first match of a regular expression in a string. If haystack does not match the pattern regex, an empty string is returned.

If the regular expression has capturing groups, the function matches the input string against the first capturing group.

Syntax

extract(haystack, pattern)

Arguments*

Returned value

  • The first match of the regular expression in the haystack string. String.

Example

Query:

SELECT extract('number: 1, number: 2, number: 3', '\\d+') AS result;

Result:

┌─result─┐
│ 1      │
└────────┘

extractAll

Returns an array of all matches of a regular expression in a string. If haystack does not match the pattern regex, an empty string is returned.

The behavior with respect to sub-patterns is the same as in function extract.

Syntax

extractAll(haystack, pattern)

Arguments*

Returned value

  • Array of matches of the regular expression in the haystack string. Array(String).

Example

Query:

SELECT extractAll('number: 1, number: 2, number: 3', '\\d+') AS result;

Result:

┌─result────────┐
│ ['1','2','3'] │
└───────────────┘

extractAllGroupsHorizontal

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where the first array includes all fragments matching the first group, the second array - matching the second group, etc.

This function is slower than extractAllGroupsVertical.

Syntax

extractAllGroupsHorizontal(haystack, pattern)

Arguments

  • haystack — Input string. String.
  • pattern — Regular expression with re2 regular expression syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown. String.

Returned value

  • Array of arrays of matches. Array.
Note

If haystack does not match the pattern regex, an array of empty arrays is returned.

Example

SELECT extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');

Result:

┌─extractAllGroupsHorizontal('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','def','ghi'],['111','222','333']]                                                │
└──────────────────────────────────────────────────────────────────────────────────────────┘

extractGroups

Match all groups of given input string with a given regular expression, returns an array of arrays of matches.

Syntax

extractGroups(haystack, pattern)

Arguments

  • haystack — Input string. String.
  • pattern — Regular expression with re2 regular expression syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown. String.

Returned value

  • Array of arrays of matches. Array.

Example

SELECT extractGroups('hello abc=111 world', '("[^"]+"|\\w+)=("[^"]+"|\\w+)') AS result;

Result:

┌─result────────┐
│ ['abc','111'] │
└───────────────┘

extractAllGroupsVertical

Matches all groups of the haystack string using the pattern regular expression. Returns an array of arrays, where each array includes matching fragments from every group. Fragments are grouped in order of appearance in the haystack.

Syntax

extractAllGroupsVertical(haystack, pattern)

Arguments

  • haystack — Input string. String.
  • pattern — Regular expression with re2 regular expression syntax. Must contain groups, each group enclosed in parentheses. If pattern contains no groups, an exception is thrown. String.

Returned value

  • Array of arrays of matches. Array.
Note

If haystack does not match the pattern regex, an empty array is returned.

Example

SELECT extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)');

Result:

┌─extractAllGroupsVertical('abc=111, def=222, ghi=333', '("[^"]+"|\\w+)=("[^"]+"|\\w+)')─┐
│ [['abc','111'],['def','222'],['ghi','333']]                                            │
└────────────────────────────────────────────────────────────────────────────────────────┘

like

Returns whether string haystack matches the LIKE expression pattern.

A LIKE expression can contain normal characters and the following metasymbols:

  • % indicates an arbitrary number of arbitrary characters (including zero characters).
  • _ indicates a single arbitrary character.
  • \ is for escaping literals %, _ and \.

Matching is based on UTF-8, e.g. _ matches the Unicode code point ¥ which is represented in UTF-8 using two bytes.

If the haystack or the LIKE expression are not valid UTF-8, the behavior is undefined.

No automatic Unicode normalization is performed, you can use the normalizeUTF8*() functions for that.

To match against literal %, _ and \ (which are LIKE metacharacters), prepend them with a backslash: \%, \_ and \\. The backslash loses its special meaning (i.e. is interpreted literally) if it prepends a character different than %, _ or \. Note that ClickHouse requires backslashes in strings to be quoted as well, so you would actually need to write \\%, \\_ and \\\\.

For LIKE expressions of the form %needle%, the function is as fast as the position function. All other LIKE expressions are internally converted to a regular expression and executed with a performance similar to function match.

Syntax

like(haystack, pattern)

Alias: haystack LIKE pattern (operator)

notLike

Like like but negates the result.

Alias: haystack NOT LIKE pattern (operator)

ilike

Like like but searches case-insensitively.

Alias: haystack ILIKE pattern (operator)

notILike

Like ilike but negates the result.

Alias: haystack NOT ILIKE pattern (operator)

ngramDistance

Calculates the 4-gram distance between a haystack string and a needle string. For this, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. Returns a Float32 between 0 and 1. The smaller the result is, the more similar the strings are to each other.

Functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

ngramDistance(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the similarity between the two strings. Float32

Implementation details

This function will throw an exception if constant needle or haystack arguments are more than 32Kb in size. If any non-constant haystack or needle arguments are more than 32Kb in size, then the distance is always 1.

Examples

The more similar two strings are to each other, the closer the result will be to 0 (identical).

Query:

SELECT ngramDistance('ClickHouse','ClickHouse!');

Result:

0.06666667

The less similar two strings are to each, the larger the result will be. Query:

SELECT ngramDistance('ClickHouse','House');

Result:

0.5555556

ngramDistanceCaseInsensitive

Provides a case-insensitive variant of ngramDistance.

Syntax

ngramDistanceCaseInsensitive(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the similarity between the two strings. Float32

Examples

With ngramDistance differences in case will affect the similarity value:

Query:

SELECT ngramDistance('ClickHouse','clickhouse');

Result:

0.71428573

With ngramDistanceCaseInsensitive case is ignored so two identical strings differing only in case will now return a low similarity value:

Query:

SELECT ngramDistanceCaseInsensitive('ClickHouse','clickhouse');

Result:

0

ngramDistanceUTF8

Provides a UTF-8 variant of ngramDistance. Assumes that needle and haystack strings are UTF-8 encoded strings.

Syntax

ngramDistanceUTF8(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the similarity between the two strings. Float32

Example

Query:

SELECT ngramDistanceUTF8('abcde','cde');

Result:

0.5

ngramDistanceCaseInsensitiveUTF8

Provides a case-insensitive variant of ngramDistanceUTF8.

Syntax

ngramDistanceCaseInsensitiveUTF8(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the similarity between the two strings. Float32

Example

Query:

SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE');

Result:

0.5

ngramSearch

Like ngramDistance but calculates the non-symmetric difference between a needle string and a haystack string, i.e. the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams. Returns a Float32 between 0 and 1. The bigger the result is, the more likely needle is in the haystack. This function is useful for fuzzy string search. Also see function soundex.

Functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

ngramSearch(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the likelihood of the needle being in the haystack. Float32

Implementation details

Note

The UTF-8 variants use the 3-gram distance. These are not perfectly fair n-gram distances. We use 2-byte hashes to hash n-grams and then calculate the (non-)symmetric difference between these hash tables – collisions may occur. With UTF-8 case-insensitive format we do not use fair tolower function – we zero the 5-th bit (starting from zero) of each codepoint byte and first bit of zeroth byte if bytes more than one – this works for Latin and mostly for all Cyrillic letters.

Example

Query:

SELECT ngramSearch('Hello World','World Hello');

Result:

0.5

ngramSearchCaseInsensitive

Provides a case-insensitive variant of ngramSearch.

Syntax

ngramSearchCaseInsensitive(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the likelihood of the needle being in the haystack. Float32

The bigger the result is, the more likely needle is in the haystack.

Example

Query:

SELECT ngramSearchCaseInsensitive('Hello World','hello');

Result:

1

ngramSearchUTF8

Provides a UTF-8 variant of ngramSearch in which needle and haystack are assumed to be UTF-8 encoded strings.

Syntax

ngramSearchUTF8(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the likelihood of the needle being in the haystack. Float32

The bigger the result is, the more likely needle is in the haystack.

Example

Query:

SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд');

Result:

0.5

ngramSearchCaseInsensitiveUTF8

Provides a case-insensitive variant of ngramSearchUTF8 in which needle and haystack.

Syntax

ngramSearchCaseInsensitiveUTF8(haystack, needle)

Parameters

Returned value

  • Value between 0 and 1 representing the likelihood of the needle being in the haystack. Float32

The bigger the result is, the more likely needle is in the haystack.

Example

Query:

SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз');

Result:

0.57142854

countSubstrings

Returns how often a substring needle occurs in a string haystack.

Functions countSubstringsCaseInsensitive and countSubstringsCaseInsensitiveUTF8 provide case-insensitive and case-insensitive + UTF-8 variants of this function respectively.

Syntax

countSubstrings(haystack, needle[, start_pos])

Arguments

  • haystack — String in which the search is performed. String or Enum.
  • needle — Substring to be searched. String.
  • start_pos – Position (1-based) in haystack at which the search starts. UInt. Optional.

Returned value

  • The number of occurrences. UInt64.

Examples

SELECT countSubstrings('aaaa', 'aa');

Result:

┌─countSubstrings('aaaa', 'aa')─┐
│                             2 │
└───────────────────────────────┘

Example with start_pos argument:

SELECT countSubstrings('abc___abc', 'abc', 4);

Result:

┌─countSubstrings('abc___abc', 'abc', 4)─┐
│                                      1 │
└────────────────────────────────────────┘

countSubstringsCaseInsensitive

Returns how often a substring needle occurs in a string haystack. Ignores case.

Syntax

countSubstringsCaseInsensitive(haystack, needle[, start_pos])

Arguments

  • haystack — String in which the search is performed. String or Enum.
  • needle — Substring to be searched. String.
  • start_pos – Position (1-based) in haystack at which the search starts. UInt. Optional.

Returned value

  • The number of occurrences. UInt64.

Examples

Query:

SELECT countSubstringsCaseInsensitive('AAAA', 'aa');

Result:

┌─countSubstringsCaseInsensitive('AAAA', 'aa')─┐
│                                            2 │
└──────────────────────────────────────────────┘

Example with start_pos argument:

Query:

SELECT countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4);

Result:

┌─countSubstringsCaseInsensitive('abc___ABC___abc', 'abc', 4)─┐
│                                                           2 │
└─────────────────────────────────────────────────────────────┘

countSubstringsCaseInsensitiveUTF8

Returns how often a substring needle occurs in a string haystack. Ignores case and assumes that haystack is a UTF8 string.

Syntax

countSubstringsCaseInsensitiveUTF8(haystack, needle[, start_pos])

Arguments

  • haystack — UTF-8 string in which the search is performed. String or Enum.
  • needle — Substring to be searched. String.
  • start_pos – Position (1-based) in haystack at which the search starts. UInt. Optional.

Returned value

  • The number of occurrences. UInt64.

Examples

Query:

SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА');

Result:

┌─countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА')─┐
│                                                                  4 │
└────────────────────────────────────────────────────────────────────┘

Example with start_pos argument:

Query:

SELECT countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА', 13);

Result:

┌─countSubstringsCaseInsensitiveUTF8('ложка, кошка, картошка', 'КА', 13)─┐
│                                                                      2 │
└────────────────────────────────────────────────────────────────────────┘

countMatches

Returns the number of regular expression matches for a pattern in a haystack.

The behavior of this function depends on the ClickHouse version:

  • in versions < v25.6, countMatches would stop counting at the first empty match even if a pattern accepts.
  • in versions >= 25.6, countMatches would continue its execution when an empty match occurs. The legacy behavior can be restored using setting count_matches_stop_at_empty_match = true;

Syntax

countMatches(haystack, pattern)

Arguments

Returned value

  • The number of matches. UInt64.

Examples

SELECT countMatches('foobar.com', 'o+');

Result:

┌─countMatches('foobar.com', 'o+')─┐
│                                2 │
└──────────────────────────────────┘
SELECT countMatches('aaaa', 'aa');

Result:

┌─countMatches('aaaa', 'aa')────┐
│                             2 │
└───────────────────────────────┘

countMatchesCaseInsensitive

Returns the number of regular expression matches for a pattern in a haystack like countMatches but matching ignores the case.

Syntax

countMatchesCaseInsensitive(haystack, pattern)

Arguments

Returned value

  • The number of matches. UInt64.

Examples

Query:

SELECT countMatchesCaseInsensitive('AAAA', 'aa');

Result:

┌─countMatchesCaseInsensitive('AAAA', 'aa')────┐
│                                            2 │
└──────────────────────────────────────────────┘

regexpExtract

Extracts the first string in haystack that matches the regexp pattern and corresponds to the regex group index.

Syntax

regexpExtract(haystack, pattern[, index])

Alias: REGEXP_EXTRACT(haystack, pattern[, index]).

Arguments

  • haystack — String, in which regexp pattern will to be matched. String.
  • pattern — String, regexp expression, must be constant. String.
  • index – An integer number greater or equal 0 with default 1. It represents which regex group to extract. UInt or Int. Optional.

Returned value

pattern may contain multiple regexp groups, index indicates which regex group to extract. An index of 0 means matching the entire regular expression. String.

Examples

SELECT
    regexpExtract('100-200', '(\\d+)-(\\d+)', 1),
    regexpExtract('100-200', '(\\d+)-(\\d+)', 2),
    regexpExtract('100-200', '(\\d+)-(\\d+)', 0),
    regexpExtract('100-200', '(\\d+)-(\\d+)');

Result:

┌─regexpExtract('100-200', '(\\d+)-(\\d+)', 1)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 2)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)', 0)─┬─regexpExtract('100-200', '(\\d+)-(\\d+)')─┐
│ 100                                          │ 200                                          │ 100-200                                      │ 100                                       │
└──────────────────────────────────────────────┴──────────────────────────────────────────────┴──────────────────────────────────────────────┴───────────────────────────────────────────┘

hasSubsequence

Returns 1 if needle is a subsequence of haystack, or 0 otherwise. A subsequence of a string is a sequence that can be derived from the given string by deleting zero or more elements without changing the order of the remaining elements. Syntax

hasSubsequence(haystack, needle)

Arguments

  • haystack — String in which the search is performed. String.
  • needle — Subsequence to be searched. String.

Returned value

  • 1, if needle is a subsequence of haystack, 0 otherwise. UInt8.

Examples

Query:

SELECT hasSubsequence('garbage', 'arg');

Result:

┌─hasSubsequence('garbage', 'arg')─┐
│                                1 │
└──────────────────────────────────┘

hasSubsequenceCaseInsensitive

Like hasSubsequence but searches case-insensitively.

Syntax

hasSubsequenceCaseInsensitive(haystack, needle)

Arguments

  • haystack — String in which the search is performed. String.
  • needle — Subsequence to be searched. String.

Returned value

  • 1, if needle is a subsequence of haystack, 0 otherwise UInt8.

Examples

Query:

SELECT hasSubsequenceCaseInsensitive('garbage', 'ARG');

Result:

┌─hasSubsequenceCaseInsensitive('garbage', 'ARG')─┐
│                                               1 │
└─────────────────────────────────────────────────┘

hasSubsequenceUTF8

Like hasSubsequence but assumes haystack and needle are UTF-8 encoded strings.

Syntax

hasSubsequenceUTF8(haystack, needle)

Arguments

  • haystack — String in which the search is performed. UTF-8 encoded String.
  • needle — Subsequence to be searched. UTF-8 encoded String.

Returned value

  • 1, if needle is a subsequence of haystack, 0, otherwise. UInt8.

Query:

Examples

SELECT hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система');

Result:

┌─hasSubsequenceUTF8('ClickHouse - столбцовая система управления базами данных', 'система')─┐
│                                                                                         1 │
└───────────────────────────────────────────────────────────────────────────────────────────┘

hasSubsequenceCaseInsensitiveUTF8

Like hasSubsequenceUTF8 but searches case-insensitively.

Syntax

hasSubsequenceCaseInsensitiveUTF8(haystack, needle)

Arguments

  • haystack — String in which the search is performed. UTF-8 encoded String.
  • needle — Subsequence to be searched. UTF-8 encoded String.

Returned value

  • 1, if needle is a subsequence of haystack, 0 otherwise. UInt8.

Examples

Query:

SELECT hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА');

Result:

┌─hasSubsequenceCaseInsensitiveUTF8('ClickHouse - столбцовая система управления базами данных', 'СИСТЕМА')─┐
│                                                                                                        1 │
└──────────────────────────────────────────────────────────────────────────────────────────────────────────┘

hasToken

Returns 1 if a given token is present in a haystack, or 0 otherwise.

Syntax

hasToken(haystack, token)

Parameters

  • haystack: String in which the search is performed. String or Enum.
  • token: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).

Returned value

  • 1, if the token is present in the haystack, 0 otherwise. UInt8.

Implementation details

Token must be a constant string. Supported by tokenbf_v1 index specialization.

Example

Query:

SELECT hasToken('Hello World','Hello');
1

hasTokenOrNull

Returns 1 if a given token is present, 0 if not present, and null if the token is ill-formed.

Syntax

hasTokenOrNull(haystack, token)

Parameters

  • haystack: String in which the search is performed. String or Enum.
  • token: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).

Returned value

  • 1, if the token is present in the haystack, 0 if it is not present, and null if the token is ill formed.

Implementation details

Token must be a constant string. Supported by tokenbf_v1 index specialization.

Example

Where hasToken would throw an error for an ill-formed token, hasTokenOrNull returns null for an ill-formed token.

Query:

SELECT hasTokenOrNull('Hello World','Hello,World');
null

hasTokenCaseInsensitive

Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case.

Syntax

hasTokenCaseInsensitive(haystack, token)

Parameters

  • haystack: String in which the search is performed. String or Enum.
  • token: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).

Returned value

  • 1, if the token is present in the haystack, 0 otherwise. UInt8.

Implementation details

Token must be a constant string. Supported by tokenbf_v1 index specialization.

Example

Query:

SELECT hasTokenCaseInsensitive('Hello World','hello');
1

hasTokenCaseInsensitiveOrNull

Returns 1 if a given token is present in a haystack, 0 otherwise. Ignores case and returns null if the token is ill-formed.

Syntax

hasTokenCaseInsensitiveOrNull(haystack, token)

Parameters

  • haystack: String in which the search is performed. String or Enum.
  • token: Maximal length substring between two non alphanumeric ASCII characters (or boundaries of haystack).

Returned value

  • 1, if the token is present in the haystack, 0 if the token is not present, otherwise null if the token is ill-formed. UInt8.

Implementation details

Token must be a constant string. Supported by tokenbf_v1 index specialization.

Example Where hasTokenCaseInsensitive would throw an error for an ill-formed token, hasTokenCaseInsensitiveOrNull returns null for an ill-formed token.

Query:

SELECT hasTokenCaseInsensitiveOrNull('Hello World','hello,world');
null

countMatches

Introduced in: v21.1

Returns number of matches of a regular expression in a string.

Version dependent behavior

The behavior of this function depends on the ClickHouse version:

  • in versions < v25.6, the function stops counting at the first empty match even if a pattern accepts.
  • in versions >= 25.6, the function continues execution when an empty match occurs. The legacy behavior can be restored using setting count_matches_stop_at_empty_match = true;

Syntax

countMatches(haystack, pattern)

Arguments

  • haystack — The string to search in. String
  • pattern — Regular expression pattern. String

Returned value

Returns the number of matches found. UInt64

Examples

Count digit sequences

SELECT countMatches('hello 123 world 456 test', '[0-9]+')
┌─countMatches('hello 123 world 456 test', '[0-9]+')─┐
│                                                   2 │
└─────────────────────────────────────────────────────┘

countMatchesCaseInsensitive

Introduced in: v21.1

Like countMatches but performs case-insensitive matching.

Syntax

countMatchesCaseInsensitive(haystack, pattern)

Arguments

  • haystack — The string to search in. String
  • pattern — Regular expression pattern. const String

Returned value

Returns the number of matches found. UInt64

Examples

Case insensitive count

SELECT countMatchesCaseInsensitive('Hello HELLO world', 'hello')
┌─countMatchesCaseInsensitive('Hello HELLO world', 'hello')─┐
│                                                         2 │
└───────────────────────────────────────────────────────────┘

extract

Introduced in: v1.1

Extracts the first match of a regular expression in a string. If 'haystack' doesn't match 'pattern', an empty string is returned.

This function uses the RE2 regular expression library. Please refer to re2 for supported syntax.

If the regular expression has capturing groups (sub-patterns), the function matches the input string against the first capturing group.

Syntax

extract(haystack, pattern)

Arguments

  • haystack — String from which to extract. String
  • pattern — Regular expression, typically containing a capturing group. const String

Returned value

Returns extracted fragment as a string. String

Examples

Extract domain from email

SELECT extract('[email protected]', '.*@(.*)$')
┌─extract('[email protected]', '.*@(.*)$')─┐
│ clickhouse.com                            │
└───────────────────────────────────────────┘

No match returns empty string

SELECT extract('[email protected]', 'no_match')
┌─extract('[email protected]', 'no_match')─┐
│                                            │
└────────────────────────────────────────────┘

extractAll

Introduced in: v1.1

Like extract, but returns an array of all matches of a regular expression in a string. If 'haystack' doesn't match the 'pattern' regex, an empty array is returned.

If the regular expression has capturing groups (sub-patterns), the function matches the input string against the first capturing group.

Syntax

extractAll(haystack, pattern)

Arguments

  • haystack — String from which to extract fragments. String
  • pattern — Regular expression, optionally containing capturing groups. const String

Returned value

Returns array of extracted fragments. Array(String)

Examples

Extract all numbers

SELECT extractAll('hello 123 world 456', '[0-9]+')
┌─extractAll('hello 123 world 456', '[0-9]+')─┐
│ ['123','456']                               │
└─────────────────────────────────────────────┘

Extract using capturing group

SELECT extractAll('[email protected], [email protected]', '([a-zA-Z0-9]+)@')
┌─extractAll('[email protected], [email protected]', '([a-zA-Z0-9]+)@')─┐
│ ['test','user']                                                    │
└────────────────────────────────────────────────────────────────────┘

hasAllTokens

Introduced in: v25.7

Like hasAnyTokens, but returns 1, if all tokens in the needle string or array match the input string, and 0 otherwise. If input is a column, returns all rows that satisfy this condition.

Note

Column input should have a text index defined for optimal performance. If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.

Prior to searching, the function tokenizes

  • the input argument (always), and
  • the needle argument (if given as a String) using the tokenizer specified for the text index. If the column has no text index defined, the splitByNonAlpha tokenizer is used instead. If the needle argument is of type Array(String), each array element is treated as a token — no additional tokenization takes place.

Duplicate tokens are ignored. For example, needles = ['ClickHouse', 'ClickHouse'] is treated the same as ['ClickHouse'].

Syntax

hasAllTokens(input, needles)

Aliases: hasAllToken

Arguments

Returned value

Returns 1, if all needles match. 0, otherwise. UInt8

Examples

Usage example

CREATE TABLE table (
    id UInt32,
    msg String,
    INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');

SELECT count() FROM table WHERE hasAllTokens(msg, 'a\\d()');
┌─count()─┐
│       1 │
└─────────┘

Specify needles to be searched for AS-IS (no tokenization) in an array

SELECT count() FROM table WHERE hasAllTokens(msg, ['a', 'd']);
┌─count()─┐
│       1 │
└─────────┘

Generate needles using the tokens function

SELECT count() FROM table WHERE hasAllTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));
┌─count()─┐
│       1 │
└─────────┘

hasAnyTokens

Introduced in: v25.7

Returns 1, if at least one token in the needle string or array matches the input string, and 0 otherwise. If input is a column, returns all rows that satisfy this condition.

Note

Column input should have a text index defined for optimal performance. If no text index is defined, the function performs a brute-force column scan which is orders of magnitude slower than an index lookup.

Prior to searching, the function tokenizes

  • the input argument (always), and
  • the needle argument (if given as a String) using the tokenizer specified for the text index. If the column has no text index defined, the splitByNonAlpha tokenizer is used instead. If the needle argument is of type Array(String), each array element is treated as a token — no additional tokenization takes place.

Duplicate tokens are ignored. For example, ['ClickHouse', 'ClickHouse'] is treated the same as ['ClickHouse'].

Syntax

hasAnyTokens(input, needles)

Aliases: hasAnyToken

Arguments

Returned value

Returns 1, if there was at least one match. 0, otherwise. UInt8

Examples

Usage example

CREATE TABLE table (
    id UInt32,
    msg String,
    INDEX idx(msg) TYPE text(tokenizer = splitByString(['()', '\\']))
)
ENGINE = MergeTree
ORDER BY id;

INSERT INTO table VALUES (1, '()a,\\bc()d'), (2, '()\\a()bc\\d'), (3, ',()a\\,bc,(),d,');

SELECT count() FROM table WHERE hasAnyTokens(msg, 'a\\d()');
┌─count()─┐
│       3 │
└─────────┘

Specify needles to be searched for AS-IS (no tokenization) in an array

SELECT count() FROM table WHERE hasAnyTokens(msg, ['a', 'd']);
┌─count()─┐
│       3 │
└─────────┘

Generate needles using the tokens function

SELECT count() FROM table WHERE hasAnyTokens(msg, tokens('a()d', 'splitByString', ['()', '\\']));
┌─count()─┐
│       3 │
└─────────┘

hasSubsequence

Introduced in: v23.7

Checks if a needle is a subsequence of a haystack. A subsequence of a string is a sequence that can be derived from another string by deleting some or no characters without changing the order of the remaining characters.

Syntax

hasSubsequence(haystack, needle)

Arguments

  • haystack — String in which to search for the subsequence. String
  • needle — Subsequence to be searched. String

Returned value

Returns 1 if needle is a subsequence of haystack, 0 otherwise. UInt8

Examples

Basic subsequence check

SELECT hasSubsequence('Hello World', 'HlWrd')
┌─hasSubsequence('Hello World', 'HlWrd')─┐
│                                      1 │
└────────────────────────────────────────┘

No subsequence found

SELECT hasSubsequence('Hello World', 'xyz')
┌─hasSubsequence('Hello World', 'xyz')─┐
│                                    0 │
└──────────────────────────────────────┘

hasToken

Introduced in: v20.1

Checks if the given token is present in the haystack.

A token is defined as the longest possible sub-sequence of consecutive characters [0-9A-Za-z_], i.e. numbers, ASCII letters and underscore.

Syntax

hasToken(haystack, token)

Arguments

Returned value

Returns 1 if the token is found, 0 otherwise. UInt8

Examples

Token search

SELECT hasToken('clickhouse test', 'test')
┌─hasToken('clickhouse test', 'test')─┐
│                                   1 │
└─────────────────────────────────────┘

hasTokenCaseInsensitive

Introduced in: v

Performs case insensitive lookup of needle in haystack using tokenbf_v1 index.

Syntax

Arguments

  • None.

Returned value

Examples

hasTokenCaseInsensitiveOrNull

Introduced in: v

Performs case insensitive lookup of needle in haystack using tokenbf_v1 index. Returns null if needle is ill-formed.

Syntax

Arguments

  • None.

Returned value

Examples

hasTokenOrNull

Introduced in: v20.1

Like hasToken but returns null if token is ill-formed.

Syntax

hasTokenOrNull(haystack, token)

Arguments

  • haystack — String to be searched. Must be constant. String
  • token — Token to search for. const String

Returned value

Returns 1 if the token is found, 0 otherwise, null if token is ill-formed. Nullable(UInt8)

Examples

Usage example

SELECT hasTokenOrNull('apple banana cherry', 'ban ana');
┌─hasTokenOrNu⋯ 'ban ana')─┐
│                     ᴺᵁᴸᴸ │
└──────────────────────────┘

locate

Introduced in: v18.16

Like position but with arguments haystack and locate switched.

Version dependent behavior

The behavior of this function depends on the ClickHouse version:

  • in versions < v24.3, locate was an alias of function position and accepted arguments (haystack, needle[, start_pos]).
  • in versions >= 24.3, locate is an individual function (for better compatibility with MySQL) and accepts arguments (needle, haystack[, start_pos]). The previous behavior can be restored using setting function_locate_has_mysql_compatible_argument_order = false.

Syntax

locate(needle, haystack[, start_pos])

Arguments

  • needle — Substring to be searched. String
  • haystack — String in which the search is performed. String or Enum
  • start_pos — Optional. Position (1-based) in haystack at which the search starts. UInt

Returned value

Returns starting position in bytes and counting from 1, if the substring was found, 0, if the substring was not found. UInt64

Examples

Basic usage

SELECT locate('ca', 'abcabc')
┌─locate('ca', 'abcabc')─┐
│                      3 │
└────────────────────────┘

match

Introduced in: v1.1

Checks if a provided string matches the provided regular expression pattern.

This function uses the RE2 regular expression library. Please refer to re2 for supported syntax.

Matching works under UTF-8 assumptions, e.g. ¥ uses two bytes internally but matching treats it as a single codepoint. The regular expression must not contain NULL bytes. If the haystack or the pattern are not valid UTF-8, the behavior is undefined.

Unlike re2's default behavior, . matches line breaks. To disable this, prepend the pattern with (?-s).

The pattern is automatically anchored at both ends (as if the pattern started with '^' and ended with '$').

If you only like to find substrings, you can use functions like or position instead - they work much faster than this function.

Alternative operator syntax: haystack REGEXP pattern.

Syntax

match(haystack, pattern)

Aliases: REGEXP_MATCHES

Arguments

  • haystack — String in which the pattern is searched. String
  • pattern — Regular expression pattern. const String

Returned value

Returns 1 if the pattern matches, 0 otherwise. UInt8

Examples

Basic pattern matching

SELECT match('Hello World', 'Hello.*')
┌─match('Hello World', 'Hello.*')─┐
│                               1 │
└─────────────────────────────────┘

Pattern not matching

SELECT match('Hello World', 'goodbye.*')
┌─match('Hello World', 'goodbye.*')─┐
│                                 0 │
└───────────────────────────────────┘

multiMatchAny

Introduced in: v20.1

Check if at least one of multiple regular expression patterns matches a haystack.

If you only want to search multiple substrings in a string, you can use function multiSearchAny instead - it works much faster than this function.

Syntax

multiMatchAny(haystack, pattern1[, pattern2, ...])

Arguments

  • haystack — String in which patterns are searched. String
  • pattern1[, pattern2, ...] — An array of one or more regular expression patterns. Array(String)

Returned value

Returns 1 if any pattern matches, 0 otherwise. UInt8

Examples

Multiple pattern matching

SELECT multiMatchAny('Hello World', ['Hello.*', 'foo.*'])
┌─multiMatchAny('Hello World', ['Hello.*', 'foo.*'])─┐
│                                                  1 │
└────────────────────────────────────────────────────┘

No patterns match

SELECT multiMatchAny('Hello World', ['goodbye.*', 'foo.*'])
┌─multiMatchAny('Hello World', ['goodbye.*', 'foo.*'])─┐
│                                                    0 │
└──────────────────────────────────────────────────────┘

multiSearchAllPositions

Introduced in: v20.1

Like position but returns an array of positions (in bytes, starting at 1) for multiple needle substrings in a haystack string.

All multiSearch*() functions only support up to 2^8 needles.

Syntax

multiSearchAllPositions(haystack, needle1[, needle2, ...])

Arguments

  • haystack — String in which the search is performed. String
  • needle1[, needle2, ...] — An array of one or more substrings to be searched. Array(String)

Returned value

Returns array of the starting position in bytes and counting from 1, if the substring was found, 0, if the substring was not found. Array(UInt64)

Examples

Multiple needle search

SELECT multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])
┌─multiSearchAllPositions('Hello, World!', ['hello', '!', 'world'])─┐
│ [0,13,0]                                                          │
└───────────────────────────────────────────────────────────────────┘

multiSearchAllPositionsCaseInsensitive

Introduced in: v20.1

Like multiSearchAllPositions but ignores case.

Syntax

multiSearchAllPositionsCaseInsensitive(haystack, needle1[, needle2, ...])

Arguments

  • haystack — String in which the search is performed. String
  • needle1[, needle2, ...] — An array of one or more substrings to be searched. Array(String)

Returned value

Returns array of the starting position in bytes and counting from 1 (if the substring was found), 0 if the substring was not found. Array(UInt64)

Examples

Case insensitive multi-search

SELECT multiSearchAllPositionsCaseInsensitive('ClickHouse',['c','h'])
┌─multiSearchA⋯['c', 'h'])─┐
│ [1,6]                    │
└──────────────────────────┘

multiSearchAllPositionsUTF8

Introduced in: v20.1

Like multiSearchAllPositions but assumes haystack and the needle substrings are UTF-8 encoded strings.

Syntax

multiSearchAllPositionsUTF8(haystack, needle1[, needle2, ...])

Arguments

  • haystack — UTF-8 encoded string in which the search is performed. String
  • needle1[, needle2, ...] — An array of UTF-8 encoded substrings to be searched. Array(String)

Returned value

Returns array of the starting position in bytes and counting from 1 (if the substring was found), 0 if the substring was not found. Array

Examples

UTF-8 multi-search

SELECT multiSearchAllPositionsUTF8('ClickHouse',['C','H'])
┌─multiSearchAllPositionsUTF8('ClickHouse', ['C', 'H'])─┐
│ [1,6]                                                 │
└───────────────────────────────────────────────────────┘

multiSearchAny

Introduced in: v20.1

Checks if at least one of a number of needle strings matches the haystack string.

Functions multiSearchAnyCaseInsensitive, multiSearchAnyUTF8 and multiSearchAnyCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

multiSearchAny(haystack, needle1[, needle2, ...])

Arguments

  • haystack — String in which the search is performed. String
  • needle1[, needle2, ...] — An array of substrings to be searched. Array(String)

Returned value

Returns 1, if there was at least one match, otherwise 0, if there was not at least one match. UInt8

Examples

Any match search

SELECT multiSearchAny('ClickHouse',['C','H'])
┌─multiSearchAny('ClickHouse', ['C', 'H'])─┐
│                                        1 │
└──────────────────────────────────────────┘

multiSearchAnyCaseInsensitive

Introduced in: v20.1

Like multiSearchAny but ignores case.

Syntax

multiSearchAnyCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — String in which the search is performed. String
  • needle — Substrings to be searched. Array(String)

Returned value

Returns 1, if there was at least one case-insensitive match, otherwise 0, if there was not at least one case-insensitive match. UInt8

Examples

Case insensitive search

SELECT multiSearchAnyCaseInsensitive('ClickHouse',['c','h'])
┌─multiSearchAnyCaseInsensitive('ClickHouse', ['c', 'h'])─┐
│                                                       1 │
└─────────────────────────────────────────────────────────┘

multiSearchAnyCaseInsensitiveUTF8

Introduced in: v20.1

Like multiSearchAnyUTF8 but ignores case.

Syntax

multiSearchAnyCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — UTF-8 string in which the search is performed. String
  • needle — UTF-8 substrings to be searched. Array(String)

Returned value

Returns 1, if there was at least one case-insensitive match, otherwise 0, if there was not at least one case-insensitive match. UInt8

Examples

Given a UTF-8 string 'Здравствуйте', check if character 'з' (lowercase) is present

SELECT multiSearchAnyCaseInsensitiveUTF8('Здравствуйте',['з'])
┌─multiSearchA⋯те', ['з'])─┐
│                        1 │
└──────────────────────────┘

multiSearchAnyUTF8

Introduced in: v20.1

Like multiSearchAny but assumes haystack and the needle substrings are UTF-8 encoded strings.

Syntax

multiSearchAnyUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — UTF-8 string in which the search is performed. String
  • needle — UTF-8 substrings to be searched. Array(String)

Returned value

Returns 1, if there was at least one match, otherwise 0, if there was not at least one match. UInt8

Examples

Given '你好,世界' ('Hello, world') as a UTF-8 string, check if there are any 你 or 界 characters in the string

SELECT multiSearchAnyUTF8('你好,世界', ['你', '界'])
┌─multiSearchA⋯你', '界'])─┐
│                        1 │
└──────────────────────────┘

multiSearchFirstPosition

Introduced in: v20.1

Like position but returns the leftmost offset in a haystack string which matches any of multiple needle strings.

Functions multiSearchFirstPositionCaseInsensitive, multiSearchFirstPositionUTF8 and multiSearchFirstPositionCaseInsensitiveUTF8 provide case-insensitive and/or UTF-8 variants of this function.

Syntax

multiSearchFirstPosition(haystack, needle1[, needle2, ...])

Arguments

  • haystack — String in which the search is performed. String
  • needle1[, needle2, ...] — An array of one or more substrings to be searched. Array(String)

Returned value

Returns the leftmost offset in a haystack string which matches any of multiple needle strings, otherwise 0, if there was no match. UInt64

Examples

First position search

SELECT multiSearchFirstPosition('Hello World',['llo', 'Wor', 'ld'])
┌─multiSearchFirstPosition('Hello World', ['llo', 'Wor', 'ld'])─┐
│                                                             3 │
└───────────────────────────────────────────────────────────────┘

multiSearchFirstPositionCaseInsensitive

Introduced in: v20.1

Like multiSearchFirstPosition but ignores case.

Syntax

multiSearchFirstPositionCaseInsensitive(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — String in which the search is performed. String
  • needle — Array of substrings to be searched. Array(String)

Returned value

Returns the leftmost offset in a haystack string which matches any of multiple needle strings. Returns 0, if there was no match. UInt64

Examples

Case insensitive first position

SELECT multiSearchFirstPositionCaseInsensitive('HELLO WORLD',['wor', 'ld', 'ello'])
┌─multiSearchFirstPositionCaseInsensitive('HELLO WORLD', ['wor', 'ld', 'ello'])─┐
│                                                                             2 │
└───────────────────────────────────────────────────────────────────────────────┘

multiSearchFirstPositionCaseInsensitiveUTF8

Introduced in: v20.1

Like multiSearchFirstPosition but assumes haystack and needle to be UTF-8 strings and ignores case.

Syntax

multiSearchFirstPositionCaseInsensitiveUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — UTF-8 string in which the search is performed. String
  • needle — Array of UTF-8 substrings to be searched. Array(String)

Returned value

Returns the leftmost offset in a haystack string which matches any of multiple needle strings, ignoring case. Returns 0, if there was no match. UInt64

Examples

Find the leftmost offset in UTF-8 string 'Здравствуй, мир' ('Hello, world') which matches any of the given needles

SELECT multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['МИР', 'вст', 'Здра'])
┌─multiSearchFirstPositionCaseInsensitiveUTF8('Здравствуй, мир', ['мир', 'вст', 'Здра'])─┐
│                                                                                      3 │
└────────────────────────────────────────────────────────────────────────────────────────┘

multiSearchFirstPositionUTF8

Introduced in: v20.1

Like multiSearchFirstPosition but assumes haystack and needle to be UTF-8 strings.

Syntax

multiSearchFirstPositionUTF8(haystack, [needle1, needle2, ..., needleN])

Arguments

  • haystack — UTF-8 string in which the search is performed. String
  • needle — Array of UTF-8 substrings to be searched. Array(String)

Returned value

Leftmost offset in a haystack string which matches any of multiple needle strings. Returns 0, if there was no match. UInt64

Examples

Find the leftmost offset in UTF-8 string 'Здравствуй, мир' ('Hello, world') which matches any of the given needles

SELECT multiSearchFirstPositionUTF8('Здравствуй, мир',['мир', 'вст', 'авст'])
┌─multiSearchFirstPositionUTF8('Здравствуй, мир', ['мир', 'вст', 'авст'])─┐
│                                                                       3 │
└─────────────────────────────────────────────────────────────────────────┘

ngramDistance

Introduced in: v20.1

Calculates the 4-gram distance between two strings. For this, it counts the symmetric difference between two multisets of 4-grams and normalizes it by the sum of their cardinalities. The smaller the returned value, the more similar the strings are.

For case-insensitive search or/and in UTF8 format use functions ngramDistanceCaseInsensitive, ngramDistanceUTF8, ngramDistanceCaseInsensitiveUTF8.

Syntax

ngramDistance(haystack, needle)

Arguments

  • haystack — String for comparison. String
  • needle — String for comparison. String

Returned value

Returns a Float32 number between 0 and 1. The smaller the returned value, the more similar the strings are. Float32

Examples

Calculate 4-gram distance

SELECT ngramDistance('ClickHouse', 'ClickHouses')
┌─ngramDistance('ClickHouse', 'ClickHouses')─┐
│                                        0.1 │
└────────────────────────────────────────────┘

ngramDistanceCaseInsensitive

Introduced in: v20.1

Provides a case-insensitive variant of ngramDistance. Calculates the 4-gram distance between two strings, ignoring case. The smaller the returned value, the more similar the strings are.

Syntax

ngramDistanceCaseInsensitive(haystack, needle)

Arguments

  • haystack — First comparison string. String
  • needle — Second comparison string. String

Returned value

Returns a Float32 number between 0 and 1. Float32

Examples

Case-insensitive 4-gram distance

SELECT ngramDistanceCaseInsensitive('ClickHouse','clickhouse')
┌─ngramDistanceCaseInsensitive('ClickHouse','clickhouse')─┐
│                                                       0 │
└─────────────────────────────────────────────────────────┘

ngramDistanceCaseInsensitiveUTF8

Introduced in: v20.1

Provides a case-insensitive UTF-8 variant of ngramDistance. Assumes that needle and haystack strings are UTF-8 encoded strings and ignores case. Calculates the 3-gram distance between two UTF-8 strings, ignoring case. The smaller the returned value, the more similar the strings are.

Syntax

ngramDistanceCaseInsensitiveUTF8(haystack, needle)

Arguments

  • haystack — First UTF-8 encoded comparison string. String
  • needle — Second UTF-8 encoded comparison string. String

Returned value

Returns a Float32 number between 0 and 1. Float32

Examples

Case-insensitive UTF-8 3-gram distance

SELECT ngramDistanceCaseInsensitiveUTF8('abcde','CDE')
┌─ngramDistanceCaseInsensitiveUTF8('abcde','CDE')─┐
│                                             0.5 │
└─────────────────────────────────────────────────┘

ngramDistanceUTF8

Introduced in: v20.1

Provides a UTF-8 variant of ngramDistance. Assumes that needle and haystack strings are UTF-8 encoded strings. Calculates the 3-gram distance between two UTF-8 strings. The smaller the returned value, the more similar the strings are.

Syntax

ngramDistanceUTF8(haystack, needle)

Arguments

  • haystack — First UTF-8 encoded comparison string. String
  • needle — Second UTF-8 encoded comparison string. String

Returned value

Returns a Float32 number between 0 and 1. Float32

Examples

UTF-8 3-gram distance

SELECT ngramDistanceUTF8('abcde','cde')
┌─ngramDistanceUTF8('abcde','cde')─┐
│                               0.5 │
└───────────────────────────────────┘

ngramSearch

Introduced in: v20.1

Checks if the 4-gram distance between two strings is less than or equal to a given threshold.

For case-insensitive search or/and in UTF8 format use functions ngramSearchCaseInsensitive, ngramSearchUTF8, ngramSearchCaseInsensitiveUTF8.

Syntax

ngramSearch(haystack, needle)

Arguments

  • haystack — String for comparison. String
  • needle — String for comparison. String

Returned value

Returns 1 if the 4-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8

Examples

Search using 4-grams

SELECT ngramSearch('ClickHouse', 'Click')
┌─ngramSearch('ClickHouse', 'Click')─┐
│                                  1 │
└────────────────────────────────────┘

ngramSearchCaseInsensitive

Introduced in: v20.1

Provides a case-insensitive variant of ngramSearch. Calculates the non-symmetric difference between a needle string and a haystack string, i.e. the number of n-grams from the needle minus the common number of n-grams normalized by the number of needle n-grams. Checks if the 4-gram distance between two strings is less than or equal to a given threshold, ignoring case.

Syntax

ngramSearchCaseInsensitive(haystack, needle)

Arguments

  • haystack — String for comparison. String
  • needle — String for comparison. String

Returned value

Returns 1 if the 4-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8

Examples

Case-insensitive search using 4-grams

SELECT ngramSearchCaseInsensitive('Hello World','hello')
┌─ngramSearchCaseInsensitive('Hello World','hello')─┐
│                                                  1 │
└────────────────────────────────────────────────────┘

ngramSearchCaseInsensitiveUTF8

Introduced in: v20.1

Provides a case-insensitive UTF-8 variant of ngramSearch. Assumes haystack and needle to be UTF-8 strings and ignores case. Checks if the 3-gram distance between two UTF-8 strings is less than or equal to a given threshold, ignoring case.

Syntax

ngramSearchCaseInsensitiveUTF8(haystack, needle)

Arguments

  • haystack — UTF-8 string for comparison. String
  • needle — UTF-8 string for comparison. String

Returned value

Returns 1 if the 3-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8

Examples

Case-insensitive UTF-8 search using 3-grams

SELECT ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')
┌─ngramSearchCaseInsensitiveUTF8('абвГДЕёжз', 'АбвгдЕЁжз')─┐
│                                                        1 │
└──────────────────────────────────────────────────────────┘

ngramSearchUTF8

Introduced in: v20.1

Provides a UTF-8 variant of ngramSearch. Assumes haystack and needle to be UTF-8 strings. Checks if the 3-gram distance between two UTF-8 strings is less than or equal to a given threshold.

Syntax

ngramSearchUTF8(haystack, needle)

Arguments

  • haystack — UTF-8 string for comparison. String
  • needle — UTF-8 string for comparison. String

Returned value

Returns 1 if the 3-gram distance between the strings is less than or equal to a threshold (1.0 by default), 0 otherwise. UInt8

Examples

UTF-8 search using 3-grams

SELECT ngramSearchUTF8('абвгдеёжз', 'гдеёзд')
┌─ngramSearchUTF8('абвгдеёжз', 'гдеёзд')─┐
│                                      1 │
└────────────────────────────────────────┘

position

Introduced in: v1.1

Returns the position (in bytes, starting at 1) of a substring needle in a string haystack.

If substring needle is empty, these rules apply:

  • if no start_pos was specified: return 1
  • if start_pos = 0: return 1
  • if start_pos >= 1 and start_pos <= length(haystack) + 1: return start_pos
  • otherwise: return 0

The same rules also apply to functions locate, positionCaseInsensitive, positionUTF8 and positionCaseInsensitiveUTF8.

Syntax

position(haystack, needle[, start_pos])

Arguments

  • haystack — String in which the search is performed. String or Enum
  • needle — Substring to be searched. String
  • start_pos — Position (1-based) in haystack at which the search starts. Optional. UInt

Returned value

Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64

Examples

Basic usage

SELECT position('Hello, world!', '!')
┌─position('Hello, world!', '!')─┐
│                             13 │
└────────────────────────────────┘

With start_pos argument

SELECT position('Hello, world!', 'o', 1), position('Hello, world!', 'o', 7)
┌─position('Hello, world!', 'o', 1)─┬─position('Hello, world!', 'o', 7)─┐
│                                 5 │                                 9 │
└───────────────────────────────────┴───────────────────────────────────┘

Needle IN haystack syntax

SELECT 6 = position('/' IN s) FROM (SELECT 'Hello/World' AS s)
┌─equals(6, position(s, '/'))─┐
│                           1 │
└─────────────────────────────┘

Empty needle substring

SELECT position('abc', ''), position('abc', '', 0), position('abc', '', 1), position('abc', '', 2), position('abc', '', 3), position('abc', '', 4), position('abc', '', 5)
┌─position('abc', '')─┬─position('abc', '', 0)─┬─position('abc', '', 1)─┬─position('abc', '', 2)─┬─position('abc', '', 3)─┬─position('abc', '', 4)─┬─position('abc', '', 5)─┐
│                   1 │                      1 │                      1 │                      2 │                      3 │                      4 │                      0 │
└─────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┴────────────────────────┘

positionCaseInsensitive

Introduced in: v1.1

Like position but case-insensitive.

Syntax

positionCaseInsensitive(haystack, needle[, start_pos])

Aliases: instr

Arguments

  • haystack — String in which the search is performed. String or Enum
  • needle — Substring to be searched. String
  • start_pos — Optional. Position (1-based) in haystack at which the search starts. UInt*

Returned value

Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64

Examples

Case insensitive search

SELECT positionCaseInsensitive('Hello, world!', 'hello')
┌─positionCaseInsensitive('Hello, world!', 'hello')─┐
│                                                 1 │
└───────────────────────────────────────────────────┘

positionCaseInsensitiveUTF8

Introduced in: v1.1

Like positionUTF8 but searches case-insensitively.

Syntax

positionCaseInsensitiveUTF8(haystack, needle[, start_pos])

Arguments

  • haystack — String in which the search is performed. String or Enum
  • needle — Substring to be searched. String
  • start_pos — Optional. Position (1-based) in haystack at which the search starts. UInt*

Returned value

Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64

Examples

Case insensitive UTF-8 search

SELECT positionCaseInsensitiveUTF8('Привет мир', 'МИР')
┌─positionCaseInsensitiveUTF8('Привет мир', 'МИР')─┐
│                                                8 │
└──────────────────────────────────────────────────┘

positionUTF8

Introduced in: v1.1

Like position but assumes haystack and needle are UTF-8 encoded strings.

Syntax

positionUTF8(haystack, needle[, start_pos])

Arguments

  • haystack — String in which the search is performed. String or Enum
  • needle — Substring to be searched. String
  • start_pos — Optional. Position (1-based) in haystack at which the search starts. UInt*

Returned value

Returns starting position in bytes and counting from 1, if the substring was found, otherwise 0, if the substring was not found. UInt64

Examples

UTF-8 character counting

SELECT positionUTF8('Motörhead', 'r')
┌─position('Motörhead', 'r')─┐
│                          5 │
└────────────────────────────┘

regexpExtract

Introduced in: v

Extracts the first string in haystack that matches the regexp pattern and corresponds to the regex group index.

Syntax

Aliases: REGEXP_EXTRACT

Arguments

  • None.

Returned value

Examples