Approx vector search in ClickHouse

次のビデオScaling ClickHouse to petabytes of logs at OpenAI

Mark Needham

Looking to speed up your vector similarity searches in ClickHouse? This video walks through approximate vector search using HNSW indexes, which can dramatically reduce query times compared to linear scans. We'll work with a dataset of nearly a million Wikipedia entries, showing you how to set up vector similarity indexes and when to use different distance functions.

We start with a basic linear scan approach using L2 distance, then introduce vector similarity indexes that bring query times down from seconds to milliseconds. You'll see practical examples using both L2 and cosine distance functions, and learn when each one makes sense for your data. We also cover filtering strategies—pre-filter versus post-filter—and how ClickHouse's query engine decides which approach to use based on heuristics.

Setting up vector similarity indexes with HNSW on embedding columns
Comparing L2 distance (Euclidean) vs cosine distance for similarity calculations
Performance differences between linear scan and approximate search methods
Adding WHERE conditions to vector searches and understanding filter strategies
Controlling pre-filter vs post-filter behavior with vectorSearchFilterStrategy setting

最近の動画

Open House

Scaling ClickHouse to petabytes of logs at OpenAI

Open House

How ClickHouse helps Anthropic scale observability

Open House, User stories

How Capital One cut infrastructure costs by 50%

Engineering leaders at Capital One share how they cut infrastructure costs by 50% and reduced average dashboard load time from 5+ to under 500ms with ClickHouse Cloud.