ビデオ
Approx vector search in ClickHouse
Mark Needham
Looking to speed up your vector similarity searches in ClickHouse? This video walks through approximate vector search using HNSW indexes, which can dramatically reduce query times compared to linear scans. We'll work with a dataset of nearly a million Wikipedia entries, showing you how to set up vector similarity indexes and when to use different distance functions.
We start with a basic linear scan approach using L2 distance, then introduce vector similarity indexes that bring query times down from seconds to milliseconds. You'll see practical examples using both L2 and cosine distance functions, and learn when each one makes sense for your data. We also cover filtering strategies—pre-filter versus post-filter—and how ClickHouse's query engine decides which approach to use based on heuristics.
- Setting up vector similarity indexes with HNSW on embedding columns
- Comparing L2 distance (Euclidean) vs cosine distance for similarity calculations
- Performance differences between linear scan and approximate search methods
- Adding WHERE conditions to vector searches and understanding filter strategies
- Controlling pre-filter vs post-filter behavior with vectorSearchFilterStrategy setting

Scaling ClickHouse to petabytes of logs at OpenAI

How ClickHouse helps Anthropic scale observability

How Capital One cut infrastructure costs by 50%
Engineering leaders at Capital One share how they cut infrastructure costs by 50% and reduced average dashboard load time from 5+ to under 500ms with ClickHouse Cloud.