ビデオ

Approx vector search in ClickHouse

Mark Needham

Looking to speed up your vector similarity searches in ClickHouse? This video walks through approximate vector search using HNSW indexes, which can dramatically reduce query times compared to linear scans. We'll work with a dataset of nearly a million Wikipedia entries, showing you how to set up vector similarity indexes and when to use different distance functions.

We start with a basic linear scan approach using L2 distance, then introduce vector similarity indexes that bring query times down from seconds to milliseconds. You'll see practical examples using both L2 and cosine distance functions, and learn when each one makes sense for your data. We also cover filtering strategies—pre-filter versus post-filter—and how ClickHouse's query engine decides which approach to use based on heuristics.

  • Setting up vector similarity indexes with HNSW on embedding columns
  • Comparing L2 distance (Euclidean) vs cosine distance for similarity calculations
  • Performance differences between linear scan and approximate search methods
  • Adding WHERE conditions to vector searches and understanding filter strategies
  • Controlling pre-filter vs post-filter behavior with vectorSearchFilterStrategy setting