본문으로 바로가기
본문으로 바로가기

Scaling recommendations

Introduction

Auto-scaling database resources requires careful balance: scaling up too slowly can risk performance degradation while scaling down too aggressively can trigger constant oscillations.

ClickHouse Cloud enables faster scale-downs, minimized scaling oscillations, and substantial infrastructure cost reduction for variable workloads, while maintaining the stability needed for production databases by pairing a two-window recommendation framework with a target-tracking CPU recommendation system.

CPU-based Scaling

CPU Scaling is based on target tracking which calculates the exact CPU allocation needed to keep utilization at a target level. A scaling action is only triggered if current CPU utilization falls outside a defined band:

ParameterValueMeaning
Target utilization53%The utilization level ClickHouse aims to maintain
High watermark75%Triggers scale-up when CPU exceeds this threshold
Low watermark37.5%Triggers scale-down when CPU falls below this threshold

The recommender evaluates CPU utilization based on historical usage, and determines a recommended CPU size using this formula:

recommended_cpu = max_cpu_usage / target_utilization

If the CPU utilization is between 37.5%–75% of allocated capacity, no scaling action is taken. Outside that band, the recommender computes the exact size needed to land back at 53% utilization, and the service is scaled accordingly.

Example

A service allocated 4 vCPU experiences a spike to 3.8 vCPU usage (~95% utilization), crossing the 75% high watermark. The recommender calculates: 3.8 / 0.53 ≈ 7.2 vCPU, and rounds up to the next available size (8 vCPU). Once load subsides and usage drops below 37.5% (1.5 vCPU), the recommender scales back down proportionally.

Memory-based recommendations

ClickHouse Cloud automatically recommends memory sizes based on your service's actual usage patterns. The recommender analyzes usage over a lookback window and adds headroom to handle spikes and prevent out-of-memory (OOM) errors.

The recommender looks at three signals:

  • Query memory: The peak memory used during query execution
  • Resident memory: The peak memory held by the process overall
  • OOM events: Whether queries or replicas have recently run out of memory

How headroom is calculated

For query and resident memory, the amount of headroom added depends on how predictable your usage is:

  • Stable usage (low variation): 1.25x multiplier — more headroom, since usage is consistent and unlikely to spike unexpectedly
  • Spiky usage (high variation): 1.1x multiplier — less headroom, to avoid over-provisioning for workloads that already vary widely

If OOM events are detected, the recommender applies a more aggressive 1.5x multiplier to ensure the service has enough memory to recover.

Final recommendation

The system takes the highest value across all signals:

desired_memory = max(
  query_memory × skew_multiplier,
  resident_memory × skew_multiplier,
  resident_memory × 1.5,   // if query OOMs detected
  rss_at_crash × 1.5       // if pod OOMs detected
)

Two-window recommender

Instead of using a single window, ClickHouse Cloud uses two lookback windows with different time ranges:

  • Small Window (3 hours): Captures recent usage patterns, enables faster scale-down
  • Large Window (30 hours): Ensures we scale up in a single step to the maximum usage seen in the longer lookback window, rather than multiple gradual scale-ups. This is critical because scaling takes time and invalidates local caches; so it is safer to scale up in a single step.

Each window independently generates a recommendation using both memory and CPU analysis. The system then merges these recommendations based on the scaling direction each window suggests, as shown in the figure below:

Two-window recommender merging logic

For a deep dive into the design decisions of the recommender, see "Smarter Auto-Scaling for ClickHouse: The Two-Window Approach "