Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add tiered caching blog #3376

Merged
merged 10 commits into from
Oct 24, 2024
92 changes: 92 additions & 0 deletions _posts/2024-10-11-tiered-cache.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,92 @@
---
layout: post
title: "Tiered caching in OpenSearch"
authors:
- upasagar
- akjain
- kkhatua
- kolchfa
date: 2024-10-11
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Update blog date to 2024-10-24

categories:
- technical-posts
has_science_table: true
meta_keywords: OpenSearch, tiered caching, query performance, on-heap cache, disk-based cache, request cache, cache hit ratio, OpenSearch Benchmark, performance optimization, caching
meta_description: Learn how tiered caching in OpenSearch enhances query performance by combining on-heap and disk-based caching. Explore its benefits, limitations, and practical use cases in this detailed blog.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
---
Copy link

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please update the meta with the following:

meta_keywords: tiered caching, on-heap cache, disk-based caching, how tiered caching works, OpenSearch cache optimization

meta_description: Explore the benefits of combining on-heap and disk-based caching in OpenSearch to manage large datasets. Learn how tiered caching works, when to use it, and the performance results of our testing.


For performance-intensive applications like OpenSearch, caching is an essential optimization. Caching stores data so that future requests can be served faster, improving query response times and application performance. OpenSearch uses two main cache types: request cache and query cache. Both are on-heap caches, meaning their size is determined by the amount of available heap memory on a node.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## On-heap cache: A good start, but is it enough?
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

On-heap caching in OpenSearch provides a quick, simple, and efficient way to cache data locally on the node. It offers low-latency data retrieval and thereby providing significant performance gains. However, these advantages come with trade-offs, especially as the cache grows, which may lead to performance challenges.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The size of an on-heap cache size is directly tied to the amount of heap memory available on a node, which is both finite and costly. This limitation creates a challenge when trying to store large datasets or handle numerous queries. When the cache reaches its capacity, older queries must often be evicted to make room for new ones. This frequent eviction can lead to cache churn, negatively impacting performance, as evicted queries may need to be recomputed later.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Is there a better way?

As discussed, on-heap caches have limitations when handling larger datasets. A more effective caching mechanism is *tiered caching*, which uses multiple cache layers, starting with on-heap caching and extending to a disk-based tier. This approach balances performance and capacity, allowing you to store larger datasets without consuming valuable heap memory.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

In the past, using a disk for caching raised concerns because traditional spinning hard drives were slower. However, advancements in storage technology, like modern SSD and NVMe drives, now deliver much faster performance. Although disk access is still slower than memory, the speed gap has narrowed enough that the performance trade-off is minimal and often outweighed by the advantage of increased storage capacity.

Check failure on line 29 in _posts/2024-10-11-tiered-cache.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: NVMe. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: NVMe. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-11-tiered-cache.md", "range": {"start": {"line": 29, "column": 174}}}, "severity": "ERROR"}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"advantage" => "benefit"?

kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## How tiered caching works

Tiered caching combines multiple cache layers, stacked by performance and size. For example, Tier 1 can be an on-heap cache, which is most performant but smaller in size. Tier 2 can be a disk-based cache, which is slower but offers significantly more storage. The following image shows a tiered caching model.

Check failure on line 33 in _posts/2024-10-11-tiered-cache.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: performant. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: performant. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-11-tiered-cache.md", "range": {"start": {"line": 33, "column": 140}}}, "severity": "ERROR"}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

![Tiered cache](/assets/media/blog-images/2024-10-11-tiered-cache/tiered_Cache_2.png){:class="img-centered" style="width:300px;"}

OpenSearch currently uses an on-heap tier and a disk tier in its tiered caching model. When an item is evicted from the on-heap cache, it's moved to the disk cache. For each incoming query, OpenSearch first checks whether the data exists in either the on-heap or disk cache. If it’s found, the response is returned immediately. If not, the query is recomputed, and the result is stored in the on-heap cache. The following diagram depicts the caching algorithm.

![Tiered cache algorithm](/assets/media/blog-images/2024-10-11-tiered-cache/tc_df_2.png){:class="img-centered"}

Currently, OpenSearch supports the tiered caching model only for **request cache**. By default, request cache uses the on-heap cache tier. The cache size is configurable and defaults to 1% of the heap memory on a node. You can enable tiered caching to add a disk-based cache tier, which stores larger datasets that don't fit in memory. This offloads the on-heap cache, improving overall performance.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Tiered caching is also designed to be pluggable. You can seamlessly integrate different types of on-heap and disk cache implementations or libraries using tiered cache settings. For more information, see [Tiered cache](https://opensearch.org/docs/latest/search-plugins/caching/tiered-cache/).

## When to use tiered caching

Because tiered caching currently only applies to the request cache, it’s useful when the existing on-heap request cache isn't large enough to store your datasets and you see frequent evictions. You can check request cache stats using the `GET /_nodes/stats/indices/request_cache` endpoint to monitor evictions, hits, and misses. If you notice frequent evictions along with some hits, enabling tiered caching could provide a significant performance boost.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

Tiered caching is especially beneficial in these situations:

- Your domain experiences many cache evictions and has repeatable queries. You can confirm this by using request cache stats.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
- You're working with log analytics or read-only indexes, in which data doesn't change often, and you're seeing frequent evictions.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

By default, request cache only stores aggregation queries. You can enable caching for specific requests by using the `?request_cache=true` query parameter.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## How to enable tiered caching

To enable tiered caching, you'll need to configure node settings. This includes installing the disk cache plugin, enabling tiered caching, and adjusting other settings as needed. For detailed instructions, see [tiered cache documentation](https://opensearch.org/docs/latest/search-plugins/caching/tiered-cache/).
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## Performance results

This feature is currently experimental and isn’t recommended for production use. To assess its performance, we conducted several tests.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

In our performance tests, we compared tiered caching with the default on-heap cache across various query types and cache hit ratios. We also measured different latency percentiles (p25, p50, p75, p90, and p99).

#### Cluster setup

* **Instance type**: c5.4xl
* **Node count**: 1
* **Total heap size**: 16 GB
* **Default cache settings**: On-heap cache size: 40 MB
* **Tiered cache settings**:
* On-heap cache size: 40 MB
* Disk cache size: 1 GB

We used the `nyc_taxis` workload in the OpenSearch Benchmark (OSB) but needed to add support for issuing statistically repeatable queries. The original benchmark always runs with caching disabled, which doesn’t allow for variation in query repetition. By adding this support, we can better simulate real-world use cases, test for a target cache hit ratio, and account for query repetition variability.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

The workload consisted of a mix of queries, categorized by their shard-level latencies:
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

* **Expensive**: >150 ms
* **Medium**: 10–150 ms
* **Cheap**: <10 ms

The following diagram presents the performance test results. A red vertical line in the results denotes the baseline percentiles, where the default on-heap cache is enabled and tiered caching is disabled. We tested with 0%, 30%, and 70% query repeatability, corresponding to different cache hit ratios.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

![Performance test results](/assets/media/blog-images/2024-10-11-tiered-cache/performance_results.png){:class="img-centered"}

Initial results show that tiered caching performs well, especially with higher cache hit ratios and latencies below p75. The gains are particularly notable in running computationally expensive queries, because tiered caching reduces the need to recompute them, fetching results directly from the cache instead.
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

## What’s next?
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved

While tiered caching is a promising feature, we're actively working on further improvements. We're currently exploring ways to make tiered caching more performant. Future enhancements may include promoting frequently accessed items from the disk cache to the on-heap cache, persisting disk cache data between restarts, or integrating tiered caching with other OpenSearch cache types, such as the query cache. You can follow our progress in [this issue](link). We encourage you to try tiered caching in a non-production environment and share any feedback to help us make this feature more robust.

Check failure on line 92 in _posts/2024-10-11-tiered-cache.md

View workflow job for this annotation

GitHub Actions / style-job

[vale] reported by reviewdog 🐶 [OpenSearch.Spelling] Error: performant. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks. Raw Output: {"message": "[OpenSearch.Spelling] Error: performant. If you are referencing a setting, variable, format, function, or repository, surround it with tic marks.", "location": {"path": "_posts/2024-10-11-tiered-cache.md", "range": {"start": {"line": 92, "column": 153}}}, "severity": "ERROR"}
kolchfa-aws marked this conversation as resolved.
Show resolved Hide resolved
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading