Caching¶
sheaf.cache ¶
In-process LRU response cache for Sheaf deployments.
Each deployment can opt in to caching via ModelSpec.cache. When enabled,
a SHA-256 of the canonical request JSON (excluding request_id and any
caller-specified fields) is used as the cache key. Cache hits bypass inference
entirely — no batching, no backend call.
Usage::
from sheaf.spec import ModelSpec, CacheConfig
spec = ModelSpec(
name="chronos-small",
...
cache=CacheConfig(enabled=True, max_size=512, ttl_s=300.0),
)
Good candidates for caching
- Embedding models (same image/text → same vector)
- Time-series forecasts with fixed history
- Tabular models with fixed feature rows
Poor candidates (non-deterministic or privacy-sensitive): - Diffusion with random seeds (include seed in the key — same seed IS same image) - Any model where the caller explicitly needs fresh output each time
SHEAF_CACHE_DISABLED=1 disables all caches regardless of config (useful in smoke tests and integration runs where you want to exercise the backend).
CacheConfig ¶
Bases: BaseModel
Cache configuration for a single deployment.
Attributes:
| Name | Type | Description |
|---|---|---|
enabled |
bool
|
Enable the cache (default |
max_size |
int
|
Maximum number of LRU entries (default 1024). |
ttl_s |
float | None
|
Time-to-live in seconds. |
exclude_fields |
list[str]
|
Request field names to omit from the cache key.
|
ResponseCache ¶
ResponseCache(config: CacheConfig)
Thread-safe in-process LRU cache for predict responses.
Keys are SHA-256 hex digests of the canonical request JSON.
Values are the serialised response dicts returned by model_dump.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
config
|
CacheConfig
|
|
required |
make_key ¶
Return a SHA-256 cache key for (deployment, request).
request_id is always excluded (it is unique per call and must not
affect the key). Fields listed in config.exclude_fields are also
dropped before hashing.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
deployment
|
str
|
Deployment name ( |
required |
request
|
Any
|
Any Pydantic |
required |
Returns:
| Type | Description |
|---|---|
str
|
64-character lowercase hex digest. |
get ¶
Return the cached response dict, or None on miss / expiry.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Cache key from :meth: |
required |
set ¶
Store a response dict under key.
If the cache is at capacity the least-recently-used entry is evicted.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
key
|
str
|
Cache key from :meth: |
required |
value
|
dict[str, Any]
|
Serialised response dict ( |
required |