Skip to content

Concepts

The runtime ideas that hold sheaf-serve together. Each page is the short version of one design decision.

  • Batching@serve.batch, BatchPolicy, and bucket_by for length-variable inputs.
  • Caching — opt-in per-deployment LRU response cache with optional TTL.
  • StreamingPOST /{name}/stream SSE for incremental output (FLUX progress, chunked transcripts).
  • Feast integration — send a feature_ref instead of raw history; sheaf-serve resolves features online.
  • Offline batch jobsBatchRunner: JSONL in, JSONL out, via Ray Data, with stateless-task or warm-actor execution.
  • Async-job workerSheafWorker: Redis Streams consumer with at-least-once delivery and webhook callbacks.
  • LoRA multiplexing — adapter-aware sub-batching for diffusion backends; per-request adapter selection on a single deployment.