Scheduling¶
sheaf.scheduling.batch ¶
Batching policies for model-type-aware request scheduling.
BatchPolicy ¶
Bases: BaseModel
Controls how requests are batched before hitting the model backend.
max_batch_size: hard cap on requests per batch
timeout_ms: max time to wait for a full batch before flushing
bucket_by: field name to group on before calling the backend.
Requests with the same value of this field are sent to
the backend together; requests with different values are
dispatched in separate batch_predict calls within the
same Ray Serve batch window. Useful when sequences of
different lengths would otherwise force padding across
the whole batch — e.g. bucket_by="horizon" for time
series, bucket_by="n_frames" for video.
``None`` (default): all requests in a batch window are
sent to the backend in a single call.
When the deployment's ``ModelSpec.lora`` is set, requests
are *additionally* grouped by their resolved LoRA adapter
selection — that grouping is automatic and cannot be
disabled (``pipeline.set_adapters`` is process-global
state, so concurrent requests with different adapters
must dispatch separately). ``bucket_by`` and
``ModelSpec.lora`` are mutually exclusive in v1; the
spec validator rejects the combination.
bucket_requests ¶
Group requests by the value of field bucket_by.
Returns a list of (indices, sub_requests) pairs — one per unique
bucket value, in the order the bucket was first seen. When bucket_by
is None, returns a single group containing all requests.
Relative order within each bucket matches the original request list.
Requests that lack the bucket_by attribute (getattr returns
None) are grouped together under the None bucket.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
requests
|
list[Any]
|
Ordered list of request objects (any type with attrs). |
required |
bucket_by
|
str | None
|
Name of the field to bucket on, or |
required |
Returns:
| Type | Description |
|---|---|
list[tuple[list[int], list[Any]]]
|
List of |
Example::
reqs = [req(horizon=6), req(horizon=12), req(horizon=6)]
groups = bucket_requests(reqs, "horizon")
# groups == [([0, 2], [reqs[0], reqs[2]]),
# ([1], [reqs[1]])]