Skip to content

Scheduling

sheaf.scheduling.batch

Batching policies for model-type-aware request scheduling.

BatchPolicy

Bases: BaseModel

Controls how requests are batched before hitting the model backend.

max_batch_size: hard cap on requests per batch timeout_ms: max time to wait for a full batch before flushing bucket_by: field name to group on before calling the backend. Requests with the same value of this field are sent to the backend together; requests with different values are dispatched in separate batch_predict calls within the same Ray Serve batch window. Useful when sequences of different lengths would otherwise force padding across the whole batch — e.g. bucket_by="horizon" for time series, bucket_by="n_frames" for video.

            ``None`` (default): all requests in a batch window are
            sent to the backend in a single call.

            When the deployment's ``ModelSpec.lora`` is set, requests
            are *additionally* grouped by their resolved LoRA adapter
            selection — that grouping is automatic and cannot be
            disabled (``pipeline.set_adapters`` is process-global
            state, so concurrent requests with different adapters
            must dispatch separately).  ``bucket_by`` and
            ``ModelSpec.lora`` are mutually exclusive in v1; the
            spec validator rejects the combination.

bucket_requests

bucket_requests(requests: list[Any], bucket_by: str | None) -> list[tuple[list[int], list[Any]]]

Group requests by the value of field bucket_by.

Returns a list of (indices, sub_requests) pairs — one per unique bucket value, in the order the bucket was first seen. When bucket_by is None, returns a single group containing all requests.

Relative order within each bucket matches the original request list. Requests that lack the bucket_by attribute (getattr returns None) are grouped together under the None bucket.

Parameters:

Name Type Description Default
requests list[Any]

Ordered list of request objects (any type with attrs).

required
bucket_by str | None

Name of the field to bucket on, or None to skip.

required

Returns:

Type Description
list[tuple[list[int], list[Any]]]

List of (original_indices, sub_requests) tuples.

Example::

reqs = [req(horizon=6), req(horizon=12), req(horizon=6)]
groups = bucket_requests(reqs, "horizon")
# groups == [([0, 2], [reqs[0], reqs[2]]),
#            ([1],    [reqs[1]])]