Client¶

sheaf.client ¶

Typed HTTP client for Sheaf deployments.

Hits the same /<deployment>/predict, /health, /ready, and /stream endpoints that ModelServer (Ray Serve) and ModalServer expose. Decodes responses into the correct Pydantic class via the AnyResponse discriminated union, so callers get a typed response object back instead of a raw dict.

Usage (sync)::

from sheaf.client import SheafClient
from sheaf.api.time_series import Frequency, TimeSeriesRequest

with SheafClient(base_url="http://localhost:8000") as client:
    resp = client.predict(
        "chronos",
        TimeSeriesRequest(
            model_name="chronos",
            history=[1.0, 2.0, 3.0],
            horizon=3,
            frequency=Frequency.HOURLY,
        ),
    )
# resp is a TimeSeriesResponse
print(resp.mean)

Usage (async)::

from sheaf.client import AsyncSheafClient

async with AsyncSheafClient(base_url="http://localhost:8000") as client:
    resp = await client.predict("chronos", req)

Retry config (opt-in, exponential backoff)::

from sheaf.client import RetryConfig, SheafClient

retry = RetryConfig(
    max_attempts=3,
    backoff_factor=0.5,                       # 0.5s, 1.0s, 2.0s, ...
    retry_on_status=(502, 503, 504),
    retry_on_connection_errors=True,
)
client = SheafClient(base_url="...", retry=retry)

Errors

:class:ValidationError — server returned 422 (request shape didn't match the deployment's expected model_type or had a malformed field).
:class:ServerError — server returned 5xx (backend exception).
:class:SheafError — base class; also raised for unexpected status codes.
:class:ClientError — transport / decode failures.

All raised errors carry request_id (the UUID the client minted on the BaseRequest) so callers can correlate a failed call with server-side log lines and metrics without holding onto the original request.

The client uses httpx under the hood; a custom transport can be injected for tests or for hitting an in-process FastAPI app.

RetryConfig `dataclass` ¶

RetryConfig(max_attempts: int = 1, backoff_factor: float = 0.5, retry_on_status: tuple[int, ...] = (502, 503, 504), retry_on_connection_errors: bool = True)

Retry policy for client-side requests.

The default (max_attempts=1) is no retry — same behavior as a client constructed without a retry config. Opt in by passing a RetryConfig with max_attempts > 1.

Attributes:

Name	Type	Description
`max_attempts`	`int`	Total number of attempts including the first. `1` disables retrying entirely. Must be at least 1.
`backoff_factor`	`float`	Base for exponential backoff between attempts, in seconds. Sleep before attempt `n` (`n >= 1`) is `backoff_factor * 2**(n - 1)` — i.e. for `backoff_factor=0.5` the gaps are 0.5s, 1.0s, 2.0s, …
`retry_on_status`	`tuple[int, ...]`	HTTP status codes that should be retried. Default is the standard transient-failure set `(502, 503, 504)`. 4xx codes are never sensible to retry — a 422 will fail again.
`retry_on_connection_errors`	`bool`	When `True` (default), retry on any `httpx.HTTPError` raised from transport (connection refused, read timeout, etc.).

sleep_seconds ¶

sleep_seconds(attempt_index: int) -> float

Return the backoff sleep before the (attempt_index)th attempt.

attempt_index=0 is the first attempt — never sleeps. For 1, 2, 3, … the gap is backoff_factor * 2**(attempt_index - 1).

SheafError ¶

SheafError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: Exception

Base class for all sheaf-client errors.

Attributes:

Name	Type	Description
`status_code`		HTTP status code returned by the server, or `None` for transport-level failures.
`detail`		Server-supplied error detail (the FastAPI `detail` field on the JSON response body), or the transport error message.
`request_id`		UUID of the request that triggered this error, when known. Lifted from the calling `BaseRequest` so callers can log-correlate a failure without holding onto the request object.

ValidationError ¶

ValidationError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: SheafError

Raised when the server returns 422 (Unprocessable Entity).

Common causes: model_type mismatch (e.g. sending a TabularRequest to a TIME_SERIES deployment), unknown LoRA adapter name, malformed payload.

ServerError ¶

ServerError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: SheafError

Raised when the server returns a 5xx status code.

The backend raised an exception during inference; the server caught it and returned a structured 500 with the exception type + message in detail.

ClientError ¶

ClientError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: SheafError

Raised for transport-level failures: connection refused, timeout, JSON decode failure on a 200 response, etc.

Distinct from :class:SheafError only by intent — the error originated on the client side or in transit, not from a server-supplied response.

SheafClient ¶

SheafClient(base_url: str, *, timeout: float = 30.0, headers: dict[str, str] | None = None, retry: RetryConfig | None = None, transport: BaseTransport | None = None)

Synchronous HTTP client for sheaf deployments.

Parameters:

Name	Type	Description	Default
`base_url`	`str`	Root of the sheaf server, e.g. `"http://localhost:8000"` or a Modal app URL. Per-deployment paths (`/<name>/predict`, etc.) are appended automatically.	required
`timeout`	`float`	Per-request timeout in seconds. Default 30.	`30.0`
`headers`	`dict[str, str] \| None`	Optional headers to send with every request (auth, etc.).	`None`
`retry`	`RetryConfig \| None`	Optional :class:`RetryConfig`. Default is no retry.	`None`
`transport`	`BaseTransport \| None`	Optional `httpx.BaseTransport` override. Set this to `httpx.MockTransport(...)` in tests.	`None`

Use as a context manager so the underlying connection pool is closed cleanly::

with SheafClient(base_url="...") as client:
    resp = client.predict("my-model", req)

close ¶

close() -> None

Close the underlying httpx connection pool.

predict ¶

predict(deployment: str, request: BaseRequest) -> BaseResponse

POST a request to /<deployment>/predict and decode the response.

Parameters:

Name	Type	Description	Default
`deployment`	`str`	Name of the target deployment (matches `ModelSpec.name`).	required
`request`	`BaseRequest`	Any subclass of `BaseRequest` — typically the typed request class for the deployment's model type.	required

Returns:

Type	Description
`BaseResponse`	The decoded response, as the correct Pydantic class for the
`BaseResponse`	request's model type.

Every raised exception carries e.request_id set to request.request_id so the caller can correlate to server logs.

Raises:

Type	Description
`ValidationError`	422 from server.
`ServerError`	5xx from server.
`SheafError`	Other non-2xx status codes.
`ClientError`	Transport / JSON decode failures.

health ¶

health(deployment: str) -> dict[str, Any]

GET /<deployment>/health. Returns the parsed JSON body.

ready ¶

ready(deployment: str) -> dict[str, Any]

GET /<deployment>/ready. Returns the parsed JSON body.

AsyncSheafClient ¶

AsyncSheafClient(base_url: str, *, timeout: float = 30.0, headers: dict[str, str] | None = None, retry: RetryConfig | None = None, transport: AsyncBaseTransport | None = None)

Async HTTP client for sheaf deployments.

Mirror of :class:SheafClient with async methods on top of httpx.AsyncClient. Use as an async context manager so the connection pool closes cleanly::

async with AsyncSheafClient(base_url="...") as client:
    resp = await client.predict("my-model", req)

See :class:SheafClient for argument and error semantics. Streaming (client.stream(...)) does NOT retry — streams are stateful and re-running them mid-flight would yield interleaved progress events.

aclose `async` ¶

aclose() -> None

Close the underlying httpx async connection pool.

predict `async` ¶

predict(deployment: str, request: BaseRequest) -> BaseResponse

POST a request to /<deployment>/predict and decode the response.

health `async` ¶

health(deployment: str) -> dict[str, Any]

GET /<deployment>/health.

ready `async` ¶

ready(deployment: str) -> dict[str, Any]

GET /<deployment>/ready.

stream `async` ¶

stream(deployment: str, request: BaseRequest) -> AsyncIterator[dict[str, Any]]

POST to /<deployment>/stream and yield SSE events as dicts.

Each event is a parsed JSON object from a data: {...}\n\n line. Two event shapes the server may emit:

{"type": "progress", "step": N, "total_steps": N, "done": False}
{"type": "result", "done": True, ...response_fields}

Backend exceptions raised mid-stream become an in-band error event ({"type": "error", "error": "..."}) — the HTTP status is still 200 in that case, so callers must check event["type"] to distinguish.

Pre-stream HTTP errors (422, 5xx, 404, …) raise the usual :class:SheafError subclass before any events are yielded.

Streaming bypasses :class:RetryConfig — re-running a partial stream would yield interleaved progress events from two backend invocations. Configure timeouts on the client itself if you need a stream-level upper bound.

Parameters:

Name	Type	Description	Default
`deployment`	`str`	Name of the target deployment.	required
`request`	`BaseRequest`	A request whose backend supports `stream_predict` (FLUX is the canonical example).	required

Yields:

Type	Description
`AsyncIterator[dict[str, Any]]`	Event dicts in arrival order.

Client¶

sheaf.client ¶

RetryConfig dataclass ¶

sleep_seconds ¶

SheafError ¶

ValidationError ¶

ServerError ¶

ClientError ¶

SheafClient ¶

close ¶

predict ¶

health ¶

ready ¶

AsyncSheafClient ¶

aclose async ¶

predict async ¶

health async ¶

ready async ¶

stream async ¶

RetryConfig `dataclass` ¶

aclose `async` ¶

predict `async` ¶

health `async` ¶

ready `async` ¶

stream `async` ¶