Skip to content

Client

sheaf.client

Typed HTTP client for Sheaf deployments.

Hits the same /<deployment>/predict, /health, /ready, and /stream endpoints that ModelServer (Ray Serve) and ModalServer expose. Decodes responses into the correct Pydantic class via the AnyResponse discriminated union, so callers get a typed response object back instead of a raw dict.

Usage (sync)::

from sheaf.client import SheafClient
from sheaf.api.time_series import Frequency, TimeSeriesRequest

with SheafClient(base_url="http://localhost:8000") as client:
    resp = client.predict(
        "chronos",
        TimeSeriesRequest(
            model_name="chronos",
            history=[1.0, 2.0, 3.0],
            horizon=3,
            frequency=Frequency.HOURLY,
        ),
    )
# resp is a TimeSeriesResponse
print(resp.mean)

Usage (async)::

from sheaf.client import AsyncSheafClient

async with AsyncSheafClient(base_url="http://localhost:8000") as client:
    resp = await client.predict("chronos", req)

Retry config (opt-in, exponential backoff)::

from sheaf.client import RetryConfig, SheafClient

retry = RetryConfig(
    max_attempts=3,
    backoff_factor=0.5,                       # 0.5s, 1.0s, 2.0s, ...
    retry_on_status=(502, 503, 504),
    retry_on_connection_errors=True,
)
client = SheafClient(base_url="...", retry=retry)
Errors
  • :class:ValidationError — server returned 422 (request shape didn't match the deployment's expected model_type or had a malformed field).
  • :class:ServerError — server returned 5xx (backend exception).
  • :class:SheafError — base class; also raised for unexpected status codes.
  • :class:ClientError — transport / decode failures.

All raised errors carry request_id (the UUID the client minted on the BaseRequest) so callers can correlate a failed call with server-side log lines and metrics without holding onto the original request.

The client uses httpx under the hood; a custom transport can be injected for tests or for hitting an in-process FastAPI app.

RetryConfig dataclass

RetryConfig(max_attempts: int = 1, backoff_factor: float = 0.5, retry_on_status: tuple[int, ...] = (502, 503, 504), retry_on_connection_errors: bool = True)

Retry policy for client-side requests.

The default (max_attempts=1) is no retry — same behavior as a client constructed without a retry config. Opt in by passing a RetryConfig with max_attempts > 1.

Attributes:

Name Type Description
max_attempts int

Total number of attempts including the first. 1 disables retrying entirely. Must be at least 1.

backoff_factor float

Base for exponential backoff between attempts, in seconds. Sleep before attempt n (n >= 1) is backoff_factor * 2**(n - 1) — i.e. for backoff_factor=0.5 the gaps are 0.5s, 1.0s, 2.0s, …

retry_on_status tuple[int, ...]

HTTP status codes that should be retried. Default is the standard transient-failure set (502, 503, 504). 4xx codes are never sensible to retry — a 422 will fail again.

retry_on_connection_errors bool

When True (default), retry on any httpx.HTTPError raised from transport (connection refused, read timeout, etc.).

sleep_seconds

sleep_seconds(attempt_index: int) -> float

Return the backoff sleep before the (attempt_index)th attempt.

attempt_index=0 is the first attempt — never sleeps. For 1, 2, 3, … the gap is backoff_factor * 2**(attempt_index - 1).

SheafError

SheafError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: Exception

Base class for all sheaf-client errors.

Attributes:

Name Type Description
status_code

HTTP status code returned by the server, or None for transport-level failures.

detail

Server-supplied error detail (the FastAPI detail field on the JSON response body), or the transport error message.

request_id

UUID of the request that triggered this error, when known. Lifted from the calling BaseRequest so callers can log-correlate a failure without holding onto the request object.

ValidationError

ValidationError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: SheafError

Raised when the server returns 422 (Unprocessable Entity).

Common causes: model_type mismatch (e.g. sending a TabularRequest to a TIME_SERIES deployment), unknown LoRA adapter name, malformed payload.

ServerError

ServerError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: SheafError

Raised when the server returns a 5xx status code.

The backend raised an exception during inference; the server caught it and returned a structured 500 with the exception type + message in detail.

ClientError

ClientError(detail: str, *, status_code: int | None = None, request_id: UUID | None = None)

Bases: SheafError

Raised for transport-level failures: connection refused, timeout, JSON decode failure on a 200 response, etc.

Distinct from :class:SheafError only by intent — the error originated on the client side or in transit, not from a server-supplied response.

SheafClient

SheafClient(base_url: str, *, timeout: float = 30.0, headers: dict[str, str] | None = None, retry: RetryConfig | None = None, transport: BaseTransport | None = None)

Synchronous HTTP client for sheaf deployments.

Parameters:

Name Type Description Default
base_url str

Root of the sheaf server, e.g. "http://localhost:8000" or a Modal app URL. Per-deployment paths (/<name>/predict, etc.) are appended automatically.

required
timeout float

Per-request timeout in seconds. Default 30.

30.0
headers dict[str, str] | None

Optional headers to send with every request (auth, etc.).

None
retry RetryConfig | None

Optional :class:RetryConfig. Default is no retry.

None
transport BaseTransport | None

Optional httpx.BaseTransport override. Set this to httpx.MockTransport(...) in tests.

None

Use as a context manager so the underlying connection pool is closed cleanly::

with SheafClient(base_url="...") as client:
    resp = client.predict("my-model", req)

close

close() -> None

Close the underlying httpx connection pool.

predict

predict(deployment: str, request: BaseRequest) -> BaseResponse

POST a request to /<deployment>/predict and decode the response.

Parameters:

Name Type Description Default
deployment str

Name of the target deployment (matches ModelSpec.name).

required
request BaseRequest

Any subclass of BaseRequest — typically the typed request class for the deployment's model type.

required

Returns:

Type Description
BaseResponse

The decoded response, as the correct Pydantic class for the

BaseResponse

request's model type.

Every raised exception carries e.request_id set to request.request_id so the caller can correlate to server logs.

Raises:

Type Description
ValidationError

422 from server.

ServerError

5xx from server.

SheafError

Other non-2xx status codes.

ClientError

Transport / JSON decode failures.

health

health(deployment: str) -> dict[str, Any]

GET /<deployment>/health. Returns the parsed JSON body.

ready

ready(deployment: str) -> dict[str, Any]

GET /<deployment>/ready. Returns the parsed JSON body.

AsyncSheafClient

AsyncSheafClient(base_url: str, *, timeout: float = 30.0, headers: dict[str, str] | None = None, retry: RetryConfig | None = None, transport: AsyncBaseTransport | None = None)

Async HTTP client for sheaf deployments.

Mirror of :class:SheafClient with async methods on top of httpx.AsyncClient. Use as an async context manager so the connection pool closes cleanly::

async with AsyncSheafClient(base_url="...") as client:
    resp = await client.predict("my-model", req)

See :class:SheafClient for argument and error semantics. Streaming (client.stream(...)) does NOT retry — streams are stateful and re-running them mid-flight would yield interleaved progress events.

aclose async

aclose() -> None

Close the underlying httpx async connection pool.

predict async

predict(deployment: str, request: BaseRequest) -> BaseResponse

POST a request to /<deployment>/predict and decode the response.

health async

health(deployment: str) -> dict[str, Any]

GET /<deployment>/health.

ready async

ready(deployment: str) -> dict[str, Any]

GET /<deployment>/ready.

stream async

stream(deployment: str, request: BaseRequest) -> AsyncIterator[dict[str, Any]]

POST to /<deployment>/stream and yield SSE events as dicts.

Each event is a parsed JSON object from a data: {...}\n\n line. Two event shapes the server may emit:

  • {"type": "progress", "step": N, "total_steps": N, "done": False}
  • {"type": "result", "done": True, ...response_fields}

Backend exceptions raised mid-stream become an in-band error event ({"type": "error", "error": "..."}) — the HTTP status is still 200 in that case, so callers must check event["type"] to distinguish.

Pre-stream HTTP errors (422, 5xx, 404, …) raise the usual :class:SheafError subclass before any events are yielded.

Streaming bypasses :class:RetryConfig — re-running a partial stream would yield interleaved progress events from two backend invocations. Configure timeouts on the client itself if you need a stream-level upper bound.

Parameters:

Name Type Description Default
deployment str

Name of the target deployment.

required
request BaseRequest

A request whose backend supports stream_predict (FLUX is the canonical example).

required

Yields:

Type Description
AsyncIterator[dict[str, Any]]

Event dicts in arrival order.