Client¶
sheaf.client ¶
Typed HTTP client for Sheaf deployments.
Hits the same /<deployment>/predict, /health, /ready, and
/stream endpoints that ModelServer (Ray Serve) and ModalServer
expose. Decodes responses into the correct Pydantic class via the
AnyResponse discriminated union, so callers get a typed response object
back instead of a raw dict.
Usage (sync)::
from sheaf.client import SheafClient
from sheaf.api.time_series import Frequency, TimeSeriesRequest
with SheafClient(base_url="http://localhost:8000") as client:
resp = client.predict(
"chronos",
TimeSeriesRequest(
model_name="chronos",
history=[1.0, 2.0, 3.0],
horizon=3,
frequency=Frequency.HOURLY,
),
)
# resp is a TimeSeriesResponse
print(resp.mean)
Usage (async)::
from sheaf.client import AsyncSheafClient
async with AsyncSheafClient(base_url="http://localhost:8000") as client:
resp = await client.predict("chronos", req)
Retry config (opt-in, exponential backoff)::
from sheaf.client import RetryConfig, SheafClient
retry = RetryConfig(
max_attempts=3,
backoff_factor=0.5, # 0.5s, 1.0s, 2.0s, ...
retry_on_status=(502, 503, 504),
retry_on_connection_errors=True,
)
client = SheafClient(base_url="...", retry=retry)
Errors
- :class:
ValidationError— server returned 422 (request shape didn't match the deployment's expectedmodel_typeor had a malformed field). - :class:
ServerError— server returned 5xx (backend exception). - :class:
SheafError— base class; also raised for unexpected status codes. - :class:
ClientError— transport / decode failures.
All raised errors carry request_id (the UUID the client minted on the
BaseRequest) so callers can correlate a failed call with server-side
log lines and metrics without holding onto the original request.
The client uses httpx under the hood; a custom transport can be
injected for tests or for hitting an in-process FastAPI app.
RetryConfig
dataclass
¶
RetryConfig(max_attempts: int = 1, backoff_factor: float = 0.5, retry_on_status: tuple[int, ...] = (502, 503, 504), retry_on_connection_errors: bool = True)
Retry policy for client-side requests.
The default (max_attempts=1) is no retry — same behavior as a client
constructed without a retry config. Opt in by passing a RetryConfig
with max_attempts > 1.
Attributes:
| Name | Type | Description |
|---|---|---|
max_attempts |
int
|
Total number of attempts including the first. |
backoff_factor |
float
|
Base for exponential backoff between attempts, in
seconds. Sleep before attempt |
retry_on_status |
tuple[int, ...]
|
HTTP status codes that should be retried. Default
is the standard transient-failure set |
retry_on_connection_errors |
bool
|
When |
sleep_seconds ¶
Return the backoff sleep before the (attempt_index)th attempt.
attempt_index=0 is the first attempt — never sleeps. For 1, 2, 3,
… the gap is backoff_factor * 2**(attempt_index - 1).
SheafError ¶
Bases: Exception
Base class for all sheaf-client errors.
Attributes:
| Name | Type | Description |
|---|---|---|
status_code |
HTTP status code returned by the server, or |
|
detail |
Server-supplied error detail (the FastAPI |
|
request_id |
UUID of the request that triggered this error, when
known. Lifted from the calling |
ValidationError ¶
Bases: SheafError
Raised when the server returns 422 (Unprocessable Entity).
Common causes: model_type mismatch (e.g. sending a TabularRequest to
a TIME_SERIES deployment), unknown LoRA adapter name, malformed payload.
ServerError ¶
Bases: SheafError
Raised when the server returns a 5xx status code.
The backend raised an exception during inference; the server caught it
and returned a structured 500 with the exception type + message in
detail.
ClientError ¶
Bases: SheafError
Raised for transport-level failures: connection refused, timeout, JSON decode failure on a 200 response, etc.
Distinct from :class:SheafError only by intent — the error originated
on the client side or in transit, not from a server-supplied response.
SheafClient ¶
SheafClient(base_url: str, *, timeout: float = 30.0, headers: dict[str, str] | None = None, retry: RetryConfig | None = None, transport: BaseTransport | None = None)
Synchronous HTTP client for sheaf deployments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
base_url
|
str
|
Root of the sheaf server, e.g. |
required |
timeout
|
float
|
Per-request timeout in seconds. Default 30. |
30.0
|
headers
|
dict[str, str] | None
|
Optional headers to send with every request (auth, etc.). |
None
|
retry
|
RetryConfig | None
|
Optional :class: |
None
|
transport
|
BaseTransport | None
|
Optional |
None
|
Use as a context manager so the underlying connection pool is closed cleanly::
with SheafClient(base_url="...") as client:
resp = client.predict("my-model", req)
predict ¶
POST a request to /<deployment>/predict and decode the response.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
deployment
|
str
|
Name of the target deployment (matches |
required |
request
|
BaseRequest
|
Any subclass of |
required |
Returns:
| Type | Description |
|---|---|
BaseResponse
|
The decoded response, as the correct Pydantic class for the |
BaseResponse
|
request's model type. |
Every raised exception carries e.request_id set to
request.request_id so the caller can correlate to server logs.
Raises:
| Type | Description |
|---|---|
ValidationError
|
422 from server. |
ServerError
|
5xx from server. |
SheafError
|
Other non-2xx status codes. |
ClientError
|
Transport / JSON decode failures. |
health ¶
GET /<deployment>/health. Returns the parsed JSON body.
ready ¶
GET /<deployment>/ready. Returns the parsed JSON body.
AsyncSheafClient ¶
AsyncSheafClient(base_url: str, *, timeout: float = 30.0, headers: dict[str, str] | None = None, retry: RetryConfig | None = None, transport: AsyncBaseTransport | None = None)
Async HTTP client for sheaf deployments.
Mirror of :class:SheafClient with async methods on top of
httpx.AsyncClient. Use as an async context manager so the
connection pool closes cleanly::
async with AsyncSheafClient(base_url="...") as client:
resp = await client.predict("my-model", req)
See :class:SheafClient for argument and error semantics. Streaming
(client.stream(...)) does NOT retry — streams are stateful and
re-running them mid-flight would yield interleaved progress events.
predict
async
¶
POST a request to /<deployment>/predict and decode the response.
stream
async
¶
POST to /<deployment>/stream and yield SSE events as dicts.
Each event is a parsed JSON object from a data: {...}\n\n line.
Two event shapes the server may emit:
{"type": "progress", "step": N, "total_steps": N, "done": False}{"type": "result", "done": True, ...response_fields}
Backend exceptions raised mid-stream become an in-band error event
({"type": "error", "error": "..."}) — the HTTP status is still 200
in that case, so callers must check event["type"] to distinguish.
Pre-stream HTTP errors (422, 5xx, 404, …) raise the usual
:class:SheafError subclass before any events are yielded.
Streaming bypasses :class:RetryConfig — re-running a partial stream
would yield interleaved progress events from two backend invocations.
Configure timeouts on the client itself if you need a stream-level
upper bound.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
deployment
|
str
|
Name of the target deployment. |
required |
request
|
BaseRequest
|
A request whose backend supports |
required |
Yields:
| Type | Description |
|---|---|
AsyncIterator[dict[str, Any]]
|
Event dicts in arrival order. |