Client SDK¶
The typed Python client lives in the same package as the server —
pip install sheaf-serve gets you both. No separate sheaf-client
package; one source of truth for the request/response schemas.
Async variant: AsyncSheafClient with the same surface but async def
methods. Streaming is async-only; everything else is available in both
forms.
Predict¶
client.predict(deployment, request) decodes the response into a typed
Pydantic object — TimeSeriesResponse, EmbeddingResponse,
DiffusionResponse, etc.
from sheaf.api.time_series import TimeSeriesRequest, Frequency, OutputMode
from sheaf.client import SheafClient
client = SheafClient(base_url="http://127.0.0.1:8000")
response = client.predict(
"chronos",
TimeSeriesRequest(
model_name="chronos",
history=[100, 110, 105, 120, 125, 130, 135],
horizon=6,
frequency=Frequency.HOURLY,
output_mode=OutputMode.QUANTILES,
quantile_levels=[0.1, 0.5, 0.9],
),
)
print(response.mean)
print(response.quantiles["0.5"])
Validation runs before the wire call — a malformed request raises
pydantic.ValidationError locally, not after a server round-trip.
Health + readiness¶
client.health("chronos") # liveness — Ray Serve replica is up
client.ready("chronos") # readiness — backend.load() finished
Both raise on non-200; the underlying SheafError subclass tells you
whether it was a 4xx (ClientError) or 5xx (ServerError).
Streaming (async only)¶
import asyncio
from sheaf.api.diffusion import DiffusionRequest
from sheaf.client import AsyncSheafClient
async def main():
async with AsyncSheafClient(base_url="https://flux.example.com") as client:
async for event in client.stream(
"flux",
DiffusionRequest(
model_name="flux",
prompt="a sheaf, mathematical concept, line drawing",
num_inference_steps=20,
),
):
if event["type"] == "progress":
print(f"step {event['step']}/{event['total_steps']}")
elif event["type"] == "result":
save_image(event["image_b64"])
break
asyncio.run(main())
Retry config¶
from sheaf.client import RetryConfig, SheafClient
client = SheafClient(
base_url="http://127.0.0.1:8000",
retry=RetryConfig(
max_attempts=3,
backoff_factor=0.5, # exponential: 0.5s, 1s, 2s
retry_on_status=(502, 503, 504),
retry_on_connection_errors=True,
),
)
RetryConfig is a frozen dataclass — pass it once at client
construction. Streaming bypasses retry by design (replaying a stream
mid-flight is more often wrong than right).
request_id propagation¶
Every BaseRequest carries a request_id (UUID, auto-generated if
not set). The server logs and metrics include it; the client
decorates every raised SheafError with it.
try:
response = client.predict("chronos", req)
except SheafError as e:
print(f"server error for {e.request_id}: {e}")
# → "server error for 7d4e8a... — 503 Service Unavailable"
Use it to correlate a client-side failure to a server log line without grepping by timestamp.
Reference¶
Full schema in the Client API reference.