API contracts¶

Typed request / response schemas — one module per model type. Validation runs at the request boundary; backends receive validated objects.

Base + discriminated unions¶

sheaf.api.base ¶

Base request/response contracts and model type registry.

sheaf.api.union ¶

Discriminated unions over every supported request and response type.

Used by sheaf.server (FastAPI body parsing for /predict and /stream) and by sheaf.batch.runner (per-row validation before Ray Data map_batches). Kept in its own module so batch workloads can validate input rows without importing sheaf.server, which pulls in the full Ray Serve deployment surface.

AnyResponse mirrors AnyRequest and is what sheaf.client.SheafClient decodes predict() responses into so callers get the correctly-typed response object back.

Time series¶

sheaf.api.time_series ¶

API contract for time series foundation models (Chronos2, TimesFM, etc.).

FeatureRef ¶

Bases: BaseModel

Reference to a Feast online feature used as model input history.

The referenced feature must store the complete input sequence as a list[float] (or list[list[float]] for multivariate history). FeastResolver calls get_online_features with these parameters and returns the resolved list as the history field for the backend.

Example::

feature_ref=FeatureRef(
    feature_view="asset_prices",
    feature_name="close_history_30d",
    entity_key="ticker",
    entity_value="AAPL",
)

TimeSeriesRequest ¶

Bases: BaseRequest

Request contract for time series foundation models.

Either history (raw values) or feature_ref (Feast entity reference) must be provided, not both. When feature_ref is given, the serving layer resolves it via FeastResolver before passing the request to the backend — the backend always sees history populated.

Univariate: history=[1.0, 2.0, 3.0, ...] Multivariate: history=[[1.0, 10.0], [2.0, 11.0], ...] (shape: [time, variates]) target_index selects which variate to forecast (default 0).

n_variates `property` ¶

n_variates: int

Number of variates in history (1 for univariate).

target_history `property` ¶

target_history: list[float]

Univariate target series extracted from (possibly multivariate) history.

For univariate history, returns history as-is. For multivariate history (shape [time, variates]), extracts the variate at target_index.

TimeSeriesResponse ¶

Bases: BaseResponse

Response contract for time series foundation models.

Tabular¶

sheaf.api.tabular ¶

API contract for tabular foundation models (TabPFN, etc.).

TabularRequest ¶

Bases: BaseRequest

Request contract for tabular foundation models.

TabPFN is an in-context learner: context_X/context_y are the "training" examples passed at inference time. query_X contains the rows to predict. No separate training step — everything happens in a single forward pass.

Attributes:

Name	Type	Description
`context_X`	`list[list[float]]`	Feature matrix for in-context examples, shape [n_context, n_features]
`context_y`	`list[float \| int]`	Labels for in-context examples, shape [n_context]
`query_X`	`list[list[float]]`	Feature rows to predict, shape [n_query, n_features]
`task`	`Literal['classification', 'regression']`	"classification" or "regression"
`feature_names`	`list[str] \| None`	Optional column names — used for logging and debugging
`categorical_feature_indices`	`list[int] \| None`	Indices of categorical columns
`output_mode`	`Literal['predictions', 'probabilities', 'quantiles']`	"predictions" for point predictions only, "probabilities" for classification probabilities, "quantiles" for regression quantile estimates
`quantile_levels`	`list[float]`	Quantile levels — only used when task=regression and output_mode=quantiles

TabularResponse ¶

Bases: BaseResponse

Response contract for tabular foundation models.

Audio (ASR + TTS + audio generation)¶

sheaf.api.audio ¶

API contract for audio foundation models (Whisper, Bark, etc.).

WordTimestamp ¶

Bases: BaseModel

Word-level timestamp from Whisper when word_timestamps=True.

AudioSegment ¶

Bases: BaseModel

A single transcription segment from Whisper.

AudioRequest ¶

Bases: BaseRequest

Request contract for audio transcription / translation.

Audio is passed as base64-encoded bytes. Any format that ffmpeg can decode is accepted (wav, mp3, m4a, ogg, flac, etc.).

Attributes:

Name	Type	Description
`audio_b64`	`str`	Base64-encoded audio file bytes.
`language`	`str \| None`	BCP-47 language code (e.g. "en", "fr") or None for auto-detection. English-only model variants (e.g. "tiny.en") ignore this field.
`task`	`Literal['transcribe', 'translate']`	"transcribe" returns text in the source language. "translate" transcribes and translates to English.
`word_timestamps`	`bool`	If True, each segment includes word-level start/end times and per-word probabilities.
`temperature`	`float`	Sampling temperature for decoding. A tuple triggers fallback through successive values on failure.
`initial_prompt`	`str \| None`	Optional text prepended to the first window to condition the model (e.g. vocabulary hints, speaker context).
`vad_filter`	`bool`	Filter out silence before transcription using Silero VAD. Supported by faster-whisper; ignored by openai-whisper.
`beam_size`	`int`	Beam search width for decoding. Higher = more accurate, slower. Supported by faster-whisper; ignored by openai-whisper.

AudioResponse ¶

Bases: BaseResponse

Response contract for audio transcription / translation.

TTSRequest ¶

Bases: BaseRequest

Request contract for text-to-speech synthesis.

Attributes:

Name	Type	Description
`text`	`str`	Input text to synthesize.
`voice_preset`	`str \| None`	Optional speaker voice preset. Bark: "v2/en_speaker_6" etc. Kokoro: "af_heart", "af_bella", "am_adam", "bf_emma", "bm_george", etc. None uses the backend's default voice.
`speed`	`float`	Playback speed multiplier [0.5, 2.0]. Supported by Kokoro; ignored by Bark. Default 1.0 (normal speed).

TTSResponse ¶

Bases: BaseResponse

Response contract for text-to-speech synthesis.

sheaf.api.audio_generation ¶

API contract for audio generation models (MusicGen, etc.).

AudioGenerationRequest ¶

Bases: BaseRequest

Request contract for text-conditioned audio/music generation.

Attributes:

Name	Type	Description
`prompt`	`str`	Text description of the audio to generate (e.g. "happy jazz with piano and drums").
`duration_s`	`float`	Target duration in seconds. Converted to max_new_tokens via model.config.audio_encoder.frame_rate (50 tokens/sec for MusicGen).
`guidance_scale`	`float \| None`	Classifier-free guidance scale. Higher values steer generation closer to the prompt at the cost of diversity. Typical range: 1.0–10.0. None disables CFG.
`temperature`	`float`	Sampling temperature. Higher values increase randomness.
`top_k`	`int`	Top-k nucleus sampling parameter.

AudioGenerationResponse ¶

Bases: BaseResponse

Response contract for audio generation.

Vision¶

sheaf.api.embedding ¶

API contract for embedding / representation models (CLIP, DINOv2, etc.).

EmbeddingRequest ¶

Bases: BaseRequest

Request contract for embedding models.

Exactly one of texts or images_b64 must be provided per request. Both fields accept a batch — pass multiple items to embed them in a single forward pass.

Attributes:

Name	Type	Description
`texts`	`list[str] \| None`	List of strings to embed (text modality).
`images_b64`	`list[str] \| None`	List of base64-encoded image files to embed (vision modality). Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).
`normalize`	`bool`	If True (default), L2-normalize the output embeddings so that cosine similarity equals dot product.

EmbeddingResponse ¶

Bases: BaseResponse

Response contract for embedding models.

sheaf.api.segmentation ¶

API contract for image segmentation models (SAM2, etc.).

SegmentationRequest ¶

Bases: BaseRequest

Request contract for prompted image segmentation.

Exactly one image is segmented per request. At least one prompt must be provided — either point_coords (with matching point_labels) or box, or both.

Attributes:

Name	Type	Description
`image_b64`	`str`	Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).
`point_coords`	`list[list[float]] \| None`	List of [x, y] points (pixel coordinates).
`point_labels`	`list[int] \| None`	Foreground (1) / background (0) label for each point. Must have the same length as `point_coords`.
`box`	`list[float] \| None`	Bounding-box prompt as [x1, y1, x2, y2] in pixel coordinates.
`multimask_output`	`bool`	If True (default), return three candidate masks ranked by score. Set to False to get a single best mask.

SegmentationResponse ¶

Bases: BaseResponse

Response contract for image segmentation models.

Each mask is a base64-encoded flat uint8 byte array. To reconstruct::

import base64, numpy as np
mask = np.frombuffer(
    base64.b64decode(masks_b64[i]), dtype=np.uint8
).reshape(height, width).astype(bool)

sheaf.api.depth ¶

API contract for monocular depth estimation models (Depth Anything v2, etc.).

DepthRequest ¶

Bases: BaseRequest

Request contract for monocular depth estimation.

Attributes:

Name	Type	Description
`image_b64`	`str`	Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).
`normalize`	`bool`	If True (default), the depth map is linearly rescaled to [0, 1] where 0 = nearest and 1 = furthest point in the scene. If False, raw relative depth values from the model are returned.

DepthResponse ¶

Bases: BaseResponse

Response contract for monocular depth estimation.

The depth map is a base64-encoded flat float32 byte array at the model's native output resolution. To reconstruct::

import base64, numpy as np
depth = np.frombuffer(
    base64.b64decode(depth_b64), dtype=np.float32
).reshape(height, width)

If normalize=True was requested, values are in [0, 1]. min_depth and max_depth are the raw (pre-normalization) bounds, useful for recovering metric-relative scale.

sheaf.api.detection ¶

API contract for object detection models (DETR, RT-DETR, etc.).

DetectionRequest ¶

Bases: BaseRequest

Request contract for object detection.

Attributes:

Name	Type	Description
`image_b64`	`str`	Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).
`threshold`	`float`	Minimum confidence score for a detection to be included in the response. Defaults to 0.5.

DetectionResponse ¶

Bases: BaseResponse

Response contract for object detection.

Boxes are in absolute pixel coordinates: [x_min, y_min, x_max, y_max]. Lists are parallel — boxes[i], scores[i], and labels[i] all describe the same detection, sorted by descending confidence score.

sheaf.api.pose ¶

API contract for pose estimation models (ViTPose, etc.).

PoseRequest ¶

Bases: BaseRequest

Request contract for human pose estimation.

ViTPose is a top-down model: it estimates keypoints within person crops. If bboxes is provided, each box is used as a person crop. If omitted, the full image is treated as a single-person crop.

Attributes:

Name	Type	Description
`image_b64`	`str`	Base64-encoded image (JPEG, PNG, or any PIL-readable format).
`bboxes`	`list[list[float]] \| None`	Optional list of person bounding boxes in pixel coordinates, each `[x_min, y_min, x_max, y_max]`. If None, defaults to the full image as one person crop.
`threshold`	`float`	Minimum keypoint confidence score to include in the response. Keypoints below this threshold are still returned but flagged by a low score; filtering is left to the caller.

PoseResponse ¶

Bases: BaseResponse

Response contract for human pose estimation.

poses[i][j] is [x, y, score] for the j-th keypoint of the i-th detected person, in absolute pixel coordinates. keypoint_names[j] gives the semantic label for keypoint j (e.g. "nose", "left_eye").

Decode example::

for person in resp.poses:
    for (x, y, score), name in zip(person, resp.keypoint_names):
        print(f"{name}: ({x:.1f}, {y:.1f})  conf={score:.2f}")

sheaf.api.optical_flow ¶

API contract for optical flow models (RAFT, UniMatch, etc.).

OpticalFlowRequest ¶

Bases: BaseRequest

Request contract for optical flow estimation.

Accepts two consecutive video frames and returns the dense per-pixel displacement field between them.

Attributes:

Name	Type	Description
`frame1_b64`	`str`	Base64-encoded first frame (JPEG, PNG, or any PIL-readable format). Both frames must have the same spatial dimensions.
`frame2_b64`	`str`	Base64-encoded second frame.

OpticalFlowResponse ¶

Bases: BaseResponse

Response contract for optical flow estimation.

flow_b64 is a base64-encoded flat float32 byte array of shape (height, width, 2), where the last dimension is (dx, dy) — the horizontal and vertical pixel displacement from frame1 to frame2.

Decode example::

import base64, numpy as np
flow = np.frombuffer(
    base64.b64decode(flow_b64), dtype=np.float32
).reshape(height, width, 2)
dx, dy = flow[..., 0], flow[..., 1]

sheaf.api.video ¶

API contract for video understanding models (VideoMAE, TimeSformer, etc.).

VideoRequest ¶

Bases: BaseRequest

Request contract for video understanding models.

Frames are passed as a list of base64-encoded images (JPEG or PNG). The number of frames expected depends on the model:

VideoMAE-base: 16 frames (default, tubelet_size=2, 224×224)
TimeSformer: 8 frames (224×224)

Pass exactly the number the model was pretrained on, or the processor will pad/truncate automatically.

Attributes:

Name	Type	Description
`frames_b64`	`list[str]`	Ordered list of base64-encoded video frames.
`task`	`Literal['embedding', 'classification']`	"embedding" returns a single fixed-size vector per video clip; "classification" returns class labels and softmax scores.
`pooling`	`Literal['cls', 'mean']`	Pooling strategy for embeddings. "cls" — CLS token at position 0 of last_hidden_state (default). "mean" — Mean of all non-CLS patch tokens.
`normalize`	`bool`	If True (default), L2-normalize the output embedding. Ignored for classification.

VideoResponse ¶

Bases: BaseResponse

Response contract for video understanding models.

For task="embedding": embedding and dim are populated. For task="classification": labels and scores are populated.

Diffusion / multimodal generation¶

sheaf.api.diffusion ¶

API contract for diffusion image generation models (FLUX, etc.).

DiffusionRequest ¶

Bases: BaseRequest

Request contract for text-to-image diffusion models.

Attributes:

Name	Type	Description
`prompt`	`str`	Text description of the image to generate.
`negative_prompt`	`str`	Text description of what to avoid. Not supported by all models (FLUX.1-schnell ignores it).
`height`	`int`	Output image height in pixels. Must be a multiple of 8. Defaults to 1024.
`width`	`int`	Output image width in pixels. Must be a multiple of 8. Defaults to 1024.
`num_inference_steps`	`int`	Number of denoising steps. FLUX.1-schnell is optimized for 1–4 steps; FLUX.1-dev typically uses 20–50.
`guidance_scale`	`float`	Classifier-free guidance scale. Higher values steer generation closer to the prompt. FLUX.1-schnell uses 0.0 (guidance-distilled); FLUX.1-dev typically uses 3.5–7.0.
`seed`	`int \| None`	Random seed for reproducibility. None = random.
`adapters`	`list[str]`	Names of LoRA adapters to apply, in order of application. Each name must be registered on the deployment's `ModelSpec.lora.adapters`. Empty (default) means the deployment default adapter is used (or no LoRA if no default is set).
`adapter_weights`	`list[float] \| None`	Per-adapter weights, parallel to `adapters`. When `None` (default), the per-adapter `weight` from `LoRAConfig.adapters[name]` is used. When provided, the length must match `adapters`.

DiffusionResponse ¶

Bases: BaseResponse

Response contract for text-to-image diffusion models.

The generated image is returned as a base64-encoded PNG. To decode::

import base64
from PIL import Image
import io

img = Image.open(io.BytesIO(base64.b64decode(image_b64)))

Attributes:

Name	Type	Description
`image_b64`	`str`	Base64-encoded PNG image.
`height`	`int`	Output image height in pixels.
`width`	`int`	Output image width in pixels.
`seed`	`int`	Seed actually used for generation (useful when the request seed was None and you want to reproduce the result).

sheaf.api.multimodal_generation ¶

API contract for text+image-conditioned generation models (SDXL, etc.).

MultimodalGenerationRequest ¶

Bases: BaseRequest

Request contract for text+image-conditioned image generation.

Distinct from pure text-to-image (DiffusionRequest/FLUX): the input image conditions the generation. When mask_b64 is omitted the backend runs img2img (style/content transfer); when provided it runs inpainting.

Attributes:

Name	Type	Description
`prompt`	`str`	Text description guiding the generated image.
`image_b64`	`str`	Base64-encoded input image (JPEG, PNG, or any PIL-readable format). Acts as the conditioning source for img2img / inpainting.
`mask_b64`	`str \| None`	Optional base64-encoded mask image (same spatial size as `image_b64`). White pixels are re-generated; black pixels are preserved. Only used when the backend is in `inpaint` mode.
`strength`	`float`	How much to transform the input image. 0.0 = no change, 1.0 = ignore the original image entirely. Default 0.8.
`num_inference_steps`	`int`	Total denoising steps. Actual steps run = `round(strength * num_inference_steps)`. Default 50.
`guidance_scale`	`float`	Classifier-free guidance scale. Higher values steer generation closer to the prompt. Default 7.5.
`negative_prompt`	`str`	Text description of what to avoid in the output.
`seed`	`int \| None`	Random seed for reproducibility. None = random.
`adapters`	`list[str]`	Names of LoRA adapters to apply, in order. Each name must be registered on the deployment's `ModelSpec.lora.adapters`. Empty (default) means the deployment default is used (or no LoRA if no default is set).
`adapter_weights`	`list[float] \| None`	Per-adapter weights, parallel to `adapters`. When `None` (default), the per-adapter `weight` from `LoRAConfig.adapters[name]` is used. Length must match `adapters` when provided.

MultimodalGenerationResponse ¶

Bases: BaseResponse

Response contract for text+image-conditioned image generation.

The generated image is returned as a base64-encoded PNG. To decode::

import base64, io
from PIL import Image
img = Image.open(io.BytesIO(base64.b64decode(image_b64)))

sheaf.api.multimodal_embedding ¶

API contract for cross-modal embedding models (ImageBind, etc.).

MultimodalEmbeddingRequest ¶

Bases: BaseRequest

Request contract for cross-modal embedding models (e.g. ImageBind).

Exactly one modality field must be set per request. All items in the chosen field are embedded in a single forward pass and returned in the shared embedding space.

Modalities

texts: List of strings (text modality). images_b64: List of base64-encoded image files (vision modality). audios_b64: List of base64-encoded audio files (audio modality). depth_images_b64: List of base64-encoded depth images (depth modality). thermal_images_b64: List of base64-encoded thermal images (thermal modality).

For image/audio inputs any format the underlying model accepts is valid (JPEG/PNG for vision; WAV/MP3 for audio). The backend writes temporary files as needed — the model loaders read paths, not raw bytes.

Attributes:

Name	Type	Description
`normalize`	`bool`	If True (default), L2-normalize output embeddings so that cosine similarity equals dot product.

modality `property` ¶

modality: str

Return the canonical modality name for the active field.

n_items `property` ¶

n_items: int

Number of items in the active modality field.

MultimodalEmbeddingResponse ¶

Bases: BaseResponse

Response contract for cross-modal embedding models.

Molecular / genomics / materials¶

sheaf.api.molecular ¶

API contract for molecular / protein language models (ESM-3, etc.).

MolecularRequest ¶

Bases: BaseRequest

Request contract for protein sequence embedding.

A single request embeds a batch of protein sequences. Sequences should use standard single-letter amino acid codes (ACDEFGHIKLMNPQRSTVWY plus ambiguity codes accepted by ESM tokenizers).

Attributes:

Name	Type	Description
`sequences`	`list[str]`	List of amino acid sequences to embed.
`pooling`	`Literal['mean', 'cls']`	How to reduce the per-residue hidden states to a single vector per sequence. `"mean"` (default) — mean over residue positions (excludes BOS/EOS special tokens at positions 0 and -1). `"cls"` — BOS token at position 0 (analogous to [CLS] in BERT).
`normalize`	`bool`	If True (default), L2-normalize each embedding so that cosine similarity equals dot product.

MolecularResponse ¶

Bases: BaseResponse

Response contract for protein sequence embedding.

sheaf.api.genomic ¶

API contract for DNA/genomic foundation models (Nucleotide Transformer, etc.).

GenomicRequest ¶

Bases: BaseRequest

Request contract for DNA/RNA sequence embedding.

A single request embeds a batch of nucleotide sequences. Sequences should use standard nucleotide codes (A, C, G, T for DNA; A, C, G, U for RNA; N for unknown bases). All are accepted by Nucleotide Transformer tokenizers.

Attributes:

Name	Type	Description
`sequences`	`list[str]`	List of nucleotide sequences to embed.
`pooling`	`Literal['mean', 'cls']`	How to reduce per-token hidden states to a single vector. `"mean"` (default) — mean of non-special tokens (excludes CLS at position 0 and EOS/SEP at position -1). `"cls"` — CLS token at position 0 (analogous to [CLS] in BERT).
`normalize`	`bool`	If True (default), L2-normalize each embedding so that cosine similarity equals dot product.

GenomicResponse ¶

Bases: BaseResponse

Response contract for DNA/RNA sequence embedding.

sheaf.api.small_molecule ¶

API contract for small molecule / chemical foundation models (MolFormer, etc.).

SmallMoleculeRequest ¶

Bases: BaseRequest

Request contract for small molecule embedding.

A single request embeds a batch of chemical compounds given as SMILES strings. SMILES (Simplified Molecular-Input Line-Entry System) is the standard text representation of molecular structure.

Attributes:

Name	Type	Description
`smiles`	`list[str]`	List of SMILES strings to embed. Each string represents one molecule (e.g. "CC(=O)OC1=CC=CC=C1C(=O)O" for aspirin).
`pooling`	`Literal['mean', 'cls']`	How to reduce per-token hidden states to a fixed-size vector. `"mean"` (default) — attention-masked mean over all tokens, excluding padding. Best for molecular property prediction. `"cls"` — CLS token at position 0. Useful for models with a dedicated classification token.
`normalize`	`bool`	If True, L2-normalize each embedding (cosine similarity == dot product). Defaults to `False` — raw embeddings are more natural for regression tasks such as property prediction.

SmallMoleculeResponse ¶

Bases: BaseResponse

Response contract for small molecule embedding.

sheaf.api.materials ¶

API contract for materials / interatomic potential models (MACE-MP, etc.).

MaterialsRequest ¶

Bases: BaseRequest

Request contract for atomistic energy/force/stress prediction.

Describes a single atomic structure: a set of atoms at given positions, optionally in a periodic simulation cell. The model predicts the potential energy surface and its derivatives.

Attributes:

Name	Type	Description
`atomic_numbers`	`list[int]`	Atomic numbers (Z) for each atom. Length N.
`positions_b64`	`str`	Base64-encoded float32 array of shape (N, 3) giving Cartesian coordinates in Angstroms.
`cell`	`list[list[float]] \| None`	3x3 lattice vectors in Angstroms for periodic boundary conditions. Required when `pbc` is True.
`pbc`	`bool \| list[bool]`	Periodic boundary conditions. `False` (default) for isolated molecules/clusters; `True` or `[True, True, True]` for bulk crystals; `[True, True, False]` for slabs.
`compute_forces`	`bool`	If True (default), return forces in eV/Å.
`compute_stress`	`bool`	If True, return the stress tensor in eV/Å³ (Voigt). Only meaningful for periodic systems (`pbc=True`).

MaterialsResponse ¶

Bases: BaseResponse

Response contract for atomistic energy/force/stress prediction.

Earth / weather¶

sheaf.api.weather ¶

API contract for weather / atmospheric-state foundation models.

Supports GraphCast, Aurora, Pangu-Weather, and similar architectures.

Encoding convention¶

All array fields are base64-encoded little-endian float32 byte strings.

Surface variable shape: (n_lat, n_lon) Atmospheric var shape: (n_levels, n_lat, n_lon)

Encode: base64.b64encode(arr.astype(np.float32).tobytes()).decode() Decode: np.frombuffer(base64.b64decode(s), dtype=np.float32) .reshape(n_lat, n_lon) # surface .reshape(n_levels, n_lat, n_lon) # atmospheric

Grid conventions (GraphCast / ERA5)¶

lat: descending, e.g. [90.0, 89.75, …, -90.0] for 0.25° global
lon: ascending, e.g. [0.0, 0.25, …, 359.75]
pressure_levels: descending hPa, e.g. [1000, 925, 850, …, 1]
current_time: ISO-8601 string, e.g. "2023-01-01T06:00:00"

GraphCast requires two consecutive time steps (t-6h and t) as input, so both vars and prev_vars are required and must contain the same variable names. n_steps controls how many 6-hour steps are predicted autoregressively.

WeatherRequest ¶

Bases: BaseRequest

Request contract for atmospheric-state foundation models.

Attributes:

Name	Type	Description
`surface_vars`	`dict[str, str]`	Surface variable fields at time t. Keys are variable names (ERA5-style for GraphCast, e.g. "2m_temperature", "10m_u_component_of_wind"). Values are base64 float32 arrays of shape (n_lat, n_lon).
`atmospheric_vars`	`dict[str, str]`	Atmospheric (pressure-level) fields at time t. Values are base64 float32 arrays of shape (n_levels, n_lat, n_lon).
`prev_surface_vars`	`dict[str, str]`	Same variables at time t - step_hours (t-6h for GraphCast).
`prev_atmospheric_vars`	`dict[str, str]`	Same variables at time t - step_hours.
`lat`	`list[float]`	Latitude grid, length n_lat, descending degrees.
`lon`	`list[float]`	Longitude grid, length n_lon, ascending degrees.
`pressure_levels`	`list[int]`	Pressure levels in hPa, length n_levels, descending.
`current_time`	`str`	ISO-8601 timestamp for the current state (t).
`n_steps`	`int`	Number of autoregressive 6-hour steps to predict.

WeatherResponse ¶

Bases: BaseResponse

Response contract for atmospheric-state foundation models.

surface_forecasts[i] — dict of {var_name: base64_float32} for step i+1. Each array has shape (n_lat, n_lon). atmospheric_forecasts[i] — same for atmospheric (pressure-level) variables. Each array has shape (n_levels, n_lat, n_lon). forecast_times[i] — ISO-8601 timestamp for step i+1.

sheaf.api.satellite ¶

API contract for Earth observation / satellite imagery foundation models.

Supports Prithvi (IBM/NASA), Clay, SatMAE, and similar architectures.

Encoding convention¶

pixels_b64 is a base64-encoded little-endian float32 byte string.

Shape: (n_time, n_bands, height, width)

For single-time input set n_time=1. Values are typically surface reflectance in [0, 1] (after dividing sensor DN by 10 000 for Landsat/Sentinel-2) or raw DN if normalize=False.

Encode: base64.b64encode(arr.astype(np.float32).tobytes()).decode() Decode: np.frombuffer(base64.b64decode(s), dtype=np.float32) .reshape(n_time, n_bands, height, width)

Band names (examples)¶

HLS (Harmonized Landsat-Sentinel) 6-band subset used by Prithvi: ["blue", "green", "red", "nir08", "swir16", "swir22"]

Sentinel-2 L2A 10-band (used by Clay): ["coastal", "blue", "green", "red", "rededge1", "rededge2", "rededge3", "nir08", "nir09", "swir16", "swir22"]

Wavelengths (μm) for Clay (examples, Sentinel-2): [0.443, 0.490, 0.560, 0.665, 0.704, 0.740, 0.783, 0.842, 0.865, 1.610, 2.190]

SatelliteRequest ¶

Bases: BaseRequest

Request contract for Earth observation foundation models.

Attributes:

Name	Type	Description
`pixels_b64`	`str`	Base64 float32 pixel array of shape (n_time, n_bands, height, width).
`n_time`	`int`	Number of time steps in the input stack.
`n_bands`	`int`	Number of spectral bands.
`height`	`int`	Spatial height in pixels.
`width`	`int`	Spatial width in pixels.
`band_names`	`list[str]`	Human-readable band labels, length n_bands.
`wavelengths`	`list[float] \| None`	Center wavelengths in micrometers, length n_bands. Required by Clay; ignored by Prithvi.
`gsd`	`float`	Ground sample distance in metres (default 10 m for Sentinel-2).
`lat`	`float \| None`	Centre latitude in degrees (optional metadata).
`lon`	`float \| None`	Centre longitude in degrees (optional metadata).
`timestamps`	`list[str] \| None`	ISO-8601 timestamp per time step (optional metadata).
`pooling`	`Literal['mean', 'cls']`	"mean" pools all output tokens; "cls" uses the first token (CLS or register token).
`normalize`	`bool`	Apply the model's per-band mean/std normalization using statistics stored in the image processor config. Disable if you have already normalized your data.

SatelliteResponse ¶

Bases: BaseResponse

Response contract for Earth observation foundation models.

embedding — scene-level float vector of length dim. For multi-temporal input the tokens from all time steps are pooled together into a single vector. dim — embedding dimensionality. n_time — number of time steps from the request (passed through for bookkeeping).

LiDAR / point cloud¶

sheaf.api.point_cloud ¶

API contract for 3D point cloud models (PointNet, etc.).

PointCloudRequest ¶

Bases: BaseRequest

Request contract for 3D point cloud processing.

Point clouds are passed as base64-encoded flat float32 byte arrays of shape (n_points, 3) containing XYZ coordinates. Points are expected to be pre-normalised to a unit sphere centred at the origin (subtract centroid, divide by max radius).

Attributes:

Name	Type	Description
`points_b64`	`str`	Base64-encoded flat float32 byte array, shape `(n_points, 3)`. Decode with:: `pts = np.frombuffer( base64.b64decode(points_b64), dtype=np.float32 ).reshape(n_points, 3)`
`n_points`	`int`	Number of points in the cloud. Required to reshape the flat byte array. Typical values: 1024, 2048, 4096.
`task`	`Literal['embed', 'classify']`	"embed" — return the 1024-dim PointNet global feature vector. "classify" — return class label + per-class softmax scores.

PointCloudResponse ¶

Bases: BaseResponse

Response contract for 3D point cloud processing.

Exactly one of embedding or labels is populated, depending on the requested task.

For task="embed": embedding — 1024-dim global PointNet feature (L2-normalised).

For task="classify": label — top predicted class name (e.g. "airplane"). scores — per-class softmax probabilities, parallel to label_names. label_names — class names in score order (model's id2label mapping).

API contracts¶

Base + discriminated unions¶

sheaf.api.base ¶

sheaf.api.union ¶

Time series¶

sheaf.api.time_series ¶

FeatureRef ¶

TimeSeriesRequest ¶

n_variates property ¶

target_history property ¶

TimeSeriesResponse ¶

Tabular¶

sheaf.api.tabular ¶

TabularRequest ¶

TabularResponse ¶

Audio (ASR + TTS + audio generation)¶

sheaf.api.audio ¶

WordTimestamp ¶

AudioSegment ¶

AudioRequest ¶

AudioResponse ¶

TTSRequest ¶

TTSResponse ¶

sheaf.api.audio_generation ¶

AudioGenerationRequest ¶

AudioGenerationResponse ¶

Vision¶

sheaf.api.embedding ¶

EmbeddingRequest ¶

EmbeddingResponse ¶

sheaf.api.segmentation ¶

SegmentationRequest ¶

SegmentationResponse ¶

sheaf.api.depth ¶

DepthRequest ¶

DepthResponse ¶

sheaf.api.detection ¶

DetectionRequest ¶

DetectionResponse ¶

sheaf.api.pose ¶

PoseRequest ¶

PoseResponse ¶

sheaf.api.optical_flow ¶

OpticalFlowRequest ¶

OpticalFlowResponse ¶

sheaf.api.video ¶

VideoRequest ¶

VideoResponse ¶

Diffusion / multimodal generation¶

sheaf.api.diffusion ¶

DiffusionRequest ¶

DiffusionResponse ¶

sheaf.api.multimodal_generation ¶

MultimodalGenerationRequest ¶

MultimodalGenerationResponse ¶

Cross-modal embedding¶

sheaf.api.multimodal_embedding ¶

MultimodalEmbeddingRequest ¶

modality property ¶

n_items property ¶

MultimodalEmbeddingResponse ¶

Molecular / genomics / materials¶

sheaf.api.molecular ¶

MolecularRequest ¶

MolecularResponse ¶

sheaf.api.genomic ¶

GenomicRequest ¶

GenomicResponse ¶

sheaf.api.small_molecule ¶

SmallMoleculeRequest ¶

SmallMoleculeResponse ¶

sheaf.api.materials ¶

MaterialsRequest ¶

MaterialsResponse ¶

Earth / weather¶

sheaf.api.weather ¶

Encoding convention¶

Grid conventions (GraphCast / ERA5)¶

WeatherRequest ¶

WeatherResponse ¶

n_variates `property` ¶

target_history `property` ¶

modality `property` ¶

n_items `property` ¶