Skip to content

API contracts

Typed request / response schemas — one module per model type. Validation runs at the request boundary; backends receive validated objects.

Base + discriminated unions

sheaf.api.base

Base request/response contracts and model type registry.

sheaf.api.union

Discriminated unions over every supported request and response type.

Used by sheaf.server (FastAPI body parsing for /predict and /stream) and by sheaf.batch.runner (per-row validation before Ray Data map_batches). Kept in its own module so batch workloads can validate input rows without importing sheaf.server, which pulls in the full Ray Serve deployment surface.

AnyResponse mirrors AnyRequest and is what sheaf.client.SheafClient decodes predict() responses into so callers get the correctly-typed response object back.

Time series

sheaf.api.time_series

API contract for time series foundation models (Chronos2, TimesFM, etc.).

FeatureRef

Bases: BaseModel

Reference to a Feast online feature used as model input history.

The referenced feature must store the complete input sequence as a list[float] (or list[list[float]] for multivariate history). FeastResolver calls get_online_features with these parameters and returns the resolved list as the history field for the backend.

Example::

feature_ref=FeatureRef(
    feature_view="asset_prices",
    feature_name="close_history_30d",
    entity_key="ticker",
    entity_value="AAPL",
)

TimeSeriesRequest

Bases: BaseRequest

Request contract for time series foundation models.

Either history (raw values) or feature_ref (Feast entity reference) must be provided, not both. When feature_ref is given, the serving layer resolves it via FeastResolver before passing the request to the backend — the backend always sees history populated.

Univariate: history=[1.0, 2.0, 3.0, ...] Multivariate: history=[[1.0, 10.0], [2.0, 11.0], ...] (shape: [time, variates]) target_index selects which variate to forecast (default 0).

n_variates property

n_variates: int

Number of variates in history (1 for univariate).

target_history property

target_history: list[float]

Univariate target series extracted from (possibly multivariate) history.

For univariate history, returns history as-is. For multivariate history (shape [time, variates]), extracts the variate at target_index.

TimeSeriesResponse

Bases: BaseResponse

Response contract for time series foundation models.

Tabular

sheaf.api.tabular

API contract for tabular foundation models (TabPFN, etc.).

TabularRequest

Bases: BaseRequest

Request contract for tabular foundation models.

TabPFN is an in-context learner: context_X/context_y are the "training" examples passed at inference time. query_X contains the rows to predict. No separate training step — everything happens in a single forward pass.

Attributes:

Name Type Description
context_X list[list[float]]

Feature matrix for in-context examples, shape [n_context, n_features]

context_y list[float | int]

Labels for in-context examples, shape [n_context]

query_X list[list[float]]

Feature rows to predict, shape [n_query, n_features]

task Literal['classification', 'regression']

"classification" or "regression"

feature_names list[str] | None

Optional column names — used for logging and debugging

categorical_feature_indices list[int] | None

Indices of categorical columns

output_mode Literal['predictions', 'probabilities', 'quantiles']

"predictions" for point predictions only, "probabilities" for classification probabilities, "quantiles" for regression quantile estimates

quantile_levels list[float]

Quantile levels — only used when task=regression and output_mode=quantiles

TabularResponse

Bases: BaseResponse

Response contract for tabular foundation models.

Audio (ASR + TTS + audio generation)

sheaf.api.audio

API contract for audio foundation models (Whisper, Bark, etc.).

WordTimestamp

Bases: BaseModel

Word-level timestamp from Whisper when word_timestamps=True.

AudioSegment

Bases: BaseModel

A single transcription segment from Whisper.

AudioRequest

Bases: BaseRequest

Request contract for audio transcription / translation.

Audio is passed as base64-encoded bytes. Any format that ffmpeg can decode is accepted (wav, mp3, m4a, ogg, flac, etc.).

Attributes:

Name Type Description
audio_b64 str

Base64-encoded audio file bytes.

language str | None

BCP-47 language code (e.g. "en", "fr") or None for auto-detection. English-only model variants (e.g. "tiny.en") ignore this field.

task Literal['transcribe', 'translate']

"transcribe" returns text in the source language. "translate" transcribes and translates to English.

word_timestamps bool

If True, each segment includes word-level start/end times and per-word probabilities.

temperature float

Sampling temperature for decoding. A tuple triggers fallback through successive values on failure.

initial_prompt str | None

Optional text prepended to the first window to condition the model (e.g. vocabulary hints, speaker context).

vad_filter bool

Filter out silence before transcription using Silero VAD. Supported by faster-whisper; ignored by openai-whisper.

beam_size int

Beam search width for decoding. Higher = more accurate, slower. Supported by faster-whisper; ignored by openai-whisper.

AudioResponse

Bases: BaseResponse

Response contract for audio transcription / translation.

TTSRequest

Bases: BaseRequest

Request contract for text-to-speech synthesis.

Attributes:

Name Type Description
text str

Input text to synthesize.

voice_preset str | None

Optional speaker voice preset. Bark: "v2/en_speaker_6" etc. Kokoro: "af_heart", "af_bella", "am_adam", "bf_emma", "bm_george", etc. None uses the backend's default voice.

speed float

Playback speed multiplier [0.5, 2.0]. Supported by Kokoro; ignored by Bark. Default 1.0 (normal speed).

TTSResponse

Bases: BaseResponse

Response contract for text-to-speech synthesis.

sheaf.api.audio_generation

API contract for audio generation models (MusicGen, etc.).

AudioGenerationRequest

Bases: BaseRequest

Request contract for text-conditioned audio/music generation.

Attributes:

Name Type Description
prompt str

Text description of the audio to generate (e.g. "happy jazz with piano and drums").

duration_s float

Target duration in seconds. Converted to max_new_tokens via model.config.audio_encoder.frame_rate (50 tokens/sec for MusicGen).

guidance_scale float | None

Classifier-free guidance scale. Higher values steer generation closer to the prompt at the cost of diversity. Typical range: 1.0–10.0. None disables CFG.

temperature float

Sampling temperature. Higher values increase randomness.

top_k int

Top-k nucleus sampling parameter.

AudioGenerationResponse

Bases: BaseResponse

Response contract for audio generation.

Vision

sheaf.api.embedding

API contract for embedding / representation models (CLIP, DINOv2, etc.).

EmbeddingRequest

Bases: BaseRequest

Request contract for embedding models.

Exactly one of texts or images_b64 must be provided per request. Both fields accept a batch — pass multiple items to embed them in a single forward pass.

Attributes:

Name Type Description
texts list[str] | None

List of strings to embed (text modality).

images_b64 list[str] | None

List of base64-encoded image files to embed (vision modality). Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).

normalize bool

If True (default), L2-normalize the output embeddings so that cosine similarity equals dot product.

EmbeddingResponse

Bases: BaseResponse

Response contract for embedding models.

sheaf.api.segmentation

API contract for image segmentation models (SAM2, etc.).

SegmentationRequest

Bases: BaseRequest

Request contract for prompted image segmentation.

Exactly one image is segmented per request. At least one prompt must be provided — either point_coords (with matching point_labels) or box, or both.

Attributes:

Name Type Description
image_b64 str

Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).

point_coords list[list[float]] | None

List of [x, y] points (pixel coordinates).

point_labels list[int] | None

Foreground (1) / background (0) label for each point. Must have the same length as point_coords.

box list[float] | None

Bounding-box prompt as [x1, y1, x2, y2] in pixel coordinates.

multimask_output bool

If True (default), return three candidate masks ranked by score. Set to False to get a single best mask.

SegmentationResponse

Bases: BaseResponse

Response contract for image segmentation models.

Each mask is a base64-encoded flat uint8 byte array. To reconstruct::

import base64, numpy as np
mask = np.frombuffer(
    base64.b64decode(masks_b64[i]), dtype=np.uint8
).reshape(height, width).astype(bool)

sheaf.api.depth

API contract for monocular depth estimation models (Depth Anything v2, etc.).

DepthRequest

Bases: BaseRequest

Request contract for monocular depth estimation.

Attributes:

Name Type Description
image_b64 str

Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).

normalize bool

If True (default), the depth map is linearly rescaled to [0, 1] where 0 = nearest and 1 = furthest point in the scene. If False, raw relative depth values from the model are returned.

DepthResponse

Bases: BaseResponse

Response contract for monocular depth estimation.

The depth map is a base64-encoded flat float32 byte array at the model's native output resolution. To reconstruct::

import base64, numpy as np
depth = np.frombuffer(
    base64.b64decode(depth_b64), dtype=np.float32
).reshape(height, width)

If normalize=True was requested, values are in [0, 1]. min_depth and max_depth are the raw (pre-normalization) bounds, useful for recovering metric-relative scale.

sheaf.api.detection

API contract for object detection models (DETR, RT-DETR, etc.).

DetectionRequest

Bases: BaseRequest

Request contract for object detection.

Attributes:

Name Type Description
image_b64 str

Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.).

threshold float

Minimum confidence score for a detection to be included in the response. Defaults to 0.5.

DetectionResponse

Bases: BaseResponse

Response contract for object detection.

Boxes are in absolute pixel coordinates: [x_min, y_min, x_max, y_max]. Lists are parallel — boxes[i], scores[i], and labels[i] all describe the same detection, sorted by descending confidence score.

sheaf.api.pose

API contract for pose estimation models (ViTPose, etc.).

PoseRequest

Bases: BaseRequest

Request contract for human pose estimation.

ViTPose is a top-down model: it estimates keypoints within person crops. If bboxes is provided, each box is used as a person crop. If omitted, the full image is treated as a single-person crop.

Attributes:

Name Type Description
image_b64 str

Base64-encoded image (JPEG, PNG, or any PIL-readable format).

bboxes list[list[float]] | None

Optional list of person bounding boxes in pixel coordinates, each [x_min, y_min, x_max, y_max]. If None, defaults to the full image as one person crop.

threshold float

Minimum keypoint confidence score to include in the response. Keypoints below this threshold are still returned but flagged by a low score; filtering is left to the caller.

PoseResponse

Bases: BaseResponse

Response contract for human pose estimation.

poses[i][j] is [x, y, score] for the j-th keypoint of the i-th detected person, in absolute pixel coordinates. keypoint_names[j] gives the semantic label for keypoint j (e.g. "nose", "left_eye").

Decode example::

for person in resp.poses:
    for (x, y, score), name in zip(person, resp.keypoint_names):
        print(f"{name}: ({x:.1f}, {y:.1f})  conf={score:.2f}")

sheaf.api.optical_flow

API contract for optical flow models (RAFT, UniMatch, etc.).

OpticalFlowRequest

Bases: BaseRequest

Request contract for optical flow estimation.

Accepts two consecutive video frames and returns the dense per-pixel displacement field between them.

Attributes:

Name Type Description
frame1_b64 str

Base64-encoded first frame (JPEG, PNG, or any PIL-readable format). Both frames must have the same spatial dimensions.

frame2_b64 str

Base64-encoded second frame.

OpticalFlowResponse

Bases: BaseResponse

Response contract for optical flow estimation.

flow_b64 is a base64-encoded flat float32 byte array of shape (height, width, 2), where the last dimension is (dx, dy) — the horizontal and vertical pixel displacement from frame1 to frame2.

Decode example::

import base64, numpy as np
flow = np.frombuffer(
    base64.b64decode(flow_b64), dtype=np.float32
).reshape(height, width, 2)
dx, dy = flow[..., 0], flow[..., 1]

sheaf.api.video

API contract for video understanding models (VideoMAE, TimeSformer, etc.).

VideoRequest

Bases: BaseRequest

Request contract for video understanding models.

Frames are passed as a list of base64-encoded images (JPEG or PNG). The number of frames expected depends on the model:

  • VideoMAE-base: 16 frames (default, tubelet_size=2, 224×224)
  • TimeSformer: 8 frames (224×224)

Pass exactly the number the model was pretrained on, or the processor will pad/truncate automatically.

Attributes:

Name Type Description
frames_b64 list[str]

Ordered list of base64-encoded video frames.

task Literal['embedding', 'classification']

"embedding" returns a single fixed-size vector per video clip; "classification" returns class labels and softmax scores.

pooling Literal['cls', 'mean']

Pooling strategy for embeddings. "cls" — CLS token at position 0 of last_hidden_state (default). "mean" — Mean of all non-CLS patch tokens.

normalize bool

If True (default), L2-normalize the output embedding. Ignored for classification.

VideoResponse

Bases: BaseResponse

Response contract for video understanding models.

For task="embedding": embedding and dim are populated. For task="classification": labels and scores are populated.

Diffusion / multimodal generation

sheaf.api.diffusion

API contract for diffusion image generation models (FLUX, etc.).

DiffusionRequest

Bases: BaseRequest

Request contract for text-to-image diffusion models.

Attributes:

Name Type Description
prompt str

Text description of the image to generate.

negative_prompt str

Text description of what to avoid. Not supported by all models (FLUX.1-schnell ignores it).

height int

Output image height in pixels. Must be a multiple of 8. Defaults to 1024.

width int

Output image width in pixels. Must be a multiple of 8. Defaults to 1024.

num_inference_steps int

Number of denoising steps. FLUX.1-schnell is optimized for 1–4 steps; FLUX.1-dev typically uses 20–50.

guidance_scale float

Classifier-free guidance scale. Higher values steer generation closer to the prompt. FLUX.1-schnell uses 0.0 (guidance-distilled); FLUX.1-dev typically uses 3.5–7.0.

seed int | None

Random seed for reproducibility. None = random.

adapters list[str]

Names of LoRA adapters to apply, in order of application. Each name must be registered on the deployment's ModelSpec.lora.adapters. Empty (default) means the deployment default adapter is used (or no LoRA if no default is set).

adapter_weights list[float] | None

Per-adapter weights, parallel to adapters. When None (default), the per-adapter weight from LoRAConfig.adapters[name] is used. When provided, the length must match adapters.

DiffusionResponse

Bases: BaseResponse

Response contract for text-to-image diffusion models.

The generated image is returned as a base64-encoded PNG. To decode::

import base64
from PIL import Image
import io

img = Image.open(io.BytesIO(base64.b64decode(image_b64)))

Attributes:

Name Type Description
image_b64 str

Base64-encoded PNG image.

height int

Output image height in pixels.

width int

Output image width in pixels.

seed int

Seed actually used for generation (useful when the request seed was None and you want to reproduce the result).

sheaf.api.multimodal_generation

API contract for text+image-conditioned generation models (SDXL, etc.).

MultimodalGenerationRequest

Bases: BaseRequest

Request contract for text+image-conditioned image generation.

Distinct from pure text-to-image (DiffusionRequest/FLUX): the input image conditions the generation. When mask_b64 is omitted the backend runs img2img (style/content transfer); when provided it runs inpainting.

Attributes:

Name Type Description
prompt str

Text description guiding the generated image.

image_b64 str

Base64-encoded input image (JPEG, PNG, or any PIL-readable format). Acts as the conditioning source for img2img / inpainting.

mask_b64 str | None

Optional base64-encoded mask image (same spatial size as image_b64). White pixels are re-generated; black pixels are preserved. Only used when the backend is in inpaint mode.

strength float

How much to transform the input image. 0.0 = no change, 1.0 = ignore the original image entirely. Default 0.8.

num_inference_steps int

Total denoising steps. Actual steps run = round(strength * num_inference_steps). Default 50.

guidance_scale float

Classifier-free guidance scale. Higher values steer generation closer to the prompt. Default 7.5.

negative_prompt str

Text description of what to avoid in the output.

seed int | None

Random seed for reproducibility. None = random.

adapters list[str]

Names of LoRA adapters to apply, in order. Each name must be registered on the deployment's ModelSpec.lora.adapters. Empty (default) means the deployment default is used (or no LoRA if no default is set).

adapter_weights list[float] | None

Per-adapter weights, parallel to adapters. When None (default), the per-adapter weight from LoRAConfig.adapters[name] is used. Length must match adapters when provided.

MultimodalGenerationResponse

Bases: BaseResponse

Response contract for text+image-conditioned image generation.

The generated image is returned as a base64-encoded PNG. To decode::

import base64, io
from PIL import Image
img = Image.open(io.BytesIO(base64.b64decode(image_b64)))

Cross-modal embedding

sheaf.api.multimodal_embedding

API contract for cross-modal embedding models (ImageBind, etc.).

MultimodalEmbeddingRequest

Bases: BaseRequest

Request contract for cross-modal embedding models (e.g. ImageBind).

Exactly one modality field must be set per request. All items in the chosen field are embedded in a single forward pass and returned in the shared embedding space.

Modalities

texts: List of strings (text modality). images_b64: List of base64-encoded image files (vision modality). audios_b64: List of base64-encoded audio files (audio modality). depth_images_b64: List of base64-encoded depth images (depth modality). thermal_images_b64: List of base64-encoded thermal images (thermal modality).

For image/audio inputs any format the underlying model accepts is valid (JPEG/PNG for vision; WAV/MP3 for audio). The backend writes temporary files as needed — the model loaders read paths, not raw bytes.

Attributes:

Name Type Description
normalize bool

If True (default), L2-normalize output embeddings so that cosine similarity equals dot product.

modality property

modality: str

Return the canonical modality name for the active field.

n_items property

n_items: int

Number of items in the active modality field.

MultimodalEmbeddingResponse

Bases: BaseResponse

Response contract for cross-modal embedding models.

Molecular / genomics / materials

sheaf.api.molecular

API contract for molecular / protein language models (ESM-3, etc.).

MolecularRequest

Bases: BaseRequest

Request contract for protein sequence embedding.

A single request embeds a batch of protein sequences. Sequences should use standard single-letter amino acid codes (ACDEFGHIKLMNPQRSTVWY plus ambiguity codes accepted by ESM tokenizers).

Attributes:

Name Type Description
sequences list[str]

List of amino acid sequences to embed.

pooling Literal['mean', 'cls']

How to reduce the per-residue hidden states to a single vector per sequence. "mean" (default) — mean over residue positions (excludes BOS/EOS special tokens at positions 0 and -1). "cls" — BOS token at position 0 (analogous to [CLS] in BERT).

normalize bool

If True (default), L2-normalize each embedding so that cosine similarity equals dot product.

MolecularResponse

Bases: BaseResponse

Response contract for protein sequence embedding.

sheaf.api.genomic

API contract for DNA/genomic foundation models (Nucleotide Transformer, etc.).

GenomicRequest

Bases: BaseRequest

Request contract for DNA/RNA sequence embedding.

A single request embeds a batch of nucleotide sequences. Sequences should use standard nucleotide codes (A, C, G, T for DNA; A, C, G, U for RNA; N for unknown bases). All are accepted by Nucleotide Transformer tokenizers.

Attributes:

Name Type Description
sequences list[str]

List of nucleotide sequences to embed.

pooling Literal['mean', 'cls']

How to reduce per-token hidden states to a single vector. "mean" (default) — mean of non-special tokens (excludes CLS at position 0 and EOS/SEP at position -1). "cls" — CLS token at position 0 (analogous to [CLS] in BERT).

normalize bool

If True (default), L2-normalize each embedding so that cosine similarity equals dot product.

GenomicResponse

Bases: BaseResponse

Response contract for DNA/RNA sequence embedding.

sheaf.api.small_molecule

API contract for small molecule / chemical foundation models (MolFormer, etc.).

SmallMoleculeRequest

Bases: BaseRequest

Request contract for small molecule embedding.

A single request embeds a batch of chemical compounds given as SMILES strings. SMILES (Simplified Molecular-Input Line-Entry System) is the standard text representation of molecular structure.

Attributes:

Name Type Description
smiles list[str]

List of SMILES strings to embed. Each string represents one molecule (e.g. "CC(=O)OC1=CC=CC=C1C(=O)O" for aspirin).

pooling Literal['mean', 'cls']

How to reduce per-token hidden states to a fixed-size vector. "mean" (default) — attention-masked mean over all tokens, excluding padding. Best for molecular property prediction. "cls" — CLS token at position 0. Useful for models with a dedicated classification token.

normalize bool

If True, L2-normalize each embedding (cosine similarity == dot product). Defaults to False — raw embeddings are more natural for regression tasks such as property prediction.

SmallMoleculeResponse

Bases: BaseResponse

Response contract for small molecule embedding.

sheaf.api.materials

API contract for materials / interatomic potential models (MACE-MP, etc.).

MaterialsRequest

Bases: BaseRequest

Request contract for atomistic energy/force/stress prediction.

Describes a single atomic structure: a set of atoms at given positions, optionally in a periodic simulation cell. The model predicts the potential energy surface and its derivatives.

Attributes:

Name Type Description
atomic_numbers list[int]

Atomic numbers (Z) for each atom. Length N.

positions_b64 str

Base64-encoded float32 array of shape (N, 3) giving Cartesian coordinates in Angstroms.

cell list[list[float]] | None

3x3 lattice vectors in Angstroms for periodic boundary conditions. Required when pbc is True.

pbc bool | list[bool]

Periodic boundary conditions. False (default) for isolated molecules/clusters; True or [True, True, True] for bulk crystals; [True, True, False] for slabs.

compute_forces bool

If True (default), return forces in eV/Å.

compute_stress bool

If True, return the stress tensor in eV/ų (Voigt). Only meaningful for periodic systems (pbc=True).

MaterialsResponse

Bases: BaseResponse

Response contract for atomistic energy/force/stress prediction.

Earth / weather

sheaf.api.weather

API contract for weather / atmospheric-state foundation models.

Supports GraphCast, Aurora, Pangu-Weather, and similar architectures.

Encoding convention

All array fields are base64-encoded little-endian float32 byte strings.

Surface variable shape: (n_lat, n_lon) Atmospheric var shape: (n_levels, n_lat, n_lon)

Encode: base64.b64encode(arr.astype(np.float32).tobytes()).decode() Decode: np.frombuffer(base64.b64decode(s), dtype=np.float32) .reshape(n_lat, n_lon) # surface .reshape(n_levels, n_lat, n_lon) # atmospheric

Grid conventions (GraphCast / ERA5)

  • lat: descending, e.g. [90.0, 89.75, …, -90.0] for 0.25° global
  • lon: ascending, e.g. [0.0, 0.25, …, 359.75]
  • pressure_levels: descending hPa, e.g. [1000, 925, 850, …, 1]
  • current_time: ISO-8601 string, e.g. "2023-01-01T06:00:00"

GraphCast requires two consecutive time steps (t-6h and t) as input, so both vars and prev_vars are required and must contain the same variable names. n_steps controls how many 6-hour steps are predicted autoregressively.

WeatherRequest

Bases: BaseRequest

Request contract for atmospheric-state foundation models.

Attributes:

Name Type Description
surface_vars dict[str, str]

Surface variable fields at time t. Keys are variable names (ERA5-style for GraphCast, e.g. "2m_temperature", "10m_u_component_of_wind"). Values are base64 float32 arrays of shape (n_lat, n_lon).

atmospheric_vars dict[str, str]

Atmospheric (pressure-level) fields at time t. Values are base64 float32 arrays of shape (n_levels, n_lat, n_lon).

prev_surface_vars dict[str, str]

Same variables at time t - step_hours (t-6h for GraphCast).

prev_atmospheric_vars dict[str, str]

Same variables at time t - step_hours.

lat list[float]

Latitude grid, length n_lat, descending degrees.

lon list[float]

Longitude grid, length n_lon, ascending degrees.

pressure_levels list[int]

Pressure levels in hPa, length n_levels, descending.

current_time str

ISO-8601 timestamp for the current state (t).

n_steps int

Number of autoregressive 6-hour steps to predict.

WeatherResponse

Bases: BaseResponse

Response contract for atmospheric-state foundation models.

surface_forecasts[i] — dict of {var_name: base64_float32} for step i+1. Each array has shape (n_lat, n_lon). atmospheric_forecasts[i] — same for atmospheric (pressure-level) variables. Each array has shape (n_levels, n_lat, n_lon). forecast_times[i] — ISO-8601 timestamp for step i+1.

sheaf.api.satellite

API contract for Earth observation / satellite imagery foundation models.

Supports Prithvi (IBM/NASA), Clay, SatMAE, and similar architectures.

Encoding convention

pixels_b64 is a base64-encoded little-endian float32 byte string.

Shape: (n_time, n_bands, height, width)

For single-time input set n_time=1. Values are typically surface reflectance in [0, 1] (after dividing sensor DN by 10 000 for Landsat/Sentinel-2) or raw DN if normalize=False.

Encode: base64.b64encode(arr.astype(np.float32).tobytes()).decode() Decode: np.frombuffer(base64.b64decode(s), dtype=np.float32) .reshape(n_time, n_bands, height, width)

Band names (examples)

HLS (Harmonized Landsat-Sentinel) 6-band subset used by Prithvi: ["blue", "green", "red", "nir08", "swir16", "swir22"]

Sentinel-2 L2A 10-band (used by Clay): ["coastal", "blue", "green", "red", "rededge1", "rededge2", "rededge3", "nir08", "nir09", "swir16", "swir22"]

Wavelengths (μm) for Clay (examples, Sentinel-2): [0.443, 0.490, 0.560, 0.665, 0.704, 0.740, 0.783, 0.842, 0.865, 1.610, 2.190]

SatelliteRequest

Bases: BaseRequest

Request contract for Earth observation foundation models.

Attributes:

Name Type Description
pixels_b64 str

Base64 float32 pixel array of shape (n_time, n_bands, height, width).

n_time int

Number of time steps in the input stack.

n_bands int

Number of spectral bands.

height int

Spatial height in pixels.

width int

Spatial width in pixels.

band_names list[str]

Human-readable band labels, length n_bands.

wavelengths list[float] | None

Center wavelengths in micrometers, length n_bands. Required by Clay; ignored by Prithvi.

gsd float

Ground sample distance in metres (default 10 m for Sentinel-2).

lat float | None

Centre latitude in degrees (optional metadata).

lon float | None

Centre longitude in degrees (optional metadata).

timestamps list[str] | None

ISO-8601 timestamp per time step (optional metadata).

pooling Literal['mean', 'cls']

"mean" pools all output tokens; "cls" uses the first token (CLS or register token).

normalize bool

Apply the model's per-band mean/std normalization using statistics stored in the image processor config. Disable if you have already normalized your data.

SatelliteResponse

Bases: BaseResponse

Response contract for Earth observation foundation models.

embedding — scene-level float vector of length dim. For multi-temporal input the tokens from all time steps are pooled together into a single vector. dim — embedding dimensionality. n_time — number of time steps from the request (passed through for bookkeeping).

LiDAR / point cloud

sheaf.api.point_cloud

API contract for 3D point cloud models (PointNet, etc.).

PointCloudRequest

Bases: BaseRequest

Request contract for 3D point cloud processing.

Point clouds are passed as base64-encoded flat float32 byte arrays of shape (n_points, 3) containing XYZ coordinates. Points are expected to be pre-normalised to a unit sphere centred at the origin (subtract centroid, divide by max radius).

Attributes:

Name Type Description
points_b64 str

Base64-encoded flat float32 byte array, shape (n_points, 3). Decode with::

pts = np.frombuffer(
    base64.b64decode(points_b64), dtype=np.float32
).reshape(n_points, 3)
n_points int

Number of points in the cloud. Required to reshape the flat byte array. Typical values: 1024, 2048, 4096.

task Literal['embed', 'classify']

"embed" — return the 1024-dim PointNet global feature vector. "classify" — return class label + per-class softmax scores.

PointCloudResponse

Bases: BaseResponse

Response contract for 3D point cloud processing.

Exactly one of embedding or labels is populated, depending on the requested task.

For task="embed": embedding — 1024-dim global PointNet feature (L2-normalised).

For task="classify": label — top predicted class name (e.g. "airplane"). scores — per-class softmax probabilities, parallel to label_names. label_names — class names in score order (model's id2label mapping).