API contracts¶
Typed request / response schemas — one module per model type. Validation runs at the request boundary; backends receive validated objects.
Base + discriminated unions¶
sheaf.api.base ¶
Base request/response contracts and model type registry.
sheaf.api.union ¶
Discriminated unions over every supported request and response type.
Used by sheaf.server (FastAPI body parsing for /predict and /stream) and by
sheaf.batch.runner (per-row validation before Ray Data map_batches). Kept in
its own module so batch workloads can validate input rows without importing
sheaf.server, which pulls in the full Ray Serve deployment surface.
AnyResponse mirrors AnyRequest and is what sheaf.client.SheafClient decodes
predict() responses into so callers get the correctly-typed response object back.
Time series¶
sheaf.api.time_series ¶
API contract for time series foundation models (Chronos2, TimesFM, etc.).
FeatureRef ¶
Bases: BaseModel
Reference to a Feast online feature used as model input history.
The referenced feature must store the complete input sequence as a
list[float] (or list[list[float]] for multivariate history).
FeastResolver calls get_online_features with these parameters and
returns the resolved list as the history field for the backend.
Example::
feature_ref=FeatureRef(
feature_view="asset_prices",
feature_name="close_history_30d",
entity_key="ticker",
entity_value="AAPL",
)
TimeSeriesRequest ¶
Bases: BaseRequest
Request contract for time series foundation models.
Either history (raw values) or feature_ref (Feast entity reference)
must be provided, not both. When feature_ref is given, the serving
layer resolves it via FeastResolver before passing the request to the
backend — the backend always sees history populated.
Univariate: history=[1.0, 2.0, 3.0, ...] Multivariate: history=[[1.0, 10.0], [2.0, 11.0], ...] (shape: [time, variates]) target_index selects which variate to forecast (default 0).
target_history
property
¶
Univariate target series extracted from (possibly multivariate) history.
For univariate history, returns history as-is. For multivariate history (shape [time, variates]), extracts the variate at target_index.
TimeSeriesResponse ¶
Bases: BaseResponse
Response contract for time series foundation models.
Tabular¶
sheaf.api.tabular ¶
API contract for tabular foundation models (TabPFN, etc.).
TabularRequest ¶
Bases: BaseRequest
Request contract for tabular foundation models.
TabPFN is an in-context learner: context_X/context_y are the "training" examples passed at inference time. query_X contains the rows to predict. No separate training step — everything happens in a single forward pass.
Attributes:
| Name | Type | Description |
|---|---|---|
context_X |
list[list[float]]
|
Feature matrix for in-context examples, shape [n_context, n_features] |
context_y |
list[float | int]
|
Labels for in-context examples, shape [n_context] |
query_X |
list[list[float]]
|
Feature rows to predict, shape [n_query, n_features] |
task |
Literal['classification', 'regression']
|
"classification" or "regression" |
feature_names |
list[str] | None
|
Optional column names — used for logging and debugging |
categorical_feature_indices |
list[int] | None
|
Indices of categorical columns |
output_mode |
Literal['predictions', 'probabilities', 'quantiles']
|
"predictions" for point predictions only, "probabilities" for classification probabilities, "quantiles" for regression quantile estimates |
quantile_levels |
list[float]
|
Quantile levels — only used when task=regression and output_mode=quantiles |
TabularResponse ¶
Bases: BaseResponse
Response contract for tabular foundation models.
Audio (ASR + TTS + audio generation)¶
sheaf.api.audio ¶
API contract for audio foundation models (Whisper, Bark, etc.).
WordTimestamp ¶
Bases: BaseModel
Word-level timestamp from Whisper when word_timestamps=True.
AudioSegment ¶
Bases: BaseModel
A single transcription segment from Whisper.
AudioRequest ¶
Bases: BaseRequest
Request contract for audio transcription / translation.
Audio is passed as base64-encoded bytes. Any format that ffmpeg can decode is accepted (wav, mp3, m4a, ogg, flac, etc.).
Attributes:
| Name | Type | Description |
|---|---|---|
audio_b64 |
str
|
Base64-encoded audio file bytes. |
language |
str | None
|
BCP-47 language code (e.g. "en", "fr") or None for auto-detection. English-only model variants (e.g. "tiny.en") ignore this field. |
task |
Literal['transcribe', 'translate']
|
"transcribe" returns text in the source language. "translate" transcribes and translates to English. |
word_timestamps |
bool
|
If True, each segment includes word-level start/end times and per-word probabilities. |
temperature |
float
|
Sampling temperature for decoding. A tuple triggers fallback through successive values on failure. |
initial_prompt |
str | None
|
Optional text prepended to the first window to condition the model (e.g. vocabulary hints, speaker context). |
vad_filter |
bool
|
Filter out silence before transcription using Silero VAD. Supported by faster-whisper; ignored by openai-whisper. |
beam_size |
int
|
Beam search width for decoding. Higher = more accurate, slower. Supported by faster-whisper; ignored by openai-whisper. |
AudioResponse ¶
Bases: BaseResponse
Response contract for audio transcription / translation.
TTSRequest ¶
Bases: BaseRequest
Request contract for text-to-speech synthesis.
Attributes:
| Name | Type | Description |
|---|---|---|
text |
str
|
Input text to synthesize. |
voice_preset |
str | None
|
Optional speaker voice preset. Bark: "v2/en_speaker_6" etc. Kokoro: "af_heart", "af_bella", "am_adam", "bf_emma", "bm_george", etc. None uses the backend's default voice. |
speed |
float
|
Playback speed multiplier [0.5, 2.0]. Supported by Kokoro; ignored by Bark. Default 1.0 (normal speed). |
TTSResponse ¶
Bases: BaseResponse
Response contract for text-to-speech synthesis.
sheaf.api.audio_generation ¶
API contract for audio generation models (MusicGen, etc.).
AudioGenerationRequest ¶
Bases: BaseRequest
Request contract for text-conditioned audio/music generation.
Attributes:
| Name | Type | Description |
|---|---|---|
prompt |
str
|
Text description of the audio to generate (e.g. "happy jazz with piano and drums"). |
duration_s |
float
|
Target duration in seconds. Converted to max_new_tokens via model.config.audio_encoder.frame_rate (50 tokens/sec for MusicGen). |
guidance_scale |
float | None
|
Classifier-free guidance scale. Higher values steer generation closer to the prompt at the cost of diversity. Typical range: 1.0–10.0. None disables CFG. |
temperature |
float
|
Sampling temperature. Higher values increase randomness. |
top_k |
int
|
Top-k nucleus sampling parameter. |
AudioGenerationResponse ¶
Bases: BaseResponse
Response contract for audio generation.
Vision¶
sheaf.api.embedding ¶
API contract for embedding / representation models (CLIP, DINOv2, etc.).
EmbeddingRequest ¶
Bases: BaseRequest
Request contract for embedding models.
Exactly one of texts or images_b64 must be provided per request.
Both fields accept a batch — pass multiple items to embed them in a single
forward pass.
Attributes:
| Name | Type | Description |
|---|---|---|
texts |
list[str] | None
|
List of strings to embed (text modality). |
images_b64 |
list[str] | None
|
List of base64-encoded image files to embed (vision modality). Any format PIL can open is accepted (JPEG, PNG, WebP, etc.). |
normalize |
bool
|
If True (default), L2-normalize the output embeddings so that cosine similarity equals dot product. |
EmbeddingResponse ¶
Bases: BaseResponse
Response contract for embedding models.
sheaf.api.segmentation ¶
API contract for image segmentation models (SAM2, etc.).
SegmentationRequest ¶
Bases: BaseRequest
Request contract for prompted image segmentation.
Exactly one image is segmented per request. At least one prompt must be
provided — either point_coords (with matching point_labels) or
box, or both.
Attributes:
| Name | Type | Description |
|---|---|---|
image_b64 |
str
|
Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.). |
point_coords |
list[list[float]] | None
|
List of [x, y] points (pixel coordinates). |
point_labels |
list[int] | None
|
Foreground (1) / background (0) label for each point.
Must have the same length as |
box |
list[float] | None
|
Bounding-box prompt as [x1, y1, x2, y2] in pixel coordinates. |
multimask_output |
bool
|
If True (default), return three candidate masks ranked by score. Set to False to get a single best mask. |
SegmentationResponse ¶
Bases: BaseResponse
Response contract for image segmentation models.
Each mask is a base64-encoded flat uint8 byte array. To reconstruct::
import base64, numpy as np
mask = np.frombuffer(
base64.b64decode(masks_b64[i]), dtype=np.uint8
).reshape(height, width).astype(bool)
sheaf.api.depth ¶
API contract for monocular depth estimation models (Depth Anything v2, etc.).
DepthRequest ¶
Bases: BaseRequest
Request contract for monocular depth estimation.
Attributes:
| Name | Type | Description |
|---|---|---|
image_b64 |
str
|
Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.). |
normalize |
bool
|
If True (default), the depth map is linearly rescaled to [0, 1] where 0 = nearest and 1 = furthest point in the scene. If False, raw relative depth values from the model are returned. |
DepthResponse ¶
Bases: BaseResponse
Response contract for monocular depth estimation.
The depth map is a base64-encoded flat float32 byte array at the model's native output resolution. To reconstruct::
import base64, numpy as np
depth = np.frombuffer(
base64.b64decode(depth_b64), dtype=np.float32
).reshape(height, width)
If normalize=True was requested, values are in [0, 1].
min_depth and max_depth are the raw (pre-normalization) bounds,
useful for recovering metric-relative scale.
sheaf.api.detection ¶
API contract for object detection models (DETR, RT-DETR, etc.).
DetectionRequest ¶
Bases: BaseRequest
Request contract for object detection.
Attributes:
| Name | Type | Description |
|---|---|---|
image_b64 |
str
|
Base64-encoded image file. Any format PIL can open is accepted (JPEG, PNG, WebP, etc.). |
threshold |
float
|
Minimum confidence score for a detection to be included in the response. Defaults to 0.5. |
DetectionResponse ¶
Bases: BaseResponse
Response contract for object detection.
Boxes are in absolute pixel coordinates: [x_min, y_min, x_max, y_max].
Lists are parallel — boxes[i], scores[i], and labels[i] all
describe the same detection, sorted by descending confidence score.
sheaf.api.pose ¶
API contract for pose estimation models (ViTPose, etc.).
PoseRequest ¶
Bases: BaseRequest
Request contract for human pose estimation.
ViTPose is a top-down model: it estimates keypoints within person crops.
If bboxes is provided, each box is used as a person crop. If omitted,
the full image is treated as a single-person crop.
Attributes:
| Name | Type | Description |
|---|---|---|
image_b64 |
str
|
Base64-encoded image (JPEG, PNG, or any PIL-readable format). |
bboxes |
list[list[float]] | None
|
Optional list of person bounding boxes in pixel coordinates,
each |
threshold |
float
|
Minimum keypoint confidence score to include in the response. Keypoints below this threshold are still returned but flagged by a low score; filtering is left to the caller. |
PoseResponse ¶
Bases: BaseResponse
Response contract for human pose estimation.
poses[i][j] is [x, y, score] for the j-th keypoint of the i-th
detected person, in absolute pixel coordinates. keypoint_names[j]
gives the semantic label for keypoint j (e.g. "nose", "left_eye").
Decode example::
for person in resp.poses:
for (x, y, score), name in zip(person, resp.keypoint_names):
print(f"{name}: ({x:.1f}, {y:.1f}) conf={score:.2f}")
sheaf.api.optical_flow ¶
API contract for optical flow models (RAFT, UniMatch, etc.).
OpticalFlowRequest ¶
Bases: BaseRequest
Request contract for optical flow estimation.
Accepts two consecutive video frames and returns the dense per-pixel displacement field between them.
Attributes:
| Name | Type | Description |
|---|---|---|
frame1_b64 |
str
|
Base64-encoded first frame (JPEG, PNG, or any PIL-readable format). Both frames must have the same spatial dimensions. |
frame2_b64 |
str
|
Base64-encoded second frame. |
OpticalFlowResponse ¶
Bases: BaseResponse
Response contract for optical flow estimation.
flow_b64 is a base64-encoded flat float32 byte array of shape
(height, width, 2), where the last dimension is (dx, dy) —
the horizontal and vertical pixel displacement from frame1 to frame2.
Decode example::
import base64, numpy as np
flow = np.frombuffer(
base64.b64decode(flow_b64), dtype=np.float32
).reshape(height, width, 2)
dx, dy = flow[..., 0], flow[..., 1]
sheaf.api.video ¶
API contract for video understanding models (VideoMAE, TimeSformer, etc.).
VideoRequest ¶
Bases: BaseRequest
Request contract for video understanding models.
Frames are passed as a list of base64-encoded images (JPEG or PNG). The number of frames expected depends on the model:
- VideoMAE-base: 16 frames (default, tubelet_size=2, 224×224)
- TimeSformer: 8 frames (224×224)
Pass exactly the number the model was pretrained on, or the processor will pad/truncate automatically.
Attributes:
| Name | Type | Description |
|---|---|---|
frames_b64 |
list[str]
|
Ordered list of base64-encoded video frames. |
task |
Literal['embedding', 'classification']
|
"embedding" returns a single fixed-size vector per video clip; "classification" returns class labels and softmax scores. |
pooling |
Literal['cls', 'mean']
|
Pooling strategy for embeddings. "cls" — CLS token at position 0 of last_hidden_state (default). "mean" — Mean of all non-CLS patch tokens. |
normalize |
bool
|
If True (default), L2-normalize the output embedding. Ignored for classification. |
VideoResponse ¶
Bases: BaseResponse
Response contract for video understanding models.
For task="embedding": embedding and dim are populated.
For task="classification": labels and scores are populated.
Diffusion / multimodal generation¶
sheaf.api.diffusion ¶
API contract for diffusion image generation models (FLUX, etc.).
DiffusionRequest ¶
Bases: BaseRequest
Request contract for text-to-image diffusion models.
Attributes:
| Name | Type | Description |
|---|---|---|
prompt |
str
|
Text description of the image to generate. |
negative_prompt |
str
|
Text description of what to avoid. Not supported by all models (FLUX.1-schnell ignores it). |
height |
int
|
Output image height in pixels. Must be a multiple of 8. Defaults to 1024. |
width |
int
|
Output image width in pixels. Must be a multiple of 8. Defaults to 1024. |
num_inference_steps |
int
|
Number of denoising steps. FLUX.1-schnell is optimized for 1–4 steps; FLUX.1-dev typically uses 20–50. |
guidance_scale |
float
|
Classifier-free guidance scale. Higher values steer generation closer to the prompt. FLUX.1-schnell uses 0.0 (guidance-distilled); FLUX.1-dev typically uses 3.5–7.0. |
seed |
int | None
|
Random seed for reproducibility. None = random. |
adapters |
list[str]
|
Names of LoRA adapters to apply, in order of application.
Each name must be registered on the deployment's
|
adapter_weights |
list[float] | None
|
Per-adapter weights, parallel to |
DiffusionResponse ¶
Bases: BaseResponse
Response contract for text-to-image diffusion models.
The generated image is returned as a base64-encoded PNG. To decode::
import base64
from PIL import Image
import io
img = Image.open(io.BytesIO(base64.b64decode(image_b64)))
Attributes:
| Name | Type | Description |
|---|---|---|
image_b64 |
str
|
Base64-encoded PNG image. |
height |
int
|
Output image height in pixels. |
width |
int
|
Output image width in pixels. |
seed |
int
|
Seed actually used for generation (useful when the request seed was None and you want to reproduce the result). |
sheaf.api.multimodal_generation ¶
API contract for text+image-conditioned generation models (SDXL, etc.).
MultimodalGenerationRequest ¶
Bases: BaseRequest
Request contract for text+image-conditioned image generation.
Distinct from pure text-to-image (DiffusionRequest/FLUX): the input image
conditions the generation. When mask_b64 is omitted the backend runs
img2img (style/content transfer); when provided it runs inpainting.
Attributes:
| Name | Type | Description |
|---|---|---|
prompt |
str
|
Text description guiding the generated image. |
image_b64 |
str
|
Base64-encoded input image (JPEG, PNG, or any PIL-readable format). Acts as the conditioning source for img2img / inpainting. |
mask_b64 |
str | None
|
Optional base64-encoded mask image (same spatial size as
|
strength |
float
|
How much to transform the input image. 0.0 = no change, 1.0 = ignore the original image entirely. Default 0.8. |
num_inference_steps |
int
|
Total denoising steps. Actual steps run =
|
guidance_scale |
float
|
Classifier-free guidance scale. Higher values steer generation closer to the prompt. Default 7.5. |
negative_prompt |
str
|
Text description of what to avoid in the output. |
seed |
int | None
|
Random seed for reproducibility. None = random. |
adapters |
list[str]
|
Names of LoRA adapters to apply, in order. Each name must
be registered on the deployment's |
adapter_weights |
list[float] | None
|
Per-adapter weights, parallel to |
MultimodalGenerationResponse ¶
Bases: BaseResponse
Response contract for text+image-conditioned image generation.
The generated image is returned as a base64-encoded PNG. To decode::
import base64, io
from PIL import Image
img = Image.open(io.BytesIO(base64.b64decode(image_b64)))
Cross-modal embedding¶
sheaf.api.multimodal_embedding ¶
API contract for cross-modal embedding models (ImageBind, etc.).
MultimodalEmbeddingRequest ¶
Bases: BaseRequest
Request contract for cross-modal embedding models (e.g. ImageBind).
Exactly one modality field must be set per request. All items in the chosen field are embedded in a single forward pass and returned in the shared embedding space.
Modalities
texts: List of strings (text modality). images_b64: List of base64-encoded image files (vision modality). audios_b64: List of base64-encoded audio files (audio modality). depth_images_b64: List of base64-encoded depth images (depth modality). thermal_images_b64: List of base64-encoded thermal images (thermal modality).
For image/audio inputs any format the underlying model accepts is valid (JPEG/PNG for vision; WAV/MP3 for audio). The backend writes temporary files as needed — the model loaders read paths, not raw bytes.
Attributes:
| Name | Type | Description |
|---|---|---|
normalize |
bool
|
If True (default), L2-normalize output embeddings so that cosine similarity equals dot product. |
MultimodalEmbeddingResponse ¶
Bases: BaseResponse
Response contract for cross-modal embedding models.
Molecular / genomics / materials¶
sheaf.api.molecular ¶
API contract for molecular / protein language models (ESM-3, etc.).
MolecularRequest ¶
Bases: BaseRequest
Request contract for protein sequence embedding.
A single request embeds a batch of protein sequences. Sequences should use standard single-letter amino acid codes (ACDEFGHIKLMNPQRSTVWY plus ambiguity codes accepted by ESM tokenizers).
Attributes:
| Name | Type | Description |
|---|---|---|
sequences |
list[str]
|
List of amino acid sequences to embed. |
pooling |
Literal['mean', 'cls']
|
How to reduce the per-residue hidden states to a single
vector per sequence.
|
normalize |
bool
|
If True (default), L2-normalize each embedding so that cosine similarity equals dot product. |
MolecularResponse ¶
Bases: BaseResponse
Response contract for protein sequence embedding.
sheaf.api.genomic ¶
API contract for DNA/genomic foundation models (Nucleotide Transformer, etc.).
GenomicRequest ¶
Bases: BaseRequest
Request contract for DNA/RNA sequence embedding.
A single request embeds a batch of nucleotide sequences. Sequences should use standard nucleotide codes (A, C, G, T for DNA; A, C, G, U for RNA; N for unknown bases). All are accepted by Nucleotide Transformer tokenizers.
Attributes:
| Name | Type | Description |
|---|---|---|
sequences |
list[str]
|
List of nucleotide sequences to embed. |
pooling |
Literal['mean', 'cls']
|
How to reduce per-token hidden states to a single vector.
|
normalize |
bool
|
If True (default), L2-normalize each embedding so that cosine similarity equals dot product. |
GenomicResponse ¶
Bases: BaseResponse
Response contract for DNA/RNA sequence embedding.
sheaf.api.small_molecule ¶
API contract for small molecule / chemical foundation models (MolFormer, etc.).
SmallMoleculeRequest ¶
Bases: BaseRequest
Request contract for small molecule embedding.
A single request embeds a batch of chemical compounds given as SMILES strings. SMILES (Simplified Molecular-Input Line-Entry System) is the standard text representation of molecular structure.
Attributes:
| Name | Type | Description |
|---|---|---|
smiles |
list[str]
|
List of SMILES strings to embed. Each string represents one molecule (e.g. "CC(=O)OC1=CC=CC=C1C(=O)O" for aspirin). |
pooling |
Literal['mean', 'cls']
|
How to reduce per-token hidden states to a fixed-size vector.
|
normalize |
bool
|
If True, L2-normalize each embedding (cosine similarity ==
dot product). Defaults to |
SmallMoleculeResponse ¶
Bases: BaseResponse
Response contract for small molecule embedding.
sheaf.api.materials ¶
API contract for materials / interatomic potential models (MACE-MP, etc.).
MaterialsRequest ¶
Bases: BaseRequest
Request contract for atomistic energy/force/stress prediction.
Describes a single atomic structure: a set of atoms at given positions, optionally in a periodic simulation cell. The model predicts the potential energy surface and its derivatives.
Attributes:
| Name | Type | Description |
|---|---|---|
atomic_numbers |
list[int]
|
Atomic numbers (Z) for each atom. Length N. |
positions_b64 |
str
|
Base64-encoded float32 array of shape (N, 3) giving Cartesian coordinates in Angstroms. |
cell |
list[list[float]] | None
|
3x3 lattice vectors in Angstroms for periodic boundary
conditions. Required when |
pbc |
bool | list[bool]
|
Periodic boundary conditions. |
compute_forces |
bool
|
If True (default), return forces in eV/Å. |
compute_stress |
bool
|
If True, return the stress tensor in eV/ų (Voigt).
Only meaningful for periodic systems ( |
MaterialsResponse ¶
Bases: BaseResponse
Response contract for atomistic energy/force/stress prediction.
Earth / weather¶
sheaf.api.weather ¶
API contract for weather / atmospheric-state foundation models.
Supports GraphCast, Aurora, Pangu-Weather, and similar architectures.
Encoding convention¶
All array fields are base64-encoded little-endian float32 byte strings.
Surface variable shape: (n_lat, n_lon) Atmospheric var shape: (n_levels, n_lat, n_lon)
Encode: base64.b64encode(arr.astype(np.float32).tobytes()).decode() Decode: np.frombuffer(base64.b64decode(s), dtype=np.float32) .reshape(n_lat, n_lon) # surface .reshape(n_levels, n_lat, n_lon) # atmospheric
Grid conventions (GraphCast / ERA5)¶
- lat: descending, e.g. [90.0, 89.75, …, -90.0] for 0.25° global
- lon: ascending, e.g. [0.0, 0.25, …, 359.75]
- pressure_levels: descending hPa, e.g. [1000, 925, 850, …, 1]
- current_time: ISO-8601 string, e.g. "2023-01-01T06:00:00"
GraphCast requires two consecutive time steps (t-6h and t) as input, so both vars and prev_vars are required and must contain the same variable names. n_steps controls how many 6-hour steps are predicted autoregressively.
WeatherRequest ¶
Bases: BaseRequest
Request contract for atmospheric-state foundation models.
Attributes:
| Name | Type | Description |
|---|---|---|
surface_vars |
dict[str, str]
|
Surface variable fields at time t. Keys are variable names (ERA5-style for GraphCast, e.g. "2m_temperature", "10m_u_component_of_wind"). Values are base64 float32 arrays of shape (n_lat, n_lon). |
atmospheric_vars |
dict[str, str]
|
Atmospheric (pressure-level) fields at time t. Values are base64 float32 arrays of shape (n_levels, n_lat, n_lon). |
prev_surface_vars |
dict[str, str]
|
Same variables at time t - step_hours (t-6h for GraphCast). |
prev_atmospheric_vars |
dict[str, str]
|
Same variables at time t - step_hours. |
lat |
list[float]
|
Latitude grid, length n_lat, descending degrees. |
lon |
list[float]
|
Longitude grid, length n_lon, ascending degrees. |
pressure_levels |
list[int]
|
Pressure levels in hPa, length n_levels, descending. |
current_time |
str
|
ISO-8601 timestamp for the current state (t). |
n_steps |
int
|
Number of autoregressive 6-hour steps to predict. |
WeatherResponse ¶
Bases: BaseResponse
Response contract for atmospheric-state foundation models.
surface_forecasts[i] — dict of {var_name: base64_float32} for step i+1. Each array has shape (n_lat, n_lon). atmospheric_forecasts[i] — same for atmospheric (pressure-level) variables. Each array has shape (n_levels, n_lat, n_lon). forecast_times[i] — ISO-8601 timestamp for step i+1.
sheaf.api.satellite ¶
API contract for Earth observation / satellite imagery foundation models.
Supports Prithvi (IBM/NASA), Clay, SatMAE, and similar architectures.
Encoding convention¶
pixels_b64 is a base64-encoded little-endian float32 byte string.
Shape: (n_time, n_bands, height, width)
For single-time input set n_time=1. Values are typically surface reflectance in [0, 1] (after dividing sensor DN by 10 000 for Landsat/Sentinel-2) or raw DN if normalize=False.
Encode: base64.b64encode(arr.astype(np.float32).tobytes()).decode() Decode: np.frombuffer(base64.b64decode(s), dtype=np.float32) .reshape(n_time, n_bands, height, width)
Band names (examples)¶
HLS (Harmonized Landsat-Sentinel) 6-band subset used by Prithvi: ["blue", "green", "red", "nir08", "swir16", "swir22"]
Sentinel-2 L2A 10-band (used by Clay): ["coastal", "blue", "green", "red", "rededge1", "rededge2", "rededge3", "nir08", "nir09", "swir16", "swir22"]
Wavelengths (μm) for Clay (examples, Sentinel-2): [0.443, 0.490, 0.560, 0.665, 0.704, 0.740, 0.783, 0.842, 0.865, 1.610, 2.190]
SatelliteRequest ¶
Bases: BaseRequest
Request contract for Earth observation foundation models.
Attributes:
| Name | Type | Description |
|---|---|---|
pixels_b64 |
str
|
Base64 float32 pixel array of shape (n_time, n_bands, height, width). |
n_time |
int
|
Number of time steps in the input stack. |
n_bands |
int
|
Number of spectral bands. |
height |
int
|
Spatial height in pixels. |
width |
int
|
Spatial width in pixels. |
band_names |
list[str]
|
Human-readable band labels, length n_bands. |
wavelengths |
list[float] | None
|
Center wavelengths in micrometers, length n_bands. Required by Clay; ignored by Prithvi. |
gsd |
float
|
Ground sample distance in metres (default 10 m for Sentinel-2). |
lat |
float | None
|
Centre latitude in degrees (optional metadata). |
lon |
float | None
|
Centre longitude in degrees (optional metadata). |
timestamps |
list[str] | None
|
ISO-8601 timestamp per time step (optional metadata). |
pooling |
Literal['mean', 'cls']
|
"mean" pools all output tokens; "cls" uses the first token (CLS or register token). |
normalize |
bool
|
Apply the model's per-band mean/std normalization using statistics stored in the image processor config. Disable if you have already normalized your data. |
SatelliteResponse ¶
Bases: BaseResponse
Response contract for Earth observation foundation models.
embedding — scene-level float vector of length dim.
For multi-temporal input the tokens from all time steps
are pooled together into a single vector.
dim — embedding dimensionality.
n_time — number of time steps from the request (passed through for
bookkeeping).
LiDAR / point cloud¶
sheaf.api.point_cloud ¶
API contract for 3D point cloud models (PointNet, etc.).
PointCloudRequest ¶
Bases: BaseRequest
Request contract for 3D point cloud processing.
Point clouds are passed as base64-encoded flat float32 byte arrays of shape
(n_points, 3) containing XYZ coordinates. Points are expected to be
pre-normalised to a unit sphere centred at the origin (subtract centroid,
divide by max radius).
Attributes:
| Name | Type | Description |
|---|---|---|
points_b64 |
str
|
Base64-encoded flat float32 byte array, shape |
n_points |
int
|
Number of points in the cloud. Required to reshape the flat byte array. Typical values: 1024, 2048, 4096. |
task |
Literal['embed', 'classify']
|
"embed" — return the 1024-dim PointNet global feature vector. "classify" — return class label + per-class softmax scores. |
PointCloudResponse ¶
Bases: BaseResponse
Response contract for 3D point cloud processing.
Exactly one of embedding or labels is populated, depending on the
requested task.
For task="embed":
embedding — 1024-dim global PointNet feature (L2-normalised).
For task="classify":
label — top predicted class name (e.g. "airplane").
scores — per-class softmax probabilities, parallel to label_names.
label_names — class names in score order (model's id2label mapping).