LoRA¶
sheaf.lora ¶
LoRA adapter configuration for Sheaf deployments.
Each ModelSpec may declare a set of named LoRA adapters via
ModelSpec.lora = LoRAConfig(...). Adapters are loaded once at deploy time
and selected per-request via the adapters / adapter_weights fields on
the request.
Adapter sources¶
LoRAAdapter.source accepts two forms:
- A local filesystem path:
/models/loras/sketch.safetensorsor a directory containing the adapter weights file. - A HuggingFace Hub reference:
hf:org/repoorhf:org/repo:weight_file.safetensorswhen the repo contains multiple adapter weight files.
Backends parse hf: themselves; the spec only validates that the source
string is non-empty.
Bucketing¶
pipeline.set_adapters(...) (Diffusers / PEFT) is process-global state on
the pipeline. Two concurrent requests inside the same Ray Serve batch window
that select different adapters would race. BatchPolicy.bucket_by_adapter
sub-batches by the (adapters, weights) tuple so each set_active_adapters
call inside a batch window applies to a homogeneous sub-batch.
LoRAAdapter ¶
Bases: BaseModel
A single named LoRA adapter declared on a ModelSpec.
Attributes:
| Name | Type | Description |
|---|---|---|
source |
str
|
Local path or |
weight |
float
|
Default weight applied when this adapter is selected without
an explicit |
LoRAConfig ¶
Bases: BaseModel
Adapter registry for a single deployment.
Attributes:
| Name | Type | Description |
|---|---|---|
adapters |
dict[str, LoRAAdapter]
|
Mapping of adapter name → :class: |
default |
str | None
|
Optional name from |
Raises:
| Type | Description |
|---|---|
ValueError
|
If |
resolve_active_adapters ¶
resolve_active_adapters(request: Any, lora: LoRAConfig | None) -> tuple[list[str], list[float]]
Resolve the (names, weights) to activate for request.
Resolution rules
- When
loraisNone: returns([], [])— no LoRA applied. The caller is responsible for raising if the request itself specified adapters but the deployment has none configured. - When
request.adaptersis non-empty: those names are used. Ifrequest.adapter_weightsis set, those weights win; otherwise each name's per-adapterweightfromlora.adaptersis used. - When
request.adaptersis empty andlora.defaultis set: the default adapter (with its configured weight) is used. - Otherwise:
([], []).
Raises:
| Type | Description |
|---|---|
ValueError
|
If a name in |
bucket_with_adapter_resolution ¶
bucket_with_adapter_resolution(requests: list[Any], lora: LoRAConfig | None) -> list[tuple[list[int], list[Any], list[str], list[float]]]
Group requests by their resolved (names, weights) adapter selection.
Returns a list of (indices, sub_requests, active_names, active_weights)
tuples — one per unique resolved selection, in first-seen order. Within
each group, requests are listed in original arrival order. active_names
and active_weights are the resolved adapter set to activate before
dispatching sub_requests to the backend.
Two requests collide on the same key only when they resolve to the
same adapter set in the same order with the same weights, which
is the homogeneity guarantee set_active_adapters requires.
Raises:
| Type | Description |
|---|---|
ValueError
|
Propagated from :func: |
parse_source ¶
Parse a :class:LoRAAdapter source into diffusers load arguments.
Returns (path_or_repo, weight_name). weight_name is None for
local paths and for HF references that don't pin a specific weight file.
Forms
"hf:org/repo" → ("org/repo", None)
"hf:org/repo:weight.safetensors" → ("org/repo", "weight.safetensors")
"/abs/path/file.safetensors" → ("/abs/path/file.safetensors", None)
"./relative/path" → ("./relative/path", None)