Observation and action schema
The JSON shapes flowing between your robot and inference.
Every Reflex inference path — whether you use the ActionStream, the
@connect decorator, or a shell observe/act subprocess wired into
reflex connect — exchanges the same two payloads: observations (you to the
server) and action chunks (server to you).
Observation
An observation describes one snapshot of the robot's world.
{
"state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
"images": {
"wrist": "<base64-encoded JPEG>",
"overhead": "<base64-encoded JPEG>"
},
"prompt": "pick up the red block",
"seq": 17,
"capture_time_ns": 1737830400000000000
}| Field | Type | Required | Description |
|---|---|---|---|
state | list[float] | yes | Robot proprioception vector. Length depends on the embodiment. |
images | dict[str, ...] | no | Map of camera name to image. Values can be base64 JPEG strings or raw bytes; the SDK normalizes for you. |
prompt | string | no | Per-step override of the task prompt. Falls back to the session prompt. |
seq | int | no | Caller-supplied sequence number. The SDK fills one in if you omit it. |
capture_time_ns | int | no | Wallclock nanoseconds of when the observation was captured. Useful for latency tracing. |
request_id | string | no | Idempotency key for the inference request. |
max_gpu_seconds | float | no | Per-request GPU time cap. Overrides the session default. |
Python SDK
seq = stream.send_observation(
state=read_joints(),
images={"wrist": jpeg_bytes},
prompt="pick up the cube", # optional per-step override
)Shell observe-command (used by reflex connect)
Print exactly one JSON object per line. reflex connect reads stdout line by
line, one observation per control step.
python observe.py
# {"state": [0.1, ...], "images": {"wrist": "<b64-jpeg>"}, "task": "pick up the cube"}Action chunk
The server returns an action chunk — a batch of consecutive action targets the robot should apply.
{
"type": "action_chunk",
"seq": 17,
"actions": [
[0.0, 0.01, 0.02, 0.03, 0.04, 0.05],
[0.0, 0.02, 0.04, 0.06, 0.08, 0.10],
[0.0, 0.03, 0.06, 0.09, 0.12, 0.15]
],
"metadata": {
"model": "pi0.5",
"inference_ms": 0
}
}| Field | Type | Description |
|---|---|---|
type | "action_chunk" | Frame type discriminator. |
seq | int | The observation sequence this chunk responds to. |
actions | list[list[float]] | List of action vectors. Each inner list is one step's worth of targets. |
metadata | dict | Server metadata (model name, timing, etc.). Stable keys are not promised. |
Python SDK
action = stream.recv_action()
for target in action["actions"]:
apply_to_hardware(target)Shell action-command (used by reflex connect)
Receives the chunk JSON on stdin. Apply it and exit 0. Any non-zero exit
triggers the safe-stop path.
python act.py < /dev/stdin
# stdin: {"type":"action_chunk","actions":[[...],[...]],...}Safe-stop frame
When reflex connect hits an unrecoverable error (network failure, command
timeout, server-reported error), it invokes the safe-stop-command with a
JSON payload describing the cause:
{
"reason": "transport_error",
"message": "websocket closed unexpectedly",
"session_id": "session_..."
}Your safe-stop command should bring the robot to a known safe state and exit.
It must complete within the connector's safe_stop_timeout_s config field.
Tips
- Keep observations small. Images dominate; downsample on the robot side rather than shipping full-resolution frames.
- Set
capture_time_nsif you care about end-to-end latency budgets — the server returns timings keyed to it. - The server rejects observations whose
statelength doesn't match the embodiment's expected dim. Check your schema if you see validation errors.