Observation and action schema

Every Reflex inference path — whether you use the ActionStream, the @connect decorator, or a shell observe/act subprocess wired into reflex connect — exchanges the same two payloads: observations (you to the server) and action chunks (server to you).

Observation

An observation describes one snapshot of the robot's world.

{
  "state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
  "images": {
    "wrist": "<base64-encoded JPEG>",
    "overhead": "<base64-encoded JPEG>"
  },
  "prompt": "pick up the red block",
  "seq": 17,
  "capture_time_ns": 1737830400000000000
}

Field	Type	Required	Description
`state`	`list[float]`	yes	Robot proprioception vector. Length depends on the embodiment.
`images`	`dict[str, ...]`	no	Map of camera name to image. Values can be base64 JPEG strings or raw bytes; the SDK normalizes for you.
`prompt`	`string`	no	Per-step override of the task prompt. Falls back to the session prompt.
`seq`	`int`	no	Caller-supplied sequence number. The SDK fills one in if you omit it.
`capture_time_ns`	`int`	no	Wallclock nanoseconds of when the observation was captured. Useful for latency tracing.
`request_id`	`string`	no	Idempotency key for the inference request.
`max_gpu_seconds`	`float`	no	Per-request GPU time cap. Overrides the session default.

Python SDK

seq = stream.send_observation(
    state=read_joints(),
    images={"wrist": jpeg_bytes},
    prompt="pick up the cube",  # optional per-step override
)

Shell observe-command (used by `reflex connect`)

Print exactly one JSON object per line. reflex connect reads stdout line by line, one observation per control step.

python observe.py
# {"state": [0.1, ...], "images": {"wrist": "<b64-jpeg>"}, "task": "pick up the cube"}

Action chunk

The server returns an action chunk — a batch of consecutive action targets the robot should apply.

{
  "type": "action_chunk",
  "seq": 17,
  "actions": [
    [0.0, 0.01, 0.02, 0.03, 0.04, 0.05],
    [0.0, 0.02, 0.04, 0.06, 0.08, 0.10],
    [0.0, 0.03, 0.06, 0.09, 0.12, 0.15]
  ],
  "metadata": {
    "model": "pi0.5",
    "inference_ms": 0
  }
}

Field	Type	Description
`type`	`"action_chunk"`	Frame type discriminator.
`seq`	`int`	The observation sequence this chunk responds to.
`actions`	`list[list[float]]`	List of action vectors. Each inner list is one step's worth of targets.
`metadata`	`dict`	Server metadata (model name, timing, etc.). Stable keys are not promised.

Python SDK

action = stream.recv_action()
for target in action["actions"]:
    apply_to_hardware(target)

Shell action-command (used by `reflex connect`)

Receives the chunk JSON on stdin. Apply it and exit 0. Any non-zero exit triggers the safe-stop path.

python act.py < /dev/stdin
# stdin: {"type":"action_chunk","actions":[[...],[...]],...}

Safe-stop frame

When reflex connect hits an unrecoverable error (network failure, command timeout, server-reported error), it invokes the safe-stop-command with a JSON payload describing the cause:

{
  "reason": "transport_error",
  "message": "websocket closed unexpectedly",
  "session_id": "session_..."
}

Your safe-stop command should bring the robot to a known safe state and exit. It must complete within the connector's safe_stop_timeout_s config field.

Tips

Keep observations small. Images dominate; downsample on the robot side rather than shipping full-resolution frames.
Set capture_time_ns if you care about end-to-end latency budgets — the server returns timings keyed to it.
The server rejects observations whose state length doesn't match the embodiment's expected dim. Check your schema if you see validation errors.

Observation and action schema

On this page