Reflex Docs
Guides

Observation and action schema

The JSON shapes flowing between your robot and inference.

Every Reflex inference path — whether you use the ActionStream, the @connect decorator, or a shell observe/act subprocess wired into reflex connect — exchanges the same two payloads: observations (you to the server) and action chunks (server to you).

Observation

An observation describes one snapshot of the robot's world.

{
  "state": [0.1, 0.2, 0.3, 0.4, 0.5, 0.6],
  "images": {
    "wrist": "<base64-encoded JPEG>",
    "overhead": "<base64-encoded JPEG>"
  },
  "prompt": "pick up the red block",
  "seq": 17,
  "capture_time_ns": 1737830400000000000
}
FieldTypeRequiredDescription
statelist[float]yesRobot proprioception vector. Length depends on the embodiment.
imagesdict[str, ...]noMap of camera name to image. Values can be base64 JPEG strings or raw bytes; the SDK normalizes for you.
promptstringnoPer-step override of the task prompt. Falls back to the session prompt.
seqintnoCaller-supplied sequence number. The SDK fills one in if you omit it.
capture_time_nsintnoWallclock nanoseconds of when the observation was captured. Useful for latency tracing.
request_idstringnoIdempotency key for the inference request.
max_gpu_secondsfloatnoPer-request GPU time cap. Overrides the session default.

Python SDK

seq = stream.send_observation(
    state=read_joints(),
    images={"wrist": jpeg_bytes},
    prompt="pick up the cube",  # optional per-step override
)

Shell observe-command (used by reflex connect)

Print exactly one JSON object per line. reflex connect reads stdout line by line, one observation per control step.

python observe.py
# {"state": [0.1, ...], "images": {"wrist": "<b64-jpeg>"}, "task": "pick up the cube"}

Action chunk

The server returns an action chunk — a batch of consecutive action targets the robot should apply.

{
  "type": "action_chunk",
  "seq": 17,
  "actions": [
    [0.0, 0.01, 0.02, 0.03, 0.04, 0.05],
    [0.0, 0.02, 0.04, 0.06, 0.08, 0.10],
    [0.0, 0.03, 0.06, 0.09, 0.12, 0.15]
  ],
  "metadata": {
    "model": "pi0.5",
    "inference_ms": 0
  }
}
FieldTypeDescription
type"action_chunk"Frame type discriminator.
seqintThe observation sequence this chunk responds to.
actionslist[list[float]]List of action vectors. Each inner list is one step's worth of targets.
metadatadictServer metadata (model name, timing, etc.). Stable keys are not promised.

Python SDK

action = stream.recv_action()
for target in action["actions"]:
    apply_to_hardware(target)

Shell action-command (used by reflex connect)

Receives the chunk JSON on stdin. Apply it and exit 0. Any non-zero exit triggers the safe-stop path.

python act.py < /dev/stdin
# stdin: {"type":"action_chunk","actions":[[...],[...]],...}

Safe-stop frame

When reflex connect hits an unrecoverable error (network failure, command timeout, server-reported error), it invokes the safe-stop-command with a JSON payload describing the cause:

{
  "reason": "transport_error",
  "message": "websocket closed unexpectedly",
  "session_id": "session_..."
}

Your safe-stop command should bring the robot to a known safe state and exit. It must complete within the connector's safe_stop_timeout_s config field.

Tips

  • Keep observations small. Images dominate; downsample on the robot side rather than shipping full-resolution frames.
  • Set capture_time_ns if you care about end-to-end latency budgets — the server returns timings keyed to it.
  • The server rejects observations whose state length doesn't match the embodiment's expected dim. Check your schema if you see validation errors.