User Guide · Evaluation · Results schema

Results schema

Every diagnostic eval run emits a single results.json file. This is the field-by-field reference.

Overview

One results.json per eval run. Top-level keys are stable; nested fields under each diagnostic dimension are versioned by the meta.version field. Treat unknown keys as forward-compatible; never strip them on round-trip.

Top-level layout

{
  "understanding": { ... },
  "perception":    { ... },
  "behavior":      { ... },
  "meta":          { ... }
}

understanding

Field	Type	Description
`per_stage_success`	`{stage_name: float}`	Stage success rate over all eval episodes. Range [0, 1].
`overall`	float	Final-stage success rate; the headline U-axis number.
`n_episodes`	int	Total episodes in the U-axis aggregation.
`stage_order`	list[string]	Stage names in graph-declared order.

perception

Field	Type	Description
`ausc`	`{axis: float}`	Per-axis AUSC (area-under-success-curve), normalised [0, 1]. Axes: `light`, `view`, `rotation`.
`overall`	float	Mean of `ausc` values.
`curve_points`	`{axis: [[x, y], ...]}`	Raw success-vs-perturbation curve, normalised x axis ∈ [0, 1].
`sweep_config`	object	The `standard_dr_sweeps()` config used (max perturbation per axis, n_steps, episodes_per_step).

behavior

Field	Type	Description
`jerk_rms`	float	End-effector jerk RMS in m/s³, averaged over successful episodes.
`vel_var`	float	Normalised velocity variance (dimensionless).
`path_length_ratio`	float	Actual path length / straight-line distance. ≥ 1.0.
`joint_jerk_rms`	float (optional)	Same metric in joint space; recommended for cross-policy comparisons.
`frame_rate`	float	Eval rollout frame rate in Hz (default 25). Required for cross-paper comparisons.
`aggregation`	string	One of `"success_only"` (default), `"all_episodes"`.

Field	Type	Description
`version`	string	Results-schema semver. Bumped on any breaking field change.
`task_graph`	string	Path to the YAML used for this run.
`policy`	string	Policy identifier (`pi05`, `openvla-oft`, …).
`checkpoint`	string	Checkpoint path or hash.
`seed`	int	Eval seed.
`timestamp`	string	ISO 8601.
`commit`	string	Git SHA of the MetaFine repo at eval time.

Sample output

{
  "understanding": {
    "per_stage_success": { "engage": 0.93, "manipulate": 0.71, "release": 0.62 },
    "overall": 0.62,
    "n_episodes": 30,
    "stage_order": ["engage", "manipulate", "release"]
  },
  "perception": {
    "ausc": { "light": 0.81, "view": 0.74, "rotation": 0.55 },
    "overall": 0.70,
    "curve_points": {
      "light":    [[0,1.0],[0.2,0.97],[0.4,0.90],[0.6,0.83],[0.8,0.72],[1.0,0.55]],
      "view":     [[0,1.0],[0.2,0.95],[0.4,0.83],[0.6,0.70],[0.8,0.58],[1.0,0.40]],
      "rotation": [[0,1.0],[0.2,0.83],[0.4,0.60],[0.6,0.41],[0.8,0.30],[1.0,0.18]]
    },
    "sweep_config": { "n_steps": 6, "episodes_per_step": 20 }
  },
  "behavior": {
    "jerk_rms": 0.082, "vel_var": 0.011, "path_length_ratio": 1.18,
    "frame_rate": 25, "aggregation": "success_only"
  },
  "meta": {
    "version": "0.1.0",
    "task_graph": "configs/grasp_and_lift_cap.yaml",
    "policy": "pi05",
    "checkpoint": "/nat/demos/pi05/run_2026_05_12/step_50000.pt",
    "seed": 42,
    "timestamp": "2026-05-13T14:30:11Z",
    "commit": "5a6c780"
  }
}