User Guide · Core Concepts · Architecture

Architecture

MetaFine sits on a three-layer pipeline: atomic skills compose into task graphs, task graphs drive recording and rollout, and rollouts feed a three-dimension diagnostic. Every concept maps onto something concrete in the source tree.

The pipeline

A MetaFine evaluation is the same shape every time: an atomic-skill primitive is composed into a multi-step task graph; the task graph drives a recording or rollout; the rollout's trajectory and per-stage outcomes are scored along three orthogonal axes.

[ Atomic skills ] → [ Compositional task graph ] → [ Diagnostic evaluation ]
core/skill.py configs/*.yaml · core/predicates.py utils/eval_*.py

21 typed primitives Stages + success predicates Understanding / Perception / Behavior
@register_skill YAML or Python DSL one results.json per run

Layer 1 — Atomic skills

A skill is a motion-planning solver that achieves one well-defined interaction with one well-defined part of an articulated asset — grasp this handle, rotate this knob 90°, slide this drawer 5 cm. The 21 atomic skills MetaFine ships fall into three phases:

Interaction — engages the object (transitions contact state). E.g. grasp_part, press, flip_switch, lift_lid.
Continuation — operates on an already-engaged part; safe to chain after an interaction. E.g. pure_rotate, pure_slide, pure_insert, release_gripper.
Bundle — pre-composed multi-step routines kept atomic for now (until we split them). E.g. lid_opening, stand_up, toggle_switch.

Every skill declares the affordances it requires of its target part — grasp_part requires graspable, pure_rotate requires rotatable, etc. See Affordances for the closed-set vocabulary.

Layer 2 — Compositional task graphs

A task graph is a YAML chain of skill calls with optional per-stage success predicates. Multi-step tasks ("grasp the cap, then lift it 5 cm") become 20-line YAML files rather than new env classes:

name: grasp_and_lift_cap
stages:
  - skill: grasp_part
    target: { object: 100221, part: cap }
    success: grasped("cap")
  - skill: pure_lift
    success: and(grasped("cap"), lifted("cap", height_m: 0.05))

Predicates compile to a callable evaluated each step — so per-stage success rates are computed for free. The predicate DSL (and / or / not plus six atomic predicates) is documented under Predicate DSL.

Layer 3 — Diagnostic evaluation

A rollout produces three orthogonal signals that get scored along the three axes:

Understanding — per-stage success rates over the task graph; surfaces where the chain breaks.
Perception — domain-randomisation sweeps (lighting, view, jitter) with AUSC normalisation; surfaces robustness to visual variation.
Behavior — trajectory smoothness (jerk RMS, velocity variance, path length); surfaces jerky / hesitant / chunk-artefact policies.

The three scores are emitted into a single results.json per run, so two policies can be compared across the full diagnostic plane.

How a request flows

Resolve. The task graph names a skill plus a target part. The SKILL_REGISTRY looks up the skill spec; the asset's capabilities.json confirms the target part offers the required affordances. A mismatch fails fast with a clear error.
Plan. The skill solver constructs a motion plan (or the policy network predicts an action chunk).
Roll out. Each step's observation is captured; the stage predicate is evaluated.
Score. At end-of-episode, the three diagnostic dimensions are aggregated; the result is appended to the run's results.json.

Source-tree map

Module	Role
`core/skill.py`	21 motion-planning skill solvers.
`core/skill_registry.py`	`@register_skill`, `SKILL_REGISTRY`, the 11-affordance vocabulary.
`core/predicates.py`	Predicate-DSL compiler.
`core/env.py`	19 Gym envs (single-skill + bundle).
`core/scene.py`	SceneBuilders (data-driven, no per-asset branches).
`core/env_mixins.py`	`EvalDREnvMixin` — camera / light jitter helpers.
`utils/task_graph.py`	`TaskGraph` dataclass + YAML loader + runner.
`utils/eval_metrics.py`	`EpisodeResult`, `EvalSummary`, `compute_smoothness`.
`utils/eval_sweep.py`	`dr_sweep` + `standard_dr_sweeps` with AUSC.
`utils/eval_setup.py`	`make_eval_env` — dispatches single-skill ↔ task-graph mode.