Architecture
MetaFine sits on a three-layer pipeline: atomic skills compose into task graphs, task graphs drive recording and rollout, and rollouts feed a three-dimension diagnostic. Every concept maps onto something concrete in the source tree.
The pipeline
A MetaFine evaluation is the same shape every time: an atomic-skill primitive is composed into a multi-step task graph; the task graph drives a recording or rollout; the rollout's trajectory and per-stage outcomes are scored along three orthogonal axes.
core/skill.py configs/*.yaml · core/predicates.py utils/eval_*.py
21 typed primitives Stages + success predicates Understanding / Perception / Behavior
@register_skill YAML or Python DSL one results.json per run
Layer 1 — Atomic skills
A skill is a motion-planning solver that achieves one well-defined interaction with one well-defined part of an articulated asset — grasp this handle, rotate this knob 90°, slide this drawer 5 cm. The 21 atomic skills MetaFine ships fall into three phases:
- Interaction — engages the object (transitions contact state). E.g.
grasp_part,press,flip_switch,lift_lid. - Continuation — operates on an already-engaged part; safe to chain after an interaction. E.g.
pure_rotate,pure_slide,pure_insert,release_gripper. - Bundle — pre-composed multi-step routines kept atomic for now (until we split them). E.g.
lid_opening,stand_up,toggle_switch.
Every skill declares the affordances it requires of its target part — grasp_part requires graspable, pure_rotate requires rotatable, etc. See Affordances for the closed-set vocabulary.
Layer 2 — Compositional task graphs
A task graph is a YAML chain of skill calls with optional per-stage success predicates. Multi-step tasks ("grasp the cap, then lift it 5 cm") become 20-line YAML files rather than new env classes:
name: grasp_and_lift_cap stages: - skill: grasp_part target: { object: 100221, part: cap } success: grasped("cap") - skill: pure_lift success: and(grasped("cap"), lifted("cap", height_m: 0.05))
Predicates compile to a callable evaluated each step — so per-stage success rates are computed for free. The predicate DSL (and / or / not plus six atomic predicates) is documented under Predicate DSL.
Layer 3 — Diagnostic evaluation
A rollout produces three orthogonal signals that get scored along the three axes:
- Understanding — per-stage success rates over the task graph; surfaces where the chain breaks.
- Perception — domain-randomisation sweeps (lighting, view, jitter) with AUSC normalisation; surfaces robustness to visual variation.
- Behavior — trajectory smoothness (jerk RMS, velocity variance, path length); surfaces jerky / hesitant / chunk-artefact policies.
The three scores are emitted into a single results.json per run, so two policies can be compared across the full diagnostic plane.
How a request flows
- Resolve. The task graph names a skill plus a target part. The
SKILL_REGISTRYlooks up the skill spec; the asset'scapabilities.jsonconfirms the target part offers the required affordances. A mismatch fails fast with a clear error. - Plan. The skill solver constructs a motion plan (or the policy network predicts an action chunk).
- Roll out. Each step's observation is captured; the stage predicate is evaluated.
- Score. At end-of-episode, the three diagnostic dimensions are aggregated; the result is appended to the run's
results.json.
Source-tree map
| Module | Role |
|---|---|
core/skill.py | 21 motion-planning skill solvers. |
core/skill_registry.py | @register_skill, SKILL_REGISTRY, the 11-affordance vocabulary. |
core/predicates.py | Predicate-DSL compiler. |
core/env.py | 19 Gym envs (single-skill + bundle). |
core/scene.py | SceneBuilders (data-driven, no per-asset branches). |
core/env_mixins.py | EvalDREnvMixin — camera / light jitter helpers. |
utils/task_graph.py | TaskGraph dataclass + YAML loader + runner. |
utils/eval_metrics.py | EpisodeResult, EvalSummary, compute_smoothness. |
utils/eval_sweep.py | dr_sweep + standard_dr_sweeps with AUSC. |
utils/eval_setup.py | make_eval_env — dispatches single-skill ↔ task-graph mode. |