Experiment Logbook
Canonical experiment lanes, what they prove, and where each one lives.
Canonical Lanes
This page is the stable map of the project. Use it before diving into the raw changelog.
Lane 1: Product / MVP
Purpose:
- Prove the same-VM GPU agent works as a real product surface.
Current reference:
- AWS public console
- Same-VM runtime + Hermes + CPU tool worker
What it covers:
- direct runtime chat
- agent tool use
- SQLite
- RAG
- memory
- file upload
- STT / TTS
- embeddings / reranking
- ORPO control path
Primary references:
Lane 2: Runtime Foundations
Purpose:
- Prove the monolith runtime is fast and numerically correct.
What it covers:
- hot latency
- cold latency
- TTFT
- parity
- frontend/backend kernel comparisons
Primary references:
Lane 3: Routing Architecture
Purpose:
- Measure internal same-body routing against external hop-based routing.
Primary references:
Lane 4: Awareness / Reasoning
Purpose:
- Test uncertainty-aware generation and internal loop behavior.
Primary references:
Lane 5: Paper Package
Purpose:
- Keep the paper-grade tables and cross-hardware package separate from day-to-day iteration noise.
Primary references:
How To Read The Project
Recommended order:
- Index
- Objectives and thesis
- Experiment logbook
- Benchmark status
- Leaderboard
- Track B claim-safe table
- Findings changelog
- Raw artifacts
What Is Not Canonical
Do not treat these as primary proof unless explicitly linked from the canonical lanes:
- one-off exploratory reruns
- debug artifacts
- scratch UI probes
- abandoned comparison branches
- provider-capacity-blocked infrastructure attempts
Those belong in the scratch view, not in the main narrative.