Treni

Experiment Logbook

Canonical experiment lanes, what they prove, and where each one lives.

Canonical Lanes

This page is the stable map of the project. Use it before diving into the raw changelog.

Lane 1: Product / MVP

Purpose:

  • Prove the same-VM GPU agent works as a real product surface.

Current reference:

  • AWS public console
  • Same-VM runtime + Hermes + CPU tool worker

What it covers:

  • direct runtime chat
  • agent tool use
  • SQLite
  • RAG
  • memory
  • file upload
  • STT / TTS
  • embeddings / reranking
  • ORPO control path

Primary references:

Lane 2: Runtime Foundations

Purpose:

  • Prove the monolith runtime is fast and numerically correct.

What it covers:

  • hot latency
  • cold latency
  • TTFT
  • parity
  • frontend/backend kernel comparisons

Primary references:

Lane 3: Routing Architecture

Purpose:

  • Measure internal same-body routing against external hop-based routing.

Primary references:

Lane 4: Awareness / Reasoning

Purpose:

  • Test uncertainty-aware generation and internal loop behavior.

Primary references:

Lane 5: Paper Package

Purpose:

  • Keep the paper-grade tables and cross-hardware package separate from day-to-day iteration noise.

Primary references:

How To Read The Project

Recommended order:

  1. Index
  2. Objectives and thesis
  3. Experiment logbook
  4. Benchmark status
  5. Leaderboard
  6. Track B claim-safe table
  7. Findings changelog
  8. Raw artifacts

What Is Not Canonical

Do not treat these as primary proof unless explicitly linked from the canonical lanes:

  • one-off exploratory reruns
  • debug artifacts
  • scratch UI probes
  • abandoned comparison branches
  • provider-capacity-blocked infrastructure attempts

Those belong in the scratch view, not in the main narrative.

On this page