Treni Experiment Docs

Foundation

Objectives and Thesis Paper Experiment Logbook

Canonical Results

Benchmark Status Leaderboard Routing Comparison Track B Claim-Safe Table Canonical G5 Artifact Set Paper Package

Detailed Logs

Findings Changelog Raw Artifacts

Scratch

Scratch Experiments

Roadmap

Experiment Logbook

Canonical experiment lanes, what they prove, and where each one lives.

Canonical Lanes

This page is the stable map of the project. Use it before diving into the raw changelog.

Lane 1: Product / MVP

Purpose:

Prove the same-VM GPU agent works as a real product surface.

Current reference:

AWS public console
Same-VM runtime + Hermes + CPU tool worker

What it covers:

direct runtime chat
agent tool use
SQLite
RAG
memory
file upload
STT / TTS
embeddings / reranking
ORPO control path

Primary references:

Benchmark status
TODO

Lane 2: Runtime Foundations

Purpose:

Prove the monolith runtime is fast and numerically correct.

What it covers:

hot latency
cold latency
TTFT
parity
frontend/backend kernel comparisons

Primary references:

Canonical G5 set
Leaderboard

Lane 3: Routing Architecture

Purpose:

Measure internal same-body routing against external hop-based routing.

Primary references:

Routing comparison
Track B claim-safe table

Lane 4: Awareness / Reasoning

Purpose:

Test uncertainty-aware generation and internal loop behavior.

Primary references:

Benchmark status
Findings changelog

Lane 5: Paper Package

Purpose:

Keep the paper-grade tables and cross-hardware package separate from day-to-day iteration noise.

Primary references:

Paper package

How To Read The Project

Recommended order:

Index
Objectives and thesis
Experiment logbook
Benchmark status
Leaderboard
Track B claim-safe table
Findings changelog
Raw artifacts

What Is Not Canonical

Do not treat these as primary proof unless explicitly linked from the canonical lanes:

one-off exploratory reruns
debug artifacts
scratch UI probes
abandoned comparison branches
provider-capacity-blocked infrastructure attempts

Those belong in the scratch view, not in the main narrative.

Paper

Entropy-Guided Loop: Achieving Reasoning through Uncertainty-Aware Generation

Benchmark Status

What was run already, what still remains, and why.

On this page

Canonical Lanes Lane 1: Product / MVP Lane 2: Runtime Foundations Lane 3: Routing Architecture Lane 4: Awareness / Reasoning Lane 5: Paper Package How To Read The Project What Is Not Canonical