Treni Experiment Docs

Thesis

Paper Objectives and Thesis

Results

Findings Changelog Leaderboard Routing Comparison

Artifacts

Canonical G5 Artifact Set Benchmark Status Raw Artifacts

Roadmap

TODO

Live execution checklist and next actions.

Priority Order

Current Checklist

Track A: Cold/Hot Foundations

True TTFT instrumentation in runtime request path.
3x cold-first-hit repeatability set (G5).
3x warm steady-state repeatability set (G5).
Cold bottleneck fix: per-model tensor lookup index cache.
Cold rerun after fix with artifact pack.
Add stage-level cold decomposition metrics (tokenizer load, index build, tensor upload, first decode step).
Optimize model_tensor_index_build via fast tensor collect path and rerun 3x cold validation.
Rerun 3x cold validation after reverting regressed upload path (clean7) and confirm clean4 parity.
Add sub-stage upload instrumentation (decoder_tensor_convert, decoder_tensor_h2d, decoder_tensor_copy_total).
Add startup preload + tokenizer cache path to cut first-request upload/tokenizer overhead.
Wire request max_tokens through runtime HTTP path for token-parity comparisons.
Disable decoder per-step trace by default (TRENI_DEMO_TRACE opt-in).
Reduce remaining Qwen request-path TTFT/full gap vs vLLM (decoder per-token path).

Track B: Internal vs External Routing

Minimal external baseline harness.
Matched task set and budgets.
Internal vs external run and report (G5).
Add explicit failure-amplification tests (timeouts/retries under load).

Track B2: External Cold-Start Proof (Runtime vs PyTorch/vLLM/Ollama)

Implement unified cold-start harness with matched prompt/output budget.
Run G5 canonical set for all four backends.
Publish report with startup/TTFT/full-latency plus caveat tags (BF16 vs quantized).
Add canonical artifact links and leaderboard row for external-cold comparison.

Track C: Agentic Loop Capability

Freeze 3 loop scenarios and success criteria.
Implement evaluators (success rate + steps-to-convergence).
Run internal vs external loop benchmark.
Publish trace-backed capability report.

Expansion

Full A100 run set.
Full H100 run set.
Paper-grade figure/table package.

Immediate Next Actions

Run 3x repeatability for token-parity external cold benchmark (max_tokens=48) on G5.
Optimize Qwen decoder per-token step path to close request-path gap vs vLLM.
Add timeout/retry failure-amplification tests to internal-vs-external routing.

Raw Artifacts

Direct JSON and report files for each benchmark set.

On this page

Priority Order Current Checklist Track A: Cold/Hot Foundations Track B: Internal vs External Routing Track B2: External Cold-Start Proof (Runtime vs PyTorch/vLLM/Ollama)Track C: Agentic Loop Capability Expansion Immediate Next Actions