Treni

Raw Artifacts

Direct JSON and report files for each benchmark set.

Paper-Grade Package (Phase 4, 2026-02-20)

Qwen3.5 Runtime Support + Probe Matrix (G5, 2026-03-06)

Qwen3.5 Contract Validation + One-Host Strict Matrix (AWS G5, 2026-03-07)

Qwen3.5 Deterministic Strict Lane + Repro Checks (AWS G5, 2026-03-08)

Qwen3.5 Sampled Strict Lane (AWS G5, 2026-03-08)

Qwen3.5 Thinking Strict Lane (AWS G5, 2026-03-08)

Same-VM Hermes MVP + ORPO Smoke (G5, 2026-03-06)

Same-VM Wrapper Recovery (AWS G5, 2026-03-07)

Phase 5 Real-Benchmark Awareness (G5, 2026-03-01)

Phase 5 Qwen3.5 Strict Runtime-vs-vLLM Matrix (G5, 2026-03-03)

Phase 5 Parse-Fix + Task-Stratified AB3 (G5, 2026-03-04)

Full-Depth FFN/U16 Follow-Up (G5, 2026-02-26)

Full-Depth FFN/Linear Probe Cycle (G5, 2026-02-27)

Full-Depth Late-Cycle Follow-Up (G5, 2026-02-27)

Full-Depth Request-Path Unlock Cycle (G5, 2026-02-27 Night)

Full-Depth FFN Follow-Up (G5, 2026-02-27 Late Night)

  • Consolidated summary:
  • TRENI_LINEAR_BATCHED2_USE_LT A/B:
    • runtime-only ab3: external_cold_layers36_preload64_ab3_batched2lt_{off,on}_s{1,2,3}_20260227T222830Z.json
    • runtime-vLLM ab2: external_cold_layers36_preload64_ab2_batched2lt_{off,on}_vllm_s{1,2}_20260227T222830Z.json
  • TRENI_DECODER_FFN_PROJ_U16_BATCHED2_F32_INPUT=1 + TRENI_DECODER_FFN_PROJ_U16_FAST_COMPUTE=1 A/B:
    • runtime-only ab8: external_cold_layers36_preload64_ab8_ffnprojf32fast_{off,on}_s{1..8}_20260227T223241Z.json
  • FFN fused path bias-deferral follow-up (TRENI_DECODER_FFN_PROJ_U16_FUSED) A/B:
    • runtime-only ab3: external_cold_layers36_preload64_ab3_ffnprojfused2_{off,on}_s{1,2,3}_20260227T223458Z.json
    • runtime-vLLM ab2: external_cold_layers36_preload64_ab2_ffnprojfused2_{off,on}_vllm_s{1,2}_20260227T223458Z.json

Fast-Profile + Mixed-Load Follow-Up (G5, 2026-02-28)

Parser-Fixed Full-Depth Follow-Up (G5, 2026-02-28)

Decode-Stage + Uncertainty A/B (G5, 2026-02-25)

Decode StepN Logits-Proj Probe Set (G5, 2026-02-25)

Phase 4 Lambda Full Reruns (A100/H100, 2026-02-20)

Cold Optimization Rerun (G5, 2026-02-17, Index Cache)

Cold Decomposition + Fast Tensor Collect (G5, 2026-02-17, clean4)

Exploratory Upload Experiment (Regressed, Reverted)

Revert Validation Set (G5, 2026-02-18, clean7)

Qwen Cold Upload GPU-Convert Ablation (G5, 2026-02-19)

H2D Staging Follow-up (G5, 2026-02-24)

Non-Staging H2D Chunk Matrix (G5, 2026-02-24)

Host Touch Prefault A/B (G5, 2026-02-24)

Upload Sync Diagnostic Probe (G5, 2026-02-24)

Host Register Sync Probe (G5, 2026-02-24)

Decoder Logits U16 Path A/B (G5, 2026-02-24)

Tensor Cache Hash A/B (G5, 2026-02-24)

Sampler Direct-Store A/B (G5, 2026-02-24)

Decoder Direct-Out Residual A/B (G5, 2026-02-24)

Custom Path Probe Consolidated Summary (G5, 2026-02-24)

Seq1 Multi-Head Attention A/B (G5, 2026-02-24)

AWS G5 Seq1 Fused Follow-Up (2026-02-22)

H100 Fused cuDNN SDPA Probe Pack (2026-02-22)

AWS G5 True Fused Frontend A/B (2026-02-22)

AWS G5 Frontend Miss-Cost Profile Probe (2026-02-22)

AWS G5 Frontend Repeatability Matrix (2026-02-22, repeats=3)

AWS G5 Frontend Claim-Strength Report (2026-02-22)

AWS G5 Frontend Matrix (No Preload, 2026-02-22)

AWS G5 Frontend Matrix (Startup Preload Prompt Set, 2026-02-22)

AWS G5 Frontend Miss-Mitigation Compare (2026-02-22)

AWS G5 Frontend Matrix (No Preload, Updated Canonical, 2026-02-22)

AWS G5 Frontend Matrix (Startup Preload Benchmark Queries, 2026-02-22)

AWS G5 Frontend Miss-Mitigation Compare (Updated Canonical, 2026-02-22)

AWS G5 Exact-Query Preload Probe (2026-02-22)

AWS G5 Frontend Matrix (No Preload + Shape Prebuild, 2026-02-22)

AWS G5 Frontend Compare (No Preload -> Shape Prebuild, 2026-02-22)

AWS G5 Frontend Matrix (No Preload + Shape Prebuild kv10, 2026-02-23)

AWS G5 Frontend Compare (Shape Prebuild kv16 -> kv10, 2026-02-23)

AWS G5 Frontend Matrix (Hybrid Shape Gate, 2026-02-23)

AWS G5 Frontend Compare (Tuned No-Gate -> Hybrid Shape Gate, 2026-02-23)

AWS G5 Frontend Matrix (Hybrid Shape Gate + Max Gate, 2026-02-23)

AWS G5 Frontend Compare (Hybrid Min-Gate -> Min+Max Gate, 2026-02-23)

AWS G5 Frontend Compare (Tuned kv10 -> Hybrid Min+Max Gate, 2026-02-23)

AWS G5 Frontend Matrix (Coverage Instrumented, 2026-02-23)

AWS G5 Frontend Coverage Profiles (Warm/Cold, 2026-02-23)

AWS G5 Shape Prebuild Probes (2026-02-22)

AWS G5 Frontend Miss Trace Probe (2026-02-22)

True-TTFT Rerun (G5, 2026-02-17)

Routing Comparison — Internal vs External (G5, 2026-02-17)

Routing Failure-Amplification Stress (G5, 2026-02-18)

Routing Matrix Expansion (G5, 2026-02-19)

Routing Cross-Host Pilot (2026-02-19)

Routing Split-Host Matrix (2026-02-19, Canonical Track B)

Routing Local-Control Fairness-Hardened Splits (2026-02-20, r8)

Commercial Root-Cause Grouped Analysis (2026-02-22)

Routing Internet Multi-Hop Matrix (Fly + Commercial APIs, 2026-02-20)

Routing Local Control Matrix (No Fly Scheduler Path, 2026-02-20)

Routing Local Control Matrix (No Fly Scheduler Path, Higher-N, 2026-02-20)

Routing Task-Family Parity Split (Local Control, Higher-N, 2026-02-20)

Track B Commercial Parity Appendix (2026-02-20)

External Cold Canonical (G5, 2026-02-18)

External Cold Optimized Runtime (G5, 2026-02-18)

External Cold Token-Parity (G5, 2026-02-18, max_tokens=48 wired)

External Cold Token-Parity + Decoder/Sampling Fix (G5, 2026-02-18)

External Cold Runtime-Only GPU-Convert Ablation (G5, 2026-02-19)

External Cold Runtime-vLLM Repeatability (G5, 2026-02-19, GPU-Convert Fix2)

External Cold All-Backend Repeatability (G5, 2026-02-19, GPU-Convert Fix2)

Runtime Cold Stability Sweep (G5, 2026-02-19, GPU-Convert Fix2)

External Cold All-Backend Repeatability (G5, 2026-02-19, GPU-Convert + Host-Prefetch Fix)

External Cold Repeatability (G5, 2026-02-24, Seq1 Multi-Head Default)

External Cold Repeatability (G5, 2026-02-24, Step0 Exp-Reuse Patch)

External Cold Repeatability (G5, 2026-02-24, Step0 Shared-Prob Follow-Up, Reverted)

Runtime Cold Stability Compare (G5, 2026-02-19, Host-Prefetch Fix)

Phase 3 Agentic Loops Canonical (G5, 2026-02-19)

Phase 3 Uncertainty Ablation (G5, 2026-02-19)

Full-Depth FFN Proj Fast Revalidation (G5, 2026-02-28 Late 8)

Full-Depth FFN Proj Fast Foundation Gate (G5, 2026-02-28 Late 9)

Runtime-vLLM AB5 Rerun (G5, 2026-02-28 Late 9)

Canonical Set (G5 Foundation Repeatability, 2026-02-16)

Previous G5 Set (2026-02-15)

Historical Set (T4, 2026-02-15)

Machine-Readable Index

On this page

Paper-Grade Package (Phase 4, 2026-02-20)Qwen3.5 Runtime Support + Probe Matrix (G5, 2026-03-06)Qwen3.5 Contract Validation + One-Host Strict Matrix (AWS G5, 2026-03-07)Qwen3.5 Deterministic Strict Lane + Repro Checks (AWS G5, 2026-03-08)Qwen3.5 Sampled Strict Lane (AWS G5, 2026-03-08)Qwen3.5 Thinking Strict Lane (AWS G5, 2026-03-08)Same-VM Hermes MVP + ORPO Smoke (G5, 2026-03-06)Same-VM Wrapper Recovery (AWS G5, 2026-03-07)Phase 5 Real-Benchmark Awareness (G5, 2026-03-01)Phase 5 Qwen3.5 Strict Runtime-vs-vLLM Matrix (G5, 2026-03-03)Phase 5 Parse-Fix + Task-Stratified AB3 (G5, 2026-03-04)Full-Depth FFN/U16 Follow-Up (G5, 2026-02-26)Full-Depth FFN/Linear Probe Cycle (G5, 2026-02-27)Full-Depth Late-Cycle Follow-Up (G5, 2026-02-27)Full-Depth Request-Path Unlock Cycle (G5, 2026-02-27 Night)Full-Depth FFN Follow-Up (G5, 2026-02-27 Late Night)Fast-Profile + Mixed-Load Follow-Up (G5, 2026-02-28)Parser-Fixed Full-Depth Follow-Up (G5, 2026-02-28)Decode-Stage + Uncertainty A/B (G5, 2026-02-25)Decode StepN Logits-Proj Probe Set (G5, 2026-02-25)Phase 4 Lambda Full Reruns (A100/H100, 2026-02-20)Cold Optimization Rerun (G5, 2026-02-17, Index Cache)Cold Decomposition + Fast Tensor Collect (G5, 2026-02-17, clean4)Exploratory Upload Experiment (Regressed, Reverted)Revert Validation Set (G5, 2026-02-18, clean7)Qwen Cold Upload GPU-Convert Ablation (G5, 2026-02-19)H2D Staging Follow-up (G5, 2026-02-24)Non-Staging H2D Chunk Matrix (G5, 2026-02-24)Host Touch Prefault A/B (G5, 2026-02-24)Upload Sync Diagnostic Probe (G5, 2026-02-24)Host Register Sync Probe (G5, 2026-02-24)Decoder Logits U16 Path A/B (G5, 2026-02-24)Tensor Cache Hash A/B (G5, 2026-02-24)Sampler Direct-Store A/B (G5, 2026-02-24)Decoder Direct-Out Residual A/B (G5, 2026-02-24)Custom Path Probe Consolidated Summary (G5, 2026-02-24)Seq1 Multi-Head Attention A/B (G5, 2026-02-24)AWS G5 Seq1 Fused Follow-Up (2026-02-22)H100 Fused cuDNN SDPA Probe Pack (2026-02-22)AWS G5 True Fused Frontend A/B (2026-02-22)AWS G5 Frontend Miss-Cost Profile Probe (2026-02-22)AWS G5 Frontend Repeatability Matrix (2026-02-22, repeats=3)AWS G5 Frontend Claim-Strength Report (2026-02-22)AWS G5 Frontend Matrix (No Preload, 2026-02-22)AWS G5 Frontend Matrix (Startup Preload Prompt Set, 2026-02-22)AWS G5 Frontend Miss-Mitigation Compare (2026-02-22)AWS G5 Frontend Matrix (No Preload, Updated Canonical, 2026-02-22)AWS G5 Frontend Matrix (Startup Preload Benchmark Queries, 2026-02-22)AWS G5 Frontend Miss-Mitigation Compare (Updated Canonical, 2026-02-22)AWS G5 Exact-Query Preload Probe (2026-02-22)AWS G5 Frontend Matrix (No Preload + Shape Prebuild, 2026-02-22)AWS G5 Frontend Compare (No Preload -> Shape Prebuild, 2026-02-22)AWS G5 Frontend Matrix (No Preload + Shape Prebuild kv10, 2026-02-23)AWS G5 Frontend Compare (Shape Prebuild kv16 -> kv10, 2026-02-23)AWS G5 Frontend Matrix (Hybrid Shape Gate, 2026-02-23)AWS G5 Frontend Compare (Tuned No-Gate -> Hybrid Shape Gate, 2026-02-23)AWS G5 Frontend Matrix (Hybrid Shape Gate + Max Gate, 2026-02-23)AWS G5 Frontend Compare (Hybrid Min-Gate -> Min+Max Gate, 2026-02-23)AWS G5 Frontend Compare (Tuned kv10 -> Hybrid Min+Max Gate, 2026-02-23)AWS G5 Frontend Matrix (Coverage Instrumented, 2026-02-23)AWS G5 Frontend Coverage Profiles (Warm/Cold, 2026-02-23)AWS G5 Shape Prebuild Probes (2026-02-22)AWS G5 Frontend Miss Trace Probe (2026-02-22)True-TTFT Rerun (G5, 2026-02-17)Routing Comparison — Internal vs External (G5, 2026-02-17)Routing Failure-Amplification Stress (G5, 2026-02-18)Routing Matrix Expansion (G5, 2026-02-19)Routing Cross-Host Pilot (2026-02-19)Routing Split-Host Matrix (2026-02-19, Canonical Track B)Routing Local-Control Fairness-Hardened Splits (2026-02-20, r8)Commercial Root-Cause Grouped Analysis (2026-02-22)Routing Internet Multi-Hop Matrix (Fly + Commercial APIs, 2026-02-20)Routing Local Control Matrix (No Fly Scheduler Path, 2026-02-20)Routing Local Control Matrix (No Fly Scheduler Path, Higher-N, 2026-02-20)Routing Task-Family Parity Split (Local Control, Higher-N, 2026-02-20)Track B Commercial Parity Appendix (2026-02-20)External Cold Canonical (G5, 2026-02-18)External Cold Optimized Runtime (G5, 2026-02-18)External Cold Token-Parity (G5, 2026-02-18, max_tokens=48 wired)External Cold Token-Parity + Decoder/Sampling Fix (G5, 2026-02-18)External Cold Runtime-Only GPU-Convert Ablation (G5, 2026-02-19)External Cold Runtime-vLLM Repeatability (G5, 2026-02-19, GPU-Convert Fix2)External Cold All-Backend Repeatability (G5, 2026-02-19, GPU-Convert Fix2)Runtime Cold Stability Sweep (G5, 2026-02-19, GPU-Convert Fix2)External Cold All-Backend Repeatability (G5, 2026-02-19, GPU-Convert + Host-Prefetch Fix)External Cold Repeatability (G5, 2026-02-24, Seq1 Multi-Head Default)External Cold Repeatability (G5, 2026-02-24, Step0 Exp-Reuse Patch)External Cold Repeatability (G5, 2026-02-24, Step0 Shared-Prob Follow-Up, Reverted)Runtime Cold Stability Compare (G5, 2026-02-19, Host-Prefetch Fix)Phase 3 Agentic Loops Canonical (G5, 2026-02-19)Phase 3 Uncertainty Ablation (G5, 2026-02-19)Full-Depth FFN Proj Fast Revalidation (G5, 2026-02-28 Late 8)Full-Depth FFN Proj Fast Foundation Gate (G5, 2026-02-28 Late 9)Runtime-vLLM AB5 Rerun (G5, 2026-02-28 Late 9)Canonical Set (G5 Foundation Repeatability, 2026-02-16)Previous G5 Set (2026-02-15)Historical Set (T4, 2026-02-15)Machine-Readable Index