Raw Artifacts
Direct JSON and report files for each benchmark set.
Paper-Grade Package (Phase 4, 2026-02-20)
latest/package_summary.jsonlatest/paper_package.mdlatest/tables/phase2_cold_hot.csvlatest/tables/routing_matrix.csvlatest/tables/c2_runtime_native.csvlatest/tables/phase3_loops_baseline.csvlatest/tables/phase3_loops_stress.csvlatest/tables/external_cold_allbackends_g5.csvlatest/manuscript/figure_manifest.jsonlatest/manuscript/captions.mdlatest/manuscript/claims.mdlatest/manuscript/figures/fig1_warm_latency.mmdlatest/manuscript/figures/fig2_routing_stress.mmdlatest/manuscript/figures/fig3_loop_success.mmdlatest/manuscript/figures/fig4_c2_deltas.mmdlatest/manuscript/figures/fig5_external_cold_full.mmd
Qwen3.5 Runtime Support + Probe Matrix (G5, 2026-03-06)
- Full tokenizer audit:
- Base runtime smoke:
- Extended warm probes:
- Consolidated probe matrix:
Qwen3.5 Contract Validation + One-Host Strict Matrix (AWS G5, 2026-03-07)
- Active tokenizer audit:
- Active runtime smoke:
- Isolated semantic A/B:
- Sequential one-host strict matrix summary:
- initial summary:
phase5_qwen35_remote_strict_matrix_20260307T173807Z.json - late summary after prefix-cache + TTFT fix:
phase5_qwen35_remote_strict_matrix_20260307T191653Z.json - late summary markdown:
phase5_qwen35_remote_strict_matrix_20260307T191653Z.md - fast-env AB3 summary:
phase5_qwen35_remote_strict_matrix_20260307T231500Z.json - hybrid linear-batch AB3 summary:
phase5_qwen35_remote_strict_matrix_20260307T235503Z.json - hybrid full-batch AB3 summary:
phase5_qwen35_remote_strict_matrix_20260308T000429Z.json - fast-sampler AB3 summary:
phase5_qwen35_remote_strict_matrix_20260308T003749Z.json - tie-stable fast-sampler AB3 summary:
phase5_qwen35_remote_strict_matrix_20260308T004758Z.json - tie-stable one-seed spot summary:
phase5_qwen35_remote_strict_matrix_20260308T004511Z.json
- initial summary:
- Seed-level strict runtime runs:
- initial set:
- late post-fix set:
- full-batch set:
- fast-sampler set:
- tie-stable fast-sampler set:
- Seed-level strict vLLM runs:
- initial set:
- late post-fix set:
- full-batch set:
- fast-sampler set:
- tie-stable fast-sampler set:
- Focused GPQA runtime profiles:
- linear-batch:
q35-gpqa-profile-aws-linearbatch_20260307T235448Z.json - full-batch:
q35-gpqa-profile-aws-fullbatch_20260308T000420Z.json - full-batch + step profile:
q35-gpqa-profile-aws-fullbatch-stepn_20260308T001322Z.json - stop-chunk probe:
q35-gpqa-profile-aws-stopchunk8_20260308T003422Z.json - fast-sampler probe:
q35-gpqa-profile-aws-samplefast1_20260308T003727Z.json
- linear-batch:
Qwen3.5 Deterministic Strict Lane + Repro Checks (AWS G5, 2026-03-08)
- Deterministic strict matrix summary:
- Deterministic runtime seed-level runs:
phase5_awareness_realbench_qwen35-remote-strict-matrix-deterministic-v1-runtime-s7_20260308T204253Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-deterministic-v1-runtime-s17_20260308T204313Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-deterministic-v1-runtime-s27_20260308T204330Z.json
- Deterministic vLLM seed-level runs:
phase5_awareness_realbench_qwen35-remote-strict-matrix-deterministic-v1-vllm-s7_20260308T204434Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-deterministic-v1-vllm-s17_20260308T204511Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-deterministic-v1-vllm-s27_20260308T204535Z.json
- Runtime reproducibility probes:
- historical sampled lane drift (non-canonical, traced to harness bug):
- deterministic lane stable:
- sampled lane fixed:
Qwen3.5 Sampled Strict Lane (AWS G5, 2026-03-08)
- Sampled strict matrix summary:
phase5_qwen35_remote_strict_matrix_20260308T220806Z.jsonphase5_qwen35_remote_strict_matrix_20260308T220806Z.md- repeatability confirmation:
- larger-N confirmation:
- latest larger-N confirmation:
- Sampled runtime seed-level runs:
phase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-v1-runtime-s7_20260308T220812Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-v1-runtime-s17_20260308T220844Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-v1-runtime-s27_20260308T220912Z.json
- Sampled vLLM seed-level runs:
phase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-v1-vllm-s7_20260308T221035Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-v1-vllm-s17_20260308T221117Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-v1-vllm-s27_20260308T221149Z.json- larger-N run set:
phase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-s16-v1-runtime-s7_20260308T222646Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-s16-v1-runtime-s17_20260308T222742Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-s16-v1-runtime-s27_20260308T222836Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-s16-v1-vllm-s7_20260308T223046Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-s16-v1-vllm-s17_20260308T223159Z.jsonphase5_awareness_realbench_qwen35-remote-strict-matrix-sampled-seedfix-s16-v1-vllm-s27_20260308T223308Z.json
- latest larger-N run set:
phase5_awareness_realbench_qwen35-remote-strict-sampled16-runtime-s7_20260308T235020Z.jsonphase5_awareness_realbench_qwen35-remote-strict-sampled16-runtime-s17_20260308T235107Z.jsonphase5_awareness_realbench_qwen35-remote-strict-sampled16-runtime-s27_20260308T235152Z.jsonphase5_awareness_realbench_qwen35-remote-strict-sampled16-vllm-s7_20260308T235322Z.jsonphase5_awareness_realbench_qwen35-remote-strict-sampled16-vllm-s17_20260308T235423Z.jsonphase5_awareness_realbench_qwen35-remote-strict-sampled16-vllm-s27_20260308T235518Z.json
Qwen3.5 Thinking Strict Lane (AWS G5, 2026-03-08)
- Initial thinking strict matrix:
- Budget-fixed thinking strict matrix:
- Finalized thinking strict matrix:
- Lower-cost finalized thinking strict matrix:
- GSM8K-only finalized thinking matrix:
phase5_qwen35_remote_strict_matrix_20260309T012953Z.jsonphase5_qwen35_remote_strict_matrix_20260309T012953Z.md- larger-N follow-up:
- larger-N second follow-up:
- AIME25 isolated finalized thinking matrix:
phase5_qwen35_remote_strict_matrix_20260309T012346Z.jsonphase5_qwen35_remote_strict_matrix_20260309T012346Z.md- second-thinking recovery attempt:
- patched prompt follow-up:
- One-example GPQA thinking probes:
- Finalized thinking seed runs:
phase5_awareness_realbench_qwen35-thinking-strict-finalize-ab3-runtime-s7_20260308T235634Z.jsonphase5_awareness_realbench_qwen35-thinking-strict-finalize-ab3-runtime-s17_20260309T000622Z.jsonphase5_awareness_realbench_qwen35-thinking-strict-finalize-ab3-runtime-s27_20260309T001609Z.jsonphase5_awareness_realbench_qwen35-thinking-strict-finalize-ab3-vllm-s7_20260309T002640Z.jsonphase5_awareness_realbench_qwen35-thinking-strict-finalize-ab3-vllm-s17_20260309T003141Z.jsonphase5_awareness_realbench_qwen35-thinking-strict-finalize-ab3-vllm-s27_20260309T003638Z.json
- Lower-cost finalized thinking seed runs:
phase5_awareness_realbench_qwen35-thinking-gpqa256-ab3-runtime-s7_20260309T010358Z.jsonphase5_awareness_realbench_qwen35-thinking-gpqa256-ab3-runtime-s17_20260309T010552Z.jsonphase5_awareness_realbench_qwen35-thinking-gpqa256-ab3-runtime-s27_20260309T010746Z.jsonphase5_awareness_realbench_qwen35-thinking-gpqa256-ab3-vllm-s7_20260309T011026Z.jsonphase5_awareness_realbench_qwen35-thinking-gpqa256-ab3-vllm-s17_20260309T011234Z.jsonphase5_awareness_realbench_qwen35-thinking-gpqa256-ab3-vllm-s27_20260309T011437Z.json
- GSM8K-only finalized thinking seed runs:
phase5_awareness_realbench_qwen35-thinking-gsm8k-ab3-runtime-s7_20260309T012958Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-ab3-runtime-s17_20260309T013100Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-ab3-runtime-s27_20260309T013201Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-ab3-vllm-s7_20260309T013349Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-ab3-vllm-s17_20260309T013457Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-ab3-vllm-s27_20260309T013601Z.json- larger-N run set:
phase5_awareness_realbench_qwen35-thinking-gsm8k-s16-ab3-runtime-s7_20260309T013913Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s16-ab3-runtime-s17_20260309T014112Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s16-ab3-runtime-s27_20260309T014312Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s16-ab3-vllm-s7_20260309T014557Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s16-ab3-vllm-s17_20260309T014809Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s16-ab3-vllm-s27_20260309T015014Z.json
- larger-N second run set:
phase5_awareness_realbench_qwen35-thinking-gsm8k-s32-ab3-runtime-s7_20260310T022354Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s32-ab3-runtime-s17_20260310T022749Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s32-ab3-runtime-s27_20260310T023144Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s32-ab3-vllm-s7_20260310T023624Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s32-ab3-vllm-s17_20260310T024036Z.jsonphase5_awareness_realbench_qwen35-thinking-gsm8k-s32-ab3-vllm-s27_20260310T024447Z.json
- AIME25 isolated run set:
phase5_awareness_realbench_qwen35-thinking-aime512-s1-runtime-s7_20260309T012351Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-s1-vllm-s7_20260309T012722Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-r2-s1-runtime-s7_20260309T020700Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-r2-s1-vllm-s7_20260309T021029Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-r3-s1-runtime-s7_20260309T021337Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-r3-s1-vllm-s7_20260309T021718Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-s1-r2-runtime-s7_20260310T021738Z.jsonphase5_awareness_realbench_qwen35-thinking-aime512-s1-r2-vllm-s7_20260310T022107Z.json
Same-VM Hermes MVP + ORPO Smoke (G5, 2026-03-06)
- Hermes same-VM Qwen3.5 smoke:
- Hermes same-VM ORPO smoke launch:
- Same-VM local worker job-status proof:
Same-VM Wrapper Recovery (AWS G5, 2026-03-07)
- Recovered end-to-end same-VM wrapper:
- Latest wrapper-driven smoke sub-artifacts:
Phase 5 Real-Benchmark Awareness (G5, 2026-03-01)
- Canonical diagnostic run (
r5, current reference for Phase 5 debugging cycle): - Template A/B diagnostic run (
r6, non-canonical due regression): - HF-reference parity run on the same sampled set:
- Tokenizer/prompt parity debug snapshots:
Phase 5 Qwen3.5 Strict Runtime-vs-vLLM Matrix (G5, 2026-03-03)
- Matrix summaries:
phase5_qwen35_runtime_vs_vllm_matrix_20260303T104038Z.jsonphase5_qwen35_runtime_vs_vllm_matrix_20260303T104038Z.mdphase5_qwen35_runtime_vs_vllm_matrix_20260302T221546Z.jsonphase5_qwen35_runtime_vs_vllm_matrix_20260302T221546Z.mdphase5_qwen35_runtime_vs_vllm_matrix_20260302T222013Z.jsonphase5_qwen35_runtime_vs_vllm_matrix_20260302T222013Z.md
- Seed-level runtime runs (
20260303T104038Z, Arm A-only): - Seed-level vLLM runs (
20260303T104038Z, Arm A-only): - Seed-level runtime runs (
20260302T222013Z): - Seed-level vLLM runs (
20260302T222013Z): - Post-fix smoke rerun (
qnorm-check1):
Phase 5 Parse-Fix + Task-Stratified AB3 (G5, 2026-03-04)
- Parse-hardening sanity artifact:
- Paired AB3 summary (
gpqa_diamond+ifeval, Arm A,16/task, seeds7/17/27,request_logprobs=false): - Runtime seed artifacts:
- vLLM seed artifacts:
Full-Depth FFN/U16 Follow-Up (G5, 2026-02-26)
- cuBLASLt workspace probe:
- FFN activation-to-u16 fused trial A/B:
- Runtime-only 3-seed A/B:
external_cold_layers36_preload64_ab3_ffnactu16_off_s1_20260226T105916Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_off_s2_20260226T105921Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_off_s3_20260226T105926Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_on_s1_20260226T105932Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_on_s2_20260226T105937Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_on_s3_20260226T105942Z.json
- Runtime-vLLM 3-seed A/B:
external_cold_layers36_preload64_ab3_ffnactu16_vllm_off_s1_20260226T110414Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_vllm_off_s2_20260226T110505Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_vllm_off_s3_20260226T110555Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_vllm_on_s1_20260226T110115Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_vllm_on_s2_20260226T110209Z.jsonexternal_cold_layers36_preload64_ab3_ffnactu16_vllm_on_s3_20260226T110259Z.json
- Default-on sanity:
- Week 3 parity reports:
Full-Depth FFN/Linear Probe Cycle (G5, 2026-02-27)
- FFN proj fused trial:
- FFN proj fused runtime-only 3-seed A/B:
external_cold_layers36_preload64_ab3_ffnprojfused_off_s1_20260227T174033Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_off_s2_20260227T174039Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_off_s3_20260227T174044Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_on_s1_20260227T174049Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_on_s2_20260227T174055Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_on_s3_20260227T174100Z.json
- FFN proj fused runtime-vLLM 3-seed A/B:
external_cold_layers36_preload64_ab3_ffnprojfused_vllm_off_s1_20260227T174328Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_vllm_off_s2_20260227T174419Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_vllm_off_s3_20260227T174509Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_vllm_on_s1_20260227T174600Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_vllm_on_s2_20260227T174650Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfused_vllm_on_s3_20260227T174741Z.json
- Linear compute probes (runtime-only 3-seed):
TRENI_LINEAR_U16_FAST_COMPUTE:TRENI_LINEAR_LT_WORKSPACE_MB:TRENI_LINEAR_USE_LT:
- Linear fast-compute promotion pack (
2026-02-28):- warm+mixed AB3 summary:
- cold AB3 summary:
- warm+mixed AB5 summary:
- post-promotion same-window sanity A/B:
- strict parity:
- canonical foundation rerun pack (warm/cold/mixed AB3):
- runtime-vLLM same-window full-depth AB3 rerun:
aws_speedpass_runtime_vllm_linearfastdefault_ab3_20260228T134630Z/summary_ab3.jsonaws_speedpass_runtime_vllm_linearfastdefault_ab3_20260228T134630Z/external_cold_layers36_preload64_ab3_linearfastdefault_s1.jsonaws_speedpass_runtime_vllm_linearfastdefault_ab3_20260228T134630Z/external_cold_layers36_preload64_ab3_linearfastdefault_s2.jsonaws_speedpass_runtime_vllm_linearfastdefault_ab3_20260228T134630Z/external_cold_layers36_preload64_ab3_linearfastdefault_s3.json
- runtime-vLLM same-window higher-N AB5 rerun:
aws_speedpass_runtime_vllm_newdefaults_ab5_20260228T145502Z/summary_ab5.jsonaws_speedpass_runtime_vllm_newdefaults_ab5_20260228T145502Z/summary_ab5.mdaws_speedpass_runtime_vllm_newdefaults_ab5_20260228T145502Z/compare_vs_prev_linearfastdefault_ab3.jsonaws_speedpass_runtime_vllm_newdefaults_ab5_20260228T145502Z/compare_vs_prev_linearfastdefault_ab3.md
- full-depth post-AB5 gate sweep + delayed-Lt AB3 confirmation:
- tuned delayed-Lt slow-gate rescue AB2:
- FFN proj
f32_inputfallback-loop diagnosis + patch validation:- profile pack:
- forced-Lt diagnostics:
- canonical AB2 re-gate:
- batched2-Lt fast-fallback isolation A/B + post-revert parity:
- H2D chunk default-promotion pack (
TRENI_TENSOR_H2D_CHUNK_MB):h2d_chunk_cold_ab3_20260228T142114Z/summary_ab3.jsonh2d_chunk_warm_mixed_ab3_20260228T142258Z/summary_ab3.jsonweek3_parity_report_h2dchunk0_default_20260228T142805Z.jsonweek3_runtime_h2dchunk0_default_20260228T142805Z.logh2d_chunk_default_vs64_sanity_20260228T142845Z/cold_default.jsonh2d_chunk_default_vs64_sanity_20260228T142845Z/cold_force64.jsonh2d_chunk_default_vs64_sanity_20260228T142845Z/mixed_default.jsonh2d_chunk_default_vs64_sanity_20260228T142845Z/mixed_force64.json
- Lt fail-cache follow-up (shape-scoped fallback, no global disable):
- runtime-only 3-seed:
- runtime-vLLM 3-seed:
- FFN proj batched2 lane (
TRENI_DECODER_FFN_PROJ_U16_BATCHED2) A/B:- runtime-only 3-seed:
external_cold_layers36_preload64_ab3_ffnprojbatch2_off_s1_20260227T182003Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_off_s2_20260227T182009Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_off_s3_20260227T182014Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_on_s1_20260227T182019Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_on_s2_20260227T182024Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_on_s3_20260227T182030Z.json
- runtime-vLLM 3-seed:
external_cold_layers36_preload64_ab3_ffnprojbatch2_vllm_off_s1_20260227T182100Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_vllm_off_s2_20260227T182150Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_vllm_off_s3_20260227T182241Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_vllm_on_s1_20260227T182331Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_vllm_on_s2_20260227T182422Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojbatch2_vllm_on_s3_20260227T182512Z.json
- default-on sanity:
- parity reports:
- runtime-only 3-seed:
- Full-depth refreshed profiles:
Full-Depth Late-Cycle Follow-Up (G5, 2026-02-27)
- Direct-out-hidden A/B (runtime-only 3-seed):
- Stage profile after direct-out-hidden default:
- Fixed-token runtime-vLLM compare (post direct-out-hidden):
- FFN act-bias fused A/B (runtime-only 3-seed):
- QKV split+bias fused A/B (runtime-only 3-seed):
- Fixed-token runtime-vLLM compare (post qkv split+bias promotion):
- Week 3 parity reports (late-cycle defaults):
Full-Depth Request-Path Unlock Cycle (G5, 2026-02-27 Night)
- Logits fast-compute hook (runtime-only 3-seed + summary):
external_cold_layers36_preload64_ab3_logitsfast_off_s1_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_off_s2_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_off_s3_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_on_s1_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_on_s2_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_on_s3_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_hook_summary_20260227T193439Z.jsonexternal_cold_layers36_preload64_ab3_logitsfast_hook_summary_20260227T193439Z.mdexternal_cold_layers36_preload64_logitsfast_off_vllm_s1_20260227T193632Z.jsonweek3_parity_report_logitsfast_hook_20260227T193756Z.json
- FFN proj fast-compute probe (runtime-only 3-seed + 8-seed):
external_cold_layers36_preload64_ab3_ffnprojfast_off_s1_20260227T194728Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfast_off_s2_20260227T194728Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfast_off_s3_20260227T194728Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfast_on_s1_20260227T194728Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfast_on_s2_20260227T194728Z.jsonexternal_cold_layers36_preload64_ab3_ffnprojfast_on_s3_20260227T194728Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s1_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s2_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s3_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s4_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s5_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s6_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s7_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_off_s8_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s1_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s2_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s3_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s4_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s5_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s6_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s7_20260227T195024Z.jsonexternal_cold_layers36_preload64_ab8_ffnprojfast_on_s8_20260227T195024Z.jsonweek3_parity_report_ffnprojfast_20260227T194853Z.json
- U16 tensor-cache unlock (runtime-only and runtime-vLLM claim-safe A/B):
external_cold_layers36_preload64_ab3_u16cachefix_default_s1_20260227T195511Z.jsonexternal_cold_layers36_preload64_ab3_u16cachefix_default_s2_20260227T195511Z.jsonexternal_cold_layers36_preload64_ab3_u16cachefix_default_s3_20260227T195511Z.jsonexternal_cold_layers36_preload64_ab3_u16cachefix_default_vllm_s1_20260227T195617Z.jsonexternal_cold_layers36_preload64_ab3_u16cachefix_default_vllm_s2_20260227T195617Z.jsonexternal_cold_layers36_preload64_ab3_u16cachefix_default_vllm_s3_20260227T195617Z.jsonexternal_cold_layers36_stageprofile_u16cachefix_default_20260227T195939Z.jsonexternal_cold_layers36_preload64_ab3_u16cache_off_s1_20260227T200112Z.jsonexternal_cold_layers36_preload64_ab3_u16cache_off_s2_20260227T200112Z.jsonexternal_cold_layers36_preload64_ab3_u16cache_off_s3_20260227T200112Z.jsonexternal_cold_layers36_preload64_ab3_u16cache_on_s1_20260227T200112Z.jsonexternal_cold_layers36_preload64_ab3_u16cache_on_s2_20260227T200112Z.jsonexternal_cold_layers36_preload64_ab3_u16cache_on_s3_20260227T200112Z.jsonexternal_cold_layers36_preload64_ab2_u16cache_off_vllm_s1_20260227T200242Z.jsonexternal_cold_layers36_preload64_ab2_u16cache_off_vllm_s2_20260227T200242Z.jsonexternal_cold_layers36_preload64_ab2_u16cache_on_vllm_s1_20260227T200242Z.jsonexternal_cold_layers36_preload64_ab2_u16cache_on_vllm_s2_20260227T200242Z.jsonexternal_cold_layers36_preload64_u16cache_claimsafe_summary_20260227T200242Z.jsonexternal_cold_layers36_preload64_u16cache_claimsafe_summary_20260227T200242Z.mdweek3_parity_report_u16cachefix_default_20260227T195625Z.jsonweek3_parity_report_u16cache_toggle_default_20260227T200652Z.json
Full-Depth FFN Follow-Up (G5, 2026-02-27 Late Night)
- Consolidated summary:
TRENI_LINEAR_BATCHED2_USE_LTA/B:- runtime-only
ab3:external_cold_layers36_preload64_ab3_batched2lt_{off,on}_s{1,2,3}_20260227T222830Z.json - runtime-vLLM
ab2:external_cold_layers36_preload64_ab2_batched2lt_{off,on}_vllm_s{1,2}_20260227T222830Z.json
- runtime-only
TRENI_DECODER_FFN_PROJ_U16_BATCHED2_F32_INPUT=1+TRENI_DECODER_FFN_PROJ_U16_FAST_COMPUTE=1A/B:- runtime-only
ab8:external_cold_layers36_preload64_ab8_ffnprojf32fast_{off,on}_s{1..8}_20260227T223241Z.json
- runtime-only
- FFN fused path bias-deferral follow-up (
TRENI_DECODER_FFN_PROJ_U16_FUSED) A/B:- runtime-only
ab3:external_cold_layers36_preload64_ab3_ffnprojfused2_{off,on}_s{1,2,3}_20260227T223458Z.json - runtime-vLLM
ab2:external_cold_layers36_preload64_ab2_ffnprojfused2_{off,on}_vllm_s{1,2}_20260227T223458Z.json
- runtime-only
Fast-Profile + Mixed-Load Follow-Up (G5, 2026-02-28)
- Fast-profile logits fast-compute A/B (
--layers 2, runtime-onlyab8):external_cold_layers2_logitsfast_ab8_summary_20260228T005529Z.jsonexternal_cold_layers2_logitsfast_ab8_summary_20260228T005529Z.mdexternal_cold_layers2_preload64_ab8_logitsfast2_{off,on}_s{1..8}_20260228T005529Z.json
- Mixed-load repeatability (
run_mode=mixed_load,http_runs=120, 3 runs):
Parser-Fixed Full-Depth Follow-Up (G5, 2026-02-28)
- Parser fix validation reruns (
phase2_runtime_benchmark.pydecimal timing parse corrected): - Full-depth FFN fast-compute warm AB3:
- Batched2 Lt strided warm AB3:
- FFN gate/up dual-bias fused add A/B:
- warm AB3 set directory:
ffn_bias_pair_ab3_20260228T020257Z - warm AB3 summary:
ffn_bias_pair_ab3_20260228T020257Z/summary.json - warm AB3 summary markdown:
ffn_bias_pair_ab3_20260228T020257Z/summary.md - cold follow-up set directory:
ffn_bias_pair_cold_ab2_20260228T020723Z - cold follow-up summary:
ffn_bias_pair_cold_ab2_20260228T020723Z/summary.json - cold follow-up summary markdown:
ffn_bias_pair_cold_ab2_20260228T020723Z/summary.md
- warm AB3 set directory:
- Batched2
seq1split-GEMM A/B:- warm AB3 set directory:
batched2_splitseq1_ab3_20260228T025841Z - warm AB3 summary:
batched2_splitseq1_ab3_20260228T025841Z/summary.json - warm AB3 summary markdown:
batched2_splitseq1_ab3_20260228T025841Z/summary.md - cold AB3 set directory:
batched2_splitseq1_cold_ab3_20260228T025841Z - cold AB3 summary:
batched2_splitseq1_cold_ab3_20260228T025841Z/summary.json - cold AB3 summary markdown:
batched2_splitseq1_cold_ab3_20260228T025841Z/summary.md
- warm AB3 set directory:
- Batched2 dup-input strided A/B:
- warm AB3 set directory:
batched2_dupinput_ab3_20260228T031816Z - warm AB3 summary:
batched2_dupinput_ab3_20260228T031816Z/summary.json - warm AB3 summary markdown:
batched2_dupinput_ab3_20260228T031816Z/summary.md - cold AB3 set directory:
batched2_dupinput_cold_ab3_20260228T031816Z - cold AB3 summary:
batched2_dupinput_cold_ab3_20260228T031816Z/summary.json - cold AB3 summary markdown:
batched2_dupinput_cold_ab3_20260228T031816Z/summary.md
- warm AB3 set directory:
- Batched2 dup-input v2 warm gate (duplication-kernel swap):
- gate set directory:
batched2_dupinput_v2warm_ab2_20260228T032741Z - gate summary JSON:
batched2_dupinput_v2warm_ab2_20260228T032741Z/summary_gate_ab2.json - gate summary markdown:
batched2_dupinput_v2warm_ab2_20260228T032741Z/summary_gate_ab2.md
- gate set directory:
- FFN proj u16 fused warm gate:
- gate set directory:
ffn_proj_u16_fused_gate_ab2_20260228T033524Z - gate summary JSON:
ffn_proj_u16_fused_gate_ab2_20260228T033524Z/summary_gate_ab2.json - gate summary markdown:
ffn_proj_u16_fused_gate_ab2_20260228T033524Z/summary_gate_ab2.md
- gate set directory:
- FFN proj batched2 f32-input warm gate:
- gate set directory:
ffn_proj_batched2_f32input_gate_ab2_20260228T033758Z - gate summary JSON:
ffn_proj_batched2_f32input_gate_ab2_20260228T033758Z/summary_gate_ab2.json - gate summary markdown:
ffn_proj_batched2_f32input_gate_ab2_20260228T033758Z/summary_gate_ab2.md
- gate set directory:
- Linear u16 compute16f warm gate:
- gate set directory:
linear_u16_compute16f_gate_ab2_20260228T034412Z - gate summary JSON:
linear_u16_compute16f_gate_ab2_20260228T034412Z/summary_gate_ab2.json - gate summary markdown:
linear_u16_compute16f_gate_ab2_20260228T034412Z/summary_gate_ab2.md
- gate set directory:
- FFN gate/up contiguous pair-pack probe:
- AB3 set directory:
ffn_pair_pack_gate_ab2_20260228T040616Z - AB3 summary JSON:
ffn_pair_pack_gate_ab2_20260228T040616Z/summary_ab3.json - AB3 summary markdown:
ffn_pair_pack_gate_ab2_20260228T040616Z/summary_ab3.md
- AB3 set directory:
- Batched2 Lt rerun on explicit-u16 lane:
- warm AB3 set directory:
batched2_use_lt_u16lane_gate_ab2_20260228T041041Z - warm AB3 summary JSON:
batched2_use_lt_u16lane_gate_ab2_20260228T041041Z/summary_ab3.json - warm AB3 summary markdown:
batched2_use_lt_u16lane_gate_ab2_20260228T041041Z/summary_ab3.md - cold AB2 set directory:
batched2_use_lt_u16lane_cold_ab2_20260228T041359Z - cold AB2 summary JSON:
batched2_use_lt_u16lane_cold_ab2_20260228T041359Z/summary_gate_ab2.json - cold AB2 summary markdown:
batched2_use_lt_u16lane_cold_ab2_20260228T041359Z/summary_gate_ab2.md - cold AB3 summary JSON:
batched2_use_lt_u16lane_cold_ab2_20260228T041359Z/summary_ab3.json - cold AB3 summary markdown:
batched2_use_lt_u16lane_cold_ab2_20260228T041359Z/summary_ab3.md
- warm AB3 set directory:
- FFN batched2 Lt prewarm probes:
- fixed-Lt warm AB2 set directory:
batched2_lt_prewarm_warm_ab2_20260228T042453Z - fixed-Lt warm AB2 summary JSON:
batched2_lt_prewarm_warm_ab2_20260228T042453Z/summary_gate_ab2.json - fixed-Lt warm AB2 summary markdown:
batched2_lt_prewarm_warm_ab2_20260228T042453Z/summary_gate_ab2.md - fixed-Lt cold AB3 set directory:
batched2_lt_prewarm_cold_ab3_20260228T042649Z - fixed-Lt cold AB3 summary JSON:
batched2_lt_prewarm_cold_ab3_20260228T042649Z/summary_ab3.json - fixed-Lt cold AB3 summary markdown:
batched2_lt_prewarm_cold_ab3_20260228T042649Z/summary_ab3.md - direct combo warm AB3 set directory:
batched2_lt_prewarm_combo_warm_ab2_20260228T042733Z - direct combo warm AB3 summary JSON:
batched2_lt_prewarm_combo_warm_ab2_20260228T042733Z/summary_ab3.json - direct combo warm AB3 summary markdown:
batched2_lt_prewarm_combo_warm_ab2_20260228T042733Z/summary_ab3.md - direct combo cold AB3 set directory:
batched2_lt_prewarm_combo_cold_ab3_20260228T042733Z - direct combo aggregate summary JSON:
batched2_lt_prewarm_combo_summary_20260228T042733Z.json
- fixed-Lt warm AB2 set directory:
- FFN down fast-compute promotion set:
- warm AB3 set directory:
ffn_down_fast_compute_gate_ab3_20260228T044546Z - warm AB3 summary JSON:
ffn_down_fast_compute_gate_ab3_20260228T044546Z/summary_ab3.json - cold AB3 set directory:
ffn_down_fast_compute_cold_ab3_20260228T044753Z - cold AB3 summary JSON:
ffn_down_fast_compute_cold_ab3_20260228T044753Z/summary_ab3.json - strict parity report:
week3_parity_report_ffn_down_fast_20260228T044846Z.json - strict parity runtime log:
week3_runtime_ffn_down_fast_20260228T044846Z.log
- warm AB3 set directory:
- Post-promotion FFN retest matrix:
- stacked-GEMM warm AB3:
batched2_stackedseq1_warm_ab3_20260228T103422Z/summary_ab3.json - stacked-GEMM cold AB3:
batched2_stackedseq1_cold_ab3_20260228T103422Z/summary_ab3.json - split-seq1 warm AB3:
batched2_splitseq1_retest_warm_ab3_20260228T045805Z/summary_ab3.json - split-seq1 cold AB3:
batched2_splitseq1_retest_cold_ab3_20260228T045805Z/summary_ab3.json - batched2 Lt warm AB3:
batched2_use_lt_retest_warm_ab3_20260228T050132Z/summary_ab3.json - batched2 Lt cold AB3:
batched2_use_lt_retest_cold_ab3_20260228T050132Z/summary_ab3.json - batched2 Lt+prewarm combo warm AB3:
batched2_lt_prewarm_combo_retest_warm_ab3_20260228T050415Z/summary_ab3.json - batched2 Lt+prewarm combo cold AB3:
batched2_lt_prewarm_combo_retest_cold_ab3_20260228T050415Z/summary_ab3.json - batched2 Lt+prewarm combo warm AB5 confirm:
batched2_lt_prewarm_combo_confirm_warm_ab5_20260228T050657Z/summary_ab5.json - batched2 Lt+prewarm combo cold AB5 confirm:
batched2_lt_prewarm_combo_confirm_cold_ab5_20260228T050657Z/summary_ab5.json - ffn-proj fast-compute warm AB3:
ffn_proj_fast_compute_retest_warm_ab3_20260228T051054Z/summary_ab3.json - ffn-proj fast-compute cold AB3:
ffn_proj_fast_compute_retest_cold_ab3_20260228T051054Z/summary_ab3.json - linear u16 fast-compute warm AB3:
linear_u16_fast_compute_retest_warm_ab3_20260228T051338Z/summary_ab3.json - linear u16 fast-compute cold AB3:
linear_u16_fast_compute_retest_cold_ab3_20260228T051338Z/summary_ab3.json
- stacked-GEMM warm AB3:
- Batched2 Lt delayed-activation probes:
5000mswarm AB3 summary:batched2_lt_enable_after_ms5000_warm_ab3_20260228T104525Z/summary_ab3.json5000mscold AB3 summary:batched2_lt_enable_after_ms5000_cold_ab3_20260228T104712Z/summary_ab3.json10000mswarm AB3 summary:batched2_lt_enable_after_ms10000_warm_ab3_20260228T105028Z/summary_ab3.json10000mscold AB3 summary:batched2_lt_enable_after_ms10000_cold_ab3_20260228T105213Z/summary_ab3.json- strict parity report:
week3_parity_report_batched2_lt_delay10000_20260228T105329Z.json - strict parity runtime log:
week3_runtime_batched2_lt_delay10000_20260228T105329Z.log - default-path strict parity report:
week3_parity_report_batched2_lt_defaultdelay_20260228T110825Z.json - default-path strict parity runtime log:
week3_runtime_batched2_lt_defaultdelay_20260228T110825Z.log - post-revert default-path strict parity report:
week3_parity_report_postrevert_defaults_20260228T115543Z.json - post-revert default-path strict parity runtime log:
week3_runtime_postrevert_defaults_20260228T115543Z.log - same-window mixed-load on/off AB3 summary:
mixed_load_defaultdelay_onoff_ab3_20260228T115010Z.json - same-window mixed-load delayed-on runs:
- same-window mixed-load forced-off runs:
- Foundation rerun pack (parser-default batched2 Lt, G5):
- warm AB3 set:
foundation_defaultdelay_warm_ab3_20260228T114315Z - warm AB3 summary:
foundation_defaultdelay_warm_ab3_20260228T114315Z/summary_ab3.json - cold AB3 set:
foundation_defaultdelay_cold_ab3_20260228T114315Z - cold AB3 summary:
foundation_defaultdelay_cold_ab3_20260228T114315Z/summary_ab3.json - mixed repeatability summary:
mixed_load_repeatability_summary_defaultdelay_20260228T114748Z.json - mixed vs prior compare:
mixed_load_repeatability_compare_defaultdelay_vs_prev_20260228T114748Z.json - pack summary JSON:
foundation_defaultdelay_pack_20260228T114315Z.json - pack summary markdown:
foundation_defaultdelay_pack_20260228T114315Z.md
- warm AB3 set:
- Runtime-only external-cold sanity for batched2 Lt strided (
layers=36, preload64):external_cold_layers36_preload64_batched2ltstrided_off_fix_20260228T011914Z.jsonexternal_cold_layers36_preload64_batched2ltstrided_on_fix_20260228T011914Z.jsonphase2_runtime_mixed_canonical_r1_20260228T005626Z.jsonphase2_runtime_mixed_canonical_r2_20260228T005626Z.jsonphase2_runtime_mixed_canonical_r3_20260228T005626Z.json
- Strict parity follow-up (canonical env):
Decode-Stage + Uncertainty A/B (G5, 2026-02-25)
- Non-step0 decode-stage profile:
- Uncertainty capture A/B (same host/profile):
- Runtime-vLLM cold rerun (same profile, uncertainty-off runtime):
Decode StepN Logits-Proj Probe Set (G5, 2026-02-25)
- Split baseline + revert:
lt16probe A/B:- fast16 GEMMEx probe:
- direct-u16-input A/B:
lt_u16workspace A/B:- Full-depth probes:
external_cold_stepn_split_layers36_20260225T083140Z.json(pool-alloc-failure run; warning case)external_cold_stepn_split_layers36_pool16g_20260225T083216Z.json(valid full-depth profiled run)external_cold_runtime_vllm_layers36_pool16g_20260225T083306Z.json(valid full-depth runtime-vLLM compare)external_cold_runtime_vllm_layers36_pool16g_preload_20260225T150209Z.jsonexternal_cold_runtime_vllm_layers36_pool16g_preload64_20260225T150410Z.jsonexternal_cold_layers36_preload_a2_u16direct_off_20260225T150710Z.jsonexternal_cold_layers36_preload_a2_u16direct_on_20260225T150715Z.jsonexternal_cold_layers36_hybrid_default_20260225T150806Z.jsonexternal_cold_layers36_hybrid_qk_20260225T150811Z.jsonexternal_cold_layers36_hybrid_pv_20260225T150816Z.jsonexternal_cold_layers36_hybrid_both_20260225T150821Z.jsonexternal_cold_layers36_preload64_ab2_base_20260225T1628Z.jsonexternal_cold_layers36_preload64_ab2_ffnu16_20260225T1628Z.jsonexternal_cold_layers36_preload64_ab3_base_s1_20260225T1640Z.jsonexternal_cold_layers36_preload64_ab3_base_s2_20260225T1640Z.jsonexternal_cold_layers36_preload64_ab3_base_s3_20260225T1640Z.jsonexternal_cold_layers36_preload64_ab3_attnffnu16_s1_20260225T1640Z.jsonexternal_cold_layers36_preload64_ab3_attnffnu16_s2_20260225T1640Z.jsonexternal_cold_layers36_preload64_ab3_attnffnu16_s3_20260225T1640Z.jsonexternal_cold_layers36_preload64_ab3_attnffnlogitsu16_s1_20260225T1700Z.jsonexternal_cold_layers36_preload64_ab3_attnffnlogitsu16_s2_20260225T1700Z.jsonexternal_cold_layers36_preload64_ab3_attnffnlogitsu16_s3_20260225T1700Z.jsonexternal_cold_layers36_preload64_sanity_revert_20260225T230834Z.jsonexternal_cold_layers36_trial_precastreuse_attnffnlogitsu16_20260225T231315Z.jsonexternal_cold_layers36_trial_precastreuse_tensorop_attnffnlogitsu16_20260225T231904Z.jsonexternal_cold_layers36_trial_precastreuse_u16lt_attnffnlogitsu16_20260225T232209Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_attnffnlogitsu16_s1_20260225T231505Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_attnffnlogitsu16_s2_20260225T231556Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_attnffnlogitsu16_s3_20260225T231648Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_attnffnlogitsu16_s1_20260225T232342Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_attnffnlogitsu16_s2_20260225T232433Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_attnffnlogitsu16_s3_20260225T232523Z.jsonexternal_cold_layers36_trial_precastreuse_u16lt_fast16_attnffnlogitsu16_20260225T233009Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_fast16_attnffnlogitsu16_s1_20260225T233232Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_fast16_attnffnlogitsu16_s2_20260225T233358Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_fast16_attnffnlogitsu16_s3_20260225T233449Z.jsonexternal_cold_layers36_preload64_ab3_precastreuse_u16lt_fast16_attnffnlogitsu16_s4_20260225T233630Z.jsonexternal_cold_layers36_trial_precastreuse_u16lt_postrevert_attnffnlogitsu16_20260225T233849Z.jsonexternal_cold_layers36_trial_residfuse_u16lt_attnffnlogitsu16_20260225T235827Z.jsonexternal_cold_layers36_preload64_ab3_residfuse_u16lt_attnffnlogitsu16_s1_20260226T000034Z.jsonexternal_cold_layers36_preload64_ab3_residfuse_u16lt_attnffnlogitsu16_s2_20260226T000130Z.jsonexternal_cold_layers36_preload64_ab3_residfuse_u16lt_attnffnlogitsu16_s3_20260226T000221Z.jsonexternal_cold_layers36_sanity_postltwsoff_residfuse_u16lt_20260226T093127Z.jsonexternal_cold_layers36_sanity_postbatch2revert_residfuse_u16lt_20260226T093905Z.jsonexternal_cold_layers36_sanity_postffnsubprof_residfuse_u16lt_20260226T094109Z.jsonexternal_cold_layers36_stepn_profile_ffnsub_20260226T094140Z.logexternal_cold_layers36_trial_gateupfused_off_20260226T100140Z.jsonexternal_cold_layers36_trial_gateupfused_on_20260226T100156Z.jsonexternal_cold_layers36_preload64_ab3_gateupfused_off_s1_20260226T100247Z.jsonexternal_cold_layers36_preload64_ab3_gateupfused_off_s2_20260226T100253Z.jsonexternal_cold_layers36_preload64_ab3_gateupfused_off_s3_20260226T100258Z.jsonexternal_cold_layers36_preload64_ab3_gateupfused_on_s1_20260226T100303Z.jsonexternal_cold_layers36_preload64_ab3_gateupfused_on_s2_20260226T100309Z.jsonexternal_cold_layers36_preload64_ab3_gateupfused_on_s3_20260226T100314Z.jsonexternal_cold_layers36_trial_qkvfused_off_20260226T101243Z.jsonexternal_cold_layers36_trial_qkvfused_on_20260226T101248Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_off_s1_20260226T101337Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_off_s2_20260226T101342Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_off_s3_20260226T101347Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_on_s1_20260226T101353Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_on_s2_20260226T101358Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_on_s3_20260226T101403Z.jsonexternal_cold_layers36_trial_qkvfused_vllm_off_20260226T101547Z.jsonexternal_cold_layers36_trial_qkvfused_vllm_on_20260226T101644Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_vllm_off_s1_20260226T101822Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_vllm_off_s2_20260226T101913Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_vllm_off_s3_20260226T102003Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_vllm_on_s1_20260226T102054Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_vllm_on_s2_20260226T102144Z.jsonexternal_cold_layers36_preload64_ab3_qkvfused_vllm_on_s3_20260226T102235Z.jsonexternal_cold_layers36_sanity_qkvfuseddefault_residfuse_u16lt_20260226T102520Z.jsonexternal_cold_layers36_stepn_profile_qkvfused_default_20260226T102646Z.logexternal_cold_layers36_runtime_log_qkvfused_default_20260226T102646Z.log
Phase 4 Lambda Full Reruns (A100/H100, 2026-02-20)
- A100:
- H100:
Cold Optimization Rerun (G5, 2026-02-17, Index Cache)
runtime_truettft_cold_indexcache_20260217T184439Z.jsonruntime_truettft_cold_indexcache_r1_20260217T184653Z.jsonruntime_truettft_cold_indexcache_r2_20260217T184821Z.jsonruntime_truettft_cold_indexcache_r3_20260217T184949Z.jsoncold_indexcache_summary_20260217T185316Z.jsoncold_indexcache_summary_20260217T185316Z.md
Cold Decomposition + Fast Tensor Collect (G5, 2026-02-17, clean4)
runtime_truettft_cold_decomp_clean4_r1_20260217T210921Z.jsonruntime_truettft_cold_decomp_clean4_r2_20260217T210930Z.jsonruntime_truettft_cold_decomp_clean4_r3_20260217T210938Z.jsoncold_stage_decomposition_summary_clean4_20260217T211039Z.jsoncold_stage_decomposition_summary_clean4_20260217T211039Z.mdruntime_truettft_warm_clean4_20260217T211130Z.json
Exploratory Upload Experiment (Regressed, Reverted)
runtime_truettft_cold_decomp_clean5_r1_20260217T211620Z.jsonruntime_truettft_cold_decomp_clean5_r2_20260217T211630Z.jsonruntime_truettft_cold_decomp_clean5_r3_20260217T211639Z.jsoncold_stage_decomposition_summary_clean5_20260217T211725Z.jsoncold_stage_decomposition_summary_clean5_20260217T211725Z.mdruntime_truettft_cold_decomp_clean7_sanity_20260217T212112Z.json
Revert Validation Set (G5, 2026-02-18, clean7)
runtime_truettft_cold_decomp_clean7_r1_20260218T001534Z.jsonruntime_truettft_cold_decomp_clean7_r2_20260218T001542Z.jsonruntime_truettft_cold_decomp_clean7_r3_20260218T001551Z.jsoncold_stage_decomposition_summary_clean7_20260218T033225Z.jsoncold_stage_decomposition_summary_clean7_20260218T033225Z.mdruntime_truettft_warm_clean7_20260218T092916Z.json
Qwen Cold Upload GPU-Convert Ablation (G5, 2026-02-19)
runtime_truettft_cold_substage_fix1_20260219T163331Z.json(pre-fix reference)runtime_truettft_cold_substage_gpuconvert_fix2_20260219T163823Z.json(GPU conversion path on)runtime_truettft_cold_substage_gpuconvert_ablation_off_20260219T164012Z.json(TRENI_TENSOR_CONVERT_GPU=0)cold_gpuconvert_ablation_qwen_20260219T164135Z.jsoncold_gpuconvert_ablation_qwen_20260219T164135Z.md
H2D Staging Follow-up (G5, 2026-02-24)
- 8-run A/B (
staging offvsstaging on min64/chunk32): - 3-run probe (
staging offvsstaging on min64/chunk128): - Consolidated summary:
Non-Staging H2D Chunk Matrix (G5, 2026-02-24)
- 8-run matrix (
TRENI_TENSOR_H2D_STAGING=0,TRENI_TENSOR_H2D_CHUNK_MB=0/64/128): - Consolidated summary:
Host Touch Prefault A/B (G5, 2026-02-24)
- 8-run A/B (
TRENI_TENSOR_HOST_TOUCH=0vsTRENI_TENSOR_HOST_TOUCH=1,TRENI_TENSOR_HOST_TOUCH_MIN_MB=256): - Consolidated summary:
Upload Sync Diagnostic Probe (G5, 2026-02-24)
- 3-run probe (
TRENI_TENSOR_UPLOAD_SYNC=0vs1): - Consolidated summary:
Host Register Sync Probe (G5, 2026-02-24)
- 3-run probe (
TRENI_TENSOR_HOST_REGISTER=0vs1, both withTRENI_TENSOR_UPLOAD_SYNC=1): - Consolidated summary:
Decoder Logits U16 Path A/B (G5, 2026-02-24)
- Valid run set (
TRENI_DECODER_LOGITS_U16_PATH=0vs1): - Consolidated summary:
- Fix2 pilot confirmation:
Tensor Cache Hash A/B (G5, 2026-02-24)
- Mixed-load A/B (
TRENI_TENSOR_CACHE_HASH=0vs1): - Warm 3-seed follow-up:
- off:
off_1.json,off_2.json,off_3.json - on:
on_1.json,on_2.json,on_3.json
- off:
Sampler Direct-Store A/B (G5, 2026-02-24)
- 3-seed warm A/B (
TRENI_SAMPLE_DIRECT_STORE=0vs1):- off:
off_1.json,off_2.json,off_3.json - on:
on_1.json,on_2.json,on_3.json
- off:
Decoder Direct-Out Residual A/B (G5, 2026-02-24)
- 3-seed warm A/B (
TRENI_DECODER_DIRECT_OUT_HIDDEN=0vs1):- off:
off_1.json,off_2.json,off_3.json - on:
on_1.json,on_2.json,on_3.json
- off:
Custom Path Probe Consolidated Summary (G5, 2026-02-24)
Seq1 Multi-Head Attention A/B (G5, 2026-02-24)
- Qwen warm+mixed 3-seed matrix:
- Bart warm 3-seed matrix:
- Cold step0 probe:
- Default-on sanity check:
- Consolidated summary:
AWS G5 Seq1 Fused Follow-Up (2026-02-22)
README.mdruntime_default.jsonruntime_qk_cublas.jsonruntime_pv_cublas.jsonruntime_both_cublas.jsonruntime_cold_default.jsonruntime_cold_pv_cublas.json
H100 Fused cuDNN SDPA Probe Pack (2026-02-22)
README.mdsdpa_shape_sweep.outsdpa_align_out5.txtsdpa_align_dbg5.logsdpa_one_out.txtsdpa_one_dbg5.logsdpa_heur_contig_out.txtsdpa_heur_contig_dbg5.logsdpa_enginecfg_probe_dbg5.log
AWS G5 True Fused Frontend A/B (2026-02-22)
attn_backend_ab_frontend_20260222T220111Z.jsonattn_backend_ab_frontend_20260222T220111Z.mdruntime_custom_cold.jsonruntime_custom_warm.jsonruntime_cudnn_sdpa_frontend_cold.jsonruntime_cudnn_sdpa_frontend_warm.jsonattn_backend_ab_frontend_20260222T220111Z.logtreni_phase2_http_fagi3kfz.logtreni_phase2_http_il17e7zo.logtreni_phase2_http_th8wn6nt.logtreni_phase2_http_utr1u029.log
AWS G5 Frontend Miss-Cost Profile Probe (2026-02-22)
README.mdcudnn_frontend_profile_warm.jsoncudnn_frontend_warm_noprofile_wu8.jsoncustom_warm_wu8.jsontreni_phase2_http_w375qwum.log
AWS G5 Frontend Repeatability Matrix (2026-02-22, repeats=3)
attn_backend_frontend_matrix_20260222T221948Z.jsonattn_backend_frontend_matrix_20260222T221948Z.mdmanifest.tsvattn_backend_frontend_matrix_20260222T221948Z.log
AWS G5 Frontend Claim-Strength Report (2026-02-22)
attn_backend_frontend_claim_report_20260222T222958Z.jsonattn_backend_frontend_claim_report_20260222T222958Z.md
AWS G5 Frontend Matrix (No Preload, 2026-02-22)
attn_backend_frontend_matrix_20260222T224315Z.jsonattn_backend_frontend_matrix_20260222T224315Z.mdmanifest.tsvattn_backend_frontend_matrix_20260222T224315Z.log
AWS G5 Frontend Matrix (Startup Preload Prompt Set, 2026-02-22)
attn_backend_frontend_matrix_20260222T224521Z.jsonattn_backend_frontend_matrix_20260222T224521Z.mdmanifest.tsvattn_backend_frontend_matrix_20260222T224521Z.log
AWS G5 Frontend Miss-Mitigation Compare (2026-02-22)
attn_backend_frontend_missmit_compare_20260222T225215Z.jsonattn_backend_frontend_missmit_compare_20260222T225215Z.md
AWS G5 Frontend Matrix (No Preload, Updated Canonical, 2026-02-22)
attn_backend_frontend_matrix_20260222T230445Z.jsonattn_backend_frontend_matrix_20260222T230445Z.mdmanifest.tsvattn_backend_frontend_matrix_20260222T230445Z.log
AWS G5 Frontend Matrix (Startup Preload Benchmark Queries, 2026-02-22)
attn_backend_frontend_matrix_20260222T231139Z.jsonattn_backend_frontend_matrix_20260222T231139Z.mdmanifest.tsvattn_backend_frontend_matrix_20260222T231139Z.log
AWS G5 Frontend Miss-Mitigation Compare (Updated Canonical, 2026-02-22)
attn_backend_frontend_missmit_compare_20260222T231335Z.jsonattn_backend_frontend_missmit_compare_20260222T231335Z.md
AWS G5 Exact-Query Preload Probe (2026-02-22)
AWS G5 Frontend Matrix (No Preload + Shape Prebuild, 2026-02-22)
attn_backend_frontend_matrix_20260222T233003Z.jsonattn_backend_frontend_matrix_20260222T233003Z.mdmanifest.tsvattn_backend_frontend_matrix_20260222T233003Z.log
AWS G5 Frontend Compare (No Preload -> Shape Prebuild, 2026-02-22)
attn_backend_frontend_missmit_compare_20260222T233116Z.jsonattn_backend_frontend_missmit_compare_20260222T233116Z.md
AWS G5 Frontend Matrix (No Preload + Shape Prebuild kv10, 2026-02-23)
attn_backend_frontend_matrix_20260223T000256Z.jsonattn_backend_frontend_matrix_20260223T000256Z.mdmanifest.tsvattn_backend_frontend_matrix_20260223T000256Z.log
AWS G5 Frontend Compare (Shape Prebuild kv16 -> kv10, 2026-02-23)
attn_backend_frontend_missmit_compare_20260223T000343Z.jsonattn_backend_frontend_missmit_compare_20260223T000343Z.md
AWS G5 Frontend Matrix (Hybrid Shape Gate, 2026-02-23)
attn_backend_frontend_matrix_20260223T001959Z.jsonattn_backend_frontend_matrix_20260223T001959Z.mdmanifest.tsvattn_backend_frontend_matrix_20260223T001959Z.log
AWS G5 Frontend Compare (Tuned No-Gate -> Hybrid Shape Gate, 2026-02-23)
attn_backend_frontend_missmit_compare_20260223T002153Z.jsonattn_backend_frontend_missmit_compare_20260223T002153Z.md
AWS G5 Frontend Matrix (Hybrid Shape Gate + Max Gate, 2026-02-23)
attn_backend_frontend_matrix_20260223T003611Z.jsonattn_backend_frontend_matrix_20260223T003611Z.mdmanifest.tsvattn_backend_frontend_matrix_20260223T003611Z.log
AWS G5 Frontend Compare (Hybrid Min-Gate -> Min+Max Gate, 2026-02-23)
attn_backend_frontend_missmit_compare_20260223T003734Z.jsonattn_backend_frontend_missmit_compare_20260223T003734Z.md
AWS G5 Frontend Compare (Tuned kv10 -> Hybrid Min+Max Gate, 2026-02-23)
attn_backend_frontend_missmit_compare_20260223T004006Z.jsonattn_backend_frontend_missmit_compare_20260223T004006Z.md
AWS G5 Frontend Matrix (Coverage Instrumented, 2026-02-23)
attn_backend_frontend_matrix_20260223T011158Z.jsonattn_backend_frontend_matrix_20260223T011158Z.mdmanifest.tsvattn_backend_frontend_matrix_20260223T011158Z.log
AWS G5 Frontend Coverage Profiles (Warm/Cold, 2026-02-23)
- Warm coverage profile set:
- Cold coverage profile set:
- KV-gated warm sweep set:
AWS G5 Shape Prebuild Probes (2026-02-22)
prebuild_seq1_nopreload_probe_20260222T232644Z.jsonprebuild_seq1_nopreload_probe_20260222T232644Z.logprebuild_startup_nopreload_probe_20260222T232932Z.jsonprebuild_startup_nopreload_probe_20260222T232932Z.logprebuild_startup10_nopreload_probe_20260222T235944Z.jsonprebuild_startup10_nopreload_probe_20260222T235944Z.logprebuild_startup8_nopreload_probe_20260223T000600Z.jsonprebuild_startup8_nopreload_probe_20260223T000600Z.logheur_probe_A_20260222T235800Z.jsonheur_probe_B_20260222T235827Z.jsonprebuild_hybrid10_nopreload_probe_20260223T001931Z.jsonprebuild_hybrid10_nopreload_probe_r1_20260223T002214Z.jsonprebuild_hybrid10_nopreload_probe_r2_20260223T002214Z.jsonprebuild_hybrid10_nopreload_probe_r3_20260223T002214Z.jsonhybrid_shape_sanity_20260223T002857Z.jsonhybrid_shape_sanity_maxgate_20260223T003453Z.json
AWS G5 Frontend Miss Trace Probe (2026-02-22)
README.mdattn_backend_ab_frontend_20260222T224739Z.jsonattn_backend_ab_frontend_20260222T224739Z.md
True-TTFT Rerun (G5, 2026-02-17)
runtime_truettft_cold_20260217T091104Z.jsonruntime_truettft_cold_20260217T092916Z.jsonruntime_truettft_cold_20260217T093339Z.jsonruntime_truettft_warm_20260217T092622Z.jsonfoundation_true_ttft_summary_20260217T093839Z.json
Routing Comparison — Internal vs External (G5, 2026-02-17)
routing_comparison_20260217T092003Z.mdrouting_comparison_20260217T092003Z.jsoninternal_vs_external_20260217T092003Z.json
Routing Failure-Amplification Stress (G5, 2026-02-18)
internal_vs_external_20260218T163643Z.jsonrouting_comparison_failure_20260218T163643Z.jsonrouting_comparison_failure_20260218T163643Z.md
Routing Matrix Expansion (G5, 2026-02-19)
routing_matrix_20260219T005022Z.jsonrouting_matrix_20260219T005022Z.mdinternal_vs_external_p00_baseline_20260219T005022Z.jsoninternal_vs_external_p01_fail_mild_20260219T005022Z.jsoninternal_vs_external_p02_timeout_mild_20260219T005022Z.jsoninternal_vs_external_p03_mixed_moderate_20260219T005022Z.jsoninternal_vs_external_p04_mixed_aggressive_20260219T005022Z.jsoninternal_vs_external_p05_mixed_aggressive_retry2_20260219T005022Z.json
Routing Cross-Host Pilot (2026-02-19)
internal_vs_external_crosshost_p00_baseline_20260219T144307Z.jsoninternal_vs_external_crosshost_p02_timeout_mild_20260219T145233Z.jsoninternal_vs_external_crosshost_p04_stress_20260219T144614Z.jsonrouting_matrix_crosshost_20260219T145513Z.jsonrouting_matrix_crosshost_20260219T145513Z.md
Routing Split-Host Matrix (2026-02-19, Canonical Track B)
internal_vs_external_splithost_p00_baseline_20260219T160729Z.jsoninternal_vs_external_splithost_p01_fail_mild_20260219T160729Z.jsoninternal_vs_external_splithost_p02_timeout_mild_20260219T160729Z.jsoninternal_vs_external_splithost_p03_mixed_moderate_20260219T160729Z.jsoninternal_vs_external_splithost_p04_mixed_aggressive_20260219T160729Z.jsoninternal_vs_external_splithost_p05_mixed_aggressive_retry2_20260219T160729Z.jsonrouting_matrix_splithost_20260219T161945Z.jsonrouting_matrix_splithost_20260219T161945Z.md
Routing Local-Control Fairness-Hardened Splits (2026-02-20, r8)
internal_vs_external_control_openai_model_only_fairness_r8_20260220T193120Z.jsoninternal_vs_external_control_openai_tool_only_fairness_r8_20260220T193246Z.jsoninternal_vs_external_control_openrouter_sonnet46_model_only_fairness_r8_20260220T193432Z.jsoninternal_vs_external_control_openrouter_sonnet46_tool_only_fairness_r8_20260220T193633Z.jsontrackb_commercial_parity_appendix_fairness_20260220T193757Z.jsontrackb_commercial_parity_appendix_fairness_20260220T193757Z.md
Commercial Root-Cause Grouped Analysis (2026-02-22)
Routing Internet Multi-Hop Matrix (Fly + Commercial APIs, 2026-02-20)
- OpenAI
gpt-5.2(repeatabilityruns=3/profile): - OpenRouter
openai/gpt-5.2(repeatabilityruns=3/profile): - OpenRouter
anthropic/claude-sonnet-4.6(repeatabilityruns=3/profile): - OpenAI
gpt-5.2(initial exploratory): - OpenRouter
openai/gpt-5.2(initial exploratory): - Local smoke:
Routing Local Control Matrix (No Fly Scheduler Path, 2026-02-20)
- OpenAI
gpt-5.2:internal_vs_external_control_openai_20260220T115444Z_p00_baseline.jsoninternal_vs_external_control_openai_20260220T115444Z_p02_timeout_mild.jsoninternal_vs_external_control_openai_20260220T115444Z_p04_mixed_aggressive.jsonrouting_matrix_control_openai_20260220T115444Z.jsonrouting_matrix_control_openai_20260220T115444Z.md
- OpenRouter
anthropic/claude-sonnet-4.6:internal_vs_external_control_openrouter_sonnet46_20260220T115815Z_p00_baseline.jsoninternal_vs_external_control_openrouter_sonnet46_20260220T115815Z_p02_timeout_mild.jsoninternal_vs_external_control_openrouter_sonnet46_20260220T115815Z_p04_mixed_aggressive.jsonrouting_matrix_control_openrouter_sonnet46_20260220T115815Z.jsonrouting_matrix_control_openrouter_sonnet46_20260220T115815Z.md
Routing Local Control Matrix (No Fly Scheduler Path, Higher-N, 2026-02-20)
- OpenAI
gpt-5.2(runs=8/profile):internal_vs_external_control_openai_r8_20260220T121820Z_p00_baseline.jsoninternal_vs_external_control_openai_r8_20260220T121820Z_p02_timeout_mild.jsoninternal_vs_external_control_openai_r8_20260220T121820Z_p04_mixed_aggressive.jsonrouting_matrix_control_openai_r8_20260220T121820Z.jsonrouting_matrix_control_openai_r8_20260220T121820Z.md
- OpenRouter
anthropic/claude-sonnet-4.6(runs=8/profile):internal_vs_external_control_openrouter_sonnet46_r8_20260220T122446Z_p00_baseline.jsoninternal_vs_external_control_openrouter_sonnet46_r8_20260220T122446Z_p02_timeout_mild.jsoninternal_vs_external_control_openrouter_sonnet46_r8_20260220T122446Z_p04_mixed_aggressive.jsonrouting_matrix_control_openrouter_sonnet46_r8_20260220T122446Z.jsonrouting_matrix_control_openrouter_sonnet46_r8_20260220T122446Z.md
Routing Task-Family Parity Split (Local Control, Higher-N, 2026-02-20)
- OpenAI
gpt-5.2: - OpenRouter
anthropic/claude-sonnet-4.6:
Track B Commercial Parity Appendix (2026-02-20)
trackb_commercial_parity_appendix_20260220T124509Z.jsontrackb_commercial_parity_appendix_20260220T124509Z.md
External Cold Canonical (G5, 2026-02-18)
External Cold Optimized Runtime (G5, 2026-02-18)
External Cold Token-Parity (G5, 2026-02-18, max_tokens=48 wired)
External Cold Token-Parity + Decoder/Sampling Fix (G5, 2026-02-18)
external_cold_g5_20260218T160120Z.jsonexternal_cold_g5_20260218T160120Z.mdexternal_cold_g5_20260218T160633Z.json(runtime+vLLM confirmation run)external_cold_g5_20260218T160633Z.md(runtime+vLLM confirmation run)external_cold_g5_20260218T160949Z.json(runtime+vLLM confirmation run)external_cold_g5_20260218T160949Z.md(runtime+vLLM confirmation run)external_cold_g5_gpuconvert_fix2_20260219T163917Z.json(runtime-only success; vLLM missing in this host env)
External Cold Runtime-Only GPU-Convert Ablation (G5, 2026-02-19)
external_cold_g5_gpuconvert_ablation_off_20260219T164354Z.json(TRENI_TENSOR_CONVERT_GPU=0)external_cold_g5_gpuconvert_ablation_on_20260219T164410Z.json(default GPU conversion on)external_cold_gpuconvert_ablation_runtime_20260219T164521Z.jsonexternal_cold_gpuconvert_ablation_runtime_20260219T164521Z.md
External Cold Runtime-vLLM Repeatability (G5, 2026-02-19, GPU-Convert Fix2)
external_cold_g5_gpuconvert_fix2_runtime_vllm_20260219T183529Z.jsonexternal_cold_g5_gpuconvert_fix2_runtime_vllm_r1_20260219T183742Z.jsonexternal_cold_g5_gpuconvert_fix2_runtime_vllm_r2_20260219T183843Z.jsonexternal_cold_g5_gpuconvert_fix2_runtime_vllm_r3_20260219T183939Z.jsonexternal_cold_gpuconvert_fix2_runtime_vllm_repeatability_20260219T184234Z.jsonexternal_cold_gpuconvert_fix2_runtime_vllm_repeatability_20260219T184234Z.md
External Cold All-Backend Repeatability (G5, 2026-02-19, GPU-Convert Fix2)
external_cold_20260219T184712Z.jsonexternal_cold_20260219T184712Z.mdexternal_cold_20260219T184951Z.jsonexternal_cold_20260219T184951Z.mdexternal_cold_20260219T185148Z.jsonexternal_cold_20260219T185148Z.mdexternal_cold_gpuconvert_fix2_allbackends_repeatability_20260219T185610Z.jsonexternal_cold_gpuconvert_fix2_allbackends_repeatability_20260219T185610Z.md
Runtime Cold Stability Sweep (G5, 2026-02-19, GPU-Convert Fix2)
external_cold_20260219T185634Z.jsonexternal_cold_20260219T185638Z.jsonexternal_cold_20260219T185639Z.jsonexternal_cold_20260219T185641Z.jsonexternal_cold_20260219T185643Z.jsonruntime_cold_stability_gpuconvert_fix2_20260219T185738Z.jsonruntime_cold_stability_gpuconvert_fix2_20260219T185738Z.md
External Cold All-Backend Repeatability (G5, 2026-02-19, GPU-Convert + Host-Prefetch Fix)
external_cold_20260219T202312Z.jsonexternal_cold_20260219T202312Z.mdexternal_cold_20260219T202504Z.jsonexternal_cold_20260219T202504Z.mdexternal_cold_20260219T202711Z.jsonexternal_cold_20260219T202711Z.mdexternal_cold_gpuconvert_prefetch_allbackends_repeatability_20260219T203017Z.jsonexternal_cold_gpuconvert_prefetch_allbackends_repeatability_20260219T203017Z.md
External Cold Repeatability (G5, 2026-02-24, Seq1 Multi-Head Default)
- First rerun probe:
- 3-run repeatability set:
external_cold_20260224T191644Z.jsonexternal_cold_20260224T191644Z.mdexternal_cold_20260224T191644Z.logexternal_cold_20260224T191739Z.jsonexternal_cold_20260224T191739Z.mdexternal_cold_20260224T191739Z.logexternal_cold_20260224T191834Z.jsonexternal_cold_20260224T191834Z.mdexternal_cold_20260224T191834Z.log
- Consolidated summary:
External Cold Repeatability (G5, 2026-02-24, Step0 Exp-Reuse Patch)
- 3-run repeatability set:
external_cold_20260224T193850Z.jsonexternal_cold_20260224T193850Z.mdexternal_cold_20260224T193850Z.logexternal_cold_20260224T193945Z.jsonexternal_cold_20260224T193945Z.mdexternal_cold_20260224T193945Z.logexternal_cold_20260224T194040Z.jsonexternal_cold_20260224T194040Z.mdexternal_cold_20260224T194040Z.log
- Consolidated summary:
External Cold Repeatability (G5, 2026-02-24, Step0 Shared-Prob Follow-Up, Reverted)
- 3-run repeatability set:
external_cold_20260224T194532Z.jsonexternal_cold_20260224T194532Z.mdexternal_cold_20260224T194532Z.logexternal_cold_20260224T194627Z.jsonexternal_cold_20260224T194627Z.mdexternal_cold_20260224T194627Z.logexternal_cold_20260224T194722Z.jsonexternal_cold_20260224T194722Z.mdexternal_cold_20260224T194722Z.log
- Consolidated summary:
Runtime Cold Stability Compare (G5, 2026-02-19, Host-Prefetch Fix)
external_cold_20260219T202209Z.jsonexternal_cold_20260219T202210Z.jsonexternal_cold_20260219T202212Z.jsonexternal_cold_20260219T202214Z.jsonexternal_cold_20260219T202215Z.jsonruntime_cold_stability_prefetch_compare_20260219T203017Z.jsonruntime_cold_stability_prefetch_compare_20260219T203017Z.md
Phase 3 Agentic Loops Canonical (G5, 2026-02-19)
phase3_canonical_g5_summary_20260219T000733Z.jsonphase3_canonical_g5_summary_20260219T000733Z.mdagentic_loops_20260219T000632Z.json(baseline s7)agentic_loops_20260219T000632Z.md(baseline s7)agentic_loops_20260219T000638Z.json(baseline s11)agentic_loops_20260219T000638Z.md(baseline s11)agentic_loops_20260219T000643Z.json(baseline s19)agentic_loops_20260219T000643Z.md(baseline s19)agentic_loops_20260219T000648Z.json(stress s7)agentic_loops_20260219T000648Z.md(stress s7)agentic_loops_20260219T000711Z.json(stress s11)agentic_loops_20260219T000711Z.md(stress s11)agentic_loops_20260219T000733Z.json(stress s19)agentic_loops_20260219T000733Z.md(stress s19)
Phase 3 Uncertainty Ablation (G5, 2026-02-19)
- Baseline repeatability summaries (
runs=8, seeds7/11/19): - Stress repeatability summaries (
runs=8, seeds7/11/19, fail/timeout profile): - Consolidated baseline-vs-stress report:
- Runtime-native canonical rerun summaries (
runs=8, seeds7/11/19, fixed greedy uncertainty): - Runtime-native canonical comparison:
- Runtime-native awareness2 rerun (
runs=8, seeds7/11/19):phase3_uncertainty_ablation_20260220T014149Z.jsonphase3_uncertainty_ablation_20260220T014431Z.jsonphase3_uncertainty_ablation_20260220T014626Z.jsonphase3_uncertainty_ablation_20260220T014834Z.jsonphase3_uncertainty_ablation_20260220T015116Z.jsonphase3_uncertainty_ablation_20260220T015357Z.jsonphase3_uncertainty_compare_runtime_native_awareness2_20260220T015651Z.jsonphase3_uncertainty_compare_runtime_native_awareness2_20260220T015651Z.md
- Runtime-native awareness3 rerun (
runs=8, seeds7/11/19, quality-gated zero fallback/errors):phase3_uncertainty_ablation_20260220T015803Z.jsonphase3_uncertainty_ablation_20260220T015924Z.jsonphase3_uncertainty_ablation_20260220T020044Z.jsonphase3_uncertainty_ablation_20260220T020216Z.jsonphase3_uncertainty_ablation_20260220T020441Z.jsonphase3_uncertainty_ablation_20260220T020706Z.jsonphase3_uncertainty_compare_runtime_native_awareness3_20260220T020947Z.jsonphase3_uncertainty_compare_runtime_native_awareness3_20260220T020947Z.md
- Runtime-native calibrated rerun
calib1(runs=8, seeds7/11/19, quality-gated zero fallback/errors):phase3_uncertainty_ablation_runtime_native_calib1_baseline_s7.jsonphase3_uncertainty_ablation_runtime_native_calib1_baseline_s11.jsonphase3_uncertainty_ablation_runtime_native_calib1_baseline_s19.jsonphase3_uncertainty_ablation_runtime_native_calib1_stress_s7.jsonphase3_uncertainty_ablation_runtime_native_calib1_stress_s11.jsonphase3_uncertainty_ablation_runtime_native_calib1_stress_s19.jsonphase3_uncertainty_compare_runtime_native_calib1_20260220T023521Z.jsonphase3_uncertainty_compare_runtime_native_calib1_20260220T023521Z.md
- Example per-arm artifact (normalized-logprob, baseline s7):
Full-Depth FFN Proj Fast Revalidation (G5, 2026-02-28 Late 8)
- Clean full-depth runtime profile captures (
pool=16384, classifier-disabled HTTP lane): - Profiled AB3 (
TRENI_DECODER_FFN_PROJ_U16_FAST_COMPUTE=0/1):- set directory:
ffnprojfast_fullstep_ab3_20260228T160255Z - summary:
ffnprojfast_fullstep_ab3_20260228T160255Z/summary_ab3.json
- set directory:
- Non-profiled warm AB3 (
TRENI_DECODER_FFN_PROJ_U16_FAST_COMPUTE=0/1):- set directory:
ffnprojfast_fullwarm_ab3_20260228T160358Z - summary:
ffnprojfast_fullwarm_ab3_20260228T160358Z/summary_ab3.json
- set directory:
- Temporary-promoted build sanity AB3 (
defaultvsforce_off): - Strict parity reports:
- candidate env report:
week3_parity_report_ffnprojfast_candidate_20260228T160459Z.json - candidate env runtime log:
week3_runtime_ffnprojfast_candidate_20260228T160459Z.log - temporary-promoted build report:
week3_parity_report_ffnprojfast_default_20260228T160639Z.json - temporary-promoted build runtime log:
week3_runtime_ffnprojfast_default_20260228T160639Z.log
- candidate env report:
Full-Depth FFN Proj Fast Foundation Gate (G5, 2026-02-28 Late 9)
- Foundation rerun pack (candidate-default lane):
- Same-window canonical gate AB2 (
defaultvsforce_off):
Runtime-vLLM AB5 Rerun (G5, 2026-02-28 Late 9)
- Same-window AB5 set (candidate lane):
- set directory:
aws_speedpass_runtime_vllm_ffnprojfastdefault_ab5_20260228T194454Z - summary:
aws_speedpass_runtime_vllm_ffnprojfastdefault_ab5_20260228T194454Z/summary_ab5.json - summary markdown:
aws_speedpass_runtime_vllm_ffnprojfastdefault_ab5_20260228T194454Z/summary_ab5.md - compare vs prior AB5:
aws_speedpass_runtime_vllm_ffnprojfastdefault_ab5_20260228T194454Z/compare_vs_prev_newdefaults_ab5.json
- set directory:
Canonical Set (G5 Foundation Repeatability, 2026-02-16)
baseline_20260215T064542Z.jsonruntime_foundation_warm_r1_20260216T215110Z.jsonruntime_foundation_warm_r2_20260216T215340Z.jsonruntime_foundation_warm_r3_20260216T215609Z.jsonruntime_foundation_cold_r1_20260216T215848Z.jsonruntime_foundation_cold_r2_20260216T220317Z.jsonruntime_foundation_cold_r3_20260216T220744Z.jsonfoundation_summary_20260216T221232Z.jsonparity_fix2_20260216T170320Z.json
Previous G5 Set (2026-02-15)
runtime_g5_20260215T061440Z.jsonparity_20260215T065535Z.jsoncomparison_20260215T085608Z.jsoncomparison_20260215T085608Z.md
Historical Set (T4, 2026-02-15)
baseline_20260214T225641Z.jsonruntime_20260215T012146Z.jsonparity_20260215T032638Z.jsoncomparison_20260215T040542Z.jsoncomparison_20260215T040542Z.md