Track B Claim-Safe Table
Paper-ready, scoped Track B commercial claims with explicit model-only vs tool-only splits.
Purpose
This page is the claim-safe summary for Track B commercial comparisons.
Interpretation rule:
external/internal > 1means internal routing is faster.
Canonical Inputs
trackb_commercial_parity_appendix_20260220T124509Z.jsontrackb_commercial_parity_appendix_20260220T124509Z.mdtrackb_commercial_parity_appendix_fairness_20260220T193757Z.jsontrackb_commercial_parity_appendix_fairness_20260220T193757Z.md
Mixed Task Set (Local Control, runs=8/profile)
| Provider/Model | Ext/Int Ratio | Internal Error | External Error |
|---|---|---|---|
OpenAI gpt-5.2 | 0.987x | 0.0000 | 0.0313 |
OpenRouter anthropic/claude-sonnet-4.6 | 1.066x | 0.0000 | 0.0313 |
Task-Family Split (Local Control, runs=8)
| Provider/Model | Model-Only Ext/Int | Tool-Only Ext/Int |
|---|---|---|
OpenAI gpt-5.2 | 0.958x | 1.136x |
OpenRouter anthropic/claude-sonnet-4.6 | 1.044x | 1.051x |
Task-Family Split (Fairness-Hardened, Local Control, runs=8)
Harness controls:
execution_mode=interleavedpair_order=alternate- deterministic generation defaults (
temperature=0) - strict tool parity enabled on
tool_only
| Provider/Model | Model-Only Ext/Int | Tool-Only Ext/Int | Model-Only Int ms/token | Model-Only Ext ms/token | Tool-Only Int ms/token | Tool-Only Ext ms/token |
|---|---|---|---|---|---|---|
OpenAI gpt-5.2 | 0.971x | 1.038x | 57.657 | 57.663 | 37.553 | 38.990 |
OpenRouter anthropic/claude-sonnet-4.6 | 1.102x | 1.063x | 61.606 | 70.054 | 41.212 | 43.791 |
Claim-Safe Reading
- After fairness hardening,
tool_onlyfavors internal on both providers. model_onlyremains provider-sensitive: OpenAI is still near parity/slight inversion, while Sonnet favors internal.- Commercial Track B claims must remain stratified by task family and paired with token-normalized metrics.
Direct Artifact Links
- OpenAI model-only (
runs=8):internal_vs_external_control_openai_model_only_r8_20260220T123747Z.json - OpenAI tool-only (
runs=8):internal_vs_external_control_openai_tool_only_r8_20260220T124123Z.json - OpenRouter Sonnet model-only (
runs=8):internal_vs_external_control_openrouter_sonnet46_model_only_r8_20260220T123923Z.json - OpenRouter Sonnet tool-only (
runs=8):internal_vs_external_control_openrouter_sonnet46_tool_only_r8_20260220T124216Z.json - OpenAI model-only fairness (
runs=8):internal_vs_external_control_openai_model_only_fairness_r8_20260220T193120Z.json - OpenAI tool-only fairness (
runs=8, strict parity):internal_vs_external_control_openai_tool_only_fairness_r8_20260220T193246Z.json - OpenRouter Sonnet model-only fairness (
runs=8):internal_vs_external_control_openrouter_sonnet46_model_only_fairness_r8_20260220T193432Z.json - OpenRouter Sonnet tool-only fairness (
runs=8, strict parity):internal_vs_external_control_openrouter_sonnet46_tool_only_fairness_r8_20260220T193633Z.json