Models catalog — version history¶
This document tracks changes to config/models_catalog.yaml across
PUMA releases. Each entry corresponds to a catalog_version value
published in a tagged release.
The catalog uses a list-of-dicts shape (models: [...]) keyed on
ollama_tag. The loader (src/puma/preflight/catalog.py) reads
raw.get("models", []) so the addition of root-level fields such as
catalog_version is backward-compatible.
catalog_version 2.7.0 — 2026-05-16 (PUMA v2.7.0)¶
Added¶
Two Qwen3 family entries, both restricted to gpu-high with
logprobs_supported: false until empirical verification on
appropriate hardware (P10 / P11):
qwen3:30b— Qwen3 30B dense (Apache 2.0, Alibaba Qwen team). Hybrid Gated DeltaNet + self-attention architecture, native context 262144 tokens. GGUF size 17.3 GB verified via Ollama registry manifest probe (registry.ollama.ai/v2/library/qwen3/manifests/30b, sum of layer sizes).qwen3:30b-a3b— Qwen3 30B-A3B MoE: 30B total parameters, ~3B active per token. Same registry-verified GGUF size of 17.3 GB (comparable to the dense sibling because the GGUF contains every expert, only a subset is routed per forward — same property as the gemma4 family observed in F8). Thenotesfield carries the F8/D18 caveat so that future contributors do not extend compatibility to smaller profiles without empirical evidence.
Both entries:
params_b: 30.0(TOTAL, following thegemma4:26b-a4bprecedent where the tag itself encodes both numbers).profiles_compatible: [gpu-high]only. PUMA's reference validation hardware (gpu-entry, RTX 2060 Mobile 6 GB) cannot run a 17.3 GB GGUF.gpu-mid(12–24 GB VRAM) is borderline once the operating system and context are accounted for;gpu-high(24+ GB) is the only safe default.- Excluded from every
apple-silicon-*profile pending empirical validation (P11 invariant). Re-enabling requires new empirical evidence and an explicit debt entry referencing the gemma4 D18 precedent.
Considered but not catalogued: Kimi K2.6¶
Moonshot AI's Kimi K2.6 was considered for cataloguing in v2.7.0. A probe of the Ollama registry on 2026-05-16 returned HTTP 404 for every plausible tag naming:
| Probed tag | Result |
|---|---|
kimi-k2:6 |
404 |
kimi-k2:latest |
404 |
kimi-k2:1t |
404 |
kimi-k2:1t-instruct |
404 |
kimi-k2:0905 |
404 |
kimi-k2:base |
404 |
kimi-k2:instruct |
404 |
kimi:latest |
404 |
kimi-k2.6:latest |
404 |
moonshot:latest |
404 |
moonshot:kimi-k2 |
404 |
kimi-k2-base:latest |
404 |
kimi-k2-instruct:latest |
404 |
The model is not distributed via the Ollama registry as of the
v2.7.0 cut. Cataloguing a non-existent ollama_tag would violate
the project's empirical-first principle (P10) and would produce a
broken puma models pull command for users following the catalog
metadata.
The model is excluded from the v2.7.0 catalog. It may be reconsidered in a future release if Moonshot AI or a third-party distributor publishes K2.6 to the Ollama registry, or if PUMA extends its schema to support non-Ollama distribution channels (out of scope for v2.7.0).
Considered but deferred: additional Qwen3 variants¶
The registry probe also confirmed the following real Ollama tags exist but were deferred from v2.7.0 to keep scope minimal:
| Tag | Real GGUF | Reason for deferral |
|---|---|---|
qwen3:32b (dense) |
18.8 GB | Marginal upgrade over qwen3:30b; defer until empirical validation on gpu-high distinguishes them |
qwen3:235b-a22b (MoE) |
132.4 GB | Requires multi-GPU rigs well beyond gpu-high (24+ GB VRAM); defer pending hardware tier extension |
qwen3-coder:30b, qwen3-coder:480b |
— | Coder family is task-specific; out of scope for PMO benchmarks |
Notes¶
- Schema unchanged: the catalog continues to use exactly the 8
fields established in v2.0.0–v2.6.0 (
ollama_tag,params_b,gguf_size_gb,context_window,logprobs_supported,profiles_compatible,timeout_s,notes). All additional information (release date, license, MoE caveats, validation status, architecture details) is encoded as multi-line text insidenotes. This preserves the project's minimum-complexity principle (P5) and keepssrc/puma/preflight/catalog.pyand theModelEntrydataclass byte-identical to v2.6.0. - Empirical validation roadmap: the two new entries declare
validation as pending in their
notestext. Whengpu-highhardware (24+ GB NVIDIA VRAM) becomes available to the project, the validation protocol is: pull the model via Ollama, run the canonical baselines (triage_jira and estimation_tawos), measure parse_failure_rate and reproducibility, then either bump thelogprobs_supportedflag, extendprofiles_compatibleto appropriate Apple Silicon variants, or document a new failure mode. - Regression guards: 5 new unit tests under
tests/unit/test_catalog_metadata.pypin the v2.7.0 contract — exactprofiles_compatible == ['gpu-high'], nogpu-entry, noapple-silicon-*, MoE caveat preserved innotes. Loosening any of these requires deliberate test-edit intent.
catalog_version 2.6.0 — 2026-05-16 (PUMA v2.6.0)¶
Added¶
- 9 Apple Silicon profile identifiers in
config/profiles.yaml:apple-silicon-m3,-m3-pro,-m3-max,-m4,-m4-pro,-m4-max,-m5,-m5-pro,-m5-max,-m5-ultra. All declareempirical_validation: pending— PUMA has no Mac hardware in the validation set as of v2.6.0; the dispatch infrastructure ships here so empirical validation can be performed when hardware becomes available. SeeCROSS_ARCH_REPRODUCIBILITY.mdfor the testing protocol. - Schema extension to
profiles.yamlrequirements (non-breaking):apple_silicon_required: bool,chip_brand_match: str,min_unified_memory_gb: int. The existing 5 NVIDIA/CPU profiles leave them at their defaults and are unaffected. - Model
profiles_compatible[]extended conservatively per a memory-headroom rule (≈ 2× GGUF + OS overhead). Summary: qwen2.5:1.5b,gemma3:1b→ compatible with all 10 apple-silicon-*qwen2.5:3b,gemma3:4b→ skip m3 base (8 GB tight)- 7B–8B models (
qwen2.5:7b,mistral:7b,llama3.1:8b,deepseek-r1:7b) → require Pro/Max/Ultra (≥ 18 GB) - 14B models (
qwen2.5:14b,deepseek-r1:14b) → require Max/Ultra (≥ 36 GB) gemma3:12b→ requires ≥ 24 GB (Pro/Max/Ultra of m3-max, m4-pro+, m5-pro+)gemma3:27b→ requires Max/Ultra (≥ 36 GB)- New unit tests in
tests/unit/test_catalog_metadata.py:test_valid_profiles_includes_all_apple_silicon_identifiers,test_apple_silicon_profiles_defined_in_profiles_yaml,test_apple_silicon_chip_brand_match_is_unique,test_gemma4_family_not_compatible_with_any_apple_silicon,test_qwen25_3b_compatible_with_apple_silicon_m4_pro.
Preserved (P2 / P6)¶
The gemma4 family stays excluded from every apple-silicon-*
profile. Same VRAM-pressure failure mode that motivated the
gpu-entry exclusion (D18, F8) applies to unified memory on smaller
chip variants. Re-enabling any (gemma4, apple-silicon-*) pair
requires new empirical evidence on Mac hardware and an explicit
debt entry referencing the prior exclusion. The regression-guard
test test_gemma4_family_not_compatible_with_any_apple_silicon
enforces this from v2.6.0 onward.
Notes¶
- Compatibility is inferred from unified-memory specifications, not empirically validated on Apple Silicon hardware in v2.6.0.
- Q4_K_M quantisation is expected to make
f1/maebit-exact across architectures; logprobs (and therefore ECE) may differ. SeeCROSS_ARCH_REPRODUCIBILITY.md. select_profile()insrc/puma/preflight/profile.pyruns the apple-silicon branch BEFORE the existing GPU/CPU dispatch. On Linux+NVIDIA the new branch is a no-op (caps.chip_brand is None).
catalog_version 2.5.0 — 2026-05-13 (PUMA v2.5.0)¶
Added¶
catalog_versionfield at the YAML root, starting at"2.5.0".catalog_changelog_pathfield pointing to this document.- Unit test
tests/unit/test_catalog_metadata.py::test_catalog_has_version_fieldasserting both fields exist and match the expected values.
Documented (no entry change, status clarification only)¶
The gemma4 family (gemma4:e2b, gemma4:e4b, gemma4:26b-a4b)
was catalogued in earlier releases with the gpu-entry profile
excluded from profiles_compatible[]. This exclusion is empirically
grounded; v2.5.0 preserves it unchanged. The supporting evidence:
- F8 (closed):
gemma4:e2bGGUF size was measured at 7.2 GB on disk, not the ~2 GB suggested by the model's effective active parameter count. Root cause: MoE architecture stores every expert in the GGUF artifact regardless of which are routed at inference time, so "effective active params" do not predict the on-disk size. Catalog fieldgguf_size_gbwas corrected from2.0to7.2. - D18 (closed): all 5 smoke runs with
gemma4:e2bongpu-entryhardware (RTX 2060 Mobile 6 GB VRAM) returned emptyraw_responsestrings acrosstriage_jira,estimation_tawos, andprioritization_jira. The inferred root cause is VRAM pressure (7.2 GB GGUF vs 6 GB VRAM) causing partial CPU offload that the current parsers cannot recover from. Resolution: removegpu-entryfromprofiles_compatible[]for the threegemma4tags. - Regression guard:
tests/unit/test_catalog_metadata.py::test_gemma4_family_excluded_from_gpu_entryasserts"gpu-entry" not in entry.profiles_compatiblefor all threegemma4tags. This test is never weakened — re-enablinggpu-entryfor anygemma4entry requires new empirical evidence on the validation hardware and reopening D18 with that evidence.
Users on gpu-mid (12–24 GB VRAM) or gpu-pro (24+ GB VRAM) hardware
can use the gemma4 family normally. Users on gpu-entry should
select qwen2.5:* or gemma3:* models instead.
Reference: Gemma 4 release timeline¶
The Gemma 4 family was released by Google on 2 April 2026 under the
Apache 2.0 license. PUMA's catalog reflects empirical compatibility
on the validation hardware (RTX 2060 Mobile 6 GB), which is more
restrictive than nominal specifications would suggest for MoE
variants. The catalog is not a marketing document — it is the
single source of truth for hardware-compatible model dispatch, and
its profiles_compatible[] field is binding.
catalog_version 2.4.0 and earlier (no version field)¶
No catalog_version field was present in v2.0.0 through v2.4.0.
The catalog content evolved organically across Sprints 1–7. The
de-facto baseline at v2.4.0 includes:
qwen2.5:1.5b,qwen2.5:3b(canonical),qwen2.5:7b,qwen2.5:14bgemma3:1b,gemma3:4b,gemma3:12b,gemma3:27bgemma4:e2b,gemma4:e4b,gemma4:26b-a4b(all gpu-entry-excluded per D18/F8)mistral:7b,llama3.1:8bdeepseek-r1:7b,deepseek-r1:14bphi3:mini(where present in the prevailing config)
For the exact catalog state at any earlier tag, use git:
Conventions for future entries¶
When adding or modifying a catalog entry, the following invariants hold:
- Bump
catalog_versioninconfig/models_catalog.yamland add a section in this document describing what changed. - Empirically validate or mark
pending. Any new entry without empirical validation on PUMA's validation hardware must includeempirical_validation: pendingand must NOT includegpu-entryinprofiles_compatible[]. This invariant generalises the F8/D18 lesson — nominal specifications do not predict runtime compatibility on constrained hardware. - Never re-enable a previously-excluded
(model, profile)pair without new empirical evidence and an explicit debt-tracker entry referencing the prior exclusion. - Test the change. Catalog tests live in
tests/unit/test_catalog_metadata.py; add an assertion when an entry encodes a non-obvious invariant.