Security¶
This page is PUMA's comprehensive security posture reference: the threat model, the determinism and integrity guarantees, the no-outbound defaults, the CI security tooling, the GitHub Actions permissions baseline, and the known security debt that is queued for follow-up work.
For the GitHub-canonical security policy (private disclosure path,
supported versions, disclosure timeline) see
SECURITY.md
at the repository root.
1. Overview¶
PUMA is a local-first benchmarking framework. Its security posture follows from three structural choices:
- Inference is local. Every model call goes through a local Ollama daemon on the same host. There is no SaaS inference path and no API token to leak.
- Predictions are reproducible. Fixed
seed=42,temperature=0.0, and a pinned model digest produce byte-identical predictions across runs — thepredictions_summary_hashis the audit signature. - Submissions are integrity-checked. Every PUMA Community
submission carries a
SHA-256over a canonical predictions tuple, the schema is immutable, and verification is available client-side and server-side.
The attack surface that remains — dependency vulnerabilities, leaked secrets, container image CVEs, GitHub Actions misconfiguration — is addressed by the CI security tooling and the audit cadence described below.
2. Threat Model¶
2.1 In scope¶
| Threat | Mitigation |
|---|---|
| Tampered or back-doored Python dependency | pip-audit on every push and weekly; pip install over PyPI HTTPS; pinned versions in requirements.txt. |
| SAST-detectable issue in PUMA source code | bandit -r src/puma/ on every push; MEDIUM+ reported, HIGH fails the build. |
| Secrets leaked in git history | gitleaks full-history scan on every push; .githooks/commit-msg strips AI-assistant trailers and other unwanted footers on the local side; the Phase Z-2 history rewrite (May 2026) removed prior co-author trailers. |
| CVE in the published container image | Trivy scan on every published tag (v*); upload as SARIF to the GitHub Security tab; build fails on HIGH/CRITICAL OS or library CVE. |
| Tampered PUMA Community submission | SHA-256 over the canonical predictions tuple; immutable schema/submission.v1.json (P3 constraint); JSON Schema validation; server-side recompute via the public verifier service. |
| GitHub Actions over-privilege | Least-privilege permissions: block declared per workflow / per job; no workflow uses a long-lived registry credential — GHCR publishes via the auto-provided GITHUB_TOKEN with packages: write; PyPI publishes via OIDC id-token: write (trusted publishing). |
| Drift of the determinism guarantees | The predictions_summary_hash is part of every persisted run and is verified across re-runs; any drift surfaces as a validate-baseline failure. |
2.2 Out of scope¶
- Host OS hardening — the contributor's local OS and Docker daemon configuration are the contributor's responsibility.
- Network-level attacks — PUMA does not run network services that
expose attack surface beyond
localhost(the dashboard binds0.0.0.0:8501by default but is intended for the local host). - Physical access — full-disk encryption and similar host-level defenses are out of scope.
- Model-output adversarial attacks — prompt-injection, jailbreaking, and similar attacks against the model itself are research-grade concerns and are tracked separately (post-v4.x research agenda).
- Vulnerabilities in third-party services — Ollama, the SQLAlchemy
driver, Streamlit, etc. are reported upstream and tracked here via
pip-auditadvisories.
3. Determinism guarantees¶
PUMA's reproducibility is a security property: it is what makes a benchmark result auditable.
| Guarantee | Where it is enforced |
|---|---|
Fixed RNG seed 42 |
src/puma/runtime/ — every spec uses inference.seed: 42 unless explicitly overridden. |
Fixed sampling temperature 0.0 |
Spec default; overrides surface in the run summary. |
| Pinned model digest | The Ollama manifest digest (e.g. qwen2.5:3b → 357c53fb659c) is captured in the runs table when the run starts. Re-running the same spec against a different digest is detectable as a hash mismatch. |
| Bi-temporal SQLite audit trail | runs, instances, predictions, metrics, emissions, profile_snapshots tables. Every row carries created_at; no row is mutated post-write. |
predictions_summary_hash |
Computed deterministically over the (instance_id, prediction) pairs in canonical order; recorded on the run and shipped in every PUMA Community submission. Byte-identical across re-runs of the same spec on the same hardware-and-runtime profile. |
The puma validate-baseline command compares the current canonical
spec's metrics against a reference value and exits 1 on drift —
the gate that turns determinism into a release-blocking check.
4. No outbound telemetry by default¶
The default configuration of PUMA makes no outbound network calls during a benchmark run.
| Surface | Default | How to opt in |
|---|---|---|
| Inference | Local Ollama daemon (http://localhost:11434) — no outbound calls. |
The endpoint can be reconfigured via OLLAMA_HOST for advanced setups; PUMA itself never reaches a remote inference provider. |
| Carbon tracking (CodeCarbon) | Off-line measurement — energy and CO₂ are computed from local CPU/GPU counters; no network call required. | sustainability.codecarbon: true in the run-spec enables the measurement; even when enabled, CodeCarbon's default emission factors are used locally. |
| Publishing (PUMA Community) | Off — puma share-results --dry-run packages the artifact locally without any network call. |
puma share-results (no --dry-run) opens a PR against pumacp/puma-community over api.github.com after puma auth login has stored a PAT locally at ~/.puma/credentials.toml (0600). |
| Server-side verification | Off — local puma community verify-hash recomputes the hash on-disk. |
puma community verify-hash --remote calls the public verifier service. |
| Web fetches inside the CLI | Off — there are no web_search / web_fetch paths in the production CLI. |
N/A. |
If you need a fully air-gapped operation, set sustainability.codecarbon: false,
do not invoke puma share-results, and run only on a host where the
Ollama daemon is local.
5. Submission integrity (PUMA Community)¶
Every artifact shared with the public benchmark hub is integrity-checked end-to-end.
- Hash format:
SHA-256hex digest over the canonical tuple(instance_id, prediction)for every row, sorted byinstance_id, serialized as compact JSON with stable key order. - Where it is stored: in the submission JSON itself
(
predictions_summary_hashfield) and as a row in themetricstable of the originating run. - Schema:
schema/submission.v1.jsonis immutable (P3 constraint). Any drift is detected by JSON Schema validation in thepuma community validatepath and rejected before submission. - Local verification:
puma community verify-hash <submission.json> --predictions <jsonl>recomputes the hash on-disk and compares it to the declared value. Exit0means verified, exit1means mismatch, exit2means the input could not be read. - Server-side verification:
puma community verify-hash --remotecalls the public verifier service; the service hashes a different input shape (D23 — tracked indocs/known_debt.md) so the--remoteflag returnsmismatchby construction for schema v1.0.0 submissions even when the local hash is correct. Reconciliation is deferred to v4.x with a schema decision; the canonical verification path is local. - Auto-merge: the
pumacp/puma-communityrepository's auto-merge workflow only accepts PRs that touch path-restricted submission files, refuses any diff outside that path filter, and requires the client-side hash to match the schema-declared one.
6. Git history sanitization¶
The repository's history was sanitized in Phase Z-2 (May 2026) to remove AI-assistant co-author trailers that had accumulated on machine-drafted commits. The rewrite:
- Used
git filter-repoto removeCo-Authored-By:trailer lines on every commit and every annotated tag, repo-wide. - Was verified to have no external dependents on the rewritten SHAs before force-pushing (no open PRs, no published packages that pinned a pre-rewrite SHA).
- Was followed by a
git push --force-with-leaseondevelopandmainand a re-tag pass to keepvX.Y.Zpointing at the rewritten commits.
The repo-root .githooks/commit-msg hook now strips three classes of
trailer from every future commit locally:
Co-authored-by: # any AI-assistant Co-authored-by line
Signed-off-by: …<AI tool> # any Signed-off-by line that names an AI tool
Generated-by: # any Generated-by footer
Activation is one-time per fresh clone: git config core.hooksPath
.githooks. The new-contributor procedure for setting this is
documented in development-workflow.md
§§6 and 10.6.
The GitHub UI for the repository surfaces commits with the
maintainer's identity as the only author; no AI-tool branding
appears in git log output.
7. Dependency management¶
| Tool | Workflow | Cadence | Severity gate |
|---|---|---|---|
pip-audit |
.github/workflows/pip-audit.yml |
Every push to develop/main + every PR + weekly 0 6 * * 1 UTC + manual dispatch |
Fails on HIGH or CRITICAL advisory in any production dependency listed in requirements.txt. |
bandit |
.github/workflows/bandit.yml |
Every push to develop/main + every PR + manual dispatch |
Reports MEDIUM+ via -ll; fails the build on HIGH (-l). |
Trivy (container) |
Appended to .github/workflows/publish-docker.yml |
Every tag push (v*) |
Fails on HIGH or CRITICAL OS or library CVE in the published image; results uploaded as SARIF to the GitHub Security tab. |
The production dependency surface is small and reads from
requirements.txt (27 entries: typer, httpx, pydantic, pyyaml,
jinja2, jsonschema, pandas, numpy, scikit-learn, scipy, sqlalchemy,
alembic, psutil, codecarbon, PyGithub, streamlit, langdetect,
structlog, rich, pyfiglet, tomli-w, requests, gradio-client,
matplotlib, seaborn — see the file for the canonical list). The
development-only surface (requirements-dev.txt: pytest, pytest-cov,
pytest-asyncio, respx, ruff, mypy, pre-commit) is excluded from
pip-audit to avoid blocking on advisories that cannot reach an
end user.
License-compatibility automation is deferred from this MVP — see §13 Known security debt entry D32.
8. Container image security¶
The published image at ghcr.io/pumacp/puma:vX.Y.Z is built from
Dockerfile.publish (added in S12.15). Properties:
- Multi-stage build: a
builderstage produces the wheel; thefinalstage installs it into a clean image. Build tooling does not ship in the final layer. - Slim base:
python:3.11-slim— minimal Debian-derived layer. - Non-root user: a
pumauser (uid=1000) is created in the final stage and theUSER pumadirective switches to it beforeENTRYPOINT. The image never runs asroot. - OCI labels:
org.opencontainers.image.title,description,source,documentation,licenses— set at the Dockerfile level and reinforced by thedocker/metadata-action@v5step in the publish workflow. - Entrypoint:
puma— the published image is the CLI; the default command is--help. - No bundled Ollama: the image expects an Ollama service to be
reachable on the network (see the project's
docker-compose.ymlfor the canonical multi-service layout). - Trivy scan: every tag push triggers a
Trivyscan withseverity: CRITICAL,HIGHandexit-code: 1; results upload to the GitHub Security tab as SARIF.
SBOM generation (CycloneDX) is deferred from this MVP — see §13 Known security debt entry D33.
9. GitHub Actions security posture¶
PUMA's CI uses GitHub-hosted ubuntu-latest runners with explicit
least-privilege permissions: blocks per workflow / per job.
Audit at the time of S12-N2 (current state, develop branch):
| Workflow | Permissions | Notes |
|---|---|---|
docs.yml (build job) |
contents: read |
Minimal — checkout + mkdocs build. |
docs.yml (deploy job) |
contents: write |
Required to push to gh-pages. |
release.yml |
contents: write |
Required by softprops/action-gh-release@v2. |
publish-docker.yml |
contents: read + packages: write |
packages: write is required by docker/login-action + GHCR push. No long-lived registry credential — GITHUB_TOKEN only. |
publish-pypi.yml |
id-token: write |
OIDC-based trusted publishing — no PyPI token in repo secrets. |
wiki-sync.yml |
contents: write |
Required to push to the puma.wiki.git companion repo. |
lint-and-test.yml |
(none declared) | Inherits the repo-default permissions. Worth tightening to explicit contents: read in a follow-up — tracked as an audit finding, not blocking this PR. |
smoke.yml |
(none declared) | Same as above — audit finding. |
pip-audit.yml (S12-N2) |
contents: read |
Minimal. |
bandit.yml (S12-N2) |
contents: read |
Minimal. |
gitleaks.yml (S12-N2) |
contents: read |
Minimal. |
GitHub branch protection on develop and main requires the
Lint and Test, Smoke Test, and Docs/build checks to be green
before merge; the security-tooling checks added in S12-N2 (pip-audit,
bandit, gitleaks) become required after one clean cycle on
develop to give time to remediate any first-pass findings.
10. Brand-scanner enforcement¶
tests/integration/test_agent_agnostic_remote.py::test_no_brand_references_in_tracked_tree
scans the entire tracked tree (via git grep -i -E) for the assistant
brand token and its case variants. The test enforces the project's
tool-agnostic posture documented in
development-workflow.md §13.
If you add a test, a config, or a doc that needs to reason about
the brand token (e.g., an audit that asserts the brand does not
appear elsewhere), the token must be assembled from fragments at
runtime, exactly as test_agent_agnostic_remote.py does:
This idiom is mandatory for any code that handles the brand token as
data; otherwise the repo-wide scanner trips on the literal in your
own file. The pattern is also applied in
tests/integration/test_security_doc.py (added in S12-N2) and in
tests/integration/test_development_workflow_doc.py (added in
S12-N4).
11. Vulnerability reporting¶
The canonical private-disclosure path is documented in
SECURITY.md
at the repository root.
In short: email pumacapstoneproject@gmail.com with a description,
reproduction steps, the affected version(s), and any suggested
mitigation. Acknowledgement within 72 hours; remediation plan within
30 days; public advisory within 90 days under the standard
coordinated-disclosure timeline.
12. Audit cadence¶
| Audit | Frequency |
|---|---|
pip-audit weekly cron |
Every Monday at 06:00 UTC. |
pip-audit / bandit / gitleaks on push |
Every push to develop/main + every PR. |
Trivy on published image |
Every tag push (v*). |
| Full re-audit (this page, the threat model, the workflow permissions table) | Every minor release (e.g. v4.1.0, v4.2.0) at minimum; on demand otherwise. |
| Hard re-audit (penetration test, dependency tree review) | Annually — first slot post-v4.0.0 release. |
13. Known security debt¶
Tracked in docs/known_debt.md. The S12-N2 MVP
deliberately right-sizes scope; the items below are queued for a
post-Sprint-12 backlog or a future audit cycle:
- D32 — License compatibility automation. Add
pip-licensesin CI to fail the build on non-permissive licenses entering the dependency tree. - D33 — SBOM CycloneDX generation. Produce a Software Bill of Materials in CycloneDX format and attach it to every published release artifact.
- D34 — Mutation testing on
src/puma/community/integrity.py. Usemutmutor equivalent to verify the integrity-hash test coverage actually catches subtle mutations of the canonical hash procedure. - D35 — Cross-platform install test. Verify the published wheel installs and the CLI starts on Linux, macOS (x86_64 + arm64), and Windows; today only Linux is exercised in CI.
Last reviewed: 2026-05-31 (S12-N2 MVP).