Development Stack Tools — PUMA

Overview

Detailed notes on all development tools used or evaluated for PUMA implementation.


🤖 LLM Runtime Alternatives

LM Studio

  • URL: lmstudio.ai
  • Purpose in PUMA: Alternative to Ollama for model discovery and local inference with GUI
  • When to use: When testing new models before adding them to Ollama pipeline, or for quick interactive testing
  • PUMA setup:
    1. Download LM Studio
    2. Search and download: llama-3.2-8b-instruct-q4_k_m
    3. Enable local server on port 1234 (OpenAI-compatible API)
    4. Test with: curl http://localhost:1234/v1/completions
    
  • Vs Ollama: LM Studio has better GUI for model comparison; Ollama is more scriptable and reproducible for batch experiments → Ollama preferred for PUMA experiments

Groq API

  • URL: console.groq.com
  • Purpose in PUMA: Ultra-fast cloud inference for rapid prototyping (not for final experiments — non-local)
  • Free tier: 14,400 req/day for Llama models
  • When to use: Testing prompts quickly before running full local Ollama batch
  • ⚠️ Not for final results — PUMA experiments must run locally for reproducibility and carbon measurement
  • Python client:
    from groq import Groq
    client = Groq(api_key="YOUR_KEY")
    response = client.chat.completions.create(
        model="llama-3.2-8b-preview",
        messages=[{"role": "user", "content": "Test prompt"}]
    )

OpenRouter

  • URL: openrouter.ai
  • Purpose in PUMA: Access to multiple LLMs via unified API for comparison (not for final experiments)
  • PUMA use: Quick benchmarking of a prompt across 3-4 models to choose which to prioritise locally
  • Python client:
    import requests
    response = requests.post(
        "https://openrouter.ai/api/v1/chat/completions",
        headers={"Authorization": f"Bearer {api_key}"},
        json={"model": "meta-llama/llama-3.2-8b-instruct", "messages": [...]}
    )

🤝 Multi-Agent Frameworks

CrewAI

  • GitHub: crewai-io/crewAI
  • Purpose: Framework for building collaborative AI agent teams with defined roles and tasks
  • PUMA relevance: Stage 5 (optional Smart PMO) concept — multiple specialized agents (Triage Agent, Estimation Agent, Reporter)
  • Current use: Not in MVP scope; architecture reference only
  • Example crew for PUMA:
    from crewai import Agent, Task, Crew
     
    triage_agent = Agent(
        role='Issue Triage Specialist',
        goal='Classify Jira issues by priority accurately',
        backstory='Expert PM with 10 years in software triage',
        llm=ollama_llm  # local model
    )

LangGraph (LangChain)

  • GitHub: langchain-ai/langgraph
  • Purpose: Building stateful, multi-step LLM workflows as directed graphs with conditional branching
  • PUMA relevance: Could replace simple sequential pipeline with graph-based orchestration for Stage 4 (RAG-enhanced triage)
  • Current use: Evaluated; overkill for MVP, may use in Stage 4
  • Template reference: wassim249/fastapi-langgraph-agent-production-ready-template

Pydantic AI

  • GitHub: pydantic/pydantic-ai
  • Purpose: Type-safe, structured output generation from LLMs using Pydantic models
  • PUMA use: Ensuring triage and estimation outputs conform to expected schema (Priority enum, SP int)
  • Example:
    from pydantic import BaseModel, Field
    from typing import Literal
     
    class TriageOutput(BaseModel):
        priority: Literal['Critical', 'High', 'Medium', 'Low']
        reasoning: str = Field(description="Step-by-step reasoning")
        confidence: Literal['high', 'medium', 'low']

📊 Observability & Evaluation

Arize Phoenix

  • URL: phoenix.arize.com (also local via pip install arize-phoenix)
  • Purpose: LLM observability — trace, evaluate, and debug LLM applications
  • PUMA use: Optional instrumentation for Stage 4 (RAG-enhanced triage) to inspect retrieval quality and model reasoning chains
  • Local setup:
    pip install arize-phoenix opentelemetry-sdk opentelemetry-exporter-otlp
    python -m phoenix.server.main &  # starts at localhost:6006

CodeCarbon

  • GitHub: mlco2/codecarbon
  • Purpose: Measure CO₂ emissions from Python code execution
  • PUMA use: Core measurement tool — every experiment condition is wrapped in EmissionsTracker
  • Integration:
    from codecarbon import EmissionsTracker
     
    tracker = EmissionsTracker(
        project_name="puma-triage-llama32-zeroshot",
        output_dir="./results/carbon/",
        log_level="error"
    )
    tracker.start()
    # ... run experiment ...
    emissions_kg = tracker.stop()
    emissions_gco2 = emissions_kg * 1000

🌐 Infrastructure

Docker Desktop + Docker Compose

  • Purpose: Containerised, reproducible environment for PUMA experiments
  • PUMA use: Docker Compose setup for: Ollama + ChromaDB + Jupyter + Streamlit
  • docker-compose.yml skeleton:
    version: '3.8'
    services:
      ollama:
        image: ollama/ollama
        ports: ["11434:11434"]
        volumes: ["ollama-models:/root/.ollama"]
      
      jupyter:
        build: .
        ports: ["8888:8888"]
        volumes: [".:/workspace"]
        environment:
          - OLLAMA_HOST=ollama:11434
      
      chromadb:
        image: chromadb/chroma
        ports: ["8000:8000"]
     
    volumes:
      ollama-models:

Streamlit

  • Purpose: Rapid web UI for PUMA demo and results visualisation
  • PUMA use: Optional interactive demo of triage agent (human-in-loop interface)
  • Setup:
    # puma_demo.py
    import streamlit as st
    from src.agents.triage_agent import TriageAgent
     
    st.title("PUMA — Issue Triage Demo")
    model = st.selectbox("Model", ["llama3.2:8b", "mistral:7b"])
    strategy = st.selectbox("Strategy", ["zero-shot", "few-shot-3", "cot"])
    issue_title = st.text_input("Issue title")
     
    if st.button("Classify"):
        agent = TriageAgent(model=model, strategy=strategy)
        result = agent.classify({"title": issue_title})
        st.metric("Predicted Priority", result["predicted_priority"])

📊 Data Analysis Libraries

pandas (McKinney)

  • URL: https://pandas.pydata.org
  • Reference: McKinney, W. (2022). Python for data analysis (3rd ed.). O’Reilly. https://wesmckinney.com/book/
  • Purpose: Core data manipulation library for PUMA experiment results
  • PUMA use: Loading Jira SR and TAWOS CSV datasets; computing F1-macro, MAE, class distributions; exporting results to Excel/CSV for thesis tables
  • Key methods used:
    import pandas as pd
     
    # Load and stratify dataset
    df = pd.read_csv("jira_sr.csv")
    subset = df.groupby('priority').apply(lambda x: x.sample(n=50, random_state=42))
     
    # Compute results
    results = pd.DataFrame(experiment_log)
    results.groupby(['model', 'strategy'])['f1_macro'].describe()
  • Justification: Industry-standard Python data manipulation; native integration with scikit-learn metrics and matplotlib/seaborn visualisation

🔗 Reference Templates & Repositories

RepositoryPurposePUMA use
wassim249/fastapi-langgraph-agent-production-ready-templateProduction FastAPI + LangGraph agentReference for Stage 4+ architecture
sushmitanandi/langgraph-multi-agentMulti-agent LangGraph exampleReference for Stage 5 (Smart PMO)
vency-ai/agentic-scrumAgile PM agent prototypeComparison reference
groepl/Obsidian-Zettelkasten-Starter-KitObsidian vault templateBase for this vault

LN-Tools-Ollama-ClaudeCode-OpenCode-BrowserOS LN-Tools-Research-Discovery SP-Architecture