πŸ—ƒοΈ Tools β€” RAG Systems & Vector Databases

Overview

Tools for Retrieval-Augmented Generation (RAG) pipeline in PUMA. Used in: Stage 4 (RAG-enhanced triage) and document knowledge base for research. See also: PN-RAG-Embeddings-VectorDB


Private Knowledge Base (Research Phase)

AnythingLLM

  • URL: https://anythingllm.com
  • Function: Local RAG platform; web interface for chat over PDFs without sending data to third parties
  • Phase: F0 – F1
  • PUMA use: Building a private knowledge base of all downloaded papers; semantic queries over the SLR corpus without re-reading; privacy-preserving (data stays local)
  • Backend: Ollama (nomic-embed-text for embeddings) + ChromaDB (local vector store)
  • Justification: All paper content stays on local hardware; GDPR-compliant research workflow
  • Related: Keshav-Reading-Log

NotebookLM (Google)

  • URL: https://notebooklm.google.com
  • Function: Google’s document-grounded conversational AI
  • Phase: F0 – F1
  • PUMA use: Verified Q&A against specific paper PDFs; audio synthesis of literature for mobile listening
  • Limitation: Cloud-based (data sent to Google); used only for non-sensitive research documents

Embedding Models

nomic-embed-text (via Ollama)

  • URL: https://ollama.ai/library/nomic-embed-text
  • Function: Local semantic embedding model
  • Dimensions: 768
  • Phase: F2 – F4
  • PUMA use: Generating embeddings for Jira SR historical issues (Stage 4 RAG); semantic similarity search for retrieving relevant historical examples
  • Install: ollama pull nomic-embed-text
  • Justification: Free, local, no API cost; compatible with both Qdrant and ChromaDB; 768-dim embeddings suitable for issue text (avg ~50-300 tokens)

Vector Databases

Qdrant

  • URL: https://qdrant.io
  • Function: High-performance vector database for production RAG systems
  • Phase: F2 – F4
  • PUMA use (Stage 4): Storing embeddings of Jira SR historical issues; semantic similarity search for RAG-enhanced triage agent; metadata filtering by project, priority class, date
  • Install: docker run -p 6333:6333 qdrant/qdrant
  • Key features: HNSW index for fast approximate search; payload filtering; persistent storage
  • Justification: Production-grade; supports metadata filters (filter by Apache project β†’ reduce noise); native Python client
  • Related: PN-RAG-Embeddings-VectorDB

ChromaDB

  • URL: https://www.trychroma.com
  • Function: Lightweight embeddable Python vector database; no server required
  • Phase: F2
  • PUMA use: Early Stage 4 prototyping; in-memory or disk-persistent vector store for rapid iteration
  • Install: pip install chromadb
  • When to use: Pass the simplicity test β€” if the prototype just needs semantic search without metadata filtering, ChromaDB is faster to set up than Qdrant
  • Migration path: Prototype β†’ ChromaDB β†’ Production β†’ Qdrant (when metadata filtering needed)

RAG Frameworks

LlamaIndex

  • URL: https://www.llamaindex.ai
  • Function: Python RAG framework for indexing, retrieval, and generation
  • Phase: F2 – F4
  • PUMA use (Stage 4):
    • Indexing Jira SR historical issues into Qdrant with nomic-embed-text embeddings
    • Building the RAG retrieval pipeline (query β†’ embed β†’ retrieve top-k issues β†’ inject into prompt)
    • OllamaEmbedding + QdrantVectorStore integration
  • Install: pip install llama-index llama-index-embeddings-ollama llama-index-vector-stores-qdrant
  • Justification: Specialised for RAG (vs. LangChain which is general); simpler pipeline for Stage 4; native Ollama + Qdrant support
  • Alternative: LangChain for more general agentic patterns (Stage 5)

RAG Pipeline for PUMA Stage 4

Historical Jira SR issues (Ortu 2015)
           ↓
  nomic-embed-text (Ollama)
           ↓ 768-dim embeddings
     Qdrant vector store
           ↓
New issue (query) β†’ embed β†’ retrieve top-5 similar issues
           ↓
 [Issue text + 5 similar historical issues + their labels]
           ↓
  Llama 3.2 8B prompt (ReAct pattern)
           ↓
     Priority classification

Metrics for Stage 4 evaluation:

  • Recall@5: proportion of retrieved issues from correct priority class
  • F1-macro: improvement over Stage 1 (CoT without RAG)
  • Latency overhead: additional ms from retrieval step

MOCs