ποΈ Tools β RAG Systems & Vector Databases
Overview
Tools for Retrieval-Augmented Generation (RAG) pipeline in PUMA. Used in: Stage 4 (RAG-enhanced triage) and document knowledge base for research. See also: PN-RAG-Embeddings-VectorDB
Private Knowledge Base (Research Phase)
AnythingLLM
- URL: https://anythingllm.com
- Function: Local RAG platform; web interface for chat over PDFs without sending data to third parties
- Phase: F0 β F1
- PUMA use: Building a private knowledge base of all downloaded papers; semantic queries over the SLR corpus without re-reading; privacy-preserving (data stays local)
- Backend: Ollama (nomic-embed-text for embeddings) + ChromaDB (local vector store)
- Justification: All paper content stays on local hardware; GDPR-compliant research workflow
- Related: Keshav-Reading-Log
NotebookLM (Google)
- URL: https://notebooklm.google.com
- Function: Googleβs document-grounded conversational AI
- Phase: F0 β F1
- PUMA use: Verified Q&A against specific paper PDFs; audio synthesis of literature for mobile listening
- Limitation: Cloud-based (data sent to Google); used only for non-sensitive research documents
Embedding Models
nomic-embed-text (via Ollama)
- URL: https://ollama.ai/library/nomic-embed-text
- Function: Local semantic embedding model
- Dimensions: 768
- Phase: F2 β F4
- PUMA use: Generating embeddings for Jira SR historical issues (Stage 4 RAG); semantic similarity search for retrieving relevant historical examples
- Install:
ollama pull nomic-embed-text - Justification: Free, local, no API cost; compatible with both Qdrant and ChromaDB; 768-dim embeddings suitable for issue text (avg ~50-300 tokens)
Vector Databases
Qdrant
- URL: https://qdrant.io
- Function: High-performance vector database for production RAG systems
- Phase: F2 β F4
- PUMA use (Stage 4): Storing embeddings of Jira SR historical issues; semantic similarity search for RAG-enhanced triage agent; metadata filtering by project, priority class, date
- Install:
docker run -p 6333:6333 qdrant/qdrant - Key features: HNSW index for fast approximate search; payload filtering; persistent storage
- Justification: Production-grade; supports metadata filters (filter by Apache project β reduce noise); native Python client
- Related: PN-RAG-Embeddings-VectorDB
ChromaDB
- URL: https://www.trychroma.com
- Function: Lightweight embeddable Python vector database; no server required
- Phase: F2
- PUMA use: Early Stage 4 prototyping; in-memory or disk-persistent vector store for rapid iteration
- Install:
pip install chromadb - When to use: Pass the simplicity test β if the prototype just needs semantic search without metadata filtering, ChromaDB is faster to set up than Qdrant
- Migration path: Prototype β ChromaDB β Production β Qdrant (when metadata filtering needed)
RAG Frameworks
LlamaIndex
- URL: https://www.llamaindex.ai
- Function: Python RAG framework for indexing, retrieval, and generation
- Phase: F2 β F4
- PUMA use (Stage 4):
- Indexing Jira SR historical issues into Qdrant with nomic-embed-text embeddings
- Building the RAG retrieval pipeline (query β embed β retrieve top-k issues β inject into prompt)
OllamaEmbedding+QdrantVectorStoreintegration
- Install:
pip install llama-index llama-index-embeddings-ollama llama-index-vector-stores-qdrant - Justification: Specialised for RAG (vs. LangChain which is general); simpler pipeline for Stage 4; native Ollama + Qdrant support
- Alternative: LangChain for more general agentic patterns (Stage 5)
RAG Pipeline for PUMA Stage 4
Historical Jira SR issues (Ortu 2015)
β
nomic-embed-text (Ollama)
β 768-dim embeddings
Qdrant vector store
β
New issue (query) β embed β retrieve top-5 similar issues
β
[Issue text + 5 similar historical issues + their labels]
β
Llama 3.2 8B prompt (ReAct pattern)
β
Priority classification
Metrics for Stage 4 evaluation:
- Recall@5: proportion of retrieved issues from correct priority class
- F1-macro: improvement over Stage 1 (CoT without RAG)
- Latency overhead: additional ms from retrieval step