LN: Tunstall, von Werra & Wolf (2022) — NLP with Transformers

Bibliographic Reference

Citation: Tunstall, L., von Werra, L., & Wolf, T. (2022). Natural language processing with Transformers: Building language applications with Hugging Face. O’Reilly Media. https://www.oreilly.com/library/view/natural-language-processing/9781098136780/

Pass 1 — Bird’s Eye View (5 Cs)

C	Assessment
Category	Technical book — implementation guide
Context	Written by members of Hugging Face (the organisation behind the `transformers` library). Combines theoretical foundations with hands-on Python code
Correctness	Code examples tested against Hugging Face library versions; covers published models with accurate capability claims
Contributions	(1) End-to-end guide for fine-tuning and deploying Transformer models; (2) Quantization techniques (4-bit, 8-bit, GGUF); (3) Local inference with open-weight models; (4) Text classification, summarisation, generation, and Q&A pipelines
Clarity	Excellent — progressive difficulty, extensive code examples

Relevance: ⭐⭐⭐⭐

Provides the technical basis for PUMA’s model selection, fine-tuning rationale (LoRA/QLoRA), and the local execution pipeline (Ollama + open-weight models). The classification pipeline chapters directly map to PUMA’s H1 implementation.

Pass 2 — Key Concepts

Text Classification Pipeline (PUMA H1 Mapping)

The book’s classification chapter maps directly to PUMA’s triage task:

from transformers import pipeline
 
classifier = pipeline("text-classification",
                      model="meta-llama/Llama-3.2-8B-Instruct")
 
result = classifier(
    "Bug: User cannot login after password reset",
    candidate_labels=["Bug", "Story", "Task", "Improvement"]
)

For PUMA, the equivalent is a prompted LLM (via Ollama) rather than a fine-tuned classifier — but the conceptual pipeline is identical.

Fine-Tuning with LoRA

The book’s fine-tuning chapters cover parameter-efficient fine-tuning:

from peft import LoraConfig, get_peft_model
 
config = LoraConfig(
    r=16,
    lora_alpha=32,
    target_modules=["q_proj", "v_proj"],
    lora_dropout=0.05,
    bias="none",
    task_type="CAUSAL_LM"
)
model = get_peft_model(base_model, config)

This is directly applicable to PUMA’s optional fine-tuning stage (PUMA Stage 3+).

Quantization and Local Deployment

Key quantization concepts from the book:

8-bit quantization (bitsandbytes): 2× memory reduction, <1% quality loss
4-bit NF4 quantization (QLoRA): 4× memory reduction, enables 8B models on consumer GPU
GGUF (llama.cpp): CPU-compatible quantization format used by Ollama

These directly justify PUMA’s choice to use Ollama for local model execution.

PUMA Integration

Ch.3 Methods: Cite Tunstall et al. for the technical justification of local inference, quantization, and the Python classification pipeline
Model selection: The book’s coverage of Mistral 7B, Gemma 2, and Phi-3.5 validates PUMA’s model choices

PN-FineTuning-LoRA-Quantization — LoRA, QLoRA, GGUF detailed notes
PN-LLM-Models-PUMA — model catalog (all covered by Tunstall et al.)
LN-Vaswani-2017-AttentionIsAllYouNeed — Transformer architecture that underpins all models
LN-Tools-Ollama-ClaudeCode-OpenCode-BrowserOS — Ollama for local inference

PUMA Vault

Explorador

Natural Language Processing with Transformers: Building Language Applications with Hugging Face

LN: Tunstall, von Werra & Wolf (2022) — NLP with Transformers

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Key Concepts

Text Classification Pipeline (PUMA H1 Mapping)

Fine-Tuning with LoRA

Quantization and Local Deployment

PUMA Integration

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces

PUMA Vault

Explorador

Natural Language Processing with Transformers: Building Language Applications with Hugging Face

LN: Tunstall, von Werra & Wolf (2022) — NLP with Transformers

Pass 1 — Bird’s Eye View (5 Cs)

Pass 2 — Key Concepts

Text Classification Pipeline (PUMA H1 Mapping)

Fine-Tuning with LoRA

Quantization and Local Deployment

PUMA Integration

Related Notes

MOCs

Vista Gráfica

Tabla de Contenidos

Retroenlaces