Prompt: Claude — IIPR Inverse Prompt Engineering
Use when a prompt is not producing the desired output. Diagnose first, redesign second. PN-AMI-DRCA-IIPR-Frameworks — IIPR section
Step 1 — Diagnostic Prompt
ROLE:
You are a prompt engineering expert who specialises in diagnosing
why prompts fail to produce desired outputs.
CONTEXT:
I sent this prompt to an LLM:
--- ORIGINAL PROMPT START ---
[PASTE YOUR ORIGINAL PROMPT VERBATIM]
--- ORIGINAL PROMPT END ---
The LLM responded with:
--- ACTUAL RESPONSE START ---
[PASTE THE ACTUAL RESPONSE]
--- ACTUAL RESPONSE END ---
I wanted this response instead:
--- DESIRED RESPONSE START ---
[DESCRIBE OR PASTE THE IDEAL RESPONSE]
--- DESIRED RESPONSE END ---
OBJECTIVE:
Diagnose exactly why my prompt failed to produce the desired response.
INSTRUCTIONS:
Analyse each RCOIF component:
1. ROLE — Was the role too vague? Too restrictive? Misaligned with the task?
2. CONTEXT — What context was missing? What irrelevant context created confusion?
3. OBJECTIVE — Was the objective ambiguous? Too broad? Contradictory?
4. INSTRUCTIONS — Were instructions unclear? Missing a key step? In wrong order?
5. FORMAT — Was the format specification clear? Contradicted by other instructions?
For each component: Problem identified + Severity (High/Medium/Low) + Specific fix.
FORMAT:
Table: Component | Problem | Severity | Fix
Root cause (1 sentence): The primary reason this prompt failed was...
Step 2 — Redesigned Prompt Generator
Based on your diagnosis above, now rewrite the prompt addressing all
identified issues.
Rules for the redesign:
1. Keep all five RCOIF components explicit and labeled
2. The Role must be specific (domain + approach + mindset)
3. The Context must include all information the model needs and nothing extra
4. The Objective must be achievable in a single response
5. The Instructions must be numbered and actionable
6. The Format must be explicit and unambiguous
Output the redesigned prompt inside a code block for easy copying.
After the prompt, explain in 2–3 sentences what changed and why.
Example: Iterating on the Triage Experiment Prompt
Problem encountered: The model was returning “I would classify this as High priority because…” instead of just “High”.
Diagnostic result: FORMAT component was too loose — “respond with only the priority label” was overridden by the model’s tendency to justify.
Fix applied:
Old FORMAT: "Respond with only the priority label."
New FORMAT: "Output EXACTLY ONE WORD from this list: Critical | High | Medium | Low
No explanation. No punctuation. No other text.
Example of correct response: High
Example of incorrect response: The priority is High."
Lesson logged in: PT-PUMA-Experiment-Prompts v1.1 refinement history
id: PT-Contextual-Anchoring-Pattern title: “Prompt Pattern: Contextual Anchoring” type: prompt-template tags: [prompt, contextual-anchoring, long-context, drift-prevention] tool: multiple methodology: contextual-anchoring use_case: research phase: F0-F5 version: 1.0 tested: true effectiveness: medium created: 2026-03-01
Prompt Pattern: Contextual Anchoring
Problem: In long prompts or multi-turn conversations, models “drift” away from initial constraints. Solution: Restate critical constraints at the end of the prompt (anchor point).
The Pattern
[... full prompt body ...]
--- ANCHORING REMINDERS (read before generating response) ---
- You are evaluating [SPECIFIC TASK], not general capabilities
- The dataset is [DATASET NAME] with [N] samples — stay within this scope
- The primary metric is [METRIC] — do not introduce other metrics
- Your audience is [AUDIENCE] — maintain [REGISTER] tone
- NEVER invent citations — flag uncertainty instead
- NEVER exceed [FORMAT CONSTRAINT]
Example for PUMA Research Prompts
[... RCOIF prompt body for gap analysis ...]
--- ANCHORING REMINDERS ---
- Focus ONLY on the PM+LLM intersection, not general SE benchmarks
- The benchmark uses LOCAL models only — cloud API comparisons are background only
- The evaluation datasets are Jira SR and TAWOS — do not discuss other datasets
as if PUMA uses them
- All citations must be verifiable — flag any paper you are not confident exists
- Response must be 400–600 words — do not exceed this
When to Use
- Prompts longer than 500 words
- Multi-turn conversations where context has accumulated
- When previous responses have drifted from the task
- When the model keeps introducing out-of-scope information
When NOT to Use
- Simple, short prompts — the reminder text adds noise
- When you want creative latitude from the model
id: PT-Agent-OS-System-Prompt title: “Prompt: Agent OS — PUMA Orchestrator System Prompt” type: prompt-template tags: [prompt, agent-os, system-prompt, orchestrator, puma-pipeline] tool: ollama methodology: agent-prompt-engineering use_case: experiment phase: F4-F5 version: 1.0 tested: false effectiveness: null created: 2026-03-01
Prompt: Agent OS — PUMA Orchestrator System Prompt
For Stage 4+ (optional). System-level prompt that governs how agents operate within PUMA. 30 - Permanent/33 Frameworks/PN-Agent-Prompt-Engineering
PUMA Orchestrator System Prompt
PUMA_ORCHESTRATOR_SYSTEM_PROMPT = """
You are the PUMA Orchestrator — the coordination layer of the PUMA benchmark system
for evaluating local LLMs on ICT project management tasks.
IDENTITY AND MANDATE:
- You coordinate specialised agents (TriageAgent, EstimationAgent, Reporter)
- You ensure reproducibility: every decision uses seed=42 deterministically
- You enforce Human-in-the-Loop: all outputs are RECOMMENDATIONS, never final decisions
- You track carbon emissions for every operation
OPERATING PRINCIPLES:
1. TRANSPARENCY — Every recommendation must include reasoning
2. REPRODUCIBILITY — Log all inputs, outputs, and parameters
3. HUMILITY — Express uncertainty when it exists; never overclaim
4. SCOPE — You handle ICT PM tasks only: triage, estimation, backlog prioritisation
OUTPUT SCHEMA (ALWAYS use this JSON format):
{
"task": "triage|estimation|planning",
"recommendation": "...",
"reasoning": "...",
"confidence": "high|medium|low",
"requires_human_review": true|false,
"carbon_tracked": true,
"model_used": "...",
"strategy_used": "..."
}
ETHICS GUARDRAILS:
- Never make final decisions about human work without explicit human confirmation
- Never access data outside the provided context window
- Always indicate when an issue is ambiguous rather than forcing a classification
- Flag any request that seems outside normal PM scope
FAILURE PROTOCOL:
If you cannot complete a task reliably, return:
{
"task": "...",
"recommendation": null,
"reasoning": "Cannot complete: [specific reason]",
"confidence": "low",
"requires_human_review": true
}
"""Usage in PUMA Architecture
# src/orchestrator/puma_orchestrator.py
from src.agents.triage_agent import TriageAgent
from src.agents.estimation_agent import EstimationAgent
class PUMAOrchestrator:
"""
Coordinates PUMA agents with system-level governance.
Enforces reproducibility, carbon tracking, and HITL principles.
"""
def __init__(self, model: str, strategy: str):
self.system_prompt = PUMA_ORCHESTRATOR_SYSTEM_PROMPT
self.triage = TriageAgent(model=model, strategy=strategy)
# self.estimator = EstimationAgent(model=model, strategy=strategy)
def process_issue(self, issue: dict) -> dict:
result = self.triage.classify(issue)
return {
"task": "triage",
"recommendation": result.predicted_priority,
"reasoning": result.reasoning,
"confidence": result.confidence,
"requires_human_review": result.confidence != "high",
"carbon_tracked": True,
"model_used": result.model,
"strategy_used": result.strategy
}