LLMsAIPrompt EngineeringPython

Prompt Engineering for Developers: Practical Patterns That Work in Production

10 June 2025·9 min read·Harshit Gupta

TL;DR

The highest-leverage prompt patterns: clear role definition, structured output instructions (JSON schema in the prompt), chain-of-thought for complex reasoning, few-shot examples for consistent formatting, and explicit negative constraints ("do not..."). These alone fix 80% of LLM output reliability problems in production.

Why "Just Ask Better" Isn't Enough

When I started building LLM features, I thought prompt engineering meant writing clearer instructions. Add "please". Be specific. Use bullet points. And yes, those things help marginally. But production reliability comes from understanding how LLMs actually process instructions — and designing prompts that exploit that architecture rather than fight it.

A well-engineered prompt is more like an API contract than a polite request. It defines: the model's role, the exact output structure, the constraints on behavior, and examples of the expected pattern. When you frame it this way, the reliability gap between "it usually works in demo" and "it works 98% of the time in production" becomes achievable.

Pattern 1: Role + Task + Output Structure

The most reliable prompt structure I've found combines three elements in the system message: who the model is, what it needs to do, and exactly what format to return results in:

SYSTEM_PROMPT = """You are a professional credential copywriter for CertifyMe, an enterprise certification platform.

Your task is to write concise, professional descriptions for digital credentials issued to learners.

Always return a JSON object with exactly these fields:
{
  "headline": "One sentence, max 15 words, action-oriented",
  "body": "Two to three sentences describing skills demonstrated, max 60 words",
  "skills": ["skill1", "skill2", "skill3"]  // 3 to 5 specific technical skills
}

Do not include markdown, explanation, or any text outside the JSON object."""

Put output format instructions in the system message

Output format instructions belong in the system prompt, not the user message. The system prompt sets the model's operating context and takes higher precedence. Mixing structural instructions into user messages leads to inconsistent adherence, especially when user messages vary widely.

Pattern 2: Few-Shot Examples for Format Consistency

Describing output format in prose is good. Showing it with examples is better. Few-shot examples help the model pattern-match to your exact desired output — especially for formatting nuances that are hard to describe verbally:

FEW_SHOT_EXAMPLES = [
    {
        "role": "user",
        "content": "Credential: AWS Solutions Architect Associate. Issued by: Amazon Web Services."
    },
    {
        "role": "assistant",
        "content": '{"headline": "Demonstrates expertise in designing scalable AWS cloud architectures", "body": "Validates the ability to design and deploy distributed systems on AWS. Covers core services including EC2, S3, RDS, and VPC, with emphasis on high availability and cost optimization.", "skills": ["AWS", "Cloud Architecture", "EC2", "S3", "High Availability"]}'
    },
    {
        "role": "user",
        "content": "Credential: Python for Data Science. Issued by: DataCamp."
    },
    {
        "role": "assistant",
        "content": '{"headline": "Certifies proficiency in Python-based data analysis and visualization", "body": "Demonstrates hands-on skills in data manipulation with Pandas, statistical analysis with NumPy, and visualization with Matplotlib. Covers real-world data cleaning and exploratory analysis workflows.", "skills": ["Python", "Pandas", "NumPy", "Matplotlib", "Data Analysis"]}'
    }
]

Pattern 3: Chain-of-Thought for Complex Reasoning

For tasks requiring multi-step reasoning — classification, analysis, decision-making — ask the model to reason through the problem before giving its answer. This pattern, called chain-of-thought prompting, significantly improves accuracy on reasoning tasks:

ANALYSIS_PROMPT = """Analyze whether this credential is appropriate to display publicly.

Think through these criteria step by step:
1. Does the issuing organization appear to be a legitimate entity?
2. Does the credential title suggest a real skill or achievement?
3. Are there any red flags (suspicious content, inappropriate language, spam)?

After your analysis, give your final decision as JSON:
{"decision": "approve" | "reject", "confidence": 0.0-1.0, "reason": "one sentence"}

Credential data:
{credential_json}"""

The explicit instruction to reason step-by-step forces the model to "think out loud" internally before committing to an answer. For our content moderation pipeline, this reduced false rejection rates by 31% compared to prompts that only asked for the final decision.

Pattern 4: Explicit Negative Constraints

Telling a model what NOT to do is often more effective than describing what to do. LLMs are trained to be helpful and verbose — without explicit negative constraints, they will add explanations, caveats, markdown formatting, and other content you didn't ask for:

STRICT_PROMPT = """Extract the certificate holder's name and issuing date from the document.

Return ONLY a JSON object: {"holder_name": "...", "issued_date": "YYYY-MM-DD"}

Do NOT:
- Include any explanation or commentary
- Add markdown formatting or code blocks
- Guess or infer values not present in the document
- Return partial results — if either field is missing, return null for that field"""

Pattern 5: Structured Error Handling in Prompts

Design your prompts to handle failure gracefully — tell the model explicitly what to do when it can't complete the task rather than letting it hallucinate or return malformed output:

EXTRACTION_PROMPT = """Extract structured data from the following text.

If a field cannot be found or inferred reliably, use null — do NOT guess.
If the input is not a credential document at all, return:
{"error": "not_a_credential", "fields": null}

Expected output schema:
{
  "credential_title": string | null,
  "issuer": string | null,
  "holder": string | null,
  "issued_date": "YYYY-MM-DD" | null,
  "expiry_date": "YYYY-MM-DD" | null
}"""

Temperature matters for structured output

For structured data extraction and classification tasks, set temperature=0 or 0.1. Higher temperatures increase creativity — which is useful for generation tasks but actively harmful for extraction tasks where you need consistent, deterministic output. Temperature is not a global setting; tune it per use case.

Testing Your Prompts Like Code

Prompt changes should be version-controlled and tested against a fixed eval set — just like code changes. The minimum viable prompt eval process:

Build a dataset of 50-100 representative inputs with expected outputs
Run every prompt change through the full eval set before deploying
Track pass rate, edge case failures, and output format adherence over time
Store prompt versions in code (not the UI) so changes are auditable

Key Takeaways

Role + task + output structure in the system prompt is the foundation of reliable LLM calls
Few-shot examples beat written format descriptions for formatting consistency
Chain-of-thought reasoning significantly improves accuracy on classification and analysis tasks
Explicit negative constraints ("do not...") prevent the model's default verbose behavior
Design prompts to handle failure gracefully with explicit null/error return paths
Set temperature near 0 for extraction/classification; higher for creative generation
Treat prompts like code — version control, eval sets, and regression testing

Back to All Posts

Written by Harshit Gupta