A Guide to Prompt Engineering Best Practices
Prompt engineering is the art and science of crafting effective inputs (prompts) to guide Large Language Models (LLMs) towards generating desired, accurate, and useful outputs. Since LLMs function as sophisticated prediction engines – fundamentally working to predict the next best group of letters, which are called 'tokens' based on their training data and the preceding sequence – the way you structure your prompt significantly influences the result. Finding the right prompt is an iterative process, as finding the right prompt requires tinkering. This involves experimentation with wording, structure, model configurations, and various fundamental and advanced techniques.
Understanding LLM Output Configuration
Before diving into prompt structure, it's crucial to understand the parameters controlling the LLM's output generation. These settings work in concert with your prompt to shape the final response.
| Parameter | Description | Impact & Guidance |
| Temperature | Controls the degree of randomness in token selection. Ranges typically from 0 to 1 (or higher). | "Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures can lead to more diverse or unexpected results." Setting temperature to 0 makes top-K and top-P irrelevant. |
| Top-K | Restricts the selection pool to the top 'K' most probable next tokens. | A low K (e.g., 1) is equivalent to greedy decoding. A higher K allows for more creative variation. |
| Top-P | Also known as nucleus sampling. Restricts the selection pool to the most probable tokens whose cumulative probability adds up to 'P'. Values range from 0 to 1. | Selects a dynamic number of tokens based on likelihood. A high P allows diversity while avoiding highly improbable tokens. A very low P approaches greedy decoding. |
| Max Output Tokens | Sets a limit on the number of tokens the model will generate in its response. | Controls output length. Crucial for managing cost, time, and avoiding truncated responses, especially with structured data. |
Interaction and Guidance:
- Top-K and Top-P filter candidates before Temperature adds randomness.
- Extreme settings (Temp=0, K=1, P≈0) lead to deterministic, greedy output.
- The best way to choose between top-K and top-P is to experiment with both methods (or both together) and see which one produces the results you are looking for.
- A general starting point for balanced results: temperature 0.2, top-P 0.95, top-K 30.
- Beware the “repetition loop bug” with inappropriate settings; careful tuning is key.
Core Prompting Techniques
These fundamental approaches structure your interaction with the LLM.
Zero-Shot Prompting
The simplest form: instructing the model directly without providing examples in the prompt. Relies entirely on the model's pre-existing capabilities.
Example: Classify movie reviews as POSITIVE, NEUTRAL or NEGATIVE. Review: 'Her' is a disturbing study…' Sentiment: (Expected Output: POSITIVE, based on internal knowledge)
Few-Shot Prompting
Providing 1 to 5 input-output examples within the prompt to guide the model on format, style, or task logic. One example is “one-shot.”
Example: Demonstrating pizza order parsing into JSON with a couple of input/output pairs before presenting the actual order to parse.
System, Contextual, and Role Prompting
These techniques set the stage:
- System Prompting: Sets overall rules or purpose. System prompts can be useful for generating output that meets specific requirements. The name ‘system prompt’ actually stands for ‘providing an additional task to the system’.
Example: Classify movie reviews as positive, neutral or negative. Only return the label in uppercase. - Contextual Prompting: Provides specific background relevant to the current task. Crucial for improving accuracy, especially with code or specific domains.
Example: Context: You are writing for a blog about retro 80's arcade video games… - Role Prompting: Assigns a specific persona. "Role prompting is a technique… that involves assigning a specific role to the gen AI model. This can help the model to generate more relevant and informative output…"
Example: I want you to act as a travel guide… You can also specify style (e.g., humorous).
Advanced Prompting Techniques
Methods for eliciting more complex reasoning and behavior.
Step-Back Prompting
Ask a more general, high-level question related to the task before the specific task. Use the answer to the general question as context for the specific prompt. This activates broader reasoning.
Chain of Thought (CoT) Prompting
Encourages the model to output its reasoning steps before the final answer, often improving accuracy on multi-step problems like math or logic puzzles. Triggered by phrases like "Let's think step by step." LLMs can struggle with even simple math initially ("Yikes. That’s obviously the wrong answer… LLMs often struggle with mathematical tasks…"), but CoT helps ("So let’s see if intermediate reasoning steps will improve the output."). Can be combined with few-shot examples showing reasoning.
Self-Consistency
Run the same CoT prompt multiple times with higher temperature, then take a majority vote on the final answers. "By generating many Chains of Thoughts, and taking the most commonly occurring answer… we can get a more consistently correct answer from the LLM." Increases accuracy at the cost of computation.
Tree of Thoughts (ToT)
An extension of CoT where the model explores multiple reasoning paths simultaneously, like branches of a tree, potentially evaluating intermediate steps. Suited for complex problems requiring exploration.
ReAct (Reason & Act)
Combines reasoning (Thought) with the ability to use external tools (Action) and learn from the results (Observation) in a loop. Allows LLMs to access real-time information (e.g., search) or perform calculations beyond their internal capabilities. Requires specific implementation frameworks.
Automatic Prompt Engineering (APE)
Using one LLM to generate multiple candidate prompts for a task performed by another (or the same) LLM. These candidates are then evaluated, often using metrics like BLEU or ROUGE, to find the optimal prompt.
Prompting for Specific Tasks: Code
LLMs are powerful tools for coding tasks.
- Generating, Explaining, Translating: Provide clear instructions, specify languages, and include relevant code snippets as input.
- Debugging & Reviewing: Provide the code, error messages/tracebacks, and ask for identification of issues and suggestions for improvement. LLMs can often spot bugs and suggest better practices.
- Caution: However, since LLMs can’t reason perfectly, and repeat training data, it’s essential to read and test your code first.
Multimodal Prompting
"Multimodal prompting is a separate concern, it refers to a technique where you use multiple input formats to guide a large language model, instead of just relying on text." This might involve images, audio, etc., alongside text, depending on the model's capabilities.
Handling Structured Data (JSON, YAML, Schemas)
Guiding LLMs to produce structured output is highly beneficial for consistency and programmatic use.
- Benefits: Consistent format, easier parsing, implicit data typing, reduced hallucinations.
- JSON: Common but verbose. Watch for truncation errors with long outputs; use json-repair tools if needed.
- YAML: More human-readable, better handling of multi-line strings (|, >).Explicitly ask for the output within yaml blocks. Embedding reasoning with comments (# reason) before data points is effective.
- Validation: Always Validate: Even with YAML's flexibility, parse and validate the output in your code using assert or other schema checks.
- Schemas (for Input): Providing a JSON Schema for input data helps the LLM understand structure and types, improving focus, especially for complex data.
Long Context Considerations & Context Management
While models handle longer contexts, careful management is still needed.
- Relevance is Key: Providing ample relevant context (like code snippets from open files, as GitHub Copilot does) improves accuracy. However, irrelevant information in an LLM’s context decreases its accuracy. Selective context is better than flooding the model.
- Performance: Accuracy on tasks requiring pinpointing details or complex reasoning across vast distances in the context can degrade even with large windows.
- RAG: Retrieval-Augmented Generation is a related technique where relevant information is fetched from an external knowledge base and inserted into the context dynamically.
General Best Practices Synthesized
- Provide Examples (Few-Shot): Highly effective for teaching format, style, and logic
- Design with Simplicity & Clarity: Use clear, concise language. Avoid ambiguity. Use action verbs.
- Be Specific About the Output: Define expectations clearly (format, length, tone, content). Quantify requests (e.g., “Write a sonnet with 14 lines…” vs. “Write a long poem”).
- Use Instructions Over Constraints: Tell the model what to do rather than only what not to do, where possible.
- Control Max Token Length: Manage cost, time, and prevent truncation.
- Iterate and Experiment: Prompting is a process. Test variations. “Different models, model configurations, prompt formats, word choices, and submits can yield different results. Therefore, it’s important to experiment…”
- Document Your Attempts: Track prompts, settings, models, and results rigorously (e.g., in a spreadsheet) to learn and debug.
- Mix Classes in Few-Shot Classification: Avoid ordering bias by interleaving examples from different categories.
- Adapt to Model Updates: Test prompts against new model versions.
- Use Variables: Employ placeholders (e.g.,
{city}) for dynamic inputs, making prompts reusable in applications. - Consider Structured Output: Use JSON, YAML, etc., for data extraction, classification, or consistent formatting needs.
- Encourage Reasoning (CoT): For complex tasks, explicitly ask the model to show its work (“think step by step”). Use low temperature (even 0) for CoT.
- Structure Your Prompt: Consider a logical flow: Role/Objective, Instructions, Reasoning Steps (if CoT), Output Format specifications, Examples, Context, Final instructions. Use delimiters (Markdown --- or ```, XML tags) to separate sections.
- Leverage Tools (ReAct/Tool Use): For tasks needing external data or computation, explore techniques that allow the LLM to use tools.
- Validate Code: Always thoroughly review and test any code generated by an LLM.