If you are looking for a comprehensive Google tutorial on prompt engineering in PDF format, the “Prompt Engineering” whitepaper by Lee Boonstra (February 2025) is the perfect resource. This detailed guide explains how to design, optimize, and apply prompts effectively when working with Google’s Gemini models in Vertex AI, as well as other large language models (LLMs) like GPT, Claude, Gemma, and LLaMA.

I. What is Prompt Engineering?

Prompt engineering is the practice of crafting well-structured inputs (prompts) to guide AI models in generating accurate, creative, and useful outputs. With the right techniques, prompts can be applied to tasks such as:

  • Text summarization and content creation
  • Question answering and data extraction
  • Code generation, debugging, and translation
  • Sentiment analysis and classification
  • Multimodal prompting (text + images, audio, or other formats)

II. What’s Inside the Google Prompt Engineering PDF?

This tutorial-style PDF walks you step by step through essential LLM concepts and techniques, including:

  • LLM Output Configuration – Control over token length, randomness (temperature), top-K, and top-P sampling.
  • Prompting Techniques – Zero-shot, one-shot, few-shot, system prompting, role prompting, contextual prompting, and advanced methods like Chain of Thought (CoT), Self-Consistency, Tree of Thoughts (ToT), and ReAct (Reason + Act).
  • Code Prompting – How to write, debug, explain, and even translate code using prompts.
  • Automatic Prompt Engineering (APE) – Using AI to generate and optimize prompts automatically.
  • Best Practices – How to provide examples, simplify instructions, use variables, and avoid common pitfalls such as repetition loops.

III. Why Download This Book?

  • Google-focused: Learn how to use prompts directly with Gemini inside Vertex AI Studio.
  • Hands-on examples: Real-world prompts with sample inputs, outputs, and configurations.
  • Future-proof skills: Mastering prompt engineering ensures you can leverage AI effectively in software development, education, business, and research.

IV. Who Should Read This Google Prompt Engineering Tutorial?

  • Developers & data scientists working with LLMs.
  • Business leaders exploring AI-driven automation.
  • Educators & researchers studying human-AI interaction.
  • Content creators and marketers who want better control over AI outputs.

V. Prompt engineering document summary

  • Date: February 2025
  • Author: Lee Boonstra (with content contributions from Michael Sherman, Yuan Cao, Erick Armbrust, Anant Nawalgaria, Antonio Gulli, Simone Cammel; curated and edited by Antonio Gulli, Anant Nawalgaria, Grace Mollison; technical writer Joey Haymaker; designer Michael Lanning)
  • Source: Google

1. Introduction to Prompt Engineering

Prompt engineering is the iterative process of designing high-quality text inputs (prompts) to guide Large Language Models (LLMs) to produce accurate, relevant, and desired outputs. While anyone can write a prompt, “crafting the most effective prompt can be complicated” due to factors like the model used, its training data, configurations, word choice, style, tone, structure, and context. Inadequate prompts can lead to “ambiguous, inaccurate responses, and can hinder the model’s ability to provide meaningful output.”

LLMs operate as “prediction engines,” taking sequential text as input and predicting the next most probable token based on their training data. Prompt engineering aims to set up the LLM to predict the correct sequence of tokens for various tasks such as text summarization, information extraction, Q&A, classification, translation, and code generation/explanation/debugging.

2. LLM Output Configuration

Beyond the prompt itself, effective prompt engineering requires optimizing various model configuration settings that control the LLM’s output:

  • Output Length: This setting determines the number of tokens to generate. Generating more tokens incurs higher computation costs, slower response times, and increased energy consumption. While reducing output length makes the LLM stop generating tokens, the prompt may need to be engineered to ensure stylistic or textual succinctness.
  • Sampling Controls: LLMs predict probabilities for each token in their vocabulary. Sampling controls determine how these probabilities are processed to choose the next output token.
  • Temperature: Controls the degree of randomness. “Lower temperatures are good for prompts that expect a more deterministic response, while higher temperatures can lead to more diverse or unexpected results.” A temperature of 0 (greedy decoding) is deterministic, always selecting the highest probability token. High temperatures (e.g., above 1) make all tokens equally likely, leading to more random output.
  • Top-K: Selects the top K most likely tokens. “The higher top-K, the more creative and varied the model’s output; the lower top-K, the more restive and factual the model’s output.” A Top-K of 1 is equivalent to greedy decoding.
  • Top-P (Nucleus Sampling): Selects tokens whose cumulative probability does not exceed a certain value (P). Values range from 0 (greedy decoding) to 1 (all tokens).
  • Interaction: These settings impact each other. If temperature is 0, Top-K and Top-P become irrelevant. If Top-K is 1, temperature and Top-P are irrelevant. Experimentation is crucial to find the optimal balance.
  • General Starting Points:Coherent, mildly creative: Temperature 0.2, Top-P 0.95, Top-K 30.
  • Creative: Temperature 0.9, Top-P 0.99, Top-K 40.
  • Less creative: Temperature 0.1, Top-P 0.9, Top-K 20.
  • Single correct answer (e.g., math): Temperature 0.
  • Repetition Loop Bug: An issue where LLMs get stuck generating repetitive filler words or phrases. This can occur at both low (overly deterministic) and high (excessively random) temperatures. “Solving this often requires careful tinkering with temperature and top-k/top-p values to find the optimal balance between determinism and randomness.”

3. Prompting Techniques

Specific techniques leverage LLM training and functionality to achieve better results:

  • General Prompting / Zero-Shot: The simplest prompt, providing only a task description and initial text without examples. It’s called “zero-shot” because it has no examples.
  • One-Shot & Few-Shot: Provides one (one-shot) or multiple (few-shot) examples to the model. This is especially useful for guiding the model toward a specific output structure or pattern. For complex tasks, “at least three to five examples” are recommended, but this can vary based on task complexity, example quality, and model capabilities. Edge cases should be included for robust output.
  • System, Contextual, and Role Prompting: Techniques to guide LLM text generation by focusing on different aspects:
  • System Prompting: Sets the “overall context and purpose” or defines “additional information on how to return the output.” Examples include specifying output format (e.g., JSON) or safety instructions (“You should be respectful in your answer.”). Using JSON format can “force the model to create a structure and limit hallucinations.”
  • Contextual Prompting: Provides “specific details or background information relevant to the current conversation or task.” This helps the model understand nuances and tailor responses.
  • Role Prompting: Assigns a “specific character or identity” (e.g., book editor, travel guide) for the LLM to adopt, influencing its tone, style, and focused expertise. Styles can include “Confrontational, Descriptive, Direct, Formal, Humorous, Influential, Informal, Inspirational, Persuasive.”
  • Step-Back Prompting: Improves performance by first prompting the LLM with a general question related to the task. The answer to this general question is then fed into a subsequent prompt for the specific task. This “allows the LLM to activate relevant background knowledge and reasoning processes,” leading to “more accurate and insightful responses” and potentially mitigating biases.
  • Chain of Thought (CoT): Improves LLM reasoning by generating “intermediate reasoning steps.” This is effective for complex tasks, especially when combined with few-shot prompting. CoT is “low-effort while being very effective and works well with off-the-shelf LLMs.” It also provides interpretability by showing the reasoning steps. Disadvantages include increased output tokens, leading to higher costs and longer prediction times. For CoT, the temperature should typically be set to 0.
  • Self-Consistency: Addresses the limitations of CoT’s “greedy decoding” by combining sampling and majority voting. It involves “generating diverse reasoning paths” (by providing the same prompt multiple times with a high temperature), extracting the answer from each, and then “choos[ing] the most common answer.” This improves accuracy and coherence but has “high costs.”
  • Tree of Thoughts (ToT): Generalizes CoT by allowing LLMs to “explore multiple different reasoning paths simultaneously” (like a tree structure) instead of a single linear chain. This is well-suited for complex tasks requiring exploration.
  • ReAct (Reason & Act): A paradigm for LLMs to solve complex tasks by combining “natural language reasoning with external tools (search, code interpreter etc.)” This creates a “thought-action loop,” where the LLM reasons, plans actions, performs them, observes results, and updates its reasoning. ReAct mimics how humans operate by reasoning and taking actions to gain information.

4. Automatic Prompt Engineering (APE)

Recognizing the complexity of prompt writing, APE aims to automate the process. This involves prompting a model to “generate more prompts,” evaluating them (e.g., using BLEU or ROUGE metrics), altering good ones, and repeating the process to enhance model performance.

5. Code Prompting

LLMs like Gemini can be used to generate, explain, translate, debug, and review code:

  • Writing Code: LLMs can generate code in various programming languages, speeding up development. It’s crucial to “read and test your code first” as LLMs cannot truly reason and may repeat training data.
  • Explaining Code: LLMs can provide explanations for existing code snippets, aiding developers in understanding unfamiliar code.
  • Translating Code: LLMs can translate code from one programming language to another (e.g., Bash to Python), facilitating application development.
  • Debugging and Reviewing Code: LLMs can identify errors in code, explain what went wrong, and suggest improvements for robustness and flexibility.

6. Best Practices for Prompt Engineering

To become a proficient prompt engineer, several best practices are recommended:

  • Provide Examples: One-shot or few-shot examples are “highly effective because it acts as a powerful teaching tool,” guiding the model toward desired outputs.
  • Design with Simplicity: Prompts should be “concise, clear, and easy to understand for both you and the model.” Avoid complex language and unnecessary information. Use strong verbs like Act, Analyze, Classify, Create, Describe, Generate, Summarize, Translate, Write.
  • Be Specific About the Output: Provide explicit details about the desired output format, length, and content. This helps the model focus and improves accuracy.
  • Use Instructions Over Constraints: “Focusing on positive instructions in prompting can be more effective than relying heavily on constraints.” Instructions directly communicate the desired outcome, fostering creativity, while constraints might leave the model guessing. Constraints are valuable for safety, clarity, or strict formatting needs.
  • Control the Max Token Length: Set a maximum token limit in the configuration or explicitly request a specific length in the prompt (e.g., “Explain quantum physics in a tweet length message.”).
  • Use Variables in Prompts: Incorporate variables (e.g., {city}) to make prompts reusable and dynamic for different inputs, especially when integrating prompts into applications.
  • Experiment with Input Formats and Writing Styles: Different model configurations, prompt formats, word choices, and styles yield varying results. Experiment with questions, statements, or instructions.
  • Mix Up Classes for Few-Shot Classification: In classification tasks, randomize the order of possible response classes in few-shot examples to prevent overfitting to a specific order. A good starting point is 6 few-shot examples.
  • Adapt to Model Updates: Stay informed about model architecture changes, new data, and capabilities. Adjust prompts to leverage new features.
  • Experiment with Output Formats: For non-creative tasks (extraction, parsing, ordering), structured formats like JSON or XML are beneficial. JSON offers advantages such as consistent style, focus on data, reduced hallucinations, relationship awareness, data types, and sorting capabilities.
  • JSON Repair: Tools like json-repair can automatically fix incomplete or malformed JSON objects, which can occur due to token limits causing truncation.
  • Working with Schemas: JSON Schemas can also be used to structure input, providing the LLM with “a clear blueprint of the data it should expect,” reducing misinterpretation, establishing relationships between data, and making the LLM “time-aware.”
  • Experiment Together with Other Prompt Engineers: Collaborate with others to develop and test prompts, leveraging diverse perspectives.
  • CoT Best Practices: For CoT, place the answer after the reasoning steps. Ensure mechanisms to extract the final answer from the reasoning. Set temperature to 0 for single, correct answers in reasoning tasks.
  • Document the Various Prompt Attempts: Crucially, “document your prompt attempts in full detail so you can learn over time what went well and what did not.” This includes recording the prompt name, goal, model, configuration settings (temperature, token limit, Top-K, Top-P), the full prompt, and the output. Tracking versions, result status (OK/NOT OK/SOMETIMES OK), and feedback is also recommended. Prompts should be saved separately from code in project codebases and integrated into operationalized systems with automated tests and evaluation.

Prompt engineering is an “iterative process” requiring continuous crafting, testing, analysis, and refinement of prompts based on model performance.