User Simulation¶

Supported in ADKPython v1.18.0

When evaluating conversational agents, it is not always practical to use a fixed set of user prompts, as the conversation can proceed in unexpected ways. For example, if the agent needs the user to supply two values to perform a task, it may ask for those values one at a time or both at once. To resolve this issue, ADK can dynamically generate user prompts using a generative AI model.

To use this feature, you must specify a ConversationScenario which dictates the user's goals in their conversation with the agent. You may also specify a user persona that you expect the user to adhere to.

A ConversationScenario consists of the following components:

starting_prompt: A fixed initial prompt that the user should use to start the conversation with the agent.
conversation_plan: A high-level guideline for the goals the user must achieve.
user_persona: A definition of the user's traits, such as technical expertise or linguistic style.

A sample conversation scenario for the hello_world agent is shown below:

{
  "starting_prompt": "What can you do for me?",
  "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime."
}

The LLM uses the conversation_plan, along with the conversation history, to dynamically generate user prompts.

You can also specify a pre-built user_persona in the following manner:

{
  "starting_prompt": "What can you do for me?",
  "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime.",
  "user_persona": "NOVICE"
}

While the conversation plan dictates what must be accomplished, the persona dictates how the model phrases its queries and reacts to the agent's responses.

Try it in Colab

Test this entire workflow yourself in an interactive notebook on Simulating User Conversations to Dynamically Evaluate ADK Agents. You'll define a conversation scenario, run a "dry run" to check the dialogue, and then perform a full evaluation to score the agent's responses.

User Personas¶

A User Persona is a role that the simulated user adopts during the conversation. It is defined by a set of behaviors that dictate how the user interacts with the agent, such as their communication style, how they provide information, and how they react to errors.

A UserPersona consists of the following fields:

id: A unique identifier for the persona.
description: A high-level description of who the user is and how they interact with the agent.
behaviors: A list of UserBehavior objects that define specific traits.

Each UserBehavior includes:

name: The name of the behavior.
description: A summary of the expected behavior.
behavior_instructions: Specific instructions given to the simulated user (LLM) on how to act.
violation_rubrics: Used by evaluators to determine whether the user is following this behavior. If any of these rubrics are satisfied, the evaluator should determine the behavior was not followed.

Pre-built Personas¶

ADK provides a set of pre-built personas composed of common behaviors. The table below summarizes the behaviors for each persona:

Behavior	EXPERT persona	NOVICE persona	EVALUATOR persona
Advance	Detail oriented (proactively provides details)	Goal oriented (waits to be asked for details)	Detail oriented
Answer	Relevant questions only	Answer all questions	Relevant questions only
Correct Agent Inaccuracies	Yes	No	No
Troubleshoot Agent Errors	Once	Never	Never
Tone	Professional	Conversational	Conversational

Example: Evaluating the `hello_world` agent with conversation scenarios¶

To add evaluation cases containing conversation scenarios to a new or existing EvalSet, you need to first create a list of conversation scenarios to test the agent in.

Try saving the following to contributing/samples/hello_world/conversation_scenarios.json:

{
  "scenarios": [
    {
      "starting_prompt": "What can you do for me?",
      "conversation_plan": "Ask the agent to roll a 20-sided die. After you get the result, ask the agent to check if it is prime.",
      "user_persona": "NOVICE"
    },
    {
      "starting_prompt": "Hi, I'm running a tabletop RPG in which prime numbers are bad!",
      "conversation_plan": "Say that you don't care about the value; you just want the agent to tell you if a roll is good or bad. Once the agent agrees, ask it to roll a 6-sided die. Finally, ask the agent to do the same with 2 20-sided dice.",
      "user_persona": "EXPERT"
    }
  ]
}

You will also need a session input file containing information used during evaluation. Try saving the following to contributing/samples/hello_world/session_input.json:

{
  "app_name": "hello_world",
  "user_id": "user"
}

Then, you can add the conversation scenarios to an EvalSet:

# (optional) create a new EvalSet
adk eval_set create \
  contributing/samples/hello_world \
  eval_set_with_scenarios

# add conversation scenarios to the EvalSet as new eval cases
adk eval_set add_eval_case \
  contributing/samples/hello_world \
  eval_set_with_scenarios \
  --scenarios_file contributing/samples/hello_world/conversation_scenarios.json \
  --session_input_file contributing/samples/hello_world/session_input.json

By default, ADK runs evaluations with metrics that require the agent's expected response to be specified. Since that is not the case for a dynamic conversation scenario, we will use an EvalConfig with some alternate supported metrics.

Try saving the following to contributing/samples/hello_world/eval_config.json:

{
  "criteria": {
    "hallucinations_v1": {
      "threshold": 0.5,
      "evaluate_intermediate_nl_responses": true
    },
    "safety_v1": {
      "threshold": 0.8
    }
  }
}

Finally, you can use the adk eval command to run the evaluation:

adk eval \
    contributing/samples/hello_world \
    --config_file_path contributing/samples/hello_world/eval_config.json \
    eval_set_with_scenarios \
    --print_detailed_results

User simulator configuration¶

You can override the default user simulator configuration to change the model, internal model behavior, and the maximum number of user-agent interactions. The below EvalConfig shows the default user simulator configuration:

{
  "criteria": {
    # same as before
  },
  "user_simulator_config": {
    "model": "gemini-2.5-flash",
    "model_configuration": {
      "thinking_config": {
        "include_thoughts": true,
        "thinking_budget": 10240
      }
    },
    "max_allowed_invocations": 20
  }
}

model: The model backing the user simulator.
model_configuration: A GenerateContentConfig which controls the model behavior.
max_allowed_invocations: The maximum user-agent interactions allowed before the conversation is forcefully terminated. This should be set to be greater than the longest reasonable user-agent interaction in your EvalSet.
custom_instructions: Optional. Overrides the default instructions for the user simulator. The instruction string must contain the following formatting placeholders using Jinja syntax (do not substitute values in advance!):
- {{ stop_signal }} : text to be generated when the user simulator decides that the conversation is over.
- {{ conversation_plan }} : the overall plan for the conversation that the user simulator must follow.
- {{ conversation_history }} : the conversation between the user and the agent so far.
- You can also access the UserPersona object through the {{ persona }} placeholder.

Custom Personas¶

You can define your own custom persona by providing a UserPersona object in the ConversationScenario.

Example of a custom persona definition:

{
  "starting_prompt": "I need help with my account.",
  "conversation_plan": "Ask the agent to reset your password.",
  "user_persona": {
    "id": "IMPATIENT_USER",
    "description": "A user who is in a rush and gets easily frustrated.",
    "behaviors": [
      {
        "name": "Short responses",
        "description": "The user should provide very short, sometimes incomplete responses.",
        "behavior_instructions": [
            "Keep your responses under 10 words.",
            "Omit polite phrases."
        ],
        "violation_rubrics": [
            "The user response is over 10 words.",
            "The user response is overly polite."
        ]
      }
    ]
  }
}