fix(eval): include function-call events in invocation_events when skip_summarization is set by Koushik-Salammagari · Pull Request #5417 · google/adk-python

Koushik-Salammagari · 2026-04-20T19:25:58Z

Link to Issue or Description of Change

Description

EvaluationGenerator.convert_events_to_eval_invocations builds
invocation_events (the intermediate tool-call record used by
TrajectoryEvaluator) by collecting all qualifying events and then excluding
the final_event from the list.

The final event is identified via event.is_final_response(), but
is_final_response() returns True for any event with
skip_summarization=True — even events that contain function_call parts
(e.g. tools that use skip_summarization to surface their result directly
without an LLM summarization step). Those events were silently dropped from
invocation_events, causing get_all_tool_calls() to return [] for the
actual invocation. The result: tool_trajectory_avg_score was always 0.0
even when the tool name and args matched the expected exactly.

Root cause: is_final_response() conflates "final user-visible response"
with "should be excluded from tool trajectory". When skip_summarization=True
the function-call event is both the final response and an intermediate step
that must appear in the trajectory.

Fix: in the list comprehension that builds invocation_events, keep an
event even when it equals final_event if it contains function calls:

# before
if e is not final_event

# after
if e is not final_event or e.get_function_calls()

Changes

src/google/adk/evaluation/evaluation_generator.py: one-line fix
tests/unittests/evaluation/test_evaluation_generator.py: regression test that verifies tool calls are preserved when skip_summarization=True
tests/unittests/evaluation/test_trajectory_evaluator.py: end-to-end tests for InvocationEvents intermediate_data format (exact match → 1.0, mismatch → 0.0)

Testing Plan

pytest tests/unittests/evaluation/test_trajectory_evaluator.py \
       tests/unittests/evaluation/test_evaluation_generator.py -v
======================== 47 passed in 1.23s ============================

…in thread pool When RunConfig.tool_thread_pool_config is enabled, _call_tool_in_thread_pool used None as a sentinel to distinguish "FunctionTool ran in thread pool" from "non-FunctionTool sync tool, needs async fallback". Because None is also a valid return value from any FunctionTool whose underlying function has no explicit return statement (implicit None), the sentinel check failed and execution fell through to tool.run_async(), invoking the function a second time silently. Replace the None sentinel with a dedicated _SYNC_TOOL_RESULT_UNSET object so that a legitimate None result from a FunctionTool is correctly returned on the first execution, without triggering the async fallback path. Fixes google#5284

…ases Per reviewer feedback: collapse the two near-identical None tests into a single @pytest.mark.parametrize test, and add falsy-but-not-None cases (0, '', {}, False) to prove the sentinel is identity-based and does not mishandle any falsy return value from a FunctionTool.

…p_summarization is set EvaluationGenerator.convert_events_to_eval_invocations builds invocation_events by excluding the final_event from intermediate steps. However, is_final_response() returns True for any event with skip_summarization=True, even when that event contains function calls (e.g. tools using skip_summarization to bypass LLM summarization). Such events were incorrectly excluded from invocation_events, causing get_all_tool_calls() to return an empty list and tool_trajectory_avg_score to always be 0.0 despite matching tool calls. Fix: keep an event in invocation_events even if it is the final_event when it contains function calls. Fixes google#5410

…rror

rohityan · 2026-04-20T22:43:37Z

Hi @Koushik-Salammagari , Thank you for your contribution! We appreciate you taking the time to submit this pull request. Please fix formatting errors by running autoformat.sh

Koushik-Salammagari added 5 commits April 14, 2026 08:14

style: apply pyink formatting to thread pool test file

5630c12

style: fix import ordering via autoformat.sh

1cf1330

adk-bot added the eval [Component] This issue is related to evaluation label Apr 20, 2026

Merge branch 'main' into fix/trajectory-eval-skip-summarization

eb6a51e

rohityan self-assigned this Apr 20, 2026

fix(eval): add type annotation and guard to resolve mypy union-attr e…

640135e

…rror

rohityan added the request clarification [Status] The maintainer need clarification or more information from the author label Apr 20, 2026

Koushik-Salammagari and others added 2 commits April 20, 2026 15:45

style: apply pyink formatting to evaluation_generator.py

fdb128b

Merge branch 'main' into fix/trajectory-eval-skip-summarization

87f8919

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

fix(eval): include function-call events in invocation_events when skip_summarization is set#5417

fix(eval): include function-call events in invocation_events when skip_summarization is set#5417
Koushik-Salammagari wants to merge 9 commits intogoogle:mainfrom
Koushik-Salammagari:fix/trajectory-eval-skip-summarization

Koushik-Salammagari commented Apr 20, 2026

Uh oh!

rohityan commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

Koushik-Salammagari commented Apr 20, 2026

Link to Issue or Description of Change

Description

Changes

Testing Plan

Uh oh!

rohityan commented Apr 20, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants