-
Notifications
You must be signed in to change notification settings - Fork 29
UN-3096 add 1st e2e test case #179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. Weβll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Changes from 4 commits
dfb11bd
2fd90a3
6a653b2
87bb429
fe68a4e
539cd33
9119952
ca4cdd0
993cac0
219a0b1
e60fcad
a1bfefd
66fc08e
c1273fc
4294245
c6215e4
f5f6d9d
769c352
2b4a38a
5802313
f57ceec
e58d261
c138c5f
5ae944e
d669421
bdd9548
a74094c
abdcde8
e028a2e
475a76d
cfe348f
2c2d694
67649c1
db5a2a1
38c79c5
c250fda
700459f
db10cf7
fadcd3b
4526cbd
3832285
a84833d
19b5fd8
a37c0a7
aab8039
476d9c6
469f982
3c09c55
00782ef
bfa15b8
739a198
40de058
84168d4
c778f69
a9e4aee
ea1d37b
d983628
be264d4
7e0c486
7c30777
b4ffbd8
417f680
4b952a7
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,114 @@ | ||
| # π§ͺ End-to-End Agent Testing Framework | ||
|
|
||
| This project provides an extensible, reusable **pytest**-based test system to validate AI agent behavior through real CLI interactions. | ||
|
|
||
| It supports: | ||
| - Running **multiple connections** (`grpc`, `http`, `direct`) | ||
| - **Parallel execution** with **pytest-xdist** | ||
| - Optional **thinking file capture** for agent internals | ||
| - Config-driven prompts using **HOCON** files | ||
|
|
||
| --- | ||
|
|
||
| ## π¦ Project Structure | ||
|
|
||
| ```bash | ||
| e2e/ | ||
| βββ README.md # This documentation | ||
| βββ configs/ # Static agent configuration | ||
|
||
| β βββ config.hocon | ||
| βββ conftest.py # Pytest customizations (CLI args, test discovery) | ||
| βββ pytest.ini # Pytest settings | ||
| βββ requirements.txt # Python dependencies | ||
| βββ test_cases_data/ # Test data for each agent | ||
| β βββ mnpt_data.hocon | ||
| βββ tests/ # Test case source files | ||
| β βββ test_music_nerd_pro.py | ||
| βββ utils/ # Helper modules (parsing, building commands, etc.) | ||
| βββ mnpt_hocon_loader.py | ||
| βββ mnpt_output_parser.py | ||
| βββ mnpt_test_runner.py | ||
| βββ thinking_file_builder.py | ||
| βββ verifier.py | ||
| ``` | ||
|
|
||
| --- | ||
|
|
||
| ## π Running Tests | ||
|
|
||
| ### Install Dependencies | ||
|
|
||
| ```bash | ||
| pip install -r requirements.txt | ||
| ``` | ||
|
|
||
| ### Basic Test Command | ||
|
|
||
| Run a test (default: **all connections**): | ||
|
|
||
| ```bash | ||
| pytest tests/ --verbose | ||
| ``` | ||
|
|
||
| Run for specific connection only: | ||
|
|
||
| ```bash | ||
| pytest tests/ --connection grpc --verbose | ||
| ``` | ||
|
|
||
| Run and enable thinking file output: | ||
|
|
||
| ```bash | ||
| pytest tests/ --thinking-file --verbose | ||
| ``` | ||
|
|
||
| Enable parallel test execution: | ||
|
|
||
| ```bash | ||
| pytest tests/ --connection grpc --repeat 5 --thinking-file -n auto --verbose | ||
| ``` | ||
|
|
||
| > π‘ When using `-n auto`, each repeat runs across multiple CPU cores. | ||
|
|
||
| --- | ||
|
|
||
| ## βοΈ CLI Options | ||
|
|
||
| | Option | Description | | ||
| |:------------------|:------------| | ||
| | `--connection` | Run tests only for a specific connection (e.g., `grpc`, `http`, `direct`). | | ||
| | `--repeat` | Repeat each test multiple times. | | ||
| | `--thinking-file` | Save the agent's internal "thinking" to a temp directory during the test. | | ||
|
|
||
| --- | ||
|
|
||
| # π§ Agent: MusicNerdPro Test (test_music_nerd_pro.py) | ||
|
|
||
| This suite tests the `music_nerd_pro` agent over all connection types. | ||
|
|
||
| ### Test Logic | ||
|
|
||
| - Load prompt/expected outputs from **HOCON** config files | ||
| - Spawn a CLI agent process | ||
| - Send user questions | ||
| - Verify that: | ||
| - Correct keyword appears in the response | ||
| - Correct cost value is returned | ||
|
|
||
| ### Related Files | ||
|
|
||
| | File | Purpose | | ||
| |:-----|:--------| | ||
| | `tests/test_music_nerd_pro.py` | Main test case (pytest function) | | ||
| | `test_cases_data/mnpt_data.hocon` | Prompt/expected answer definitions | | ||
| | `configs/config.hocon` | Static agent config (connections list) | | ||
| | `utils/*.py` | Reusable helpers for all agent tests | | ||
|
|
||
| --- | ||
|
|
||
| # π Notes | ||
|
|
||
| - **Thinking files** are stored under `/private/tmp/agent_thinking/` | ||
| - If `-n auto` is used, **worker-specific** folders are created (e.g., `run_gw0_1`). | ||
| - **PEXPECT** is used to fully simulate CLI typing behavior. | ||
| - Future agents can be easily added following the same pattern as MusicNerdPro! | ||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,8 @@ | ||
| # config.hocon | ||
| # Agent config & connection setup | ||
vince-leaf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
|
|
||
| connection = ["direct", "grpc", "http"] | ||
| agent = [music_nerd_pro] | ||
|
|
||
| model_llm = ["gpt-4o", "llama3.1"] | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. LLMs should be a property of the agent, not the test.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I listed here as I was thinking of a performance test case(s). For example:
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Alternatively, we could utilize the existing infrastructure on the sly_data feature to perform the comparison. |
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,105 @@ | ||
| # conftest.py | ||
| # ------------------------------------------------------------------------ | ||
| # Pytest configuration for MusicNerdPro tests. | ||
| # Provides custom CLI flags, dynamic test generation, and environment setup. | ||
| # ------------------------------------------------------------------------ | ||
|
|
||
| import pytest | ||
| import os | ||
| from pyhocon import ConfigFactory | ||
|
|
||
| # ------------------------------------------------------------------------------ | ||
| # Constants | ||
| # ------------------------------------------------------------------------------ | ||
|
|
||
| # Directory where agent CLI thinking files will be written (optional feature) | ||
| THINKING_FILE_PATH = "/private/tmp/agent_thinking" | ||
|
|
||
| # Static agent config (HOCON) loaded once for all tests | ||
| CONFIG_HOCON_PATH = os.path.join(os.path.dirname(__file__), "configs", "config.hocon") | ||
| config = ConfigFactory.parse_file(CONFIG_HOCON_PATH) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Parse the config hocon to get connections. |
||
|
|
||
| # ------------------------------------------------------------------------------ | ||
| # Hooks | ||
| # ------------------------------------------------------------------------------ | ||
|
|
||
| def pytest_configure(config): | ||
| """ | ||
| Prints custom environment info when pytest starts. | ||
| Helps verify environment settings. | ||
| """ | ||
| print("\nCustom Environment Info") | ||
| print(f"thinking-file path : {THINKING_FILE_PATH}") | ||
|
|
||
| def pytest_addoption(parser): | ||
vince-leaf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| """ | ||
| Adds custom command-line options for pytest to control the test suite: | ||
| --connection -> Filter tests by specific connection method (direct/grpc/http) | ||
| --repeat -> Repeat the same test multiple times (for stability/reliability) | ||
| --thinking-file -> Enable writing out agent thinking_file logs during test | ||
| """ | ||
| group = parser.getgroup("custom options") | ||
| group.addoption( | ||
| "--connection", | ||
| action="store", | ||
| default=None, | ||
| help="Specify a connection name to test (e.g., direct, grpc, http). If omitted, all will be tested." | ||
| ) | ||
| group.addoption( | ||
| "--repeat", | ||
| action="store", | ||
| type=int, | ||
| default=1, | ||
| help="Number of times to repeat each test (for stress or reliability testing)." | ||
| ) | ||
| group.addoption( | ||
| "--thinking-file", | ||
| action="store_true", | ||
| default=False, | ||
| help="If enabled, agent will write a thinking_file log per test case (grpc/http/direct)." | ||
| ) | ||
|
|
||
| def pytest_generate_tests(metafunc): | ||
| """ | ||
| Dynamically parameterizes the tests based on the connection(s) and repetition requested. | ||
|
|
||
| Example: | ||
| --connection grpc --repeat 3 | ||
| β Runs 3 tests against 'grpc' connection. | ||
|
|
||
| --repeat 2 (with no connection) | ||
| β Runs 2 tests for each connection (direct, grpc, http). | ||
|
|
||
| This auto-expands into (connection_name, repeat_index) fixture pairs. | ||
| """ | ||
| if "connection_name" in metafunc.fixturenames: | ||
| all_connections = load_connections() | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. by default, the connection is all three |
||
| selected_connection = metafunc.config.getoption("connection") | ||
| repeat = metafunc.config.getoption("repeat") | ||
|
|
||
| # Filter if a specific connection is selected | ||
| if selected_connection: | ||
| if selected_connection not in all_connections: | ||
| raise ValueError(f"Connection '{selected_connection}' not found in config: {all_connections}") | ||
| all_connections = [selected_connection] | ||
|
|
||
| # Generate combinations of (connection_name, repeat_index) | ||
| test_params = [ | ||
| pytest.param(conn, i, id=f"{conn}_run{i+1}") | ||
| for conn in all_connections | ||
| for i in range(repeat) | ||
|
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Generate the matrix of runners |
||
| ] | ||
|
|
||
| # Parametrize the test function | ||
| metafunc.parametrize("connection_name, repeat_index", test_params) | ||
|
|
||
| # ------------------------------------------------------------------------------ | ||
| # Utilities | ||
| # ------------------------------------------------------------------------------ | ||
|
|
||
| def load_connections(): | ||
| """ | ||
| Loads the list of supported connection names from the static config file. | ||
| """ | ||
| return config.get("connection") | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,5 @@ | ||
| # pytest.ini | ||
| [pytest] | ||
| filterwarnings = | ||
| ignore:.*use of forkpty.*:DeprecationWarning:pty | ||
vince-leaf marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
vince-leaf marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,6 @@ | ||
| pexpect | ||
| pyhocon | ||
| pytest | ||
| pytest-xdist | ||
| pytest-timeout | ||
| pytest-timer | ||
vince-leaf marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,29 @@ | ||
| # test_data.hocon | ||
| # Input/output test pairs | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Don't abbreviate leaving people guessing as to what this file is for. |
||
|
|
||
| test = [ | ||
vince-leaf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| { | ||
| input_1: { | ||
| user_text: "Who did yellow submarine?" | ||
| answer: { | ||
| type_match: "keyword" | ||
| word: "Beatles" | ||
| cost: "3.0" | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. That you have cost built in as a key likely means that this format is very tightly coupled to a particular test. |
||
| } | ||
| } | ||
| }, | ||
| { | ||
| input_2: { | ||
| user_text: "Where were they from?" | ||
| answer: { | ||
| type_match: "keyword" | ||
| word: "Liverpool" | ||
| cost: "6.0" | ||
| } | ||
| } | ||
| }, | ||
| { | ||
| input_done: "quit" | ||
| } | ||
|
Collaborator
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. You should be able to make this test using the existing infrastructure.
Contributor
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, I do like to use the sly_data feature, but I have got to it. That's an excellent suggestion; I'll look into it. |
||
| ] | ||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,21 @@ | ||
| # test_music_nerd_pro.py | ||
| # --------------------------------------------------------- | ||
| # Parametrized test case that drives CLI interaction test | ||
| # --------------------------------------------------------- | ||
|
|
||
| import pytest | ||
| from utils.mnpt_hocon_loader import extract_test_values | ||
| from utils.mnpt_test_runner import run_test | ||
|
|
||
| @pytest.mark.timeout(120) | ||
| def test_run_connection(connection_name, repeat_index, request): | ||
| """ | ||
| Main test entry point for testing music_nerd_pro agent over various connections. | ||
| """ | ||
| use_thinking_file = request.config.getoption("--thinking-file") | ||
|
|
||
| # NEW: Only pass connection name | ||
| result = extract_test_values(connection_name) | ||
|
|
||
| run_test(*result, repeat_index, use_thinking_file) | ||
vince-leaf marked this conversation as resolved.
Outdated
Show resolved
Hide resolved
|
||
|
|
||
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,69 @@ | ||
| # ------------------------------------------------------------------------ | ||
| # mnpt_hocon_loader.py | ||
| # ------------------------------------------------------------------------ | ||
| # Utility functions for loading test prompt/response values from HOCON files. | ||
| # Separates test data loading from agent configuration loading. | ||
vince-leaf marked this conversation as resolved.
Show resolved
Hide resolved
|
||
| # ------------------------------------------------------------------------ | ||
|
|
||
| import os | ||
| from pyhocon import ConfigFactory | ||
|
|
||
| # ------------------------------------------------------------------------ | ||
| # Path to the TEST DATA HOCON file | ||
| # - This file contains input prompts and expected agent outputs. | ||
| # - NOTE: Only test cases, no agent config. | ||
| # ------------------------------------------------------------------------ | ||
|
|
||
| TEST_DATA_HOCON_PATH = os.path.join( | ||
| os.path.dirname(__file__), # This utils/ folder | ||
| "../test_cases_data/mnpt_data.hocon" # Relative path to test_cases/ | ||
| ) | ||
|
|
||
| # ------------------------------------------------------------------------ | ||
| # Load the test data once at import time | ||
| # ------------------------------------------------------------------------ | ||
| test_data = ConfigFactory.parse_file(os.path.abspath(TEST_DATA_HOCON_PATH)) | ||
|
|
||
| # ------------------------------------------------------------------------ | ||
| # Function: extract_test_values | ||
| # Description: | ||
| # - Loads the prompts and expected answer keywords/costs | ||
| # - Validates the connection name if needed | ||
| # - Returns extracted values for CLI interaction testing | ||
| # ------------------------------------------------------------------------ | ||
| def extract_test_values(connection_name): | ||
| """ | ||
| Loads test prompts and expected outputs for a given connection | ||
| from the test data HOCON file. | ||
|
|
||
| Args: | ||
| connection_name (str): The type of connection to validate (e.g., "grpc", "http") | ||
|
|
||
| Returns: | ||
| tuple: (connection_name, prompt_1, prompt_2, word_1, word_2, cost_1, cost_2, input_done) | ||
| """ | ||
|
|
||
| # If you want to validate connection types, you can add here | ||
| # Example connection list: ["direct", "grpc", "http"] | ||
|
|
||
| # Pull the list of test prompts and expected outputs | ||
| test_entries = test_data.get("test") | ||
|
|
||
| # Extract the first test input | ||
| input_1 = next(item["input_1"] for item in test_entries if "input_1" in item) | ||
| prompt_1 = input_1.get("user_text") | ||
| word_1 = input_1.get("answer.word") | ||
| cost_1 = input_1.get("answer.cost") | ||
|
|
||
| # Extract the second test input | ||
| input_2 = next(item["input_2"] for item in test_entries if "input_2" in item) | ||
| prompt_2 = input_2.get("user_text") | ||
| word_2 = input_2.get("answer.word") | ||
| cost_2 = input_2.get("answer.cost") | ||
|
|
||
| # Extract the input for termination (e.g., "quit") | ||
| input_done = next((item.get("input_done") for item in test_entries if "input_done" in item), None) | ||
|
|
||
| # Return all values required for the test runner | ||
| return connection_name, prompt_1, prompt_2, word_1, word_2, cost_1, cost_2, input_done | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.