Amazon Bedrock AgentCore

2026/04/29 - Amazon Bedrock AgentCore - 14 new api methods

Changes  Adds batch evaluation for running evaluators against multiple agent sessions with server-side orchestration, AI-powered recommendations for optimizing system prompts and tool descriptions, and AB testing with controlled traffic splitting and statistical significance reporting

StopBatchEvaluation (new) Link ¶

Stops a running batch evaluation. Sessions that have already been evaluated retain their results.

See also: AWS API Documentation

Request Syntax

client.stop_batch_evaluation(
    batchEvaluationId='string'
)
type batchEvaluationId:

string

param batchEvaluationId:

[REQUIRED]

The unique identifier of the batch evaluation to stop.

rtype:

dict

returns:

Response Syntax

{
    'batchEvaluationId': 'string',
    'batchEvaluationArn': 'string',
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'COMPLETED_WITH_ERRORS'|'FAILED'|'STOPPING'|'STOPPED'|'DELETING',
    'description': 'string'
}

Response Structure

  • (dict) --

    • batchEvaluationId (string) --

      The unique identifier of the stopped batch evaluation.

    • batchEvaluationArn (string) --

      The Amazon Resource Name (ARN) of the stopped batch evaluation.

    • status (string) --

      The status of the batch evaluation after the stop request.

    • description (string) --

      The description of the batch evaluation.

DeleteRecommendation (new) Link ¶

Deletes a recommendation and its associated results.

See also: AWS API Documentation

Request Syntax

client.delete_recommendation(
    recommendationId='string'
)
type recommendationId:

string

param recommendationId:

[REQUIRED]

The unique identifier of the recommendation to delete.

rtype:

dict

returns:

Response Syntax

{
    'recommendationId': 'string',
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'DELETING'
}

Response Structure

  • (dict) --

    • recommendationId (string) --

      The unique identifier of the deleted recommendation.

    • status (string) --

      The status of the recommendation deletion operation.

GetRecommendation (new) Link ¶

Retrieves detailed information about a recommendation, including its configuration, status, and results.

See also: AWS API Documentation

Request Syntax

client.get_recommendation(
    recommendationId='string'
)
type recommendationId:

string

param recommendationId:

[REQUIRED]

The unique identifier of the recommendation to retrieve.

rtype:

dict

returns:

Response Syntax

{
    'recommendationId': 'string',
    'recommendationArn': 'string',
    'name': 'string',
    'description': 'string',
    'type': 'SYSTEM_PROMPT_RECOMMENDATION'|'TOOL_DESCRIPTION_RECOMMENDATION',
    'recommendationConfig': {
        'systemPromptRecommendationConfig': {
            'systemPrompt': {
                'text': 'string',
                'configurationBundle': {
                    'bundleArn': 'string',
                    'versionId': 'string',
                    'systemPromptJsonPath': 'string'
                }
            },
            'agentTraces': {
                'sessionSpans': [
                    {...}|[...]|123|123.4|'string'|True|None,
                ],
                'cloudwatchLogs': {
                    'logGroupArns': [
                        'string',
                    ],
                    'serviceNames': [
                        'string',
                    ],
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1),
                    'rule': {
                        'filters': [
                            {
                                'key': 'string',
                                'operator': 'Equals'|'NotEquals'|'GreaterThan'|'LessThan'|'GreaterThanOrEqual'|'LessThanOrEqual'|'Contains'|'NotContains',
                                'value': {
                                    'stringValue': 'string',
                                    'doubleValue': 123.0,
                                    'booleanValue': True|False
                                }
                            },
                        ]
                    }
                }
            },
            'evaluationConfig': {
                'evaluators': [
                    {
                        'evaluatorArn': 'string'
                    },
                ]
            }
        },
        'toolDescriptionRecommendationConfig': {
            'toolDescription': {
                'toolDescriptionText': {
                    'tools': [
                        {
                            'toolName': 'string',
                            'toolDescription': {
                                'text': 'string'
                            }
                        },
                    ]
                },
                'configurationBundle': {
                    'bundleArn': 'string',
                    'versionId': 'string',
                    'tools': [
                        {
                            'toolName': 'string',
                            'toolDescriptionJsonPath': 'string'
                        },
                    ]
                }
            },
            'agentTraces': {
                'sessionSpans': [
                    {...}|[...]|123|123.4|'string'|True|None,
                ],
                'cloudwatchLogs': {
                    'logGroupArns': [
                        'string',
                    ],
                    'serviceNames': [
                        'string',
                    ],
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1),
                    'rule': {
                        'filters': [
                            {
                                'key': 'string',
                                'operator': 'Equals'|'NotEquals'|'GreaterThan'|'LessThan'|'GreaterThanOrEqual'|'LessThanOrEqual'|'Contains'|'NotContains',
                                'value': {
                                    'stringValue': 'string',
                                    'doubleValue': 123.0,
                                    'booleanValue': True|False
                                }
                            },
                        ]
                    }
                }
            }
        }
    },
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'DELETING',
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1),
    'recommendationResult': {
        'systemPromptRecommendationResult': {
            'recommendedSystemPrompt': 'string',
            'configurationBundle': {
                'bundleArn': 'string',
                'versionId': 'string'
            },
            'errorCode': 'string',
            'errorMessage': 'string'
        },
        'toolDescriptionRecommendationResult': {
            'tools': [
                {
                    'toolName': 'string',
                    'recommendedToolDescription': 'string'
                },
            ],
            'configurationBundle': {
                'bundleArn': 'string',
                'versionId': 'string'
            },
            'errorCode': 'string',
            'errorMessage': 'string'
        }
    }
}

Response Structure

  • (dict) --

    • recommendationId (string) --

      The unique identifier of the recommendation.

    • recommendationArn (string) --

      The Amazon Resource Name (ARN) of the recommendation.

    • name (string) --

      The name of the recommendation.

    • description (string) --

      The description of the recommendation.

    • type (string) --

      The type of recommendation.

    • recommendationConfig (dict) --

      The configuration for the recommendation.

      • systemPromptRecommendationConfig (dict) --

        The configuration for a system prompt recommendation.

        • systemPrompt (dict) --

          The current system prompt to optimize.

          • text (string) --

            The system prompt text provided inline.

          • configurationBundle (dict) --

            The system prompt sourced from a configuration bundle version.

            • bundleArn (string) --

              The Amazon Resource Name (ARN) of the configuration bundle.

            • versionId (string) --

              The version identifier of the configuration bundle.

            • systemPromptJsonPath (string) --

              The JSON path within the configuration bundle that contains the system prompt.

        • agentTraces (dict) --

          The agent traces to analyze for generating recommendations.

          • sessionSpans (list) --

            Agent traces provided as inline session spans in OpenTelemetry format.

            • (:ref:`document<document>`) --

          • cloudwatchLogs (dict) --

            Agent traces read from CloudWatch Logs.

            • logGroupArns (list) --

              The list of CloudWatch log group ARNs to read agent traces from.

              • (string) --

            • serviceNames (list) --

              The list of service names to filter traces within the specified log groups.

              • (string) --

            • startTime (datetime) --

              The start time of the time range to read traces from.

            • endTime (datetime) --

              The end time of the time range to read traces from.

            • rule (dict) --

              Optional rule configuration for filtering traces.

              • filters (list) --

                The list of filters to apply when reading agent traces.

                • (dict) --

                  A filter for narrowing down agent traces from CloudWatch Logs based on key-value comparisons.

                  • key (string) --

                    The key or field name to filter on within the agent trace data.

                  • operator (string) --

                    The comparison operator to use for filtering.

                  • value (dict) --

                    The value to compare against using the specified operator.

                    • stringValue (string) --

                      A string value for text-based filtering.

                    • doubleValue (float) --

                      A numeric value for numerical filtering and comparisons.

                    • booleanValue (boolean) --

                      A boolean value for true/false filtering conditions.

        • evaluationConfig (dict) --

          The evaluation configuration specifying which evaluator to use for assessing recommendation quality.

          • evaluators (list) --

            The list of evaluators to use for assessing recommendation quality.

            • (dict) --

              A reference to an evaluator used for recommendation assessment.

              • evaluatorArn (string) --

                The Amazon Resource Name (ARN) of the evaluator.

      • toolDescriptionRecommendationConfig (dict) --

        The configuration for a tool description recommendation.

        • toolDescription (dict) --

          The current tool descriptions to optimize.

          • toolDescriptionText (dict) --

            Tool descriptions provided as inline text.

            • tools (list) --

              The list of tool descriptions to optimize.

              • (dict) --

                A tool description input containing the tool name and its current description.

                • toolName (string) --

                  The name of the tool.

                • toolDescription (dict) --

                  The current description of the tool to optimize.

                  • text (string) --

                    The tool description as inline text.

          • configurationBundle (dict) --

            Tool descriptions sourced from a configuration bundle version.

            • bundleArn (string) --

              The Amazon Resource Name (ARN) of the configuration bundle.

            • versionId (string) --

              The version identifier of the configuration bundle.

            • tools (list) --

              The list of tool entries mapping tool names to their JSON paths within the bundle.

              • (dict) --

                Maps a tool name to its JSON path within a configuration bundle.

                • toolName (string) --

                  The name of the tool.

                • toolDescriptionJsonPath (string) --

                  The JSON path within the configuration bundle's components that contains the tool description.

        • agentTraces (dict) --

          The agent traces to analyze for generating tool description recommendations.

          • sessionSpans (list) --

            Agent traces provided as inline session spans in OpenTelemetry format.

            • (:ref:`document<document>`) --

          • cloudwatchLogs (dict) --

            Agent traces read from CloudWatch Logs.

            • logGroupArns (list) --

              The list of CloudWatch log group ARNs to read agent traces from.

              • (string) --

            • serviceNames (list) --

              The list of service names to filter traces within the specified log groups.

              • (string) --

            • startTime (datetime) --

              The start time of the time range to read traces from.

            • endTime (datetime) --

              The end time of the time range to read traces from.

            • rule (dict) --

              Optional rule configuration for filtering traces.

              • filters (list) --

                The list of filters to apply when reading agent traces.

                • (dict) --

                  A filter for narrowing down agent traces from CloudWatch Logs based on key-value comparisons.

                  • key (string) --

                    The key or field name to filter on within the agent trace data.

                  • operator (string) --

                    The comparison operator to use for filtering.

                  • value (dict) --

                    The value to compare against using the specified operator.

                    • stringValue (string) --

                      A string value for text-based filtering.

                    • doubleValue (float) --

                      A numeric value for numerical filtering and comparisons.

                    • booleanValue (boolean) --

                      A boolean value for true/false filtering conditions.

    • status (string) --

      The current status of the recommendation.

    • createdAt (datetime) --

      The timestamp when the recommendation was created.

    • updatedAt (datetime) --

      The timestamp when the recommendation was last updated.

    • recommendationResult (dict) --

      The result of the recommendation, containing the optimized system prompt or tool descriptions. Only present when the recommendation status is COMPLETED.

      • systemPromptRecommendationResult (dict) --

        The result of a system prompt recommendation.

        • recommendedSystemPrompt (string) --

          The optimized system prompt text generated by the recommendation.

        • configurationBundle (dict) --

          The configuration bundle containing the recommended system prompt, if the input was sourced from a configuration bundle.

          • bundleArn (string) --

            The Amazon Resource Name (ARN) of the configuration bundle.

          • versionId (string) --

            The version identifier of the configuration bundle containing the recommendation.

        • errorCode (string) --

          The error code if the recommendation failed.

        • errorMessage (string) --

          The error message if the recommendation failed.

      • toolDescriptionRecommendationResult (dict) --

        The result of a tool description recommendation.

        • tools (list) --

          The list of tools with their recommended descriptions.

          • (dict) --

            The output for a single tool description recommendation.

            • toolName (string) --

              The name of the tool.

            • recommendedToolDescription (string) --

              The optimized tool description text generated by the recommendation.

        • configurationBundle (dict) --

          The configuration bundle containing the recommended tool descriptions, if the input was sourced from a configuration bundle.

          • bundleArn (string) --

            The Amazon Resource Name (ARN) of the configuration bundle.

          • versionId (string) --

            The version identifier of the configuration bundle containing the recommendation.

        • errorCode (string) --

          The error code if the recommendation failed.

        • errorMessage (string) --

          The error message if the recommendation failed.

StartBatchEvaluation (new) Link ¶

Starts a batch evaluation job that evaluates agent performance across multiple sessions. Batch evaluations pull agent traces from CloudWatch Logs or an existing online evaluation configuration and run specified evaluators and insights against them.

See also: AWS API Documentation

Request Syntax

client.start_batch_evaluation(
    batchEvaluationName='string',
    evaluators=[
        {
            'evaluatorId': 'string'
        },
    ],
    dataSourceConfig={
        'cloudWatchLogs': {
            'serviceNames': [
                'string',
            ],
            'logGroupNames': [
                'string',
            ],
            'filterConfig': {
                'sessionIds': [
                    'string',
                ],
                'timeRange': {
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1)
                }
            }
        }
    },
    clientToken='string',
    evaluationMetadata={
        'sessionMetadata': [
            {
                'sessionId': 'string',
                'testScenarioId': 'string',
                'groundTruth': {
                    'inline': {
                        'assertions': [
                            {
                                'text': 'string'
                            },
                        ],
                        'expectedTrajectory': {
                            'toolNames': [
                                'string',
                            ]
                        },
                        'turns': [
                            {
                                'input': {
                                    'prompt': 'string'
                                },
                                'expectedResponse': {
                                    'text': 'string'
                                }
                            },
                        ]
                    }
                },
                'metadata': {
                    'string': 'string'
                }
            },
        ]
    },
    description='string'
)
type batchEvaluationName:

string

param batchEvaluationName:

[REQUIRED]

The name of the batch evaluation. Must be unique within your account.

type evaluators:

list

param evaluators:

The list of evaluators to apply during the batch evaluation. Can include both built-in evaluators and custom evaluators. Maximum of 10 evaluators.

  • (dict) --

    An evaluator to run against sessions

    • evaluatorId (string) -- [REQUIRED]

      The unique identifier of the evaluator. Can reference built-in evaluators (e.g., Builtin.Helpfulness) or custom evaluators.

type dataSourceConfig:

dict

param dataSourceConfig:

[REQUIRED]

The data source configuration that specifies where to pull agent session traces from for evaluation.

  • cloudWatchLogs (dict) --

    Pull session spans from CloudWatch

    • serviceNames (list) -- [REQUIRED]

      The list of agent service names to filter traces within the specified log groups.

      • (string) --

    • logGroupNames (list) -- [REQUIRED]

      The list of CloudWatch log group names to read agent traces from. Maximum of 5 log groups.

      • (string) --

    • filterConfig (dict) --

      Optional filter configuration to narrow down which sessions to evaluate.

      • sessionIds (list) --

        A list of specific session IDs to evaluate. If specified, only these sessions are included in the evaluation.

        • (string) --

      • timeRange (dict) --

        The time range filter for selecting sessions to evaluate.

        • startTime (datetime) --

          The start time of the time range. Only sessions with activity at or after this timestamp are included.

        • endTime (datetime) --

          The end time of the time range. Only sessions with activity before this timestamp are included.

type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, the service ignores the request, but does not return an error.

This field is autopopulated if not provided.

type evaluationMetadata:

dict

param evaluationMetadata:

Optional metadata for the evaluation, including session-specific ground truth data and test scenario identifiers.

  • sessionMetadata (list) --

    A list of session metadata entries containing ground truth data and test scenario identifiers for specific sessions.

    • (dict) --

      Metadata for a specific session in a batch evaluation, including ground truth data and test scenario identifiers.

      • sessionId (string) -- [REQUIRED]

        The unique identifier of the session this metadata applies to.

      • testScenarioId (string) --

        An optional test scenario identifier for categorizing and tracking evaluation results.

      • groundTruth (dict) --

        The ground truth data for this session, including expected responses and assertions.

        • inline (dict) --

          Provide ground truth inline

          • assertions (list) --

            assertions for evaluation, reuses common model EvaluationContentList

            • (dict) --

              A content block for ground truth data in evaluation reference inputs. Supports text content for expected responses and assertions.

              • text (string) --

                The text content of the ground truth data. Used for expected response text and assertion statements.

          • expectedTrajectory (dict) --

            expectedTrajectory for evaluation, reuses common model EvaluationExpectedTrajectory

            • toolNames (list) --

              The list of tool names representing the expected tool call sequence.

              • (string) --

          • turns (list) --

            A list of per-turn ground truth data, each containing an input prompt and expected response.

            • (dict) --

              Ground truth data for a single conversation turn.

              • input (dict) --

                The input for this conversation turn.

                • prompt (string) --

                  The text prompt for this conversation turn.

              • expectedResponse (dict) --

                The expected response for this conversation turn.

                • text (string) --

                  The text content of the ground truth data. Used for expected response text and assertion statements.

      • metadata (dict) --

        Additional key-value metadata associated with this session.

        • (string) --

          • (string) --

type description:

string

param description:

The description of the batch evaluation.

rtype:

dict

returns:

Response Syntax

{
    'batchEvaluationId': 'string',
    'batchEvaluationArn': 'string',
    'batchEvaluationName': 'string',
    'evaluators': [
        {
            'evaluatorId': 'string'
        },
    ],
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'COMPLETED_WITH_ERRORS'|'FAILED'|'STOPPING'|'STOPPED'|'DELETING',
    'createdAt': datetime(2015, 1, 1),
    'outputConfig': {
        'cloudWatchConfig': {
            'logGroupName': 'string',
            'logStreamName': 'string'
        }
    },
    'description': 'string'
}

Response Structure

  • (dict) --

    • batchEvaluationId (string) --

      The unique identifier of the created batch evaluation.

    • batchEvaluationArn (string) --

      The Amazon Resource Name (ARN) of the created batch evaluation.

    • batchEvaluationName (string) --

      The name of the batch evaluation.

    • evaluators (list) --

      The list of evaluators applied during the batch evaluation.

      • (dict) --

        An evaluator to run against sessions

        • evaluatorId (string) --

          The unique identifier of the evaluator. Can reference built-in evaluators (e.g., Builtin.Helpfulness) or custom evaluators.

    • status (string) --

      The status of the batch evaluation.

    • createdAt (datetime) --

      The timestamp when the batch evaluation was created.

    • outputConfig (dict) --

      The output configuration specifying where evaluation results are written.

      • cloudWatchConfig (dict) --

        The CloudWatch Logs configuration for writing evaluation results.

        • logGroupName (string) --

          The name of the CloudWatch log group where evaluation results will be written.

        • logStreamName (string) --

          The name of the CloudWatch log stream where evaluation results will be written.

    • description (string) --

      The description of the batch evaluation.

GetBatchEvaluation (new) Link ¶

Retrieves detailed information about a batch evaluation, including its status, configuration, results, and any error details.

See also: AWS API Documentation

Request Syntax

client.get_batch_evaluation(
    batchEvaluationId='string'
)
type batchEvaluationId:

string

param batchEvaluationId:

[REQUIRED]

The unique identifier of the batch evaluation to retrieve.

rtype:

dict

returns:

Response Syntax

{
    'batchEvaluationId': 'string',
    'batchEvaluationArn': 'string',
    'batchEvaluationName': 'string',
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'COMPLETED_WITH_ERRORS'|'FAILED'|'STOPPING'|'STOPPED'|'DELETING',
    'createdAt': datetime(2015, 1, 1),
    'evaluators': [
        {
            'evaluatorId': 'string'
        },
    ],
    'dataSourceConfig': {
        'cloudWatchLogs': {
            'serviceNames': [
                'string',
            ],
            'logGroupNames': [
                'string',
            ],
            'filterConfig': {
                'sessionIds': [
                    'string',
                ],
                'timeRange': {
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1)
                }
            }
        }
    },
    'outputConfig': {
        'cloudWatchConfig': {
            'logGroupName': 'string',
            'logStreamName': 'string'
        }
    },
    'evaluationResults': {
        'numberOfSessionsCompleted': 123,
        'numberOfSessionsInProgress': 123,
        'numberOfSessionsFailed': 123,
        'totalNumberOfSessions': 123,
        'numberOfSessionsIgnored': 123,
        'evaluatorSummaries': [
            {
                'evaluatorId': 'string',
                'statistics': {
                    'averageScore': 123.0
                },
                'totalEvaluated': 123,
                'totalFailed': 123
            },
        ]
    },
    'errorDetails': [
        'string',
    ],
    'description': 'string',
    'updatedAt': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • batchEvaluationId (string) --

      The unique identifier of the batch evaluation.

    • batchEvaluationArn (string) --

      The Amazon Resource Name (ARN) of the batch evaluation.

    • batchEvaluationName (string) --

      The name of the batch evaluation.

    • status (string) --

      The current status of the batch evaluation.

    • createdAt (datetime) --

      The timestamp when the batch evaluation was created.

    • evaluators (list) --

      The list of evaluators applied during the batch evaluation.

      • (dict) --

        An evaluator to run against sessions

        • evaluatorId (string) --

          The unique identifier of the evaluator. Can reference built-in evaluators (e.g., Builtin.Helpfulness) or custom evaluators.

    • dataSourceConfig (dict) --

      The data source configuration specifying where agent traces are pulled from.

      • cloudWatchLogs (dict) --

        Pull session spans from CloudWatch

        • serviceNames (list) --

          The list of agent service names to filter traces within the specified log groups.

          • (string) --

        • logGroupNames (list) --

          The list of CloudWatch log group names to read agent traces from. Maximum of 5 log groups.

          • (string) --

        • filterConfig (dict) --

          Optional filter configuration to narrow down which sessions to evaluate.

          • sessionIds (list) --

            A list of specific session IDs to evaluate. If specified, only these sessions are included in the evaluation.

            • (string) --

          • timeRange (dict) --

            The time range filter for selecting sessions to evaluate.

            • startTime (datetime) --

              The start time of the time range. Only sessions with activity at or after this timestamp are included.

            • endTime (datetime) --

              The end time of the time range. Only sessions with activity before this timestamp are included.

    • outputConfig (dict) --

      The output configuration specifying where evaluation results are written.

      • cloudWatchConfig (dict) --

        The CloudWatch Logs configuration for writing evaluation results.

        • logGroupName (string) --

          The name of the CloudWatch log group where evaluation results will be written.

        • logStreamName (string) --

          The name of the CloudWatch log stream where evaluation results will be written.

    • evaluationResults (dict) --

      The aggregated evaluation results, including session completion counts and evaluator score summaries.

      • numberOfSessionsCompleted (integer) --

        The number of sessions that have been successfully evaluated.

      • numberOfSessionsInProgress (integer) --

        The number of sessions currently being evaluated.

      • numberOfSessionsFailed (integer) --

        The number of sessions that failed evaluation.

      • totalNumberOfSessions (integer) --

        The total number of sessions included in the batch evaluation.

      • numberOfSessionsIgnored (integer) --

        The number of sessions that were ignored during evaluation.

      • evaluatorSummaries (list) --

        A list of per-evaluator summary statistics.

        • (dict) --

          Summary statistics for a single evaluator within a batch evaluation.

          • evaluatorId (string) --

            The unique identifier of the evaluator.

          • statistics (dict) --

            The aggregated statistics for this evaluator.

            • averageScore (float) --

              The average score across all evaluated sessions for this evaluator.

          • totalEvaluated (integer) --

            The total number of sessions evaluated by this evaluator.

          • totalFailed (integer) --

            The total number of sessions that failed evaluation by this evaluator.

    • errorDetails (list) --

      The error details if the batch evaluation encountered failures.

      • (string) --

    • description (string) --

      The description of the batch evaluation.

    • updatedAt (datetime) --

      The timestamp when the batch evaluation was last updated.

UpdateABTest (new) Link ¶

Updates an A/B test's configuration, including variants, traffic allocation, evaluation settings, or execution status.

See also: AWS API Documentation

Request Syntax

client.update_ab_test(
    abTestId='string',
    clientToken='string',
    name='string',
    description='string',
    variants=[
        {
            'name': 'string',
            'weight': 123,
            'variantConfiguration': {
                'configurationBundle': {
                    'bundleArn': 'string',
                    'bundleVersion': 'string'
                },
                'target': {
                    'name': 'string'
                }
            }
        },
    ],
    gatewayFilter={
        'targetPaths': [
            'string',
        ]
    },
    evaluationConfig={
        'onlineEvaluationConfigArn': 'string',
        'perVariantOnlineEvaluationConfig': [
            {
                'name': 'string',
                'onlineEvaluationConfigArn': 'string'
            },
        ]
    },
    roleArn='string',
    executionStatus='PAUSED'|'RUNNING'|'STOPPED'|'NOT_STARTED'
)
type abTestId:

string

param abTestId:

[REQUIRED]

The unique identifier of the A/B test to update.

type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, the service ignores the request, but does not return an error.

This field is autopopulated if not provided.

type name:

string

param name:

The updated name of the A/B test.

type description:

string

param description:

The updated description of the A/B test.

type variants:

list

param variants:

The updated list of variants.

  • (dict) --

    A variant in an A/B test, representing either the control (C) or treatment (T1) configuration.

    • name (string) -- [REQUIRED]

      The name of the variant. Must be C for control or T1 for treatment.

    • weight (integer) -- [REQUIRED]

      The percentage of traffic to route to this variant. Weights across all variants must sum to 100.

    • variantConfiguration (dict) -- [REQUIRED]

      The configuration for this variant, including the configuration bundle or target reference.

      • configurationBundle (dict) --

        A reference to a configuration bundle version to use for this variant.

        • bundleArn (string) -- [REQUIRED]

          The Amazon Resource Name (ARN) of the configuration bundle.

        • bundleVersion (string) -- [REQUIRED]

          The version of the configuration bundle.

      • target (dict) --

        A reference to a gateway target to route traffic to for this variant.

        • name (string) -- [REQUIRED]

          The name of the gateway target.

type gatewayFilter:

dict

param gatewayFilter:

The updated gateway filter.

  • targetPaths (list) --

    A list of target path patterns to include in the A/B test.

    • (string) --

type evaluationConfig:

dict

param evaluationConfig:

The updated evaluation configuration.

  • onlineEvaluationConfigArn (string) --

    The Amazon Resource Name (ARN) of a single online evaluation configuration to use for both variants.

  • perVariantOnlineEvaluationConfig (list) --

    Per-variant online evaluation configurations, allowing different evaluation settings for each variant.

    • (dict) --

      An online evaluation configuration associated with a specific A/B test variant.

      • name (string) -- [REQUIRED]

        The name of the variant this evaluation configuration applies to.

      • onlineEvaluationConfigArn (string) -- [REQUIRED]

        The Amazon Resource Name (ARN) of the online evaluation configuration for this variant.

type roleArn:

string

param roleArn:

The updated IAM role ARN.

type executionStatus:

string

param executionStatus:

The updated execution status to enable or disable the A/B test.

rtype:

dict

returns:

Response Syntax

{
    'abTestId': 'string',
    'abTestArn': 'string',
    'status': 'CREATING'|'ACTIVE'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'|'DELETE_FAILED'|'FAILED',
    'executionStatus': 'PAUSED'|'RUNNING'|'STOPPED'|'NOT_STARTED',
    'updatedAt': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • abTestId (string) --

      The unique identifier of the updated A/B test.

    • abTestArn (string) --

      The Amazon Resource Name (ARN) of the updated A/B test.

    • status (string) --

      The status of the A/B test.

    • executionStatus (string) --

      The execution status of the A/B test.

    • updatedAt (datetime) --

      The timestamp when the A/B test was updated.

ListBatchEvaluations (new) Link ¶

Lists all batch evaluations in the account, providing summary information about each evaluation's status and configuration.

See also: AWS API Documentation

Request Syntax

client.list_batch_evaluations(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of results to return in the response. If the total number of results is greater than this value, use the token returned in the response in the nextToken field when making another request to return the next batch of results.

type nextToken:

string

param nextToken:

If the total number of results is greater than the maxResults value provided in the request, enter the token returned in the nextToken field in the response in this field to return the next batch of results.

rtype:

dict

returns:

Response Syntax

{
    'batchEvaluations': [
        {
            'batchEvaluationId': 'string',
            'batchEvaluationArn': 'string',
            'batchEvaluationName': 'string',
            'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'COMPLETED_WITH_ERRORS'|'FAILED'|'STOPPING'|'STOPPED'|'DELETING',
            'createdAt': datetime(2015, 1, 1),
            'description': 'string',
            'evaluators': [
                {
                    'evaluatorId': 'string'
                },
            ],
            'evaluationResults': {
                'numberOfSessionsCompleted': 123,
                'numberOfSessionsInProgress': 123,
                'numberOfSessionsFailed': 123,
                'totalNumberOfSessions': 123,
                'numberOfSessionsIgnored': 123,
                'evaluatorSummaries': [
                    {
                        'evaluatorId': 'string',
                        'statistics': {
                            'averageScore': 123.0
                        },
                        'totalEvaluated': 123,
                        'totalFailed': 123
                    },
                ]
            },
            'errorDetails': [
                'string',
            ],
            'updatedAt': datetime(2015, 1, 1)
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • batchEvaluations (list) --

      The list of batch evaluation summaries.

      • (dict) --

        Summary representation for list responses

        • batchEvaluationId (string) --

          The unique identifier of the batch evaluation.

        • batchEvaluationArn (string) --

          The Amazon Resource Name (ARN) of the batch evaluation.

        • batchEvaluationName (string) --

          The name of the batch evaluation.

        • status (string) --

          The current status of the batch evaluation.

        • createdAt (datetime) --

          The timestamp when the batch evaluation was created.

        • description (string) --

          The description of the batch evaluation.

        • evaluators (list) --

          The list of evaluators applied during the batch evaluation.

          • (dict) --

            An evaluator to run against sessions

            • evaluatorId (string) --

              The unique identifier of the evaluator. Can reference built-in evaluators (e.g., Builtin.Helpfulness) or custom evaluators.

        • evaluationResults (dict) --

          The aggregated evaluation results.

          • numberOfSessionsCompleted (integer) --

            The number of sessions that have been successfully evaluated.

          • numberOfSessionsInProgress (integer) --

            The number of sessions currently being evaluated.

          • numberOfSessionsFailed (integer) --

            The number of sessions that failed evaluation.

          • totalNumberOfSessions (integer) --

            The total number of sessions included in the batch evaluation.

          • numberOfSessionsIgnored (integer) --

            The number of sessions that were ignored during evaluation.

          • evaluatorSummaries (list) --

            A list of per-evaluator summary statistics.

            • (dict) --

              Summary statistics for a single evaluator within a batch evaluation.

              • evaluatorId (string) --

                The unique identifier of the evaluator.

              • statistics (dict) --

                The aggregated statistics for this evaluator.

                • averageScore (float) --

                  The average score across all evaluated sessions for this evaluator.

              • totalEvaluated (integer) --

                The total number of sessions evaluated by this evaluator.

              • totalFailed (integer) --

                The total number of sessions that failed evaluation by this evaluator.

        • errorDetails (list) --

          The error details if the batch evaluation encountered failures.

          • (string) --

        • updatedAt (datetime) --

          The timestamp when the batch evaluation was last updated.

    • nextToken (string) --

      If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

GetABTest (new) Link ¶

Retrieves detailed information about an A/B test, including its configuration, status, and statistical results.

See also: AWS API Documentation

Request Syntax

client.get_ab_test(
    abTestId='string'
)
type abTestId:

string

param abTestId:

[REQUIRED]

The unique identifier of the A/B test to retrieve.

rtype:

dict

returns:

Response Syntax

{
    'abTestId': 'string',
    'abTestArn': 'string',
    'name': 'string',
    'description': 'string',
    'status': 'CREATING'|'ACTIVE'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'|'DELETE_FAILED'|'FAILED',
    'executionStatus': 'PAUSED'|'RUNNING'|'STOPPED'|'NOT_STARTED',
    'gatewayArn': 'string',
    'variants': [
        {
            'name': 'string',
            'weight': 123,
            'variantConfiguration': {
                'configurationBundle': {
                    'bundleArn': 'string',
                    'bundleVersion': 'string'
                },
                'target': {
                    'name': 'string'
                }
            }
        },
    ],
    'gatewayFilter': {
        'targetPaths': [
            'string',
        ]
    },
    'evaluationConfig': {
        'onlineEvaluationConfigArn': 'string',
        'perVariantOnlineEvaluationConfig': [
            {
                'name': 'string',
                'onlineEvaluationConfigArn': 'string'
            },
        ]
    },
    'roleArn': 'string',
    'currentRunId': 'string',
    'errorDetails': [
        'string',
    ],
    'startedAt': datetime(2015, 1, 1),
    'stoppedAt': datetime(2015, 1, 1),
    'maxDurationExpiresAt': datetime(2015, 1, 1),
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1),
    'results': {
        'analysisTimestamp': datetime(2015, 1, 1),
        'evaluatorMetrics': [
            {
                'evaluatorArn': 'string',
                'controlStats': {
                    'variantName': 'string',
                    'sampleSize': 123,
                    'mean': 123.0
                },
                'variantResults': [
                    {
                        'variantName': 'string',
                        'sampleSize': 123,
                        'mean': 123.0,
                        'absoluteChange': 123.0,
                        'percentChange': 123.0,
                        'pValue': 123.0,
                        'confidenceInterval': {
                            'lower': 123.0,
                            'upper': 123.0
                        },
                        'isSignificant': True|False
                    },
                ]
            },
        ]
    }
}

Response Structure

  • (dict) --

    • abTestId (string) --

      The unique identifier of the A/B test.

    • abTestArn (string) --

      The Amazon Resource Name (ARN) of the A/B test.

    • name (string) --

      The name of the A/B test.

    • description (string) --

      The description of the A/B test.

    • status (string) --

      The current status of the A/B test.

    • executionStatus (string) --

      The execution status indicating whether the A/B test is currently running.

    • gatewayArn (string) --

      The Amazon Resource Name (ARN) of the gateway used for traffic splitting.

    • variants (list) --

      The list of variants in the A/B test.

      • (dict) --

        A variant in an A/B test, representing either the control (C) or treatment (T1) configuration.

        • name (string) --

          The name of the variant. Must be C for control or T1 for treatment.

        • weight (integer) --

          The percentage of traffic to route to this variant. Weights across all variants must sum to 100.

        • variantConfiguration (dict) --

          The configuration for this variant, including the configuration bundle or target reference.

          • configurationBundle (dict) --

            A reference to a configuration bundle version to use for this variant.

            • bundleArn (string) --

              The Amazon Resource Name (ARN) of the configuration bundle.

            • bundleVersion (string) --

              The version of the configuration bundle.

          • target (dict) --

            A reference to a gateway target to route traffic to for this variant.

            • name (string) --

              The name of the gateway target.

    • gatewayFilter (dict) --

      The gateway filter restricting which target paths are included.

      • targetPaths (list) --

        A list of target path patterns to include in the A/B test.

        • (string) --

    • evaluationConfig (dict) --

      The evaluation configuration for measuring variant performance.

      • onlineEvaluationConfigArn (string) --

        The Amazon Resource Name (ARN) of a single online evaluation configuration to use for both variants.

      • perVariantOnlineEvaluationConfig (list) --

        Per-variant online evaluation configurations, allowing different evaluation settings for each variant.

        • (dict) --

          An online evaluation configuration associated with a specific A/B test variant.

          • name (string) --

            The name of the variant this evaluation configuration applies to.

          • onlineEvaluationConfigArn (string) --

            The Amazon Resource Name (ARN) of the online evaluation configuration for this variant.

    • roleArn (string) --

      The IAM role ARN used by the A/B test.

    • currentRunId (string) --

      The identifier of the current run of the A/B test.

    • errorDetails (list) --

      The error details if the A/B test encountered failures.

      • (string) --

    • startedAt (datetime) --

      The timestamp when the A/B test was started.

    • stoppedAt (datetime) --

      The timestamp when the A/B test was stopped.

    • maxDurationExpiresAt (datetime) --

      The timestamp when the A/B test will automatically expire.

    • createdAt (datetime) --

      The timestamp when the A/B test was created.

    • updatedAt (datetime) --

      The timestamp when the A/B test was last updated.

    • results (dict) --

      The statistical results of the A/B test, including per-evaluator metrics and significance analysis.

      • analysisTimestamp (datetime) --

        The timestamp when the analysis was performed.

      • evaluatorMetrics (list) --

        The per-evaluator metrics comparing control and treatment variants.

        • (dict) --

          Statistical metrics for a single evaluator comparing control and treatment variants.

          • evaluatorArn (string) --

            The Amazon Resource Name (ARN) of the evaluator.

          • controlStats (dict) --

            The statistics for the control variant.

            • variantName (string) --

              The name of the control variant.

            • sampleSize (integer) --

              The number of sessions evaluated for the control variant.

            • mean (float) --

              The mean evaluation score for the control variant.

          • variantResults (list) --

            The results for each treatment variant compared against the control.

            • (dict) --

              Statistical results for a treatment variant compared against the control.

              • variantName (string) --

                The name of the treatment variant.

              • sampleSize (integer) --

                The number of sessions evaluated for this variant.

              • mean (float) --

                The mean evaluation score for this variant.

              • absoluteChange (float) --

                The absolute change in mean score compared to the control variant.

              • percentChange (float) --

                The percentage change in mean score compared to the control variant.

              • pValue (float) --

                The p-value indicating the statistical significance of the observed difference.

              • confidenceInterval (dict) --

                The confidence interval for the observed difference.

                • lower (float) --

                  The lower bound of the confidence interval.

                • upper (float) --

                  The upper bound of the confidence interval.

              • isSignificant (boolean) --

                Whether the observed difference is statistically significant.

StartRecommendation (new) Link ¶

Starts a recommendation job that analyzes agent traces and generates optimization suggestions for system prompts or tool descriptions to improve agent performance.

See also: AWS API Documentation

Request Syntax

client.start_recommendation(
    name='string',
    description='string',
    type='SYSTEM_PROMPT_RECOMMENDATION'|'TOOL_DESCRIPTION_RECOMMENDATION',
    recommendationConfig={
        'systemPromptRecommendationConfig': {
            'systemPrompt': {
                'text': 'string',
                'configurationBundle': {
                    'bundleArn': 'string',
                    'versionId': 'string',
                    'systemPromptJsonPath': 'string'
                }
            },
            'agentTraces': {
                'sessionSpans': [
                    {...}|[...]|123|123.4|'string'|True|None,
                ],
                'cloudwatchLogs': {
                    'logGroupArns': [
                        'string',
                    ],
                    'serviceNames': [
                        'string',
                    ],
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1),
                    'rule': {
                        'filters': [
                            {
                                'key': 'string',
                                'operator': 'Equals'|'NotEquals'|'GreaterThan'|'LessThan'|'GreaterThanOrEqual'|'LessThanOrEqual'|'Contains'|'NotContains',
                                'value': {
                                    'stringValue': 'string',
                                    'doubleValue': 123.0,
                                    'booleanValue': True|False
                                }
                            },
                        ]
                    }
                }
            },
            'evaluationConfig': {
                'evaluators': [
                    {
                        'evaluatorArn': 'string'
                    },
                ]
            }
        },
        'toolDescriptionRecommendationConfig': {
            'toolDescription': {
                'toolDescriptionText': {
                    'tools': [
                        {
                            'toolName': 'string',
                            'toolDescription': {
                                'text': 'string'
                            }
                        },
                    ]
                },
                'configurationBundle': {
                    'bundleArn': 'string',
                    'versionId': 'string',
                    'tools': [
                        {
                            'toolName': 'string',
                            'toolDescriptionJsonPath': 'string'
                        },
                    ]
                }
            },
            'agentTraces': {
                'sessionSpans': [
                    {...}|[...]|123|123.4|'string'|True|None,
                ],
                'cloudwatchLogs': {
                    'logGroupArns': [
                        'string',
                    ],
                    'serviceNames': [
                        'string',
                    ],
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1),
                    'rule': {
                        'filters': [
                            {
                                'key': 'string',
                                'operator': 'Equals'|'NotEquals'|'GreaterThan'|'LessThan'|'GreaterThanOrEqual'|'LessThanOrEqual'|'Contains'|'NotContains',
                                'value': {
                                    'stringValue': 'string',
                                    'doubleValue': 123.0,
                                    'booleanValue': True|False
                                }
                            },
                        ]
                    }
                }
            }
        }
    },
    clientToken='string'
)
type name:

string

param name:

[REQUIRED]

The name of the recommendation. Must be unique within your account.

type description:

string

param description:

The description of the recommendation.

type type:

string

param type:

[REQUIRED]

The type of recommendation to generate. Valid values are SYSTEM_PROMPT_RECOMMENDATION for system prompt optimization or TOOL_DESCRIPTION_RECOMMENDATION for tool description optimization.

type recommendationConfig:

dict

param recommendationConfig:

[REQUIRED]

The configuration for the recommendation, including the input to optimize, agent traces to analyze, and evaluation settings.

  • systemPromptRecommendationConfig (dict) --

    The configuration for a system prompt recommendation.

    • systemPrompt (dict) -- [REQUIRED]

      The current system prompt to optimize.

      • text (string) --

        The system prompt text provided inline.

      • configurationBundle (dict) --

        The system prompt sourced from a configuration bundle version.

        • bundleArn (string) -- [REQUIRED]

          The Amazon Resource Name (ARN) of the configuration bundle.

        • versionId (string) -- [REQUIRED]

          The version identifier of the configuration bundle.

        • systemPromptJsonPath (string) -- [REQUIRED]

          The JSON path within the configuration bundle that contains the system prompt.

    • agentTraces (dict) -- [REQUIRED]

      The agent traces to analyze for generating recommendations.

      • sessionSpans (list) --

        Agent traces provided as inline session spans in OpenTelemetry format.

        • (:ref:`document<document>`) --

      • cloudwatchLogs (dict) --

        Agent traces read from CloudWatch Logs.

        • logGroupArns (list) -- [REQUIRED]

          The list of CloudWatch log group ARNs to read agent traces from.

          • (string) --

        • serviceNames (list) -- [REQUIRED]

          The list of service names to filter traces within the specified log groups.

          • (string) --

        • startTime (datetime) -- [REQUIRED]

          The start time of the time range to read traces from.

        • endTime (datetime) -- [REQUIRED]

          The end time of the time range to read traces from.

        • rule (dict) --

          Optional rule configuration for filtering traces.

          • filters (list) --

            The list of filters to apply when reading agent traces.

            • (dict) --

              A filter for narrowing down agent traces from CloudWatch Logs based on key-value comparisons.

              • key (string) -- [REQUIRED]

                The key or field name to filter on within the agent trace data.

              • operator (string) -- [REQUIRED]

                The comparison operator to use for filtering.

              • value (dict) -- [REQUIRED]

                The value to compare against using the specified operator.

                • stringValue (string) --

                  A string value for text-based filtering.

                • doubleValue (float) --

                  A numeric value for numerical filtering and comparisons.

                • booleanValue (boolean) --

                  A boolean value for true/false filtering conditions.

    • evaluationConfig (dict) -- [REQUIRED]

      The evaluation configuration specifying which evaluator to use for assessing recommendation quality.

      • evaluators (list) -- [REQUIRED]

        The list of evaluators to use for assessing recommendation quality.

        • (dict) --

          A reference to an evaluator used for recommendation assessment.

          • evaluatorArn (string) -- [REQUIRED]

            The Amazon Resource Name (ARN) of the evaluator.

  • toolDescriptionRecommendationConfig (dict) --

    The configuration for a tool description recommendation.

    • toolDescription (dict) -- [REQUIRED]

      The current tool descriptions to optimize.

      • toolDescriptionText (dict) --

        Tool descriptions provided as inline text.

        • tools (list) -- [REQUIRED]

          The list of tool descriptions to optimize.

          • (dict) --

            A tool description input containing the tool name and its current description.

            • toolName (string) -- [REQUIRED]

              The name of the tool.

            • toolDescription (dict) -- [REQUIRED]

              The current description of the tool to optimize.

              • text (string) --

                The tool description as inline text.

      • configurationBundle (dict) --

        Tool descriptions sourced from a configuration bundle version.

        • bundleArn (string) -- [REQUIRED]

          The Amazon Resource Name (ARN) of the configuration bundle.

        • versionId (string) -- [REQUIRED]

          The version identifier of the configuration bundle.

        • tools (list) -- [REQUIRED]

          The list of tool entries mapping tool names to their JSON paths within the bundle.

          • (dict) --

            Maps a tool name to its JSON path within a configuration bundle.

            • toolName (string) -- [REQUIRED]

              The name of the tool.

            • toolDescriptionJsonPath (string) -- [REQUIRED]

              The JSON path within the configuration bundle's components that contains the tool description.

    • agentTraces (dict) -- [REQUIRED]

      The agent traces to analyze for generating tool description recommendations.

      • sessionSpans (list) --

        Agent traces provided as inline session spans in OpenTelemetry format.

        • (:ref:`document<document>`) --

      • cloudwatchLogs (dict) --

        Agent traces read from CloudWatch Logs.

        • logGroupArns (list) -- [REQUIRED]

          The list of CloudWatch log group ARNs to read agent traces from.

          • (string) --

        • serviceNames (list) -- [REQUIRED]

          The list of service names to filter traces within the specified log groups.

          • (string) --

        • startTime (datetime) -- [REQUIRED]

          The start time of the time range to read traces from.

        • endTime (datetime) -- [REQUIRED]

          The end time of the time range to read traces from.

        • rule (dict) --

          Optional rule configuration for filtering traces.

          • filters (list) --

            The list of filters to apply when reading agent traces.

            • (dict) --

              A filter for narrowing down agent traces from CloudWatch Logs based on key-value comparisons.

              • key (string) -- [REQUIRED]

                The key or field name to filter on within the agent trace data.

              • operator (string) -- [REQUIRED]

                The comparison operator to use for filtering.

              • value (dict) -- [REQUIRED]

                The value to compare against using the specified operator.

                • stringValue (string) --

                  A string value for text-based filtering.

                • doubleValue (float) --

                  A numeric value for numerical filtering and comparisons.

                • booleanValue (boolean) --

                  A boolean value for true/false filtering conditions.

type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, the service ignores the request, but does not return an error.

This field is autopopulated if not provided.

rtype:

dict

returns:

Response Syntax

{
    'recommendationId': 'string',
    'recommendationArn': 'string',
    'name': 'string',
    'description': 'string',
    'type': 'SYSTEM_PROMPT_RECOMMENDATION'|'TOOL_DESCRIPTION_RECOMMENDATION',
    'recommendationConfig': {
        'systemPromptRecommendationConfig': {
            'systemPrompt': {
                'text': 'string',
                'configurationBundle': {
                    'bundleArn': 'string',
                    'versionId': 'string',
                    'systemPromptJsonPath': 'string'
                }
            },
            'agentTraces': {
                'sessionSpans': [
                    {...}|[...]|123|123.4|'string'|True|None,
                ],
                'cloudwatchLogs': {
                    'logGroupArns': [
                        'string',
                    ],
                    'serviceNames': [
                        'string',
                    ],
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1),
                    'rule': {
                        'filters': [
                            {
                                'key': 'string',
                                'operator': 'Equals'|'NotEquals'|'GreaterThan'|'LessThan'|'GreaterThanOrEqual'|'LessThanOrEqual'|'Contains'|'NotContains',
                                'value': {
                                    'stringValue': 'string',
                                    'doubleValue': 123.0,
                                    'booleanValue': True|False
                                }
                            },
                        ]
                    }
                }
            },
            'evaluationConfig': {
                'evaluators': [
                    {
                        'evaluatorArn': 'string'
                    },
                ]
            }
        },
        'toolDescriptionRecommendationConfig': {
            'toolDescription': {
                'toolDescriptionText': {
                    'tools': [
                        {
                            'toolName': 'string',
                            'toolDescription': {
                                'text': 'string'
                            }
                        },
                    ]
                },
                'configurationBundle': {
                    'bundleArn': 'string',
                    'versionId': 'string',
                    'tools': [
                        {
                            'toolName': 'string',
                            'toolDescriptionJsonPath': 'string'
                        },
                    ]
                }
            },
            'agentTraces': {
                'sessionSpans': [
                    {...}|[...]|123|123.4|'string'|True|None,
                ],
                'cloudwatchLogs': {
                    'logGroupArns': [
                        'string',
                    ],
                    'serviceNames': [
                        'string',
                    ],
                    'startTime': datetime(2015, 1, 1),
                    'endTime': datetime(2015, 1, 1),
                    'rule': {
                        'filters': [
                            {
                                'key': 'string',
                                'operator': 'Equals'|'NotEquals'|'GreaterThan'|'LessThan'|'GreaterThanOrEqual'|'LessThanOrEqual'|'Contains'|'NotContains',
                                'value': {
                                    'stringValue': 'string',
                                    'doubleValue': 123.0,
                                    'booleanValue': True|False
                                }
                            },
                        ]
                    }
                }
            }
        }
    },
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'DELETING',
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • recommendationId (string) --

      The unique identifier of the created recommendation.

    • recommendationArn (string) --

      The Amazon Resource Name (ARN) of the created recommendation.

    • name (string) --

      The name of the recommendation.

    • description (string) --

      The description of the recommendation.

    • type (string) --

      The type of recommendation.

    • recommendationConfig (dict) --

      The configuration for the recommendation.

      • systemPromptRecommendationConfig (dict) --

        The configuration for a system prompt recommendation.

        • systemPrompt (dict) --

          The current system prompt to optimize.

          • text (string) --

            The system prompt text provided inline.

          • configurationBundle (dict) --

            The system prompt sourced from a configuration bundle version.

            • bundleArn (string) --

              The Amazon Resource Name (ARN) of the configuration bundle.

            • versionId (string) --

              The version identifier of the configuration bundle.

            • systemPromptJsonPath (string) --

              The JSON path within the configuration bundle that contains the system prompt.

        • agentTraces (dict) --

          The agent traces to analyze for generating recommendations.

          • sessionSpans (list) --

            Agent traces provided as inline session spans in OpenTelemetry format.

            • (:ref:`document<document>`) --

          • cloudwatchLogs (dict) --

            Agent traces read from CloudWatch Logs.

            • logGroupArns (list) --

              The list of CloudWatch log group ARNs to read agent traces from.

              • (string) --

            • serviceNames (list) --

              The list of service names to filter traces within the specified log groups.

              • (string) --

            • startTime (datetime) --

              The start time of the time range to read traces from.

            • endTime (datetime) --

              The end time of the time range to read traces from.

            • rule (dict) --

              Optional rule configuration for filtering traces.

              • filters (list) --

                The list of filters to apply when reading agent traces.

                • (dict) --

                  A filter for narrowing down agent traces from CloudWatch Logs based on key-value comparisons.

                  • key (string) --

                    The key or field name to filter on within the agent trace data.

                  • operator (string) --

                    The comparison operator to use for filtering.

                  • value (dict) --

                    The value to compare against using the specified operator.

                    • stringValue (string) --

                      A string value for text-based filtering.

                    • doubleValue (float) --

                      A numeric value for numerical filtering and comparisons.

                    • booleanValue (boolean) --

                      A boolean value for true/false filtering conditions.

        • evaluationConfig (dict) --

          The evaluation configuration specifying which evaluator to use for assessing recommendation quality.

          • evaluators (list) --

            The list of evaluators to use for assessing recommendation quality.

            • (dict) --

              A reference to an evaluator used for recommendation assessment.

              • evaluatorArn (string) --

                The Amazon Resource Name (ARN) of the evaluator.

      • toolDescriptionRecommendationConfig (dict) --

        The configuration for a tool description recommendation.

        • toolDescription (dict) --

          The current tool descriptions to optimize.

          • toolDescriptionText (dict) --

            Tool descriptions provided as inline text.

            • tools (list) --

              The list of tool descriptions to optimize.

              • (dict) --

                A tool description input containing the tool name and its current description.

                • toolName (string) --

                  The name of the tool.

                • toolDescription (dict) --

                  The current description of the tool to optimize.

                  • text (string) --

                    The tool description as inline text.

          • configurationBundle (dict) --

            Tool descriptions sourced from a configuration bundle version.

            • bundleArn (string) --

              The Amazon Resource Name (ARN) of the configuration bundle.

            • versionId (string) --

              The version identifier of the configuration bundle.

            • tools (list) --

              The list of tool entries mapping tool names to their JSON paths within the bundle.

              • (dict) --

                Maps a tool name to its JSON path within a configuration bundle.

                • toolName (string) --

                  The name of the tool.

                • toolDescriptionJsonPath (string) --

                  The JSON path within the configuration bundle's components that contains the tool description.

        • agentTraces (dict) --

          The agent traces to analyze for generating tool description recommendations.

          • sessionSpans (list) --

            Agent traces provided as inline session spans in OpenTelemetry format.

            • (:ref:`document<document>`) --

          • cloudwatchLogs (dict) --

            Agent traces read from CloudWatch Logs.

            • logGroupArns (list) --

              The list of CloudWatch log group ARNs to read agent traces from.

              • (string) --

            • serviceNames (list) --

              The list of service names to filter traces within the specified log groups.

              • (string) --

            • startTime (datetime) --

              The start time of the time range to read traces from.

            • endTime (datetime) --

              The end time of the time range to read traces from.

            • rule (dict) --

              Optional rule configuration for filtering traces.

              • filters (list) --

                The list of filters to apply when reading agent traces.

                • (dict) --

                  A filter for narrowing down agent traces from CloudWatch Logs based on key-value comparisons.

                  • key (string) --

                    The key or field name to filter on within the agent trace data.

                  • operator (string) --

                    The comparison operator to use for filtering.

                  • value (dict) --

                    The value to compare against using the specified operator.

                    • stringValue (string) --

                      A string value for text-based filtering.

                    • doubleValue (float) --

                      A numeric value for numerical filtering and comparisons.

                    • booleanValue (boolean) --

                      A boolean value for true/false filtering conditions.

    • status (string) --

      The status of the recommendation.

    • createdAt (datetime) --

      The timestamp when the recommendation was created.

    • updatedAt (datetime) --

      The timestamp when the recommendation was last updated.

ListABTests (new) Link ¶

Lists all A/B tests in the account.

See also: AWS API Documentation

Request Syntax

client.list_ab_tests(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of results to return in the response. If the total number of results is greater than this value, use the token returned in the response in the nextToken field when making another request to return the next batch of results.

type nextToken:

string

param nextToken:

If the total number of results is greater than the maxResults value provided in the request, enter the token returned in the nextToken field in the response in this field to return the next batch of results.

rtype:

dict

returns:

Response Syntax

{
    'abTests': [
        {
            'abTestId': 'string',
            'abTestArn': 'string',
            'name': 'string',
            'status': 'CREATING'|'ACTIVE'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'|'DELETE_FAILED'|'FAILED',
            'executionStatus': 'PAUSED'|'RUNNING'|'STOPPED'|'NOT_STARTED',
            'description': 'string',
            'gatewayArn': 'string',
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1)
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • abTests (list) --

      The list of A/B test summaries.

      • (dict) --

        Summary information about an A/B test.

        • abTestId (string) --

          The unique identifier of the A/B test.

        • abTestArn (string) --

          The Amazon Resource Name (ARN) of the A/B test.

        • name (string) --

          The name of the A/B test.

        • status (string) --

          The current status of the A/B test.

        • executionStatus (string) --

          The execution status of the A/B test.

        • description (string) --

          The description of the A/B test.

        • gatewayArn (string) --

          The Amazon Resource Name (ARN) of the gateway used for traffic splitting.

        • createdAt (datetime) --

          The timestamp when the A/B test was created.

        • updatedAt (datetime) --

          The timestamp when the A/B test was last updated.

    • nextToken (string) --

      If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

DeleteABTest (new) Link ¶

Deletes an A/B test and its associated gateway rules.

See also: AWS API Documentation

Request Syntax

client.delete_ab_test(
    abTestId='string'
)
type abTestId:

string

param abTestId:

[REQUIRED]

The unique identifier of the A/B test to delete.

rtype:

dict

returns:

Response Syntax

{
    'abTestId': 'string',
    'abTestArn': 'string',
    'status': 'CREATING'|'ACTIVE'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'|'DELETE_FAILED'|'FAILED'
}

Response Structure

  • (dict) --

    • abTestId (string) --

      The unique identifier of the deleted A/B test.

    • abTestArn (string) --

      The Amazon Resource Name (ARN) of the deleted A/B test.

    • status (string) --

      The status of the A/B test deletion operation.

ListRecommendations (new) Link ¶

Lists all recommendations in the account, with optional filtering by status.

See also: AWS API Documentation

Request Syntax

client.list_recommendations(
    maxResults=123,
    nextToken='string',
    statusFilter='PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'DELETING'
)
type maxResults:

integer

param maxResults:

The maximum number of results to return in the response. If the total number of results is greater than this value, use the token returned in the response in the nextToken field when making another request to return the next batch of results.

type nextToken:

string

param nextToken:

If the total number of results is greater than the maxResults value provided in the request, enter the token returned in the nextToken field in the response in this field to return the next batch of results.

type statusFilter:

string

param statusFilter:

Optional filter to return only recommendations with the specified status.

rtype:

dict

returns:

Response Syntax

{
    'recommendationSummaries': [
        {
            'recommendationId': 'string',
            'recommendationArn': 'string',
            'name': 'string',
            'description': 'string',
            'type': 'SYSTEM_PROMPT_RECOMMENDATION'|'TOOL_DESCRIPTION_RECOMMENDATION',
            'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'DELETING',
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1)
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • recommendationSummaries (list) --

      The list of recommendation summaries.

      • (dict) --

        Summary information about a recommendation.

        • recommendationId (string) --

          The unique identifier of the recommendation.

        • recommendationArn (string) --

          The Amazon Resource Name (ARN) of the recommendation.

        • name (string) --

          The name of the recommendation.

        • description (string) --

          The description of the recommendation.

        • type (string) --

          The type of recommendation.

        • status (string) --

          The current status of the recommendation.

        • createdAt (datetime) --

          The timestamp when the recommendation was created.

        • updatedAt (datetime) --

          The timestamp when the recommendation was last updated.

    • nextToken (string) --

      If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

DeleteBatchEvaluation (new) Link ¶

Deletes a batch evaluation and its associated results.

See also: AWS API Documentation

Request Syntax

client.delete_batch_evaluation(
    batchEvaluationId='string'
)
type batchEvaluationId:

string

param batchEvaluationId:

[REQUIRED]

The unique identifier of the batch evaluation to delete.

rtype:

dict

returns:

Response Syntax

{
    'batchEvaluationId': 'string',
    'batchEvaluationArn': 'string',
    'status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'COMPLETED_WITH_ERRORS'|'FAILED'|'STOPPING'|'STOPPED'|'DELETING'
}

Response Structure

  • (dict) --

    • batchEvaluationId (string) --

      The unique identifier of the deleted batch evaluation.

    • batchEvaluationArn (string) --

      The Amazon Resource Name (ARN) of the deleted batch evaluation.

    • status (string) --

      The status of the batch evaluation deletion operation.

CreateABTest (new) Link ¶

Creates an A/B test for comparing agent configurations. A/B tests split traffic between a control variant and a treatment variant through a gateway, then evaluate performance using online evaluation configurations to determine which variant performs better.

See also: AWS API Documentation

Request Syntax

client.create_ab_test(
    name='string',
    description='string',
    gatewayArn='string',
    variants=[
        {
            'name': 'string',
            'weight': 123,
            'variantConfiguration': {
                'configurationBundle': {
                    'bundleArn': 'string',
                    'bundleVersion': 'string'
                },
                'target': {
                    'name': 'string'
                }
            }
        },
    ],
    gatewayFilter={
        'targetPaths': [
            'string',
        ]
    },
    evaluationConfig={
        'onlineEvaluationConfigArn': 'string',
        'perVariantOnlineEvaluationConfig': [
            {
                'name': 'string',
                'onlineEvaluationConfigArn': 'string'
            },
        ]
    },
    roleArn='string',
    enableOnCreate=True|False,
    clientToken='string'
)
type name:

string

param name:

[REQUIRED]

The name of the A/B test. Must be unique within your account.

type description:

string

param description:

The description of the A/B test.

type gatewayArn:

string

param gatewayArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the gateway to use for traffic splitting.

type variants:

list

param variants:

[REQUIRED]

The list of variants for the A/B test. Must contain exactly two variants: a control (C) and a treatment (T1), each with a configuration bundle or target reference and a traffic weight.

  • (dict) --

    A variant in an A/B test, representing either the control (C) or treatment (T1) configuration.

    • name (string) -- [REQUIRED]

      The name of the variant. Must be C for control or T1 for treatment.

    • weight (integer) -- [REQUIRED]

      The percentage of traffic to route to this variant. Weights across all variants must sum to 100.

    • variantConfiguration (dict) -- [REQUIRED]

      The configuration for this variant, including the configuration bundle or target reference.

      • configurationBundle (dict) --

        A reference to a configuration bundle version to use for this variant.

        • bundleArn (string) -- [REQUIRED]

          The Amazon Resource Name (ARN) of the configuration bundle.

        • bundleVersion (string) -- [REQUIRED]

          The version of the configuration bundle.

      • target (dict) --

        A reference to a gateway target to route traffic to for this variant.

        • name (string) -- [REQUIRED]

          The name of the gateway target.

type gatewayFilter:

dict

param gatewayFilter:

Optional filter to restrict which gateway target paths are included in the A/B test.

  • targetPaths (list) --

    A list of target path patterns to include in the A/B test.

    • (string) --

type evaluationConfig:

dict

param evaluationConfig:

[REQUIRED]

The evaluation configuration specifying which online evaluation configurations to use for measuring variant performance.

  • onlineEvaluationConfigArn (string) --

    The Amazon Resource Name (ARN) of a single online evaluation configuration to use for both variants.

  • perVariantOnlineEvaluationConfig (list) --

    Per-variant online evaluation configurations, allowing different evaluation settings for each variant.

    • (dict) --

      An online evaluation configuration associated with a specific A/B test variant.

      • name (string) -- [REQUIRED]

        The name of the variant this evaluation configuration applies to.

      • onlineEvaluationConfigArn (string) -- [REQUIRED]

        The Amazon Resource Name (ARN) of the online evaluation configuration for this variant.

type roleArn:

string

param roleArn:

[REQUIRED]

The IAM role ARN that grants permissions for the A/B test to access gateway and evaluation resources.

type enableOnCreate:

boolean

param enableOnCreate:

Whether to enable the A/B test immediately upon creation. If true, traffic splitting begins automatically.

type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, the service ignores the request, but does not return an error.

This field is autopopulated if not provided.

rtype:

dict

returns:

Response Syntax

{
    'abTestId': 'string',
    'abTestArn': 'string',
    'name': 'string',
    'status': 'CREATING'|'ACTIVE'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'|'DELETE_FAILED'|'FAILED',
    'executionStatus': 'PAUSED'|'RUNNING'|'STOPPED'|'NOT_STARTED',
    'createdAt': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • abTestId (string) --

      The unique identifier of the created A/B test.

    • abTestArn (string) --

      The Amazon Resource Name (ARN) of the created A/B test.

    • name (string) --

      The name of the A/B test.

    • status (string) --

      The status of the A/B test.

    • executionStatus (string) --

      The execution status indicating whether the A/B test is currently running.

    • createdAt (datetime) --

      The timestamp when the A/B test was created.