Amazon Bedrock AgentCore Control

2026/03/27 - Amazon Bedrock AgentCore Control - 4 updated api methods

Changes  Adding AgentCore Code Interpreter Node.js Runtime Support with an optional runtime field

CreateEvaluator (updated) Link ¶
Changes (request)
{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
                                                    'lambdaTimeoutInSeconds': 'integer'}}}}

Creates a custom evaluator for agent quality assessment. Custom evaluators can use either LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings, or code-based configurations with customer-managed Lambda functions to evaluate agent performance at tool call, trace, or session levels.

See also: AWS API Documentation

Request Syntax

client.create_evaluator(
    clientToken='string',
    evaluatorName='string',
    description='string',
    evaluatorConfig={
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    level='TOOL_CALL'|'TRACE'|'SESSION',
    tags={
        'string': 'string'
    }
)
type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don't specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn't return an error. For more information, see Ensuring idempotency.

This field is autopopulated if not provided.

type evaluatorName:

string

param evaluatorName:

[REQUIRED]

The name of the evaluator. Must be unique within your account.

type description:

string

param description:

The description of the evaluator that explains its purpose and evaluation criteria.

type evaluatorConfig:

dict

param evaluatorConfig:

[REQUIRED]

The configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.

  • llmAsAJudge (dict) --

    The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.

    • instructions (string) -- [REQUIRED]

      The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.

    • ratingScale (dict) -- [REQUIRED]

      The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.

      • numerical (list) --

        The numerical rating scale with defined score values and descriptions for quantitative evaluation.

        • (dict) --

          The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.

          • definition (string) -- [REQUIRED]

            The description that explains what this numerical rating represents and when it should be used.

          • value (float) -- [REQUIRED]

            The numerical value for this rating scale option.

          • label (string) -- [REQUIRED]

            The label or name that describes this numerical rating option.

      • categorical (list) --

        The categorical rating scale with named categories and definitions for qualitative evaluation.

        • (dict) --

          The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.

          • definition (string) -- [REQUIRED]

            The description that explains what this categorical rating represents and when it should be used.

          • label (string) -- [REQUIRED]

            The label or name of this categorical rating option.

    • modelConfig (dict) -- [REQUIRED]

      The model configuration that specifies which foundation model to use and how to configure it for evaluation.

      • bedrockEvaluatorModelConfig (dict) --

        The Amazon Bedrock model configuration for evaluation.

        • modelId (string) -- [REQUIRED]

          The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.

        • inferenceConfig (dict) --

          The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.

          • maxTokens (integer) --

            The maximum number of tokens to generate in the model response during evaluation.

          • temperature (float) --

            The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.

          • topP (float) --

            The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.

          • stopSequences (list) --

            The list of sequences that will cause the model to stop generating tokens when encountered.

            • (string) --

        • additionalModelRequestFields (:ref:`document<document>`) --

          Additional model-specific request fields to customize model behavior beyond the standard inference configuration.

  • codeBased (dict) --

    Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.

    • lambdaConfig (dict) --

      The Lambda function configuration for code-based evaluation.

      • lambdaArn (string) -- [REQUIRED]

        The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.

      • lambdaTimeoutInSeconds (integer) --

        The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.

type level:

string

param level:

[REQUIRED]

The evaluation level that determines the scope of evaluation. Valid values are TOOL_CALL for individual tool invocations, TRACE for single request-response interactions, or SESSION for entire conversation sessions.

type tags:

dict

param tags:

A map of tag keys and values to assign to an AgentCore Evaluator. Tags enable you to categorize your resources in different ways, for example, by purpose, owner, or environment.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'createdAt': datetime(2015, 1, 1),
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}

Response Structure

  • (dict) --

    • evaluatorArn (string) --

      The Amazon Resource Name (ARN) of the created evaluator.

    • evaluatorId (string) --

      The unique identifier of the created evaluator.

    • createdAt (datetime) --

      The timestamp when the evaluator was created.

    • status (string) --

      The status of the evaluator creation operation.

GetEvaluator (updated) Link ¶
Changes (response)
{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
                                                    'lambdaTimeoutInSeconds': 'integer'}}}}

Retrieves detailed information about an evaluator, including its configuration, status, and metadata. Works with both built-in and custom evaluators.

See also: AWS API Documentation

Request Syntax

client.get_evaluator(
    evaluatorId='string'
)
type evaluatorId:

string

param evaluatorId:

[REQUIRED]

The unique identifier of the evaluator to retrieve. Can be a built-in evaluator ID (e.g., Builtin.Helpfulness) or a custom evaluator ID.

rtype:

dict

returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'evaluatorName': 'string',
    'description': 'string',
    'evaluatorConfig': {
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    'level': 'TOOL_CALL'|'TRACE'|'SESSION',
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING',
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1),
    'lockedForModification': True|False
}

Response Structure

  • (dict) --

    • evaluatorArn (string) --

      The Amazon Resource Name (ARN) of the evaluator.

    • evaluatorId (string) --

      The unique identifier of the evaluator.

    • evaluatorName (string) --

      The name of the evaluator.

    • description (string) --

      The description of the evaluator.

    • evaluatorConfig (dict) --

      The configuration of the evaluator, including LLM-as-a-Judge or code-based settings.

      • llmAsAJudge (dict) --

        The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.

        • instructions (string) --

          The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.

        • ratingScale (dict) --

          The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.

          • numerical (list) --

            The numerical rating scale with defined score values and descriptions for quantitative evaluation.

            • (dict) --

              The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.

              • definition (string) --

                The description that explains what this numerical rating represents and when it should be used.

              • value (float) --

                The numerical value for this rating scale option.

              • label (string) --

                The label or name that describes this numerical rating option.

          • categorical (list) --

            The categorical rating scale with named categories and definitions for qualitative evaluation.

            • (dict) --

              The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.

              • definition (string) --

                The description that explains what this categorical rating represents and when it should be used.

              • label (string) --

                The label or name of this categorical rating option.

        • modelConfig (dict) --

          The model configuration that specifies which foundation model to use and how to configure it for evaluation.

          • bedrockEvaluatorModelConfig (dict) --

            The Amazon Bedrock model configuration for evaluation.

            • modelId (string) --

              The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.

            • inferenceConfig (dict) --

              The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.

              • maxTokens (integer) --

                The maximum number of tokens to generate in the model response during evaluation.

              • temperature (float) --

                The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.

              • topP (float) --

                The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.

              • stopSequences (list) --

                The list of sequences that will cause the model to stop generating tokens when encountered.

                • (string) --

            • additionalModelRequestFields (:ref:`document<document>`) --

              Additional model-specific request fields to customize model behavior beyond the standard inference configuration.

      • codeBased (dict) --

        Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.

        • lambdaConfig (dict) --

          The Lambda function configuration for code-based evaluation.

          • lambdaArn (string) --

            The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.

          • lambdaTimeoutInSeconds (integer) --

            The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.

    • level (string) --

      The evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.

    • status (string) --

      The current status of the evaluator.

    • createdAt (datetime) --

      The timestamp when the evaluator was created.

    • updatedAt (datetime) --

      The timestamp when the evaluator was last updated.

    • lockedForModification (boolean) --

      Whether the evaluator is locked for modification due to being referenced by active online evaluation configurations.

ListEvaluators (updated) Link ¶
Changes (response)
{'evaluators': {'evaluatorType': {'CustomCode'}}}

Lists all available evaluators, including both builtin evaluators provided by the service and custom evaluators created by the user.

See also: AWS API Documentation

Request Syntax

client.list_evaluators(
    nextToken='string',
    maxResults=123
)
type nextToken:

string

param nextToken:

The pagination token from a previous request to retrieve the next page of results.

type maxResults:

integer

param maxResults:

The maximum number of evaluators to return in a single response.

rtype:

dict

returns:

Response Syntax

{
    'evaluators': [
        {
            'evaluatorArn': 'string',
            'evaluatorId': 'string',
            'evaluatorName': 'string',
            'description': 'string',
            'evaluatorType': 'Builtin'|'Custom'|'CustomCode',
            'level': 'TOOL_CALL'|'TRACE'|'SESSION',
            'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING',
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'lockedForModification': True|False
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • evaluators (list) --

      The list of evaluator summaries containing basic information about each evaluator.

      • (dict) --

        The summary information about an evaluator, including basic metadata and status information.

        • evaluatorArn (string) --

          The Amazon Resource Name (ARN) of the evaluator.

        • evaluatorId (string) --

          The unique identifier of the evaluator.

        • evaluatorName (string) --

          The name of the evaluator.

        • description (string) --

          The description of the evaluator.

        • evaluatorType (string) --

          The type of evaluator, indicating whether it is a built-in evaluator provided by the service or a custom evaluator created by the user.

        • level (string) --

          The evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.

        • status (string) --

          The current status of the evaluator.

        • createdAt (datetime) --

          The timestamp when the evaluator was created.

        • updatedAt (datetime) --

          The timestamp when the evaluator was last updated.

        • lockedForModification (boolean) --

          Whether the evaluator is locked for modification due to being referenced by active online evaluation configurations.

    • nextToken (string) --

      The pagination token to use in a subsequent request to retrieve the next page of results.

UpdateEvaluator (updated) Link ¶
Changes (request)
{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
                                                    'lambdaTimeoutInSeconds': 'integer'}}}}

Updates a custom evaluator's configuration, description, or evaluation level. Built-in evaluators cannot be updated. The evaluator must not be locked for modification.

See also: AWS API Documentation

Request Syntax

client.update_evaluator(
    clientToken='string',
    evaluatorId='string',
    description='string',
    evaluatorConfig={
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    level='TOOL_CALL'|'TRACE'|'SESSION'
)
type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don't specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn't return an error. For more information, see Ensuring idempotency.

This field is autopopulated if not provided.

type evaluatorId:

string

param evaluatorId:

[REQUIRED]

The unique identifier of the evaluator to update.

type description:

string

param description:

The updated description of the evaluator.

type evaluatorConfig:

dict

param evaluatorConfig:

The updated configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.

  • llmAsAJudge (dict) --

    The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.

    • instructions (string) -- [REQUIRED]

      The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.

    • ratingScale (dict) -- [REQUIRED]

      The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.

      • numerical (list) --

        The numerical rating scale with defined score values and descriptions for quantitative evaluation.

        • (dict) --

          The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.

          • definition (string) -- [REQUIRED]

            The description that explains what this numerical rating represents and when it should be used.

          • value (float) -- [REQUIRED]

            The numerical value for this rating scale option.

          • label (string) -- [REQUIRED]

            The label or name that describes this numerical rating option.

      • categorical (list) --

        The categorical rating scale with named categories and definitions for qualitative evaluation.

        • (dict) --

          The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.

          • definition (string) -- [REQUIRED]

            The description that explains what this categorical rating represents and when it should be used.

          • label (string) -- [REQUIRED]

            The label or name of this categorical rating option.

    • modelConfig (dict) -- [REQUIRED]

      The model configuration that specifies which foundation model to use and how to configure it for evaluation.

      • bedrockEvaluatorModelConfig (dict) --

        The Amazon Bedrock model configuration for evaluation.

        • modelId (string) -- [REQUIRED]

          The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.

        • inferenceConfig (dict) --

          The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.

          • maxTokens (integer) --

            The maximum number of tokens to generate in the model response during evaluation.

          • temperature (float) --

            The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.

          • topP (float) --

            The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.

          • stopSequences (list) --

            The list of sequences that will cause the model to stop generating tokens when encountered.

            • (string) --

        • additionalModelRequestFields (:ref:`document<document>`) --

          Additional model-specific request fields to customize model behavior beyond the standard inference configuration.

  • codeBased (dict) --

    Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.

    • lambdaConfig (dict) --

      The Lambda function configuration for code-based evaluation.

      • lambdaArn (string) -- [REQUIRED]

        The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.

      • lambdaTimeoutInSeconds (integer) --

        The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.

type level:

string

param level:

The updated evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.

rtype:

dict

returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'updatedAt': datetime(2015, 1, 1),
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}

Response Structure

  • (dict) --

    • evaluatorArn (string) --

      The Amazon Resource Name (ARN) of the updated evaluator.

    • evaluatorId (string) --

      The unique identifier of the updated evaluator.

    • updatedAt (datetime) --

      The timestamp when the evaluator was last updated.

    • status (string) --

      The status of the evaluator update operation.