AWS API Changes

2026/03/27 - Amazon Bedrock AgentCore Control - 4 updated api methods

Changes Adding AgentCore Code Interpreter Node.js Runtime Support with an optional runtime field

CreateEvaluator (updated)

Link ¶
Changes (request)

{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
                                                    'lambdaTimeoutInSeconds': 'integer'}}}}

Creates a custom evaluator for agent quality assessment. Custom evaluators can use either LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings, or code-based configurations with customer-managed Lambda functions to evaluate agent performance at tool call, trace, or session levels.

See also: AWS API Documentation

Request Syntax

client.create_evaluator(
    clientToken='string',
    evaluatorName='string',
    description='string',
    evaluatorConfig={
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    level='TOOL_CALL'|'TRACE'|'SESSION',
    tags={
        'string': 'string'
    }
)

type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don't specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn't return an error. For more information, see Ensuring idempotency.

This field is autopopulated if not provided.

type evaluatorName:

string

param evaluatorName:

[REQUIRED]

The name of the evaluator. Must be unique within your account.

type description:

string

param description:

The description of the evaluator that explains its purpose and evaluation criteria.

type evaluatorConfig:

dict

param evaluatorConfig:

[REQUIRED]

The configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.

llmAsAJudge (dict) --

The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
- instructions (string) -- [REQUIRED]
  
  The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
- ratingScale (dict) -- [REQUIRED]
  
  The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
  
  Note
  
  This is a Tagged Union structure. Only one of the following top level keys can be set: numerical, categorical.
  - numerical (list) --
    
    The numerical rating scale with defined score values and descriptions for quantitative evaluation.
    - (dict) --
      
      The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
      - definition (string) -- [REQUIRED]
        
        The description that explains what this numerical rating represents and when it should be used.
      - value (float) -- [REQUIRED]
        
        The numerical value for this rating scale option.
      - label (string) -- [REQUIRED]
        
        The label or name that describes this numerical rating option.
  - categorical (list) --
    
    The categorical rating scale with named categories and definitions for qualitative evaluation.
    - (dict) --
      
      The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
      - definition (string) -- [REQUIRED]
        
        The description that explains what this categorical rating represents and when it should be used.
      - label (string) -- [REQUIRED]
        
        The label or name of this categorical rating option.
- modelConfig (dict) -- [REQUIRED]
  
  The model configuration that specifies which foundation model to use and how to configure it for evaluation.
  
  Note
  
  This is a Tagged Union structure. Only one of the following top level keys can be set: bedrockEvaluatorModelConfig.
  - bedrockEvaluatorModelConfig (dict) --
    
    The Amazon Bedrock model configuration for evaluation.
    - modelId (string) -- [REQUIRED]
      
      The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
    - inferenceConfig (dict) --
      
      The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
      - maxTokens (integer) --
        
        The maximum number of tokens to generate in the model response during evaluation.
      - temperature (float) --
        
        The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.
      - topP (float) --
        
        The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.
      - stopSequences (list) --
        
        The list of sequences that will cause the model to stop generating tokens when encountered.
        
        (string) --
    - additionalModelRequestFields (:ref:`document<document>`) --
      
      Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
codeBased (dict) --

Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.

Note

This is a Tagged Union structure. Only one of the following top level keys can be set: lambdaConfig.
- lambdaConfig (dict) --
  
  The Lambda function configuration for code-based evaluation.
  - lambdaArn (string) -- [REQUIRED]
    
    The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
  - lambdaTimeoutInSeconds (integer) --
    
    The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.

type level:

string

param level:

[REQUIRED]

The evaluation level that determines the scope of evaluation. Valid values are TOOL_CALL for individual tool invocations, TRACE for single request-response interactions, or SESSION for entire conversation sessions.

type tags:

dict

param tags:

A map of tag keys and values to assign to an AgentCore Evaluator. Tags enable you to categorize your resources in different ways, for example, by purpose, owner, or environment.

(string) --
- (string) --

rtype:

dict

returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'createdAt': datetime(2015, 1, 1),
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}

Response Structure

(dict) --
- evaluatorArn (string) --
  
  The Amazon Resource Name (ARN) of the created evaluator.
- evaluatorId (string) --
  
  The unique identifier of the created evaluator.
- createdAt (datetime) --
  
  The timestamp when the evaluator was created.
- status (string) --
  
  The status of the evaluator creation operation.

GetEvaluator (updated)

Link ¶
Changes (response)

{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
                                                    'lambdaTimeoutInSeconds': 'integer'}}}}

Retrieves detailed information about an evaluator, including its configuration, status, and metadata. Works with both built-in and custom evaluators.

See also: AWS API Documentation

Request Syntax

client.get_evaluator(
    evaluatorId='string'
)

type evaluatorId:

string

param evaluatorId:

[REQUIRED]

The unique identifier of the evaluator to retrieve. Can be a built-in evaluator ID (e.g., Builtin.Helpfulness) or a custom evaluator ID.

rtype:

dict

returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'evaluatorName': 'string',
    'description': 'string',
    'evaluatorConfig': {
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    'level': 'TOOL_CALL'|'TRACE'|'SESSION',
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING',
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1),
    'lockedForModification': True|False
}

Response Structure

(dict) --
- evaluatorArn (string) --
  
  The Amazon Resource Name (ARN) of the evaluator.
- evaluatorId (string) --
  
  The unique identifier of the evaluator.
- evaluatorName (string) --
  
  The name of the evaluator.
- description (string) --
  
  The description of the evaluator.
- evaluatorConfig (dict) --
  
  The configuration of the evaluator, including LLM-as-a-Judge or code-based settings.
  Note
  
  This is a Tagged Union structure. Only one of the following top level keys will be set: llmAsAJudge, codeBased. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:
```
'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
```
  - llmAsAJudge (dict) --
    
    The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
    - instructions (string) --
      
      The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
    - ratingScale (dict) --
      
      The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
      Note
      
      This is a Tagged Union structure. Only one of the following top level keys will be set: numerical, categorical. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:
      
      'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
      - numerical (list) --
        
        The numerical rating scale with defined score values and descriptions for quantitative evaluation.
        
        (dict) --
        
        The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
        
        definition (string) --
        
        The description that explains what this numerical rating represents and when it should be used.
        
        value (float) --
        
        The numerical value for this rating scale option.
        
        label (string) --
        
        The label or name that describes this numerical rating option.
      - categorical (list) --
        
        The categorical rating scale with named categories and definitions for qualitative evaluation.
        
        (dict) --
        
        The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
        
        definition (string) --
        
        The description that explains what this categorical rating represents and when it should be used.
        
        label (string) --
        
        The label or name of this categorical rating option.
    - modelConfig (dict) --
      
      The model configuration that specifies which foundation model to use and how to configure it for evaluation.
      Note
      
      This is a Tagged Union structure. Only one of the following top level keys will be set: bedrockEvaluatorModelConfig. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:
      
      'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
      - bedrockEvaluatorModelConfig (dict) --
        
        The Amazon Bedrock model configuration for evaluation.
        
        modelId (string) --
        
        The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
        
        inferenceConfig (dict) --
        
        The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
        
        maxTokens (integer) --
        
        The maximum number of tokens to generate in the model response during evaluation.
        
        temperature (float) --
        
        The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.
        
        topP (float) --
        
        The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.
        
        stopSequences (list) --
        
        The list of sequences that will cause the model to stop generating tokens when encountered.
        
        (string) --
        
        additionalModelRequestFields (:ref:`document<document>`) --
        
        Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
  - codeBased (dict) --
    
    Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.
    Note
    
    This is a Tagged Union structure. Only one of the following top level keys will be set: lambdaConfig. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:
```
'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
```
    - lambdaConfig (dict) --
      
      The Lambda function configuration for code-based evaluation.
      - lambdaArn (string) --
        
        The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
      - lambdaTimeoutInSeconds (integer) --
        
        The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.
- level (string) --
  
  The evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.
- status (string) --
  
  The current status of the evaluator.
- createdAt (datetime) --
  
  The timestamp when the evaluator was created.
- updatedAt (datetime) --
  
  The timestamp when the evaluator was last updated.
- lockedForModification (boolean) --
  
  Whether the evaluator is locked for modification due to being referenced by active online evaluation configurations.

ListEvaluators (updated)

Link ¶
Changes (response)

{'evaluators': {'evaluatorType': {'CustomCode'}}}

Lists all available evaluators, including both builtin evaluators provided by the service and custom evaluators created by the user.

See also: AWS API Documentation

Request Syntax

client.list_evaluators(
    nextToken='string',
    maxResults=123
)

type nextToken:

string

param nextToken:

The pagination token from a previous request to retrieve the next page of results.

type maxResults:

integer

param maxResults:

The maximum number of evaluators to return in a single response.

rtype:

dict

returns:

Response Syntax

{
    'evaluators': [
        {
            'evaluatorArn': 'string',
            'evaluatorId': 'string',
            'evaluatorName': 'string',
            'description': 'string',
            'evaluatorType': 'Builtin'|'Custom'|'CustomCode',
            'level': 'TOOL_CALL'|'TRACE'|'SESSION',
            'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING',
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'lockedForModification': True|False
        },
    ],
    'nextToken': 'string'
}

Response Structure

(dict) --
- evaluators (list) --
  
  The list of evaluator summaries containing basic information about each evaluator.
  - (dict) --
    
    The summary information about an evaluator, including basic metadata and status information.
    - evaluatorArn (string) --
      
      The Amazon Resource Name (ARN) of the evaluator.
    - evaluatorId (string) --
      
      The unique identifier of the evaluator.
    - evaluatorName (string) --
      
      The name of the evaluator.
    - description (string) --
      
      The description of the evaluator.
    - evaluatorType (string) --
      
      The type of evaluator, indicating whether it is a built-in evaluator provided by the service or a custom evaluator created by the user.
    - level (string) --
      
      The evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.
    - status (string) --
      
      The current status of the evaluator.
    - createdAt (datetime) --
      
      The timestamp when the evaluator was created.
    - updatedAt (datetime) --
      
      The timestamp when the evaluator was last updated.
    - lockedForModification (boolean) --
      
      Whether the evaluator is locked for modification due to being referenced by active online evaluation configurations.
- nextToken (string) --
  
  The pagination token to use in a subsequent request to retrieve the next page of results.

UpdateEvaluator (updated)

Link ¶
Changes (request)

{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
                                                    'lambdaTimeoutInSeconds': 'integer'}}}}

Updates a custom evaluator's configuration, description, or evaluation level. Built-in evaluators cannot be updated. The evaluator must not be locked for modification.

See also: AWS API Documentation

Request Syntax

client.update_evaluator(
    clientToken='string',
    evaluatorId='string',
    description='string',
    evaluatorConfig={
        'llmAsAJudge': {
            'instructions': 'string',
            'ratingScale': {
                'numerical': [
                    {
                        'definition': 'string',
                        'value': 123.0,
                        'label': 'string'
                    },
                ],
                'categorical': [
                    {
                        'definition': 'string',
                        'label': 'string'
                    },
                ]
            },
            'modelConfig': {
                'bedrockEvaluatorModelConfig': {
                    'modelId': 'string',
                    'inferenceConfig': {
                        'maxTokens': 123,
                        'temperature': ...,
                        'topP': ...,
                        'stopSequences': [
                            'string',
                        ]
                    },
                    'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
                }
            }
        },
        'codeBased': {
            'lambdaConfig': {
                'lambdaArn': 'string',
                'lambdaTimeoutInSeconds': 123
            }
        }
    },
    level='TOOL_CALL'|'TRACE'|'SESSION'
)

type clientToken:

string

param clientToken:

This field is autopopulated if not provided.

type evaluatorId:

string

param evaluatorId:

[REQUIRED]

The unique identifier of the evaluator to update.

type description:

string

param description:

The updated description of the evaluator.

type evaluatorConfig:

dict

param evaluatorConfig:

The updated configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.

llmAsAJudge (dict) --

The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
- instructions (string) -- [REQUIRED]
  
  The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
- ratingScale (dict) -- [REQUIRED]
  
  The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
  
  Note
  
  This is a Tagged Union structure. Only one of the following top level keys can be set: numerical, categorical.
  - numerical (list) --
    
    The numerical rating scale with defined score values and descriptions for quantitative evaluation.
    - (dict) --
      
      The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
      - definition (string) -- [REQUIRED]
        
        The description that explains what this numerical rating represents and when it should be used.
      - value (float) -- [REQUIRED]
        
        The numerical value for this rating scale option.
      - label (string) -- [REQUIRED]
        
        The label or name that describes this numerical rating option.
  - categorical (list) --
    
    The categorical rating scale with named categories and definitions for qualitative evaluation.
    - (dict) --
      
      The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
      - definition (string) -- [REQUIRED]
        
        The description that explains what this categorical rating represents and when it should be used.
      - label (string) -- [REQUIRED]
        
        The label or name of this categorical rating option.
- modelConfig (dict) -- [REQUIRED]
  
  The model configuration that specifies which foundation model to use and how to configure it for evaluation.
  
  Note
  
  This is a Tagged Union structure. Only one of the following top level keys can be set: bedrockEvaluatorModelConfig.
  - bedrockEvaluatorModelConfig (dict) --
    
    The Amazon Bedrock model configuration for evaluation.
    - modelId (string) -- [REQUIRED]
      
      The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
    - inferenceConfig (dict) --
      
      The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
      - maxTokens (integer) --
        
        The maximum number of tokens to generate in the model response during evaluation.
      - temperature (float) --
        
        The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.
      - topP (float) --
        
        The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.
      - stopSequences (list) --
        
        The list of sequences that will cause the model to stop generating tokens when encountered.
        
        (string) --
    - additionalModelRequestFields (:ref:`document<document>`) --
      
      Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
codeBased (dict) --

Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.

Note

This is a Tagged Union structure. Only one of the following top level keys can be set: lambdaConfig.
- lambdaConfig (dict) --
  
  The Lambda function configuration for code-based evaluation.
  - lambdaArn (string) -- [REQUIRED]
    
    The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
  - lambdaTimeoutInSeconds (integer) --
    
    The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.

type level:

string

param level:

The updated evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.

rtype:

dict

returns:

Response Syntax

{
    'evaluatorArn': 'string',
    'evaluatorId': 'string',
    'updatedAt': datetime(2015, 1, 1),
    'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}

Response Structure

(dict) --
- evaluatorArn (string) --
  
  The Amazon Resource Name (ARN) of the updated evaluator.
- evaluatorId (string) --
  
  The unique identifier of the updated evaluator.
- updatedAt (datetime) --
  
  The timestamp when the evaluator was last updated.
- status (string) --
  
  The status of the evaluator update operation.