2026/03/27 - Amazon Bedrock AgentCore Control - 4 updated api methods
Changes Adding AgentCore Code Interpreter Node.js Runtime Support with an optional runtime field
{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
'lambdaTimeoutInSeconds': 'integer'}}}}
Creates a custom evaluator for agent quality assessment. Custom evaluators can use either LLM-as-a-Judge configurations with user-defined prompts, rating scales, and model settings, or code-based configurations with customer-managed Lambda functions to evaluate agent performance at tool call, trace, or session levels.
See also: AWS API Documentation
Request Syntax
client.create_evaluator(
clientToken='string',
evaluatorName='string',
description='string',
evaluatorConfig={
'llmAsAJudge': {
'instructions': 'string',
'ratingScale': {
'numerical': [
{
'definition': 'string',
'value': 123.0,
'label': 'string'
},
],
'categorical': [
{
'definition': 'string',
'label': 'string'
},
]
},
'modelConfig': {
'bedrockEvaluatorModelConfig': {
'modelId': 'string',
'inferenceConfig': {
'maxTokens': 123,
'temperature': ...,
'topP': ...,
'stopSequences': [
'string',
]
},
'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
}
}
},
'codeBased': {
'lambdaConfig': {
'lambdaArn': 'string',
'lambdaTimeoutInSeconds': 123
}
}
},
level='TOOL_CALL'|'TRACE'|'SESSION',
tags={
'string': 'string'
}
)
string
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don't specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn't return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
string
[REQUIRED]
The name of the evaluator. Must be unique within your account.
string
The description of the evaluator that explains its purpose and evaluation criteria.
dict
[REQUIRED]
The configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.
llmAsAJudge (dict) --
The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
instructions (string) -- [REQUIRED]
The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
ratingScale (dict) -- [REQUIRED]
The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
numerical (list) --
The numerical rating scale with defined score values and descriptions for quantitative evaluation.
(dict) --
The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
definition (string) -- [REQUIRED]
The description that explains what this numerical rating represents and when it should be used.
value (float) -- [REQUIRED]
The numerical value for this rating scale option.
label (string) -- [REQUIRED]
The label or name that describes this numerical rating option.
categorical (list) --
The categorical rating scale with named categories and definitions for qualitative evaluation.
(dict) --
The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
definition (string) -- [REQUIRED]
The description that explains what this categorical rating represents and when it should be used.
label (string) -- [REQUIRED]
The label or name of this categorical rating option.
modelConfig (dict) -- [REQUIRED]
The model configuration that specifies which foundation model to use and how to configure it for evaluation.
bedrockEvaluatorModelConfig (dict) --
The Amazon Bedrock model configuration for evaluation.
modelId (string) -- [REQUIRED]
The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
inferenceConfig (dict) --
The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
maxTokens (integer) --
The maximum number of tokens to generate in the model response during evaluation.
temperature (float) --
The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.
topP (float) --
The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.
stopSequences (list) --
The list of sequences that will cause the model to stop generating tokens when encountered.
(string) --
additionalModelRequestFields (:ref:`document<document>`) --
Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
codeBased (dict) --
Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.
lambdaConfig (dict) --
The Lambda function configuration for code-based evaluation.
lambdaArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
lambdaTimeoutInSeconds (integer) --
The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.
string
[REQUIRED]
The evaluation level that determines the scope of evaluation. Valid values are TOOL_CALL for individual tool invocations, TRACE for single request-response interactions, or SESSION for entire conversation sessions.
dict
A map of tag keys and values to assign to an AgentCore Evaluator. Tags enable you to categorize your resources in different ways, for example, by purpose, owner, or environment.
(string) --
(string) --
dict
Response Syntax
{
'evaluatorArn': 'string',
'evaluatorId': 'string',
'createdAt': datetime(2015, 1, 1),
'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}
Response Structure
(dict) --
evaluatorArn (string) --
The Amazon Resource Name (ARN) of the created evaluator.
evaluatorId (string) --
The unique identifier of the created evaluator.
createdAt (datetime) --
The timestamp when the evaluator was created.
status (string) --
The status of the evaluator creation operation.
{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
'lambdaTimeoutInSeconds': 'integer'}}}}
Retrieves detailed information about an evaluator, including its configuration, status, and metadata. Works with both built-in and custom evaluators.
See also: AWS API Documentation
Request Syntax
client.get_evaluator(
evaluatorId='string'
)
string
[REQUIRED]
The unique identifier of the evaluator to retrieve. Can be a built-in evaluator ID (e.g., Builtin.Helpfulness) or a custom evaluator ID.
dict
Response Syntax
{
'evaluatorArn': 'string',
'evaluatorId': 'string',
'evaluatorName': 'string',
'description': 'string',
'evaluatorConfig': {
'llmAsAJudge': {
'instructions': 'string',
'ratingScale': {
'numerical': [
{
'definition': 'string',
'value': 123.0,
'label': 'string'
},
],
'categorical': [
{
'definition': 'string',
'label': 'string'
},
]
},
'modelConfig': {
'bedrockEvaluatorModelConfig': {
'modelId': 'string',
'inferenceConfig': {
'maxTokens': 123,
'temperature': ...,
'topP': ...,
'stopSequences': [
'string',
]
},
'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
}
}
},
'codeBased': {
'lambdaConfig': {
'lambdaArn': 'string',
'lambdaTimeoutInSeconds': 123
}
}
},
'level': 'TOOL_CALL'|'TRACE'|'SESSION',
'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING',
'createdAt': datetime(2015, 1, 1),
'updatedAt': datetime(2015, 1, 1),
'lockedForModification': True|False
}
Response Structure
(dict) --
evaluatorArn (string) --
The Amazon Resource Name (ARN) of the evaluator.
evaluatorId (string) --
The unique identifier of the evaluator.
evaluatorName (string) --
The name of the evaluator.
description (string) --
The description of the evaluator.
evaluatorConfig (dict) --
The configuration of the evaluator, including LLM-as-a-Judge or code-based settings.
llmAsAJudge (dict) --
The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
instructions (string) --
The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
ratingScale (dict) --
The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
numerical (list) --
The numerical rating scale with defined score values and descriptions for quantitative evaluation.
(dict) --
The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
definition (string) --
The description that explains what this numerical rating represents and when it should be used.
value (float) --
The numerical value for this rating scale option.
label (string) --
The label or name that describes this numerical rating option.
categorical (list) --
The categorical rating scale with named categories and definitions for qualitative evaluation.
(dict) --
The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
definition (string) --
The description that explains what this categorical rating represents and when it should be used.
label (string) --
The label or name of this categorical rating option.
modelConfig (dict) --
The model configuration that specifies which foundation model to use and how to configure it for evaluation.
bedrockEvaluatorModelConfig (dict) --
The Amazon Bedrock model configuration for evaluation.
modelId (string) --
The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
inferenceConfig (dict) --
The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
maxTokens (integer) --
The maximum number of tokens to generate in the model response during evaluation.
temperature (float) --
The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.
topP (float) --
The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.
stopSequences (list) --
The list of sequences that will cause the model to stop generating tokens when encountered.
(string) --
additionalModelRequestFields (:ref:`document<document>`) --
Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
codeBased (dict) --
Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.
lambdaConfig (dict) --
The Lambda function configuration for code-based evaluation.
lambdaArn (string) --
The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
lambdaTimeoutInSeconds (integer) --
The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.
level (string) --
The evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.
status (string) --
The current status of the evaluator.
createdAt (datetime) --
The timestamp when the evaluator was created.
updatedAt (datetime) --
The timestamp when the evaluator was last updated.
lockedForModification (boolean) --
Whether the evaluator is locked for modification due to being referenced by active online evaluation configurations.
{'evaluators': {'evaluatorType': {'CustomCode'}}}
Lists all available evaluators, including both builtin evaluators provided by the service and custom evaluators created by the user.
See also: AWS API Documentation
Request Syntax
client.list_evaluators(
nextToken='string',
maxResults=123
)
string
The pagination token from a previous request to retrieve the next page of results.
integer
The maximum number of evaluators to return in a single response.
dict
Response Syntax
{
'evaluators': [
{
'evaluatorArn': 'string',
'evaluatorId': 'string',
'evaluatorName': 'string',
'description': 'string',
'evaluatorType': 'Builtin'|'Custom'|'CustomCode',
'level': 'TOOL_CALL'|'TRACE'|'SESSION',
'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING',
'createdAt': datetime(2015, 1, 1),
'updatedAt': datetime(2015, 1, 1),
'lockedForModification': True|False
},
],
'nextToken': 'string'
}
Response Structure
(dict) --
evaluators (list) --
The list of evaluator summaries containing basic information about each evaluator.
(dict) --
The summary information about an evaluator, including basic metadata and status information.
evaluatorArn (string) --
The Amazon Resource Name (ARN) of the evaluator.
evaluatorId (string) --
The unique identifier of the evaluator.
evaluatorName (string) --
The name of the evaluator.
description (string) --
The description of the evaluator.
evaluatorType (string) --
The type of evaluator, indicating whether it is a built-in evaluator provided by the service or a custom evaluator created by the user.
level (string) --
The evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.
status (string) --
The current status of the evaluator.
createdAt (datetime) --
The timestamp when the evaluator was created.
updatedAt (datetime) --
The timestamp when the evaluator was last updated.
lockedForModification (boolean) --
Whether the evaluator is locked for modification due to being referenced by active online evaluation configurations.
nextToken (string) --
The pagination token to use in a subsequent request to retrieve the next page of results.
{'evaluatorConfig': {'codeBased': {'lambdaConfig': {'lambdaArn': 'string',
'lambdaTimeoutInSeconds': 'integer'}}}}
Updates a custom evaluator's configuration, description, or evaluation level. Built-in evaluators cannot be updated. The evaluator must not be locked for modification.
See also: AWS API Documentation
Request Syntax
client.update_evaluator(
clientToken='string',
evaluatorId='string',
description='string',
evaluatorConfig={
'llmAsAJudge': {
'instructions': 'string',
'ratingScale': {
'numerical': [
{
'definition': 'string',
'value': 123.0,
'label': 'string'
},
],
'categorical': [
{
'definition': 'string',
'label': 'string'
},
]
},
'modelConfig': {
'bedrockEvaluatorModelConfig': {
'modelId': 'string',
'inferenceConfig': {
'maxTokens': 123,
'temperature': ...,
'topP': ...,
'stopSequences': [
'string',
]
},
'additionalModelRequestFields': {...}|[...]|123|123.4|'string'|True|None
}
}
},
'codeBased': {
'lambdaConfig': {
'lambdaArn': 'string',
'lambdaTimeoutInSeconds': 123
}
}
},
level='TOOL_CALL'|'TRACE'|'SESSION'
)
string
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If you don't specify this field, a value is randomly generated for you. If this token matches a previous request, the service ignores the request, but doesn't return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
string
[REQUIRED]
The unique identifier of the evaluator to update.
string
The updated description of the evaluator.
dict
The updated configuration for the evaluator. Specify either LLM-as-a-Judge settings with instructions, rating scale, and model configuration, or code-based settings with a customer-managed Lambda function.
llmAsAJudge (dict) --
The LLM-as-a-Judge configuration that uses a language model to evaluate agent performance based on custom instructions and rating scales.
instructions (string) -- [REQUIRED]
The evaluation instructions that guide the language model in assessing agent performance, including criteria and evaluation guidelines.
ratingScale (dict) -- [REQUIRED]
The rating scale that defines how the evaluator should score agent performance, either numerical or categorical.
numerical (list) --
The numerical rating scale with defined score values and descriptions for quantitative evaluation.
(dict) --
The definition of a numerical rating scale option that provides a numeric value with its description for evaluation scoring.
definition (string) -- [REQUIRED]
The description that explains what this numerical rating represents and when it should be used.
value (float) -- [REQUIRED]
The numerical value for this rating scale option.
label (string) -- [REQUIRED]
The label or name that describes this numerical rating option.
categorical (list) --
The categorical rating scale with named categories and definitions for qualitative evaluation.
(dict) --
The definition of a categorical rating scale option that provides a named category with its description for evaluation scoring.
definition (string) -- [REQUIRED]
The description that explains what this categorical rating represents and when it should be used.
label (string) -- [REQUIRED]
The label or name of this categorical rating option.
modelConfig (dict) -- [REQUIRED]
The model configuration that specifies which foundation model to use and how to configure it for evaluation.
bedrockEvaluatorModelConfig (dict) --
The Amazon Bedrock model configuration for evaluation.
modelId (string) -- [REQUIRED]
The identifier of the Amazon Bedrock model to use for evaluation. Must be a supported foundation model available in your region.
inferenceConfig (dict) --
The inference configuration parameters that control model behavior during evaluation, including temperature, token limits, and sampling settings.
maxTokens (integer) --
The maximum number of tokens to generate in the model response during evaluation.
temperature (float) --
The temperature value that controls randomness in the model's responses. Lower values produce more deterministic outputs.
topP (float) --
The top-p sampling parameter that controls the diversity of the model's responses by limiting the cumulative probability of token choices.
stopSequences (list) --
The list of sequences that will cause the model to stop generating tokens when encountered.
(string) --
additionalModelRequestFields (:ref:`document<document>`) --
Additional model-specific request fields to customize model behavior beyond the standard inference configuration.
codeBased (dict) --
Configuration for a code-based evaluator that uses a customer-managed Lambda function to programmatically assess agent performance.
lambdaConfig (dict) --
The Lambda function configuration for code-based evaluation.
lambdaArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the Lambda function that implements the evaluation logic.
lambdaTimeoutInSeconds (integer) --
The timeout in seconds for the Lambda function invocation. Defaults to 60. Must be between 1 and 300.
string
The updated evaluation level ( TOOL_CALL, TRACE, or SESSION) that determines the scope of evaluation.
dict
Response Syntax
{
'evaluatorArn': 'string',
'evaluatorId': 'string',
'updatedAt': datetime(2015, 1, 1),
'status': 'ACTIVE'|'CREATING'|'CREATE_FAILED'|'UPDATING'|'UPDATE_FAILED'|'DELETING'
}
Response Structure
(dict) --
evaluatorArn (string) --
The Amazon Resource Name (ARN) of the updated evaluator.
evaluatorId (string) --
The unique identifier of the updated evaluator.
updatedAt (datetime) --
The timestamp when the evaluator was last updated.
status (string) --
The status of the evaluator update operation.