AWS API Changes

2023/03/02 - Amazon SageMaker Service - 1 updated api methods

Changes Add a new field "EndpointMetrics" in SageMaker Inference Recommender "ListInferenceRecommendationsJobSteps" API response.

ListInferenceRecommendationsJobSteps (updated)

Link ¶
Changes (response)

{'Steps': {'InferenceBenchmark': {'EndpointMetrics': {'MaxInvocations': 'integer',
                                                      'ModelLatency': 'integer'}}}}

Returns a list of the subtasks for an Inference Recommender job.

The supported subtasks are benchmarks, which evaluate the performance of your model on different instance types.

See also: AWS API Documentation

Request Syntax

client.list_inference_recommendations_job_steps(
    JobName='string',
    Status='PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
    StepType='BENCHMARK',
    MaxResults=123,
    NextToken='string'
)

type JobName:

string

param JobName:

[REQUIRED]

The name for the Inference Recommender job.

type Status:

string

param Status:

A filter to return benchmarks of a specified status. If this field is left empty, then all benchmarks are returned.

type StepType:

string

param StepType:

A filter to return details about the specified type of subtask.

BENCHMARK: Evaluate the performance of your model on different instance types.

type MaxResults:

integer

param MaxResults:

The maximum number of results to return.

type NextToken:

string

param NextToken:

A token that you can specify to return more results from the list. Specify this field if you have a token that was returned from a previous request.

rtype:

dict

returns:

Response Syntax

{
    'Steps': [
        {
            'StepType': 'BENCHMARK',
            'JobName': 'string',
            'Status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
            'InferenceBenchmark': {
                'Metrics': {
                    'CostPerHour': ...,
                    'CostPerInference': ...,
                    'MaxInvocations': 123,
                    'ModelLatency': 123,
                    'CpuUtilization': ...,
                    'MemoryUtilization': ...
                },
                'EndpointConfiguration': {
                    'EndpointName': 'string',
                    'VariantName': 'string',
                    'InstanceType': 'ml.t2.medium'|'ml.t2.large'|'ml.t2.xlarge'|'ml.t2.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.m5d.large'|'ml.m5d.xlarge'|'ml.m5d.2xlarge'|'ml.m5d.4xlarge'|'ml.m5d.12xlarge'|'ml.m5d.24xlarge'|'ml.c4.large'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.large'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.c5d.large'|'ml.c5d.xlarge'|'ml.c5d.2xlarge'|'ml.c5d.4xlarge'|'ml.c5d.9xlarge'|'ml.c5d.18xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.12xlarge'|'ml.r5.24xlarge'|'ml.r5d.large'|'ml.r5d.xlarge'|'ml.r5d.2xlarge'|'ml.r5d.4xlarge'|'ml.r5d.12xlarge'|'ml.r5d.24xlarge'|'ml.inf1.xlarge'|'ml.inf1.2xlarge'|'ml.inf1.6xlarge'|'ml.inf1.24xlarge'|'ml.c6i.large'|'ml.c6i.xlarge'|'ml.c6i.2xlarge'|'ml.c6i.4xlarge'|'ml.c6i.8xlarge'|'ml.c6i.12xlarge'|'ml.c6i.16xlarge'|'ml.c6i.24xlarge'|'ml.c6i.32xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.p4d.24xlarge'|'ml.c7g.large'|'ml.c7g.xlarge'|'ml.c7g.2xlarge'|'ml.c7g.4xlarge'|'ml.c7g.8xlarge'|'ml.c7g.12xlarge'|'ml.c7g.16xlarge'|'ml.m6g.large'|'ml.m6g.xlarge'|'ml.m6g.2xlarge'|'ml.m6g.4xlarge'|'ml.m6g.8xlarge'|'ml.m6g.12xlarge'|'ml.m6g.16xlarge'|'ml.m6gd.large'|'ml.m6gd.xlarge'|'ml.m6gd.2xlarge'|'ml.m6gd.4xlarge'|'ml.m6gd.8xlarge'|'ml.m6gd.12xlarge'|'ml.m6gd.16xlarge'|'ml.c6g.large'|'ml.c6g.xlarge'|'ml.c6g.2xlarge'|'ml.c6g.4xlarge'|'ml.c6g.8xlarge'|'ml.c6g.12xlarge'|'ml.c6g.16xlarge'|'ml.c6gd.large'|'ml.c6gd.xlarge'|'ml.c6gd.2xlarge'|'ml.c6gd.4xlarge'|'ml.c6gd.8xlarge'|'ml.c6gd.12xlarge'|'ml.c6gd.16xlarge'|'ml.c6gn.large'|'ml.c6gn.xlarge'|'ml.c6gn.2xlarge'|'ml.c6gn.4xlarge'|'ml.c6gn.8xlarge'|'ml.c6gn.12xlarge'|'ml.c6gn.16xlarge'|'ml.r6g.large'|'ml.r6g.xlarge'|'ml.r6g.2xlarge'|'ml.r6g.4xlarge'|'ml.r6g.8xlarge'|'ml.r6g.12xlarge'|'ml.r6g.16xlarge'|'ml.r6gd.large'|'ml.r6gd.xlarge'|'ml.r6gd.2xlarge'|'ml.r6gd.4xlarge'|'ml.r6gd.8xlarge'|'ml.r6gd.12xlarge'|'ml.r6gd.16xlarge'|'ml.p4de.24xlarge',
                    'InitialInstanceCount': 123
                },
                'ModelConfiguration': {
                    'InferenceSpecificationName': 'string',
                    'EnvironmentParameters': [
                        {
                            'Key': 'string',
                            'ValueType': 'string',
                            'Value': 'string'
                        },
                    ],
                    'CompilationJobName': 'string'
                },
                'FailureReason': 'string',
                'EndpointMetrics': {
                    'MaxInvocations': 123,
                    'ModelLatency': 123
                }
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- Steps (list) --
  
  A list of all subtask details in Inference Recommender.
  - (dict) --
    
    A returned array object for the Steps response field in the ListInferenceRecommendationsJobSteps API command.
    - StepType (string) --
      
      The type of the subtask.
      
      BENCHMARK: Evaluate the performance of your model on different instance types.
    - JobName (string) --
      
      The name of the Inference Recommender job.
    - Status (string) --
      
      The current status of the benchmark.
    - InferenceBenchmark (dict) --
      
      The details for a specific benchmark.
      - Metrics (dict) --
        
        The metrics of recommendations.
        
        CostPerHour (float) --
        
        Defines the cost per hour for the instance.
        
        CostPerInference (float) --
        
        Defines the cost per inference for the instance .
        
        MaxInvocations (integer) --
        
        The expected maximum number of requests per minute for the instance.
        
        ModelLatency (integer) --
        
        The expected model latency at maximum invocation per minute for the instance.
        
        CpuUtilization (float) --
        
        The expected CPU utilization at maximum invocations per minute for the instance.
        
        NaN indicates that the value is not available.
        
        MemoryUtilization (float) --
        
        The expected memory utilization at maximum invocations per minute for the instance.
        
        NaN indicates that the value is not available.
      - EndpointConfiguration (dict) --
        
        The endpoint configuration made by Inference Recommender during a recommendation job.
        
        EndpointName (string) --
        
        The name of the endpoint made during a recommendation job.
        
        VariantName (string) --
        
        The name of the production variant (deployed model) made during a recommendation job.
        
        InstanceType (string) --
        
        The instance type recommended by Amazon SageMaker Inference Recommender.
        
        InitialInstanceCount (integer) --
        
        The number of instances recommended to launch initially.
      - ModelConfiguration (dict) --
        
        Defines the model configuration. Includes the specification name and environment parameters.
        
        InferenceSpecificationName (string) --
        
        The inference specification name in the model package version.
        
        EnvironmentParameters (list) --
        
        Defines the environment parameters that includes key, value types, and values.
        
        (dict) --
        
        A list of environment parameters suggested by the Amazon SageMaker Inference Recommender.
        
        Key (string) --
        
        The environment key suggested by the Amazon SageMaker Inference Recommender.
        
        ValueType (string) --
        
        The value type suggested by the Amazon SageMaker Inference Recommender.
        
        Value (string) --
        
        The value suggested by the Amazon SageMaker Inference Recommender.
        
        CompilationJobName (string) --
        
        The name of the compilation job used to create the recommended model artifacts.
      - FailureReason (string) --
        
        The reason why a benchmark failed.
      - EndpointMetrics (dict) --
        
        The metrics for an existing endpoint compared in an Inference Recommender job.
        
        MaxInvocations (integer) --
        
        The expected maximum number of requests per minute for the instance.
        
        ModelLatency (integer) --
        
        The expected model latency at maximum invocations per minute for the instance.
- NextToken (string) --
  
  A token that you can specify in your next request to return more results from the list.