Amazon SageMaker Service

2023/03/02 - Amazon SageMaker Service - 1 updated api methods

Changes  Add a new field "EndpointMetrics" in SageMaker Inference Recommender "ListInferenceRecommendationsJobSteps" API response.

ListInferenceRecommendationsJobSteps (updated) Link ΒΆ
Changes (response)
{'Steps': {'InferenceBenchmark': {'EndpointMetrics': {'MaxInvocations': 'integer',
                                                      'ModelLatency': 'integer'}}}}

Returns a list of the subtasks for an Inference Recommender job.

The supported subtasks are benchmarks, which evaluate the performance of your model on different instance types.

See also: AWS API Documentation

Request Syntax

client.list_inference_recommendations_job_steps(
    JobName='string',
    Status='PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
    StepType='BENCHMARK',
    MaxResults=123,
    NextToken='string'
)
type JobName:

string

param JobName:

[REQUIRED]

The name for the Inference Recommender job.

type Status:

string

param Status:

A filter to return benchmarks of a specified status. If this field is left empty, then all benchmarks are returned.

type StepType:

string

param StepType:

A filter to return details about the specified type of subtask.

BENCHMARK: Evaluate the performance of your model on different instance types.

type MaxResults:

integer

param MaxResults:

The maximum number of results to return.

type NextToken:

string

param NextToken:

A token that you can specify to return more results from the list. Specify this field if you have a token that was returned from a previous request.

rtype:

dict

returns:

Response Syntax

{
    'Steps': [
        {
            'StepType': 'BENCHMARK',
            'JobName': 'string',
            'Status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
            'InferenceBenchmark': {
                'Metrics': {
                    'CostPerHour': ...,
                    'CostPerInference': ...,
                    'MaxInvocations': 123,
                    'ModelLatency': 123,
                    'CpuUtilization': ...,
                    'MemoryUtilization': ...
                },
                'EndpointConfiguration': {
                    'EndpointName': 'string',
                    'VariantName': 'string',
                    'InstanceType': 'ml.t2.medium'|'ml.t2.large'|'ml.t2.xlarge'|'ml.t2.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.m5d.large'|'ml.m5d.xlarge'|'ml.m5d.2xlarge'|'ml.m5d.4xlarge'|'ml.m5d.12xlarge'|'ml.m5d.24xlarge'|'ml.c4.large'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.large'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.c5d.large'|'ml.c5d.xlarge'|'ml.c5d.2xlarge'|'ml.c5d.4xlarge'|'ml.c5d.9xlarge'|'ml.c5d.18xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.12xlarge'|'ml.r5.24xlarge'|'ml.r5d.large'|'ml.r5d.xlarge'|'ml.r5d.2xlarge'|'ml.r5d.4xlarge'|'ml.r5d.12xlarge'|'ml.r5d.24xlarge'|'ml.inf1.xlarge'|'ml.inf1.2xlarge'|'ml.inf1.6xlarge'|'ml.inf1.24xlarge'|'ml.c6i.large'|'ml.c6i.xlarge'|'ml.c6i.2xlarge'|'ml.c6i.4xlarge'|'ml.c6i.8xlarge'|'ml.c6i.12xlarge'|'ml.c6i.16xlarge'|'ml.c6i.24xlarge'|'ml.c6i.32xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.p4d.24xlarge'|'ml.c7g.large'|'ml.c7g.xlarge'|'ml.c7g.2xlarge'|'ml.c7g.4xlarge'|'ml.c7g.8xlarge'|'ml.c7g.12xlarge'|'ml.c7g.16xlarge'|'ml.m6g.large'|'ml.m6g.xlarge'|'ml.m6g.2xlarge'|'ml.m6g.4xlarge'|'ml.m6g.8xlarge'|'ml.m6g.12xlarge'|'ml.m6g.16xlarge'|'ml.m6gd.large'|'ml.m6gd.xlarge'|'ml.m6gd.2xlarge'|'ml.m6gd.4xlarge'|'ml.m6gd.8xlarge'|'ml.m6gd.12xlarge'|'ml.m6gd.16xlarge'|'ml.c6g.large'|'ml.c6g.xlarge'|'ml.c6g.2xlarge'|'ml.c6g.4xlarge'|'ml.c6g.8xlarge'|'ml.c6g.12xlarge'|'ml.c6g.16xlarge'|'ml.c6gd.large'|'ml.c6gd.xlarge'|'ml.c6gd.2xlarge'|'ml.c6gd.4xlarge'|'ml.c6gd.8xlarge'|'ml.c6gd.12xlarge'|'ml.c6gd.16xlarge'|'ml.c6gn.large'|'ml.c6gn.xlarge'|'ml.c6gn.2xlarge'|'ml.c6gn.4xlarge'|'ml.c6gn.8xlarge'|'ml.c6gn.12xlarge'|'ml.c6gn.16xlarge'|'ml.r6g.large'|'ml.r6g.xlarge'|'ml.r6g.2xlarge'|'ml.r6g.4xlarge'|'ml.r6g.8xlarge'|'ml.r6g.12xlarge'|'ml.r6g.16xlarge'|'ml.r6gd.large'|'ml.r6gd.xlarge'|'ml.r6gd.2xlarge'|'ml.r6gd.4xlarge'|'ml.r6gd.8xlarge'|'ml.r6gd.12xlarge'|'ml.r6gd.16xlarge'|'ml.p4de.24xlarge',
                    'InitialInstanceCount': 123
                },
                'ModelConfiguration': {
                    'InferenceSpecificationName': 'string',
                    'EnvironmentParameters': [
                        {
                            'Key': 'string',
                            'ValueType': 'string',
                            'Value': 'string'
                        },
                    ],
                    'CompilationJobName': 'string'
                },
                'FailureReason': 'string',
                'EndpointMetrics': {
                    'MaxInvocations': 123,
                    'ModelLatency': 123
                }
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Steps (list) --

      A list of all subtask details in Inference Recommender.

      • (dict) --

        A returned array object for the Steps response field in the ListInferenceRecommendationsJobSteps API command.

        • StepType (string) --

          The type of the subtask.

          BENCHMARK: Evaluate the performance of your model on different instance types.

        • JobName (string) --

          The name of the Inference Recommender job.

        • Status (string) --

          The current status of the benchmark.

        • InferenceBenchmark (dict) --

          The details for a specific benchmark.

          • Metrics (dict) --

            The metrics of recommendations.

            • CostPerHour (float) --

              Defines the cost per hour for the instance.

            • CostPerInference (float) --

              Defines the cost per inference for the instance .

            • MaxInvocations (integer) --

              The expected maximum number of requests per minute for the instance.

            • ModelLatency (integer) --

              The expected model latency at maximum invocation per minute for the instance.

            • CpuUtilization (float) --

              The expected CPU utilization at maximum invocations per minute for the instance.

              NaN indicates that the value is not available.

            • MemoryUtilization (float) --

              The expected memory utilization at maximum invocations per minute for the instance.

              NaN indicates that the value is not available.

          • EndpointConfiguration (dict) --

            The endpoint configuration made by Inference Recommender during a recommendation job.

            • EndpointName (string) --

              The name of the endpoint made during a recommendation job.

            • VariantName (string) --

              The name of the production variant (deployed model) made during a recommendation job.

            • InstanceType (string) --

              The instance type recommended by Amazon SageMaker Inference Recommender.

            • InitialInstanceCount (integer) --

              The number of instances recommended to launch initially.

          • ModelConfiguration (dict) --

            Defines the model configuration. Includes the specification name and environment parameters.

            • InferenceSpecificationName (string) --

              The inference specification name in the model package version.

            • EnvironmentParameters (list) --

              Defines the environment parameters that includes key, value types, and values.

              • (dict) --

                A list of environment parameters suggested by the Amazon SageMaker Inference Recommender.

                • Key (string) --

                  The environment key suggested by the Amazon SageMaker Inference Recommender.

                • ValueType (string) --

                  The value type suggested by the Amazon SageMaker Inference Recommender.

                • Value (string) --

                  The value suggested by the Amazon SageMaker Inference Recommender.

            • CompilationJobName (string) --

              The name of the compilation job used to create the recommended model artifacts.

          • FailureReason (string) --

            The reason why a benchmark failed.

          • EndpointMetrics (dict) --

            The metrics for an existing endpoint compared in an Inference Recommender job.

            • MaxInvocations (integer) --

              The expected maximum number of requests per minute for the instance.

            • ModelLatency (integer) --

              The expected model latency at maximum invocations per minute for the instance.

    • NextToken (string) --

      A token that you can specify in your next request to return more results from the list.