AWS API Changes

2023/05/23 - Amazon SageMaker Service - 3 updated api methods

Changes Added ModelNameEquals, ModelPackageVersionArnEquals in request and ModelName, SamplePayloadUrl, ModelPackageVersionArn in response of ListInferenceRecommendationsJobs API. Added Invocation timestamps in response of DescribeInferenceRecommendationsJob API & ListInferenceRecommendationsJobSteps API.

DescribeInferenceRecommendationsJob (updated)

Link ¶
Changes (response)

{'InferenceRecommendations': {'InvocationEndTime': 'timestamp',
                              'InvocationStartTime': 'timestamp'}}

Provides the results of the Inference Recommender job. One or more recommendation jobs are returned.

See also: AWS API Documentation

Request Syntax

client.describe_inference_recommendations_job(
    JobName='string'
)

type JobName:

string

param JobName:

[REQUIRED]

The name of the job. The name must be unique within an Amazon Web Services Region in the Amazon Web Services account.

rtype:

dict

returns:

Response Syntax

{
    'JobName': 'string',
    'JobDescription': 'string',
    'JobType': 'Default'|'Advanced',
    'JobArn': 'string',
    'RoleArn': 'string',
    'Status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
    'CreationTime': datetime(2015, 1, 1),
    'CompletionTime': datetime(2015, 1, 1),
    'LastModifiedTime': datetime(2015, 1, 1),
    'FailureReason': 'string',
    'InputConfig': {
        'ModelPackageVersionArn': 'string',
        'JobDurationInSeconds': 123,
        'TrafficPattern': {
            'TrafficType': 'PHASES',
            'Phases': [
                {
                    'InitialNumberOfUsers': 123,
                    'SpawnRate': 123,
                    'DurationInSeconds': 123
                },
            ]
        },
        'ResourceLimit': {
            'MaxNumberOfTests': 123,
            'MaxParallelOfTests': 123
        },
        'EndpointConfigurations': [
            {
                'InstanceType': 'ml.t2.medium'|'ml.t2.large'|'ml.t2.xlarge'|'ml.t2.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.m5d.large'|'ml.m5d.xlarge'|'ml.m5d.2xlarge'|'ml.m5d.4xlarge'|'ml.m5d.12xlarge'|'ml.m5d.24xlarge'|'ml.c4.large'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.large'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.c5d.large'|'ml.c5d.xlarge'|'ml.c5d.2xlarge'|'ml.c5d.4xlarge'|'ml.c5d.9xlarge'|'ml.c5d.18xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.12xlarge'|'ml.r5.24xlarge'|'ml.r5d.large'|'ml.r5d.xlarge'|'ml.r5d.2xlarge'|'ml.r5d.4xlarge'|'ml.r5d.12xlarge'|'ml.r5d.24xlarge'|'ml.inf1.xlarge'|'ml.inf1.2xlarge'|'ml.inf1.6xlarge'|'ml.inf1.24xlarge'|'ml.c6i.large'|'ml.c6i.xlarge'|'ml.c6i.2xlarge'|'ml.c6i.4xlarge'|'ml.c6i.8xlarge'|'ml.c6i.12xlarge'|'ml.c6i.16xlarge'|'ml.c6i.24xlarge'|'ml.c6i.32xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.p4d.24xlarge'|'ml.c7g.large'|'ml.c7g.xlarge'|'ml.c7g.2xlarge'|'ml.c7g.4xlarge'|'ml.c7g.8xlarge'|'ml.c7g.12xlarge'|'ml.c7g.16xlarge'|'ml.m6g.large'|'ml.m6g.xlarge'|'ml.m6g.2xlarge'|'ml.m6g.4xlarge'|'ml.m6g.8xlarge'|'ml.m6g.12xlarge'|'ml.m6g.16xlarge'|'ml.m6gd.large'|'ml.m6gd.xlarge'|'ml.m6gd.2xlarge'|'ml.m6gd.4xlarge'|'ml.m6gd.8xlarge'|'ml.m6gd.12xlarge'|'ml.m6gd.16xlarge'|'ml.c6g.large'|'ml.c6g.xlarge'|'ml.c6g.2xlarge'|'ml.c6g.4xlarge'|'ml.c6g.8xlarge'|'ml.c6g.12xlarge'|'ml.c6g.16xlarge'|'ml.c6gd.large'|'ml.c6gd.xlarge'|'ml.c6gd.2xlarge'|'ml.c6gd.4xlarge'|'ml.c6gd.8xlarge'|'ml.c6gd.12xlarge'|'ml.c6gd.16xlarge'|'ml.c6gn.large'|'ml.c6gn.xlarge'|'ml.c6gn.2xlarge'|'ml.c6gn.4xlarge'|'ml.c6gn.8xlarge'|'ml.c6gn.12xlarge'|'ml.c6gn.16xlarge'|'ml.r6g.large'|'ml.r6g.xlarge'|'ml.r6g.2xlarge'|'ml.r6g.4xlarge'|'ml.r6g.8xlarge'|'ml.r6g.12xlarge'|'ml.r6g.16xlarge'|'ml.r6gd.large'|'ml.r6gd.xlarge'|'ml.r6gd.2xlarge'|'ml.r6gd.4xlarge'|'ml.r6gd.8xlarge'|'ml.r6gd.12xlarge'|'ml.r6gd.16xlarge'|'ml.p4de.24xlarge'|'ml.trn1.2xlarge'|'ml.trn1.32xlarge'|'ml.inf2.xlarge'|'ml.inf2.8xlarge'|'ml.inf2.24xlarge'|'ml.inf2.48xlarge',
                'InferenceSpecificationName': 'string',
                'EnvironmentParameterRanges': {
                    'CategoricalParameterRanges': [
                        {
                            'Name': 'string',
                            'Value': [
                                'string',
                            ]
                        },
                    ]
                }
            },
        ],
        'VolumeKmsKeyId': 'string',
        'ContainerConfig': {
            'Domain': 'string',
            'Task': 'string',
            'Framework': 'string',
            'FrameworkVersion': 'string',
            'PayloadConfig': {
                'SamplePayloadUrl': 'string',
                'SupportedContentTypes': [
                    'string',
                ]
            },
            'NearestModelName': 'string',
            'SupportedInstanceTypes': [
                'string',
            ],
            'DataInputConfig': 'string'
        },
        'Endpoints': [
            {
                'EndpointName': 'string'
            },
        ],
        'VpcConfig': {
            'SecurityGroupIds': [
                'string',
            ],
            'Subnets': [
                'string',
            ]
        },
        'ModelName': 'string'
    },
    'StoppingConditions': {
        'MaxInvocations': 123,
        'ModelLatencyThresholds': [
            {
                'Percentile': 'string',
                'ValueInMilliseconds': 123
            },
        ]
    },
    'InferenceRecommendations': [
        {
            'Metrics': {
                'CostPerHour': ...,
                'CostPerInference': ...,
                'MaxInvocations': 123,
                'ModelLatency': 123,
                'CpuUtilization': ...,
                'MemoryUtilization': ...
            },
            'EndpointConfiguration': {
                'EndpointName': 'string',
                'VariantName': 'string',
                'InstanceType': 'ml.t2.medium'|'ml.t2.large'|'ml.t2.xlarge'|'ml.t2.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.m5d.large'|'ml.m5d.xlarge'|'ml.m5d.2xlarge'|'ml.m5d.4xlarge'|'ml.m5d.12xlarge'|'ml.m5d.24xlarge'|'ml.c4.large'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.large'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.c5d.large'|'ml.c5d.xlarge'|'ml.c5d.2xlarge'|'ml.c5d.4xlarge'|'ml.c5d.9xlarge'|'ml.c5d.18xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.12xlarge'|'ml.r5.24xlarge'|'ml.r5d.large'|'ml.r5d.xlarge'|'ml.r5d.2xlarge'|'ml.r5d.4xlarge'|'ml.r5d.12xlarge'|'ml.r5d.24xlarge'|'ml.inf1.xlarge'|'ml.inf1.2xlarge'|'ml.inf1.6xlarge'|'ml.inf1.24xlarge'|'ml.c6i.large'|'ml.c6i.xlarge'|'ml.c6i.2xlarge'|'ml.c6i.4xlarge'|'ml.c6i.8xlarge'|'ml.c6i.12xlarge'|'ml.c6i.16xlarge'|'ml.c6i.24xlarge'|'ml.c6i.32xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.p4d.24xlarge'|'ml.c7g.large'|'ml.c7g.xlarge'|'ml.c7g.2xlarge'|'ml.c7g.4xlarge'|'ml.c7g.8xlarge'|'ml.c7g.12xlarge'|'ml.c7g.16xlarge'|'ml.m6g.large'|'ml.m6g.xlarge'|'ml.m6g.2xlarge'|'ml.m6g.4xlarge'|'ml.m6g.8xlarge'|'ml.m6g.12xlarge'|'ml.m6g.16xlarge'|'ml.m6gd.large'|'ml.m6gd.xlarge'|'ml.m6gd.2xlarge'|'ml.m6gd.4xlarge'|'ml.m6gd.8xlarge'|'ml.m6gd.12xlarge'|'ml.m6gd.16xlarge'|'ml.c6g.large'|'ml.c6g.xlarge'|'ml.c6g.2xlarge'|'ml.c6g.4xlarge'|'ml.c6g.8xlarge'|'ml.c6g.12xlarge'|'ml.c6g.16xlarge'|'ml.c6gd.large'|'ml.c6gd.xlarge'|'ml.c6gd.2xlarge'|'ml.c6gd.4xlarge'|'ml.c6gd.8xlarge'|'ml.c6gd.12xlarge'|'ml.c6gd.16xlarge'|'ml.c6gn.large'|'ml.c6gn.xlarge'|'ml.c6gn.2xlarge'|'ml.c6gn.4xlarge'|'ml.c6gn.8xlarge'|'ml.c6gn.12xlarge'|'ml.c6gn.16xlarge'|'ml.r6g.large'|'ml.r6g.xlarge'|'ml.r6g.2xlarge'|'ml.r6g.4xlarge'|'ml.r6g.8xlarge'|'ml.r6g.12xlarge'|'ml.r6g.16xlarge'|'ml.r6gd.large'|'ml.r6gd.xlarge'|'ml.r6gd.2xlarge'|'ml.r6gd.4xlarge'|'ml.r6gd.8xlarge'|'ml.r6gd.12xlarge'|'ml.r6gd.16xlarge'|'ml.p4de.24xlarge'|'ml.trn1.2xlarge'|'ml.trn1.32xlarge'|'ml.inf2.xlarge'|'ml.inf2.8xlarge'|'ml.inf2.24xlarge'|'ml.inf2.48xlarge',
                'InitialInstanceCount': 123
            },
            'ModelConfiguration': {
                'InferenceSpecificationName': 'string',
                'EnvironmentParameters': [
                    {
                        'Key': 'string',
                        'ValueType': 'string',
                        'Value': 'string'
                    },
                ],
                'CompilationJobName': 'string'
            },
            'RecommendationId': 'string',
            'InvocationEndTime': datetime(2015, 1, 1),
            'InvocationStartTime': datetime(2015, 1, 1)
        },
    ],
    'EndpointPerformances': [
        {
            'Metrics': {
                'MaxInvocations': 123,
                'ModelLatency': 123
            },
            'EndpointInfo': {
                'EndpointName': 'string'
            }
        },
    ]
}

Response Structure

(dict) --
- JobName (string) --
  
  The name of the job. The name must be unique within an Amazon Web Services Region in the Amazon Web Services account.
- JobDescription (string) --
  
  The job description that you provided when you initiated the job.
- JobType (string) --
  
  The job type that you provided when you initiated the job.
- JobArn (string) --
  
  The Amazon Resource Name (ARN) of the job.
- RoleArn (string) --
  
  The Amazon Resource Name (ARN) of the Amazon Web Services Identity and Access Management (IAM) role you provided when you initiated the job.
- Status (string) --
  
  The status of the job.
- CreationTime (datetime) --
  
  A timestamp that shows when the job was created.
- CompletionTime (datetime) --
  
  A timestamp that shows when the job completed.
- LastModifiedTime (datetime) --
  
  A timestamp that shows when the job was last modified.
- FailureReason (string) --
  
  If the job fails, provides information why the job failed.
- InputConfig (dict) --
  
  Returns information about the versioned model package Amazon Resource Name (ARN), the traffic pattern, and endpoint configurations you provided when you initiated the job.
  - ModelPackageVersionArn (string) --
    
    The Amazon Resource Name (ARN) of a versioned model package.
  - JobDurationInSeconds (integer) --
    
    Specifies the maximum duration of the job, in seconds.>
  - TrafficPattern (dict) --
    
    Specifies the traffic pattern of the job.
    - TrafficType (string) --
      
      Defines the traffic patterns.
    - Phases (list) --
      
      Defines the phases traffic specification.
      - (dict) --
        
        Defines the traffic pattern.
        
        InitialNumberOfUsers (integer) --
        
        Specifies how many concurrent users to start with.
        
        SpawnRate (integer) --
        
        Specified how many new users to spawn in a minute.
        
        DurationInSeconds (integer) --
        
        Specifies how long traffic phase should be.
  - ResourceLimit (dict) --
    
    Defines the resource limit of the job.
    - MaxNumberOfTests (integer) --
      
      Defines the maximum number of load tests.
    - MaxParallelOfTests (integer) --
      
      Defines the maximum number of parallel load tests.
  - EndpointConfigurations (list) --
    
    Specifies the endpoint configuration to use for a job.
    - (dict) --
      
      The endpoint configuration for the load test.
      - InstanceType (string) --
        
        The instance types to use for the load test.
      - InferenceSpecificationName (string) --
        
        The inference specification name in the model package version.
      - EnvironmentParameterRanges (dict) --
        
        The parameter you want to benchmark against.
        
        CategoricalParameterRanges (list) --
        
        Specified a list of parameters for each category.
        
        (dict) --
        
        Environment parameters you want to benchmark your load test against.
        
        Name (string) --
        
        The Name of the environment variable.
        
        Value (list) --
        
        The list of values you can pass.
        
        (string) --
  - VolumeKmsKeyId (string) --
    
    The Amazon Resource Name (ARN) of a Amazon Web Services Key Management Service (Amazon Web Services KMS) key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance that hosts the endpoint. This key will be passed to SageMaker Hosting for endpoint creation.
    
    The SageMaker execution role must have kms:CreateGrant permission in order to encrypt data on the storage volume of the endpoints created for inference recommendation. The inference recommendation job will fail asynchronously during endpoint configuration creation if the role passed does not have kms:CreateGrant permission.
    
    The KmsKeyId can be any of the following formats:
    - // KMS Key ID "1234abcd-12ab-34cd-56ef-1234567890ab"
    - // Amazon Resource Name (ARN) of a KMS Key "arn:aws:kms:<region>:<account>:key/<key-id-12ab-34cd-56ef-1234567890ab>"
    - // KMS Key Alias "alias/ExampleAlias"
    - // Amazon Resource Name (ARN) of a KMS Key Alias "arn:aws:kms:<region>:<account>:alias/<ExampleAlias>"
    For more information about key identifiers, see Key identifiers (KeyID) in the Amazon Web Services Key Management Service (Amazon Web Services KMS) documentation.
  - ContainerConfig (dict) --
    
    Specifies mandatory fields for running an Inference Recommender job. The fields specified in ContainerConfig override the corresponding fields in the model package.
    - Domain (string) --
      
      The machine learning domain of the model and its components.
      
      Valid Values: COMPUTER_VISION | NATURAL_LANGUAGE_PROCESSING | MACHINE_LEARNING
    - Task (string) --
      
      The machine learning task that the model accomplishes.
      
      Valid Values: IMAGE_CLASSIFICATION | OBJECT_DETECTION | TEXT_GENERATION | IMAGE_SEGMENTATION | FILL_MASK | CLASSIFICATION | REGRESSION | OTHER
    - Framework (string) --
      
      The machine learning framework of the container image.
      
      Valid Values: TENSORFLOW | PYTORCH | XGBOOST | SAGEMAKER-SCIKIT-LEARN
    - FrameworkVersion (string) --
      
      The framework version of the container image.
    - PayloadConfig (dict) --
      
      Specifies the SamplePayloadUrl and all other sample payload-related fields.
      - SamplePayloadUrl (string) --
        
        The Amazon Simple Storage Service (Amazon S3) path where the sample payload is stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).
      - SupportedContentTypes (list) --
        
        The supported MIME types for the input data.
        
        (string) --
    - NearestModelName (string) --
      
      The name of a pre-trained machine learning model benchmarked by Amazon SageMaker Inference Recommender that matches your model.
      
      Valid Values: efficientnetb7 | unet | xgboost | faster-rcnn-resnet101 | nasnetlarge | vgg16 | inception-v3 | mask-rcnn | sagemaker-scikit-learn | densenet201-gluon | resnet18v2-gluon | xception | densenet201 | yolov4 | resnet152 | bert-base-cased | xceptionV1-keras | resnet50 | retinanet
    - SupportedInstanceTypes (list) --
      
      A list of the instance types that are used to generate inferences in real-time.
      - (string) --
    - DataInputConfig (string) --
      
      Specifies the name and shape of the expected data inputs for your trained model with a JSON dictionary form. This field is used for optimizing your model using SageMaker Neo. For more information, see DataInputConfig.
  - Endpoints (list) --
    
    Existing customer endpoints on which to run an Inference Recommender job.
    - (dict) --
      
      Details about a customer endpoint that was compared in an Inference Recommender job.
      - EndpointName (string) --
        
        The name of a customer's endpoint.
  - VpcConfig (dict) --
    
    Inference Recommender provisions SageMaker endpoints with access to VPC in the inference recommendation job.
    - SecurityGroupIds (list) --
      
      The VPC security group IDs. IDs have the form of sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.
      - (string) --
    - Subnets (list) --
      
      The ID of the subnets in the VPC to which you want to connect your model.
      - (string) --
  - ModelName (string) --
    
    The name of the created model.
- StoppingConditions (dict) --
  
  The stopping conditions that you provided when you initiated the job.
  - MaxInvocations (integer) --
    
    The maximum number of requests per minute expected for the endpoint.
  - ModelLatencyThresholds (list) --
    
    The interval of time taken by a model to respond as viewed from SageMaker. The interval includes the local communication time taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.
    - (dict) --
      
      The model latency threshold.
      - Percentile (string) --
        
        The model latency percentile threshold.
      - ValueInMilliseconds (integer) --
        
        The model latency percentile value in milliseconds.
- InferenceRecommendations (list) --
  
  The recommendations made by Inference Recommender.
  - (dict) --
    
    A list of recommendations made by Amazon SageMaker Inference Recommender.
    - Metrics (dict) --
      
      The metrics used to decide what recommendation to make.
      - CostPerHour (float) --
        
        Defines the cost per hour for the instance.
      - CostPerInference (float) --
        
        Defines the cost per inference for the instance .
      - MaxInvocations (integer) --
        
        The expected maximum number of requests per minute for the instance.
      - ModelLatency (integer) --
        
        The expected model latency at maximum invocation per minute for the instance.
      - CpuUtilization (float) --
        
        The expected CPU utilization at maximum invocations per minute for the instance.
        
        NaN indicates that the value is not available.
      - MemoryUtilization (float) --
        
        The expected memory utilization at maximum invocations per minute for the instance.
        
        NaN indicates that the value is not available.
    - EndpointConfiguration (dict) --
      
      Defines the endpoint configuration parameters.
      - EndpointName (string) --
        
        The name of the endpoint made during a recommendation job.
      - VariantName (string) --
        
        The name of the production variant (deployed model) made during a recommendation job.
      - InstanceType (string) --
        
        The instance type recommended by Amazon SageMaker Inference Recommender.
      - InitialInstanceCount (integer) --
        
        The number of instances recommended to launch initially.
    - ModelConfiguration (dict) --
      
      Defines the model configuration.
      - InferenceSpecificationName (string) --
        
        The inference specification name in the model package version.
      - EnvironmentParameters (list) --
        
        Defines the environment parameters that includes key, value types, and values.
        
        (dict) --
        
        A list of environment parameters suggested by the Amazon SageMaker Inference Recommender.
        
        Key (string) --
        
        The environment key suggested by the Amazon SageMaker Inference Recommender.
        
        ValueType (string) --
        
        The value type suggested by the Amazon SageMaker Inference Recommender.
        
        Value (string) --
        
        The value suggested by the Amazon SageMaker Inference Recommender.
      - CompilationJobName (string) --
        
        The name of the compilation job used to create the recommended model artifacts.
    - RecommendationId (string) --
      
      The recommendation ID which uniquely identifies each recommendation.
    - InvocationEndTime (datetime) --
      
      A timestamp that shows when the benchmark completed.
    - InvocationStartTime (datetime) --
      
      A timestamp that shows when the benchmark started.
- EndpointPerformances (list) --
  
  The performance results from running an Inference Recommender job on an existing endpoint.
  - (dict) --
    
    The performance results from running an Inference Recommender job on an existing endpoint.
    - Metrics (dict) --
      
      The metrics for an existing endpoint.
      - MaxInvocations (integer) --
        
        The expected maximum number of requests per minute for the instance.
      - ModelLatency (integer) --
        
        The expected model latency at maximum invocations per minute for the instance.
    - EndpointInfo (dict) --
      
      Details about a customer endpoint that was compared in an Inference Recommender job.
      - EndpointName (string) --
        
        The name of a customer's endpoint.

ListInferenceRecommendationsJobSteps (updated)

Link ¶
Changes (response)

{'Steps': {'InferenceBenchmark': {'InvocationEndTime': 'timestamp',
                                  'InvocationStartTime': 'timestamp'}}}

Returns a list of the subtasks for an Inference Recommender job.

The supported subtasks are benchmarks, which evaluate the performance of your model on different instance types.

See also: AWS API Documentation

Request Syntax

client.list_inference_recommendations_job_steps(
    JobName='string',
    Status='PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
    StepType='BENCHMARK',
    MaxResults=123,
    NextToken='string'
)

type JobName:

string

param JobName:

[REQUIRED]

The name for the Inference Recommender job.

type Status:

string

param Status:

A filter to return benchmarks of a specified status. If this field is left empty, then all benchmarks are returned.

type StepType:

string

param StepType:

A filter to return details about the specified type of subtask.

BENCHMARK: Evaluate the performance of your model on different instance types.

type MaxResults:

integer

param MaxResults:

The maximum number of results to return.

type NextToken:

string

param NextToken:

A token that you can specify to return more results from the list. Specify this field if you have a token that was returned from a previous request.

rtype:

dict

returns:

Response Syntax

{
    'Steps': [
        {
            'StepType': 'BENCHMARK',
            'JobName': 'string',
            'Status': 'PENDING'|'IN_PROGRESS'|'COMPLETED'|'FAILED'|'STOPPING'|'STOPPED',
            'InferenceBenchmark': {
                'Metrics': {
                    'CostPerHour': ...,
                    'CostPerInference': ...,
                    'MaxInvocations': 123,
                    'ModelLatency': 123,
                    'CpuUtilization': ...,
                    'MemoryUtilization': ...
                },
                'EndpointConfiguration': {
                    'EndpointName': 'string',
                    'VariantName': 'string',
                    'InstanceType': 'ml.t2.medium'|'ml.t2.large'|'ml.t2.xlarge'|'ml.t2.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.m5d.large'|'ml.m5d.xlarge'|'ml.m5d.2xlarge'|'ml.m5d.4xlarge'|'ml.m5d.12xlarge'|'ml.m5d.24xlarge'|'ml.c4.large'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.large'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.c5d.large'|'ml.c5d.xlarge'|'ml.c5d.2xlarge'|'ml.c5d.4xlarge'|'ml.c5d.9xlarge'|'ml.c5d.18xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.12xlarge'|'ml.r5.24xlarge'|'ml.r5d.large'|'ml.r5d.xlarge'|'ml.r5d.2xlarge'|'ml.r5d.4xlarge'|'ml.r5d.12xlarge'|'ml.r5d.24xlarge'|'ml.inf1.xlarge'|'ml.inf1.2xlarge'|'ml.inf1.6xlarge'|'ml.inf1.24xlarge'|'ml.c6i.large'|'ml.c6i.xlarge'|'ml.c6i.2xlarge'|'ml.c6i.4xlarge'|'ml.c6i.8xlarge'|'ml.c6i.12xlarge'|'ml.c6i.16xlarge'|'ml.c6i.24xlarge'|'ml.c6i.32xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.p4d.24xlarge'|'ml.c7g.large'|'ml.c7g.xlarge'|'ml.c7g.2xlarge'|'ml.c7g.4xlarge'|'ml.c7g.8xlarge'|'ml.c7g.12xlarge'|'ml.c7g.16xlarge'|'ml.m6g.large'|'ml.m6g.xlarge'|'ml.m6g.2xlarge'|'ml.m6g.4xlarge'|'ml.m6g.8xlarge'|'ml.m6g.12xlarge'|'ml.m6g.16xlarge'|'ml.m6gd.large'|'ml.m6gd.xlarge'|'ml.m6gd.2xlarge'|'ml.m6gd.4xlarge'|'ml.m6gd.8xlarge'|'ml.m6gd.12xlarge'|'ml.m6gd.16xlarge'|'ml.c6g.large'|'ml.c6g.xlarge'|'ml.c6g.2xlarge'|'ml.c6g.4xlarge'|'ml.c6g.8xlarge'|'ml.c6g.12xlarge'|'ml.c6g.16xlarge'|'ml.c6gd.large'|'ml.c6gd.xlarge'|'ml.c6gd.2xlarge'|'ml.c6gd.4xlarge'|'ml.c6gd.8xlarge'|'ml.c6gd.12xlarge'|'ml.c6gd.16xlarge'|'ml.c6gn.large'|'ml.c6gn.xlarge'|'ml.c6gn.2xlarge'|'ml.c6gn.4xlarge'|'ml.c6gn.8xlarge'|'ml.c6gn.12xlarge'|'ml.c6gn.16xlarge'|'ml.r6g.large'|'ml.r6g.xlarge'|'ml.r6g.2xlarge'|'ml.r6g.4xlarge'|'ml.r6g.8xlarge'|'ml.r6g.12xlarge'|'ml.r6g.16xlarge'|'ml.r6gd.large'|'ml.r6gd.xlarge'|'ml.r6gd.2xlarge'|'ml.r6gd.4xlarge'|'ml.r6gd.8xlarge'|'ml.r6gd.12xlarge'|'ml.r6gd.16xlarge'|'ml.p4de.24xlarge'|'ml.trn1.2xlarge'|'ml.trn1.32xlarge'|'ml.inf2.xlarge'|'ml.inf2.8xlarge'|'ml.inf2.24xlarge'|'ml.inf2.48xlarge',
                    'InitialInstanceCount': 123
                },
                'ModelConfiguration': {
                    'InferenceSpecificationName': 'string',
                    'EnvironmentParameters': [
                        {
                            'Key': 'string',
                            'ValueType': 'string',
                            'Value': 'string'
                        },
                    ],
                    'CompilationJobName': 'string'
                },
                'FailureReason': 'string',
                'EndpointMetrics': {
                    'MaxInvocations': 123,
                    'ModelLatency': 123
                },
                'InvocationEndTime': datetime(2015, 1, 1),
                'InvocationStartTime': datetime(2015, 1, 1)
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- Steps (list) --
  
  A list of all subtask details in Inference Recommender.
  - (dict) --
    
    A returned array object for the Steps response field in the ListInferenceRecommendationsJobSteps API command.
    - StepType (string) --
      
      The type of the subtask.
      
      BENCHMARK: Evaluate the performance of your model on different instance types.
    - JobName (string) --
      
      The name of the Inference Recommender job.
    - Status (string) --
      
      The current status of the benchmark.
    - InferenceBenchmark (dict) --
      
      The details for a specific benchmark.
      - Metrics (dict) --
        
        The metrics of recommendations.
        
        CostPerHour (float) --
        
        Defines the cost per hour for the instance.
        
        CostPerInference (float) --
        
        Defines the cost per inference for the instance .
        
        MaxInvocations (integer) --
        
        The expected maximum number of requests per minute for the instance.
        
        ModelLatency (integer) --
        
        The expected model latency at maximum invocation per minute for the instance.
        
        CpuUtilization (float) --
        
        The expected CPU utilization at maximum invocations per minute for the instance.
        
        NaN indicates that the value is not available.
        
        MemoryUtilization (float) --
        
        The expected memory utilization at maximum invocations per minute for the instance.
        
        NaN indicates that the value is not available.
      - EndpointConfiguration (dict) --
        
        The endpoint configuration made by Inference Recommender during a recommendation job.
        
        EndpointName (string) --
        
        The name of the endpoint made during a recommendation job.
        
        VariantName (string) --
        
        The name of the production variant (deployed model) made during a recommendation job.
        
        InstanceType (string) --
        
        The instance type recommended by Amazon SageMaker Inference Recommender.
        
        InitialInstanceCount (integer) --
        
        The number of instances recommended to launch initially.
      - ModelConfiguration (dict) --
        
        Defines the model configuration. Includes the specification name and environment parameters.
        
        InferenceSpecificationName (string) --
        
        The inference specification name in the model package version.
        
        EnvironmentParameters (list) --
        
        Defines the environment parameters that includes key, value types, and values.
        
        (dict) --
        
        A list of environment parameters suggested by the Amazon SageMaker Inference Recommender.
        
        Key (string) --
        
        The environment key suggested by the Amazon SageMaker Inference Recommender.
        
        ValueType (string) --
        
        The value type suggested by the Amazon SageMaker Inference Recommender.
        
        Value (string) --
        
        The value suggested by the Amazon SageMaker Inference Recommender.
        
        CompilationJobName (string) --
        
        The name of the compilation job used to create the recommended model artifacts.
      - FailureReason (string) --
        
        The reason why a benchmark failed.
      - EndpointMetrics (dict) --
        
        The metrics for an existing endpoint compared in an Inference Recommender job.
        
        MaxInvocations (integer) --
        
        The expected maximum number of requests per minute for the instance.
        
        ModelLatency (integer) --
        
        The expected model latency at maximum invocations per minute for the instance.
      - InvocationEndTime (datetime) --
        
        A timestamp that shows when the benchmark completed.
      - InvocationStartTime (datetime) --
        
        A timestamp that shows when the benchmark started.
- NextToken (string) --
  
  A token that you can specify in your next request to return more results from the list.

ListInferenceRecommendationsJobs (updated)

Link ¶
Changes (request, response)
Request

{'ModelNameEquals': 'string', 'ModelPackageVersionArnEquals': 'string'}

Response

{'InferenceRecommendationsJobs': {'ModelName': 'string',
                                  'ModelPackageVersionArn': 'string',
                                  'SamplePayloadUrl': 'string'}}

Lists recommendation jobs that satisfy various filters.