Amazon SageMaker Service

2023/08/02 - Amazon SageMaker Service - 1 new api methods

Changes  SageMaker Inference Recommender introduces a new API GetScalingConfigurationRecommendation to recommend auto scaling policies based on completed Inference Recommender jobs.

GetScalingConfigurationRecommendation (new) Link ΒΆ

Starts an Amazon SageMaker Inference Recommender autoscaling recommendation job. Returns recommendations for autoscaling policies that you can apply to your SageMaker endpoint.

See also: AWS API Documentation

Request Syntax

client.get_scaling_configuration_recommendation(
    InferenceRecommendationsJobName='string',
    RecommendationId='string',
    EndpointName='string',
    TargetCpuUtilizationPerCore=123,
    ScalingPolicyObjective={
        'MinInvocationsPerMinute': 123,
        'MaxInvocationsPerMinute': 123
    }
)
type InferenceRecommendationsJobName

string

param InferenceRecommendationsJobName

[REQUIRED]

The name of a previously completed Inference Recommender job.

type RecommendationId

string

param RecommendationId

The recommendation ID of a previously completed inference recommendation. This ID should come from one of the recommendations returned by the job specified in the InferenceRecommendationsJobName field.

Specify either this field or the EndpointName field.

type EndpointName

string

param EndpointName

The name of an endpoint benchmarked during a previously completed inference recommendation job. This name should come from one of the recommendations returned by the job specified in the InferenceRecommendationsJobName field.

Specify either this field or the RecommendationId field.

type TargetCpuUtilizationPerCore

integer

param TargetCpuUtilizationPerCore

The percentage of how much utilization you want an instance to use before autoscaling. The default value is 50%.

type ScalingPolicyObjective

dict

param ScalingPolicyObjective

An object where you specify the anticipated traffic pattern for an endpoint.

  • MinInvocationsPerMinute (integer) --

    The minimum number of expected requests to your endpoint per minute.

  • MaxInvocationsPerMinute (integer) --

    The maximum number of expected requests to your endpoint per minute.

rtype

dict

returns

Response Syntax

{
    'InferenceRecommendationsJobName': 'string',
    'RecommendationId': 'string',
    'EndpointName': 'string',
    'TargetCpuUtilizationPerCore': 123,
    'ScalingPolicyObjective': {
        'MinInvocationsPerMinute': 123,
        'MaxInvocationsPerMinute': 123
    },
    'Metric': {
        'InvocationsPerInstance': 123,
        'ModelLatency': 123
    },
    'DynamicScalingConfiguration': {
        'MinCapacity': 123,
        'MaxCapacity': 123,
        'ScaleInCooldown': 123,
        'ScaleOutCooldown': 123,
        'ScalingPolicies': [
            {
                'TargetTracking': {
                    'MetricSpecification': {
                        'Predefined': {
                            'PredefinedMetricType': 'string'
                        },
                        'Customized': {
                            'MetricName': 'string',
                            'Namespace': 'string',
                            'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum'
                        }
                    },
                    'TargetValue': 123.0
                }
            },
        ]
    }
}

Response Structure

  • (dict) --

    • InferenceRecommendationsJobName (string) --

      The name of a previously completed Inference Recommender job.

    • RecommendationId (string) --

      The recommendation ID of a previously completed inference recommendation.

    • EndpointName (string) --

      The name of an endpoint benchmarked during a previously completed Inference Recommender job.

    • TargetCpuUtilizationPerCore (integer) --

      The percentage of how much utilization you want an instance to use before autoscaling, which you specified in the request. The default value is 50%.

    • ScalingPolicyObjective (dict) --

      An object representing the anticipated traffic pattern for an endpoint that you specified in the request.

      • MinInvocationsPerMinute (integer) --

        The minimum number of expected requests to your endpoint per minute.

      • MaxInvocationsPerMinute (integer) --

        The maximum number of expected requests to your endpoint per minute.

    • Metric (dict) --

      An object with a list of metrics that were benchmarked during the previously completed Inference Recommender job.

      • InvocationsPerInstance (integer) --

        The number of invocations sent to a model, normalized by InstanceCount in each ProductionVariant. 1/numberOfInstances is sent as the value on each request, where numberOfInstances is the number of active instances for the ProductionVariant behind the endpoint at the time of the request.

      • ModelLatency (integer) --

        The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.

    • DynamicScalingConfiguration (dict) --

      An object with the recommended values for you to specify when creating an autoscaling policy.

      • MinCapacity (integer) --

        The recommended minimum capacity to specify for your autoscaling policy.

      • MaxCapacity (integer) --

        The recommended maximum capacity to specify for your autoscaling policy.

      • ScaleInCooldown (integer) --

        The recommended scale in cooldown time for your autoscaling policy.

      • ScaleOutCooldown (integer) --

        The recommended scale out cooldown time for your autoscaling policy.

      • ScalingPolicies (list) --

        An object of the scaling policies for each metric.

        • (dict) --

          An object containing a recommended scaling policy.

          Note

          This is a Tagged Union structure. Only one of the following top level keys will be set: TargetTracking. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

          'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
          • TargetTracking (dict) --

            A target tracking scaling policy. Includes support for predefined or customized metrics.

            • MetricSpecification (dict) --

              An object containing information about a metric.

              Note

              This is a Tagged Union structure. Only one of the following top level keys will be set: Predefined, Customized. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

              'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
              • Predefined (dict) --

                Information about a predefined metric.

                • PredefinedMetricType (string) --

                  The metric type. You can only apply SageMaker metric types to SageMaker endpoints.

              • Customized (dict) --

                Information about a customized metric.

                • MetricName (string) --

                  The name of the customized metric.

                • Namespace (string) --

                  The namespace of the customized metric.

                • Statistic (string) --

                  The statistic of the customized metric.

            • TargetValue (float) --

              The recommended target value to specify for the metric when creating a scaling policy.