2023/08/02 - Amazon SageMaker Service - 1 new api methods
Changes SageMaker Inference Recommender introduces a new API GetScalingConfigurationRecommendation to recommend auto scaling policies based on completed Inference Recommender jobs.
Starts an Amazon SageMaker Inference Recommender autoscaling recommendation job. Returns recommendations for autoscaling policies that you can apply to your SageMaker endpoint.
See also: AWS API Documentation
Request Syntax
client.get_scaling_configuration_recommendation( InferenceRecommendationsJobName='string', RecommendationId='string', EndpointName='string', TargetCpuUtilizationPerCore=123, ScalingPolicyObjective={ 'MinInvocationsPerMinute': 123, 'MaxInvocationsPerMinute': 123 } )
string
[REQUIRED]
The name of a previously completed Inference Recommender job.
string
The recommendation ID of a previously completed inference recommendation. This ID should come from one of the recommendations returned by the job specified in the InferenceRecommendationsJobName field.
Specify either this field or the EndpointName field.
string
The name of an endpoint benchmarked during a previously completed inference recommendation job. This name should come from one of the recommendations returned by the job specified in the InferenceRecommendationsJobName field.
Specify either this field or the RecommendationId field.
integer
The percentage of how much utilization you want an instance to use before autoscaling. The default value is 50%.
dict
An object where you specify the anticipated traffic pattern for an endpoint.
MinInvocationsPerMinute (integer) --
The minimum number of expected requests to your endpoint per minute.
MaxInvocationsPerMinute (integer) --
The maximum number of expected requests to your endpoint per minute.
dict
Response Syntax
{ 'InferenceRecommendationsJobName': 'string', 'RecommendationId': 'string', 'EndpointName': 'string', 'TargetCpuUtilizationPerCore': 123, 'ScalingPolicyObjective': { 'MinInvocationsPerMinute': 123, 'MaxInvocationsPerMinute': 123 }, 'Metric': { 'InvocationsPerInstance': 123, 'ModelLatency': 123 }, 'DynamicScalingConfiguration': { 'MinCapacity': 123, 'MaxCapacity': 123, 'ScaleInCooldown': 123, 'ScaleOutCooldown': 123, 'ScalingPolicies': [ { 'TargetTracking': { 'MetricSpecification': { 'Predefined': { 'PredefinedMetricType': 'string' }, 'Customized': { 'MetricName': 'string', 'Namespace': 'string', 'Statistic': 'Average'|'Minimum'|'Maximum'|'SampleCount'|'Sum' } }, 'TargetValue': 123.0 } }, ] } }
Response Structure
(dict) --
InferenceRecommendationsJobName (string) --
The name of a previously completed Inference Recommender job.
RecommendationId (string) --
The recommendation ID of a previously completed inference recommendation.
EndpointName (string) --
The name of an endpoint benchmarked during a previously completed Inference Recommender job.
TargetCpuUtilizationPerCore (integer) --
The percentage of how much utilization you want an instance to use before autoscaling, which you specified in the request. The default value is 50%.
ScalingPolicyObjective (dict) --
An object representing the anticipated traffic pattern for an endpoint that you specified in the request.
MinInvocationsPerMinute (integer) --
The minimum number of expected requests to your endpoint per minute.
MaxInvocationsPerMinute (integer) --
The maximum number of expected requests to your endpoint per minute.
Metric (dict) --
An object with a list of metrics that were benchmarked during the previously completed Inference Recommender job.
InvocationsPerInstance (integer) --
The number of invocations sent to a model, normalized by InstanceCount in each ProductionVariant. 1/numberOfInstances is sent as the value on each request, where numberOfInstances is the number of active instances for the ProductionVariant behind the endpoint at the time of the request.
ModelLatency (integer) --
The interval of time taken by a model to respond as viewed from SageMaker. This interval includes the local communication times taken to send the request and to fetch the response from the container of a model and the time taken to complete the inference in the container.
DynamicScalingConfiguration (dict) --
An object with the recommended values for you to specify when creating an autoscaling policy.
MinCapacity (integer) --
The recommended minimum capacity to specify for your autoscaling policy.
MaxCapacity (integer) --
The recommended maximum capacity to specify for your autoscaling policy.
ScaleInCooldown (integer) --
The recommended scale in cooldown time for your autoscaling policy.
ScaleOutCooldown (integer) --
The recommended scale out cooldown time for your autoscaling policy.
ScalingPolicies (list) --
An object of the scaling policies for each metric.
(dict) --
An object containing a recommended scaling policy.
TargetTracking (dict) --
A target tracking scaling policy. Includes support for predefined or customized metrics.
MetricSpecification (dict) --
An object containing information about a metric.
Predefined (dict) --
Information about a predefined metric.
PredefinedMetricType (string) --
The metric type. You can only apply SageMaker metric types to SageMaker endpoints.
Customized (dict) --
Information about a customized metric.
MetricName (string) --
The name of the customized metric.
Namespace (string) --
The namespace of the customized metric.
Statistic (string) --
The statistic of the customized metric.
TargetValue (float) --
The recommended target value to specify for the metric when creating a scaling policy.