2025/03/21 - Amazon SageMaker Service - 3 updated api methods
Changes This release does the following: 1.) Adds DurationHours as a required field to the SearchTrainingPlanOfferings action in the SageMaker AI API; 2.) Adds support for G6e instance types for SageMaker AI inference optimization jobs.
{'DeploymentInstanceType': {'ml.g6e.12xlarge', 'ml.g6e.16xlarge', 'ml.g6e.24xlarge', 'ml.g6e.2xlarge', 'ml.g6e.48xlarge', 'ml.g6e.4xlarge', 'ml.g6e.8xlarge', 'ml.g6e.xlarge'}}
Creates a job that optimizes a model for inference performance. To create the job, you provide the location of a source model, and you provide the settings for the optimization techniques that you want the job to apply. When the job completes successfully, SageMaker uploads the new optimized model to the output destination that you specify.
For more information about how to use this action, and about the supported optimization techniques, see Optimize model inference with Amazon SageMaker.
See also: AWS API Documentation
Request Syntax
client.create_optimization_job( OptimizationJobName='string', RoleArn='string', ModelSource={ 'S3': { 'S3Uri': 'string', 'ModelAccessConfig': { 'AcceptEula': True|False } } }, DeploymentInstanceType='ml.p4d.24xlarge'|'ml.p4de.24xlarge'|'ml.p5.48xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.g6.xlarge'|'ml.g6.2xlarge'|'ml.g6.4xlarge'|'ml.g6.8xlarge'|'ml.g6.12xlarge'|'ml.g6.16xlarge'|'ml.g6.24xlarge'|'ml.g6.48xlarge'|'ml.g6e.xlarge'|'ml.g6e.2xlarge'|'ml.g6e.4xlarge'|'ml.g6e.8xlarge'|'ml.g6e.12xlarge'|'ml.g6e.16xlarge'|'ml.g6e.24xlarge'|'ml.g6e.48xlarge'|'ml.inf2.xlarge'|'ml.inf2.8xlarge'|'ml.inf2.24xlarge'|'ml.inf2.48xlarge'|'ml.trn1.2xlarge'|'ml.trn1.32xlarge'|'ml.trn1n.32xlarge', OptimizationEnvironment={ 'string': 'string' }, OptimizationConfigs=[ { 'ModelQuantizationConfig': { 'Image': 'string', 'OverrideEnvironment': { 'string': 'string' } }, 'ModelCompilationConfig': { 'Image': 'string', 'OverrideEnvironment': { 'string': 'string' } }, 'ModelShardingConfig': { 'Image': 'string', 'OverrideEnvironment': { 'string': 'string' } } }, ], OutputConfig={ 'KmsKeyId': 'string', 'S3OutputLocation': 'string' }, StoppingCondition={ 'MaxRuntimeInSeconds': 123, 'MaxWaitTimeInSeconds': 123, 'MaxPendingTimeInSeconds': 123 }, Tags=[ { 'Key': 'string', 'Value': 'string' }, ], VpcConfig={ 'SecurityGroupIds': [ 'string', ], 'Subnets': [ 'string', ] } )
string
[REQUIRED]
A custom name for the new optimization job.
string
[REQUIRED]
The Amazon Resource Name (ARN) of an IAM role that enables Amazon SageMaker AI to perform tasks on your behalf.
During model optimization, Amazon SageMaker AI needs your permission to:
Read input data from an S3 bucket
Write model artifacts to an S3 bucket
Write logs to Amazon CloudWatch Logs
Publish metrics to Amazon CloudWatch
You grant permissions for all of these tasks to an IAM role. To pass this role to Amazon SageMaker AI, the caller of this API must have the iam:PassRole permission. For more information, see Amazon SageMaker AI Roles.
dict
[REQUIRED]
The location of the source model to optimize with an optimization job.
S3 (dict) --
The Amazon S3 location of a source model to optimize with an optimization job.
S3Uri (string) --
An Amazon S3 URI that locates a source model to optimize with an optimization job.
ModelAccessConfig (dict) --
The access configuration settings for the source ML model for an optimization job, where you can accept the model end-user license agreement (EULA).
AcceptEula (boolean) -- [REQUIRED]
Specifies agreement to the model end-user license agreement (EULA). The AcceptEula value must be explicitly defined as True in order to accept the EULA that this model requires. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using a model.
string
[REQUIRED]
The type of instance that hosts the optimized model that you create with the optimization job.
dict
The environment variables to set in the model container.
(string) --
(string) --
list
[REQUIRED]
Settings for each of the optimization techniques that the job applies.
(dict) --
Settings for an optimization technique that you apply with a model optimization job.
ModelQuantizationConfig (dict) --
Settings for the model quantization technique that's applied by a model optimization job.
Image (string) --
The URI of an LMI DLC in Amazon ECR. SageMaker uses this image to run the optimization.
OverrideEnvironment (dict) --
Environment variables that override the default ones in the model container.
(string) --
(string) --
ModelCompilationConfig (dict) --
Settings for the model compilation technique that's applied by a model optimization job.
Image (string) --
The URI of an LMI DLC in Amazon ECR. SageMaker uses this image to run the optimization.
OverrideEnvironment (dict) --
Environment variables that override the default ones in the model container.
(string) --
(string) --
ModelShardingConfig (dict) --
Settings for the model sharding technique that's applied by a model optimization job.
Image (string) --
The URI of an LMI DLC in Amazon ECR. SageMaker uses this image to run the optimization.
OverrideEnvironment (dict) --
Environment variables that override the default ones in the model container.
(string) --
(string) --
dict
[REQUIRED]
Details for where to store the optimized model that you create with the optimization job.
KmsKeyId (string) --
The Amazon Resource Name (ARN) of a key in Amazon Web Services KMS. SageMaker uses they key to encrypt the artifacts of the optimized model when SageMaker uploads the model to Amazon S3.
S3OutputLocation (string) -- [REQUIRED]
The Amazon S3 URI for where to store the optimized model that you create with an optimization job.
dict
[REQUIRED]
Specifies a limit to how long a job can run. When the job reaches the time limit, SageMaker ends the job. Use this API to cap costs.
To stop a training job, SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost.
The training algorithms provided by SageMaker automatically save the intermediate results of a model training job when possible. This attempt to save artifacts is only a best effort case as model might not be in a state from which it can be saved. For example, if training has just started, the model might not be ready to save. When saved, this intermediate data is a valid model artifact. You can use it to create a model with CreateModel.
MaxRuntimeInSeconds (integer) --
The maximum length of time, in seconds, that a training or compilation job can run before it is stopped.
For compilation jobs, if the job does not complete during this time, a TimeOut error is generated. We recommend starting with 900 seconds and increasing as necessary based on your model.
For all other jobs, if the job does not complete during this time, SageMaker ends the job. When RetryStrategy is specified in the job request, MaxRuntimeInSeconds specifies the maximum time for all of the attempts in total, not each individual attempt. The default value is 1 day. The maximum value is 28 days.
The maximum time that a TrainingJob can run in total, including any time spent publishing metrics or archiving and uploading models after it has been stopped, is 30 days.
MaxWaitTimeInSeconds (integer) --
The maximum length of time, in seconds, that a managed Spot training job has to complete. It is the amount of time spent waiting for Spot capacity plus the amount of time the job can run. It must be equal to or greater than MaxRuntimeInSeconds. If the job does not complete during this time, SageMaker ends the job.
When RetryStrategy is specified in the job request, MaxWaitTimeInSeconds specifies the maximum time for all of the attempts in total, not each individual attempt.
MaxPendingTimeInSeconds (integer) --
The maximum length of time, in seconds, that a training or compilation job can be pending before it is stopped.
list
A list of key-value pairs associated with the optimization job. For more information, see Tagging Amazon Web Services resources in the Amazon Web Services General Reference Guide.
(dict) --
A tag object that consists of a key and an optional value, used to manage metadata for SageMaker Amazon Web Services resources.
You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags.
For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources. For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy.
Key (string) -- [REQUIRED]
The tag key. Tag keys must be unique per resource.
Value (string) -- [REQUIRED]
The tag value.
dict
A VPC in Amazon VPC that your optimized model has access to.
SecurityGroupIds (list) -- [REQUIRED]
The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.
(string) --
Subnets (list) -- [REQUIRED]
The ID of the subnets in the VPC to which you want to connect your optimized model.
(string) --
dict
Response Syntax
{ 'OptimizationJobArn': 'string' }
Response Structure
(dict) --
OptimizationJobArn (string) --
The Amazon Resource Name (ARN) of the optimization job.
{'DeploymentInstanceType': {'ml.g6e.12xlarge', 'ml.g6e.16xlarge', 'ml.g6e.24xlarge', 'ml.g6e.2xlarge', 'ml.g6e.48xlarge', 'ml.g6e.4xlarge', 'ml.g6e.8xlarge', 'ml.g6e.xlarge'}}
Provides the properties of the specified optimization job.
See also: AWS API Documentation
Request Syntax
client.describe_optimization_job( OptimizationJobName='string' )
string
[REQUIRED]
The name that you assigned to the optimization job.
dict
Response Syntax
{ 'OptimizationJobArn': 'string', 'OptimizationJobStatus': 'INPROGRESS'|'COMPLETED'|'FAILED'|'STARTING'|'STOPPING'|'STOPPED', 'OptimizationStartTime': datetime(2015, 1, 1), 'OptimizationEndTime': datetime(2015, 1, 1), 'CreationTime': datetime(2015, 1, 1), 'LastModifiedTime': datetime(2015, 1, 1), 'FailureReason': 'string', 'OptimizationJobName': 'string', 'ModelSource': { 'S3': { 'S3Uri': 'string', 'ModelAccessConfig': { 'AcceptEula': True|False } } }, 'OptimizationEnvironment': { 'string': 'string' }, 'DeploymentInstanceType': 'ml.p4d.24xlarge'|'ml.p4de.24xlarge'|'ml.p5.48xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.g6.xlarge'|'ml.g6.2xlarge'|'ml.g6.4xlarge'|'ml.g6.8xlarge'|'ml.g6.12xlarge'|'ml.g6.16xlarge'|'ml.g6.24xlarge'|'ml.g6.48xlarge'|'ml.g6e.xlarge'|'ml.g6e.2xlarge'|'ml.g6e.4xlarge'|'ml.g6e.8xlarge'|'ml.g6e.12xlarge'|'ml.g6e.16xlarge'|'ml.g6e.24xlarge'|'ml.g6e.48xlarge'|'ml.inf2.xlarge'|'ml.inf2.8xlarge'|'ml.inf2.24xlarge'|'ml.inf2.48xlarge'|'ml.trn1.2xlarge'|'ml.trn1.32xlarge'|'ml.trn1n.32xlarge', 'OptimizationConfigs': [ { 'ModelQuantizationConfig': { 'Image': 'string', 'OverrideEnvironment': { 'string': 'string' } }, 'ModelCompilationConfig': { 'Image': 'string', 'OverrideEnvironment': { 'string': 'string' } }, 'ModelShardingConfig': { 'Image': 'string', 'OverrideEnvironment': { 'string': 'string' } } }, ], 'OutputConfig': { 'KmsKeyId': 'string', 'S3OutputLocation': 'string' }, 'OptimizationOutput': { 'RecommendedInferenceImage': 'string' }, 'RoleArn': 'string', 'StoppingCondition': { 'MaxRuntimeInSeconds': 123, 'MaxWaitTimeInSeconds': 123, 'MaxPendingTimeInSeconds': 123 }, 'VpcConfig': { 'SecurityGroupIds': [ 'string', ], 'Subnets': [ 'string', ] } }
Response Structure
(dict) --
OptimizationJobArn (string) --
The Amazon Resource Name (ARN) of the optimization job.
OptimizationJobStatus (string) --
The current status of the optimization job.
OptimizationStartTime (datetime) --
The time when the optimization job started.
OptimizationEndTime (datetime) --
The time when the optimization job finished processing.
CreationTime (datetime) --
The time when you created the optimization job.
LastModifiedTime (datetime) --
The time when the optimization job was last updated.
FailureReason (string) --
If the optimization job status is FAILED, the reason for the failure.
OptimizationJobName (string) --
The name that you assigned to the optimization job.
ModelSource (dict) --
The location of the source model to optimize with an optimization job.
S3 (dict) --
The Amazon S3 location of a source model to optimize with an optimization job.
S3Uri (string) --
An Amazon S3 URI that locates a source model to optimize with an optimization job.
ModelAccessConfig (dict) --
The access configuration settings for the source ML model for an optimization job, where you can accept the model end-user license agreement (EULA).
AcceptEula (boolean) --
Specifies agreement to the model end-user license agreement (EULA). The AcceptEula value must be explicitly defined as True in order to accept the EULA that this model requires. You are responsible for reviewing and complying with any applicable license terms and making sure they are acceptable for your use case before downloading or using a model.
OptimizationEnvironment (dict) --
The environment variables to set in the model container.
(string) --
(string) --
DeploymentInstanceType (string) --
The type of instance that hosts the optimized model that you create with the optimization job.
OptimizationConfigs (list) --
Settings for each of the optimization techniques that the job applies.
(dict) --
Settings for an optimization technique that you apply with a model optimization job.
ModelQuantizationConfig (dict) --
Settings for the model quantization technique that's applied by a model optimization job.
Image (string) --
The URI of an LMI DLC in Amazon ECR. SageMaker uses this image to run the optimization.
OverrideEnvironment (dict) --
Environment variables that override the default ones in the model container.
(string) --
(string) --
ModelCompilationConfig (dict) --
Settings for the model compilation technique that's applied by a model optimization job.
Image (string) --
The URI of an LMI DLC in Amazon ECR. SageMaker uses this image to run the optimization.
OverrideEnvironment (dict) --
Environment variables that override the default ones in the model container.
(string) --
(string) --
ModelShardingConfig (dict) --
Settings for the model sharding technique that's applied by a model optimization job.
Image (string) --
The URI of an LMI DLC in Amazon ECR. SageMaker uses this image to run the optimization.
OverrideEnvironment (dict) --
Environment variables that override the default ones in the model container.
(string) --
(string) --
OutputConfig (dict) --
Details for where to store the optimized model that you create with the optimization job.
KmsKeyId (string) --
The Amazon Resource Name (ARN) of a key in Amazon Web Services KMS. SageMaker uses they key to encrypt the artifacts of the optimized model when SageMaker uploads the model to Amazon S3.
S3OutputLocation (string) --
The Amazon S3 URI for where to store the optimized model that you create with an optimization job.
OptimizationOutput (dict) --
Output values produced by an optimization job.
RecommendedInferenceImage (string) --
The image that SageMaker recommends that you use to host the optimized model that you created with an optimization job.
RoleArn (string) --
The ARN of the IAM role that you assigned to the optimization job.
StoppingCondition (dict) --
Specifies a limit to how long a job can run. When the job reaches the time limit, SageMaker ends the job. Use this API to cap costs.
To stop a training job, SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms can use this 120-second window to save the model artifacts, so the results of training are not lost.
The training algorithms provided by SageMaker automatically save the intermediate results of a model training job when possible. This attempt to save artifacts is only a best effort case as model might not be in a state from which it can be saved. For example, if training has just started, the model might not be ready to save. When saved, this intermediate data is a valid model artifact. You can use it to create a model with CreateModel.
MaxRuntimeInSeconds (integer) --
The maximum length of time, in seconds, that a training or compilation job can run before it is stopped.
For compilation jobs, if the job does not complete during this time, a TimeOut error is generated. We recommend starting with 900 seconds and increasing as necessary based on your model.
For all other jobs, if the job does not complete during this time, SageMaker ends the job. When RetryStrategy is specified in the job request, MaxRuntimeInSeconds specifies the maximum time for all of the attempts in total, not each individual attempt. The default value is 1 day. The maximum value is 28 days.
The maximum time that a TrainingJob can run in total, including any time spent publishing metrics or archiving and uploading models after it has been stopped, is 30 days.
MaxWaitTimeInSeconds (integer) --
The maximum length of time, in seconds, that a managed Spot training job has to complete. It is the amount of time spent waiting for Spot capacity plus the amount of time the job can run. It must be equal to or greater than MaxRuntimeInSeconds. If the job does not complete during this time, SageMaker ends the job.
When RetryStrategy is specified in the job request, MaxWaitTimeInSeconds specifies the maximum time for all of the attempts in total, not each individual attempt.
MaxPendingTimeInSeconds (integer) --
The maximum length of time, in seconds, that a training or compilation job can be pending before it is stopped.
VpcConfig (dict) --
A VPC in Amazon VPC that your optimized model has access to.
SecurityGroupIds (list) --
The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.
(string) --
Subnets (list) --
The ID of the subnets in the VPC to which you want to connect your optimized model.
(string) --
{'OptimizationJobSummaries': {'DeploymentInstanceType': {'ml.g6e.12xlarge', 'ml.g6e.16xlarge', 'ml.g6e.24xlarge', 'ml.g6e.2xlarge', 'ml.g6e.48xlarge', 'ml.g6e.4xlarge', 'ml.g6e.8xlarge', 'ml.g6e.xlarge'}}}
Lists the optimization jobs in your account and their properties.
See also: AWS API Documentation
Request Syntax
client.list_optimization_jobs( NextToken='string', MaxResults=123, CreationTimeAfter=datetime(2015, 1, 1), CreationTimeBefore=datetime(2015, 1, 1), LastModifiedTimeAfter=datetime(2015, 1, 1), LastModifiedTimeBefore=datetime(2015, 1, 1), OptimizationContains='string', NameContains='string', StatusEquals='INPROGRESS'|'COMPLETED'|'FAILED'|'STARTING'|'STOPPING'|'STOPPED', SortBy='Name'|'CreationTime'|'Status', SortOrder='Ascending'|'Descending' )
string
A token that you use to get the next set of results following a truncated response. If the response to the previous request was truncated, that response provides the value for this token.
integer
The maximum number of optimization jobs to return in the response. The default is 50.
datetime
Filters the results to only those optimization jobs that were created after the specified time.
datetime
Filters the results to only those optimization jobs that were created before the specified time.
datetime
Filters the results to only those optimization jobs that were updated after the specified time.
datetime
Filters the results to only those optimization jobs that were updated before the specified time.
string
Filters the results to only those optimization jobs that apply the specified optimization techniques. You can specify either Quantization or Compilation.
string
Filters the results to only those optimization jobs with a name that contains the specified string.
string
Filters the results to only those optimization jobs with the specified status.
string
The field by which to sort the optimization jobs in the response. The default is CreationTime
string
The sort order for results. The default is Ascending
dict
Response Syntax
{ 'OptimizationJobSummaries': [ { 'OptimizationJobName': 'string', 'OptimizationJobArn': 'string', 'CreationTime': datetime(2015, 1, 1), 'OptimizationJobStatus': 'INPROGRESS'|'COMPLETED'|'FAILED'|'STARTING'|'STOPPING'|'STOPPED', 'OptimizationStartTime': datetime(2015, 1, 1), 'OptimizationEndTime': datetime(2015, 1, 1), 'LastModifiedTime': datetime(2015, 1, 1), 'DeploymentInstanceType': 'ml.p4d.24xlarge'|'ml.p4de.24xlarge'|'ml.p5.48xlarge'|'ml.g5.xlarge'|'ml.g5.2xlarge'|'ml.g5.4xlarge'|'ml.g5.8xlarge'|'ml.g5.12xlarge'|'ml.g5.16xlarge'|'ml.g5.24xlarge'|'ml.g5.48xlarge'|'ml.g6.xlarge'|'ml.g6.2xlarge'|'ml.g6.4xlarge'|'ml.g6.8xlarge'|'ml.g6.12xlarge'|'ml.g6.16xlarge'|'ml.g6.24xlarge'|'ml.g6.48xlarge'|'ml.g6e.xlarge'|'ml.g6e.2xlarge'|'ml.g6e.4xlarge'|'ml.g6e.8xlarge'|'ml.g6e.12xlarge'|'ml.g6e.16xlarge'|'ml.g6e.24xlarge'|'ml.g6e.48xlarge'|'ml.inf2.xlarge'|'ml.inf2.8xlarge'|'ml.inf2.24xlarge'|'ml.inf2.48xlarge'|'ml.trn1.2xlarge'|'ml.trn1.32xlarge'|'ml.trn1n.32xlarge', 'OptimizationTypes': [ 'string', ] }, ], 'NextToken': 'string' }
Response Structure
(dict) --
OptimizationJobSummaries (list) --
A list of optimization jobs and their properties that matches any of the filters you specified in the request.
(dict) --
Summarizes an optimization job by providing some of its key properties.
OptimizationJobName (string) --
The name that you assigned to the optimization job.
OptimizationJobArn (string) --
The Amazon Resource Name (ARN) of the optimization job.
CreationTime (datetime) --
The time when you created the optimization job.
OptimizationJobStatus (string) --
The current status of the optimization job.
OptimizationStartTime (datetime) --
The time when the optimization job started.
OptimizationEndTime (datetime) --
The time when the optimization job finished processing.
LastModifiedTime (datetime) --
The time when the optimization job was last updated.
DeploymentInstanceType (string) --
The type of instance that hosts the optimized model that you create with the optimization job.
OptimizationTypes (list) --
The optimization techniques that are applied by the optimization job.
(string) --
NextToken (string) --
The token to use in a subsequent request to get the next set of results following a truncated response.