Amazon SageMaker Service

2021/11/08 - Amazon SageMaker Service - 4 updated api methods

Changes  SageMaker CreateEndpoint and UpdateEndpoint APIs now support additional deployment configuration to manage traffic shifting options and automatic rollback monitoring. DescribeEndpoint now shows new in-progress deployment details with stage status.

CreateEndpoint (updated) Link ¶
Changes (request)
{'DeploymentConfig': {'AutoRollbackConfiguration': {'Alarms': [{'AlarmName': 'string'}]},
                      'BlueGreenUpdatePolicy': {'MaximumExecutionTimeoutInSeconds': 'integer',
                                                'TerminationWaitInSeconds': 'integer',
                                                'TrafficRoutingConfiguration': {'CanarySize': {'Type': 'INSTANCE_COUNT '
                                                                                                       '| '
                                                                                                       'CAPACITY_PERCENT',
                                                                                               'Value': 'integer'},
                                                                                'LinearStepSize': {'Type': 'INSTANCE_COUNT '
                                                                                                           '| '
                                                                                                           'CAPACITY_PERCENT',
                                                                                                   'Value': 'integer'},
                                                                                'Type': 'ALL_AT_ONCE '
                                                                                        '| '
                                                                                        'CANARY '
                                                                                        '| '
                                                                                        'LINEAR',
                                                                                'WaitIntervalInSeconds': 'integer'}}}}

Creates an endpoint using the endpoint configuration specified in the request. Amazon SageMaker uses the endpoint to provision resources and deploy models. You create the endpoint configuration with the CreateEndpointConfig API.

Use this API to deploy models using Amazon SageMaker hosting services.

For an example that calls this method when deploying a model to Amazon SageMaker hosting services, see the Create Endpoint example notebook.

Note

You must not delete an EndpointConfig that is in use by an endpoint that is live or while the UpdateEndpoint or CreateEndpoint operations are being performed on the endpoint. To update an endpoint, you must create a new EndpointConfig .

The endpoint name must be unique within an Amazon Web Services Region in your Amazon Web Services account.

When it receives the request, Amazon SageMaker creates the endpoint, launches the resources (ML compute instances), and deploys the model(s) on them.

Note

When you call CreateEndpoint, a load call is made to DynamoDB to verify that your endpoint configuration exists. When you read data from a DynamoDB table supporting Eventually Consistent Reads, the response might not reflect the results of a recently completed write operation. The response might include some stale data. If the dependent entities are not yet in DynamoDB, this causes a validation error. If you repeat your read request after a short time, the response should return the latest data. So retry logic is recommended to handle these possible issues. We also recommend that customers call DescribeEndpointConfig before calling CreateEndpoint to minimize the potential impact of a DynamoDB eventually consistent read.

When Amazon SageMaker receives the request, it sets the endpoint status to Creating . After it creates the endpoint, it sets the status to InService . Amazon SageMaker can then process incoming requests for inferences. To check the status of an endpoint, use the DescribeEndpoint API.

If any of the models hosted at this endpoint get model data from an Amazon S3 location, Amazon SageMaker uses Amazon Web Services Security Token Service to download model artifacts from the S3 path you provided. Amazon Web Services STS is activated in your IAM user account by default. If you previously deactivated Amazon Web Services STS for a region, you need to reactivate Amazon Web Services STS for that region. For more information, see Activating and Deactivating Amazon Web Services STS in an Amazon Web Services Region in the Amazon Web Services Identity and Access Management User Guide .

Note

To add the IAM role policies for using this API operation, go to the IAM console, and choose Roles in the left navigation pane. Search the IAM role that you want to grant access to use the CreateEndpoint and CreateEndpointConfig API operations, add the following policies to the role.

  • Option 1: For a full SageMaker access, search and attach the AmazonSageMakerFullAccess policy.

  • Option 2: For granting a limited access to an IAM role, paste the following Action elements manually into the JSON file of the IAM role: "Action": ["sagemaker:CreateEndpoint", "sagemaker:CreateEndpointConfig"] "Resource": [ "arn:aws:sagemaker:region:account-id:endpoint/endpointName" "arn:aws:sagemaker:region:account-id:endpoint-config/endpointConfigName" ] For more information, see SageMaker API Permissions: Actions, Permissions, and Resources Reference.

See also: AWS API Documentation

Request Syntax

client.create_endpoint(
    EndpointName='string',
    EndpointConfigName='string',
    DeploymentConfig={
        'BlueGreenUpdatePolicy': {
            'TrafficRoutingConfiguration': {
                'Type': 'ALL_AT_ONCE'|'CANARY'|'LINEAR',
                'WaitIntervalInSeconds': 123,
                'CanarySize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                },
                'LinearStepSize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                }
            },
            'TerminationWaitInSeconds': 123,
            'MaximumExecutionTimeoutInSeconds': 123
        },
        'AutoRollbackConfiguration': {
            'Alarms': [
                {
                    'AlarmName': 'string'
                },
            ]
        }
    },
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
type EndpointName

string

param EndpointName

[REQUIRED]

The name of the endpoint.The name must be unique within an Amazon Web Services Region in your Amazon Web Services account. The name is case-insensitive in CreateEndpoint , but the case is preserved and must be matched in .

type EndpointConfigName

string

param EndpointConfigName

[REQUIRED]

The name of an endpoint configuration. For more information, see CreateEndpointConfig.

type DeploymentConfig

dict

param DeploymentConfig

The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.

  • BlueGreenUpdatePolicy (dict) -- [REQUIRED]

    Update policy for a blue/green deployment. If this update policy is specified, SageMaker creates a new fleet during the deployment while maintaining the old fleet. SageMaker flips traffic to the new fleet according to the specified traffic routing configuration. Only one update policy should be used in the deployment configuration. If no update policy is specified, SageMaker uses a blue/green deployment strategy with all at once traffic shifting by default.

    • TrafficRoutingConfiguration (dict) -- [REQUIRED]

      Defines the traffic routing strategy to shift traffic from the old fleet to the new fleet during an endpoint deployment.

      • Type (string) -- [REQUIRED]

        Traffic routing strategy type.

        • ALL_AT_ONCE : Endpoint traffic shifts to the new fleet in a single step.

        • CANARY : Endpoint traffic shifts to the new fleet in two steps. The first step is the canary, which is a small portion of the traffic. The second step is the remainder of the traffic.

        • LINEAR : Endpoint traffic shifts to the new fleet in n steps of a configurable size.

      • WaitIntervalInSeconds (integer) -- [REQUIRED]

        The waiting time (in seconds) between incremental steps to turn on traffic on the new endpoint fleet.

      • CanarySize (dict) --

        Batch size for the first step to turn on traffic on the new endpoint fleet. Value must be less than or equal to 50% of the variant's total instance count.

        • Type (string) -- [REQUIRED]

          Specifies the endpoint capacity type.

          • INSTANCE_COUNT : The endpoint activates based on the number of instances.

          • CAPACITY_PERCENT : The endpoint activates based on the specified percentage of capacity.

        • Value (integer) -- [REQUIRED]

          Defines the capacity size, either as a number of instances or a capacity percentage.

      • LinearStepSize (dict) --

        Batch size for each step to turn on traffic on the new endpoint fleet. Value must be 10-50% of the variant's total instance count.

        • Type (string) -- [REQUIRED]

          Specifies the endpoint capacity type.

          • INSTANCE_COUNT : The endpoint activates based on the number of instances.

          • CAPACITY_PERCENT : The endpoint activates based on the specified percentage of capacity.

        • Value (integer) -- [REQUIRED]

          Defines the capacity size, either as a number of instances or a capacity percentage.

    • TerminationWaitInSeconds (integer) --

      Additional waiting time in seconds after the completion of an endpoint deployment before terminating the old endpoint fleet. Default is 0.

    • MaximumExecutionTimeoutInSeconds (integer) --

      Maximum execution timeout for the deployment. Note that the timeout value should be larger than the total waiting time specified in TerminationWaitInSeconds and WaitIntervalInSeconds .

  • AutoRollbackConfiguration (dict) --

    Automatic rollback configuration for handling endpoint deployment failures and recovery.

    • Alarms (list) --

      List of CloudWatch alarms in your account that are configured to monitor metrics on an endpoint. If any alarms are tripped during a deployment, SageMaker rolls back the deployment.

      • (dict) --

        An Amazon CloudWatch alarm configured to monitor metrics on an endpoint.

        • AlarmName (string) --

          The name of a CloudWatch alarm in your account.

type Tags

list

param Tags

An array of key-value pairs. You can use tags to categorize your Amazon Web Services resources in different ways, for example, by purpose, owner, or environment. For more information, see Tagging Amazon Web Services Resources.

  • (dict) --

    A tag object that consists of a key and an optional value, used to manage metadata for SageMaker Amazon Web Services resources.

    You can add tags to notebook instances, training jobs, hyperparameter tuning jobs, batch transform jobs, models, labeling jobs, work teams, endpoint configurations, and endpoints. For more information on adding tags to SageMaker resources, see AddTags.

    For more information on adding metadata to your Amazon Web Services resources with tagging, see Tagging Amazon Web Services resources. For advice on best practices for managing Amazon Web Services resources with tagging, see Tagging Best Practices: Implement an Effective Amazon Web Services Resource Tagging Strategy.

    • Key (string) -- [REQUIRED]

      The tag key. Tag keys must be unique per resource.

    • Value (string) -- [REQUIRED]

      The tag value.

rtype

dict

returns

Response Syntax

{
    'EndpointArn': 'string'
}

Response Structure

  • (dict) --

    • EndpointArn (string) --

      The Amazon Resource Name (ARN) of the endpoint.

DescribeEndpoint (updated) Link ¶
Changes (response)
{'LastDeploymentConfig': {'BlueGreenUpdatePolicy': {'TrafficRoutingConfiguration': {'LinearStepSize': {'Type': 'INSTANCE_COUNT '
                                                                                                               '| '
                                                                                                               'CAPACITY_PERCENT',
                                                                                                       'Value': 'integer'},
                                                                                    'Type': {'LINEAR'}}}},
 'PendingDeploymentSummary': {'EndpointConfigName': 'string',
                              'ProductionVariants': [{'AcceleratorType': 'ml.eia1.medium '
                                                                         '| '
                                                                         'ml.eia1.large '
                                                                         '| '
                                                                         'ml.eia1.xlarge '
                                                                         '| '
                                                                         'ml.eia2.medium '
                                                                         '| '
                                                                         'ml.eia2.large '
                                                                         '| '
                                                                         'ml.eia2.xlarge',
                                                      'CurrentInstanceCount': 'integer',
                                                      'CurrentWeight': 'float',
                                                      'DeployedImages': [{'ResolutionTime': 'timestamp',
                                                                          'ResolvedImage': 'string',
                                                                          'SpecifiedImage': 'string'}],
                                                      'DesiredInstanceCount': 'integer',
                                                      'DesiredWeight': 'float',
                                                      'InstanceType': 'ml.t2.medium '
                                                                      '| '
                                                                      'ml.t2.large '
                                                                      '| '
                                                                      'ml.t2.xlarge '
                                                                      '| '
                                                                      'ml.t2.2xlarge '
                                                                      '| '
                                                                      'ml.m4.xlarge '
                                                                      '| '
                                                                      'ml.m4.2xlarge '
                                                                      '| '
                                                                      'ml.m4.4xlarge '
                                                                      '| '
                                                                      'ml.m4.10xlarge '
                                                                      '| '
                                                                      'ml.m4.16xlarge '
                                                                      '| '
                                                                      'ml.m5.large '
                                                                      '| '
                                                                      'ml.m5.xlarge '
                                                                      '| '
                                                                      'ml.m5.2xlarge '
                                                                      '| '
                                                                      'ml.m5.4xlarge '
                                                                      '| '
                                                                      'ml.m5.12xlarge '
                                                                      '| '
                                                                      'ml.m5.24xlarge '
                                                                      '| '
                                                                      'ml.m5d.large '
                                                                      '| '
                                                                      'ml.m5d.xlarge '
                                                                      '| '
                                                                      'ml.m5d.2xlarge '
                                                                      '| '
                                                                      'ml.m5d.4xlarge '
                                                                      '| '
                                                                      'ml.m5d.12xlarge '
                                                                      '| '
                                                                      'ml.m5d.24xlarge '
                                                                      '| '
                                                                      'ml.c4.large '
                                                                      '| '
                                                                      'ml.c4.xlarge '
                                                                      '| '
                                                                      'ml.c4.2xlarge '
                                                                      '| '
                                                                      'ml.c4.4xlarge '
                                                                      '| '
                                                                      'ml.c4.8xlarge '
                                                                      '| '
                                                                      'ml.p2.xlarge '
                                                                      '| '
                                                                      'ml.p2.8xlarge '
                                                                      '| '
                                                                      'ml.p2.16xlarge '
                                                                      '| '
                                                                      'ml.p3.2xlarge '
                                                                      '| '
                                                                      'ml.p3.8xlarge '
                                                                      '| '
                                                                      'ml.p3.16xlarge '
                                                                      '| '
                                                                      'ml.c5.large '
                                                                      '| '
                                                                      'ml.c5.xlarge '
                                                                      '| '
                                                                      'ml.c5.2xlarge '
                                                                      '| '
                                                                      'ml.c5.4xlarge '
                                                                      '| '
                                                                      'ml.c5.9xlarge '
                                                                      '| '
                                                                      'ml.c5.18xlarge '
                                                                      '| '
                                                                      'ml.c5d.large '
                                                                      '| '
                                                                      'ml.c5d.xlarge '
                                                                      '| '
                                                                      'ml.c5d.2xlarge '
                                                                      '| '
                                                                      'ml.c5d.4xlarge '
                                                                      '| '
                                                                      'ml.c5d.9xlarge '
                                                                      '| '
                                                                      'ml.c5d.18xlarge '
                                                                      '| '
                                                                      'ml.g4dn.xlarge '
                                                                      '| '
                                                                      'ml.g4dn.2xlarge '
                                                                      '| '
                                                                      'ml.g4dn.4xlarge '
                                                                      '| '
                                                                      'ml.g4dn.8xlarge '
                                                                      '| '
                                                                      'ml.g4dn.12xlarge '
                                                                      '| '
                                                                      'ml.g4dn.16xlarge '
                                                                      '| '
                                                                      'ml.r5.large '
                                                                      '| '
                                                                      'ml.r5.xlarge '
                                                                      '| '
                                                                      'ml.r5.2xlarge '
                                                                      '| '
                                                                      'ml.r5.4xlarge '
                                                                      '| '
                                                                      'ml.r5.12xlarge '
                                                                      '| '
                                                                      'ml.r5.24xlarge '
                                                                      '| '
                                                                      'ml.r5d.large '
                                                                      '| '
                                                                      'ml.r5d.xlarge '
                                                                      '| '
                                                                      'ml.r5d.2xlarge '
                                                                      '| '
                                                                      'ml.r5d.4xlarge '
                                                                      '| '
                                                                      'ml.r5d.12xlarge '
                                                                      '| '
                                                                      'ml.r5d.24xlarge '
                                                                      '| '
                                                                      'ml.inf1.xlarge '
                                                                      '| '
                                                                      'ml.inf1.2xlarge '
                                                                      '| '
                                                                      'ml.inf1.6xlarge '
                                                                      '| '
                                                                      'ml.inf1.24xlarge',
                                                      'VariantName': 'string',
                                                      'VariantStatus': [{'StartTime': 'timestamp',
                                                                         'Status': 'Creating '
                                                                                   '| '
                                                                                   'Updating '
                                                                                   '| '
                                                                                   'Deleting '
                                                                                   '| '
                                                                                   'ActivatingTraffic '
                                                                                   '| '
                                                                                   'Baking',
                                                                         'StatusMessage': 'string'}]}],
                              'StartTime': 'timestamp'},
 'ProductionVariants': {'VariantStatus': [{'StartTime': 'timestamp',
                                           'Status': 'Creating | Updating | '
                                                     'Deleting | '
                                                     'ActivatingTraffic | '
                                                     'Baking',
                                           'StatusMessage': 'string'}]}}

Returns the description of an endpoint.

See also: AWS API Documentation

Request Syntax

client.describe_endpoint(
    EndpointName='string'
)
type EndpointName

string

param EndpointName

[REQUIRED]

The name of the endpoint.

rtype

dict

returns

Response Syntax

{
    'EndpointName': 'string',
    'EndpointArn': 'string',
    'EndpointConfigName': 'string',
    'ProductionVariants': [
        {
            'VariantName': 'string',
            'DeployedImages': [
                {
                    'SpecifiedImage': 'string',
                    'ResolvedImage': 'string',
                    'ResolutionTime': datetime(2015, 1, 1)
                },
            ],
            'CurrentWeight': ...,
            'DesiredWeight': ...,
            'CurrentInstanceCount': 123,
            'DesiredInstanceCount': 123,
            'VariantStatus': [
                {
                    'Status': 'Creating'|'Updating'|'Deleting'|'ActivatingTraffic'|'Baking',
                    'StatusMessage': 'string',
                    'StartTime': datetime(2015, 1, 1)
                },
            ]
        },
    ],
    'DataCaptureConfig': {
        'EnableCapture': True|False,
        'CaptureStatus': 'Started'|'Stopped',
        'CurrentSamplingPercentage': 123,
        'DestinationS3Uri': 'string',
        'KmsKeyId': 'string'
    },
    'EndpointStatus': 'OutOfService'|'Creating'|'Updating'|'SystemUpdating'|'RollingBack'|'InService'|'Deleting'|'Failed',
    'FailureReason': 'string',
    'CreationTime': datetime(2015, 1, 1),
    'LastModifiedTime': datetime(2015, 1, 1),
    'LastDeploymentConfig': {
        'BlueGreenUpdatePolicy': {
            'TrafficRoutingConfiguration': {
                'Type': 'ALL_AT_ONCE'|'CANARY'|'LINEAR',
                'WaitIntervalInSeconds': 123,
                'CanarySize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                },
                'LinearStepSize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                }
            },
            'TerminationWaitInSeconds': 123,
            'MaximumExecutionTimeoutInSeconds': 123
        },
        'AutoRollbackConfiguration': {
            'Alarms': [
                {
                    'AlarmName': 'string'
                },
            ]
        }
    },
    'AsyncInferenceConfig': {
        'ClientConfig': {
            'MaxConcurrentInvocationsPerInstance': 123
        },
        'OutputConfig': {
            'KmsKeyId': 'string',
            'S3OutputPath': 'string',
            'NotificationConfig': {
                'SuccessTopic': 'string',
                'ErrorTopic': 'string'
            }
        }
    },
    'PendingDeploymentSummary': {
        'EndpointConfigName': 'string',
        'ProductionVariants': [
            {
                'VariantName': 'string',
                'DeployedImages': [
                    {
                        'SpecifiedImage': 'string',
                        'ResolvedImage': 'string',
                        'ResolutionTime': datetime(2015, 1, 1)
                    },
                ],
                'CurrentWeight': ...,
                'DesiredWeight': ...,
                'CurrentInstanceCount': 123,
                'DesiredInstanceCount': 123,
                'InstanceType': 'ml.t2.medium'|'ml.t2.large'|'ml.t2.xlarge'|'ml.t2.2xlarge'|'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.m5d.large'|'ml.m5d.xlarge'|'ml.m5d.2xlarge'|'ml.m5d.4xlarge'|'ml.m5d.12xlarge'|'ml.m5d.24xlarge'|'ml.c4.large'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.large'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.c5d.large'|'ml.c5d.xlarge'|'ml.c5d.2xlarge'|'ml.c5d.4xlarge'|'ml.c5d.9xlarge'|'ml.c5d.18xlarge'|'ml.g4dn.xlarge'|'ml.g4dn.2xlarge'|'ml.g4dn.4xlarge'|'ml.g4dn.8xlarge'|'ml.g4dn.12xlarge'|'ml.g4dn.16xlarge'|'ml.r5.large'|'ml.r5.xlarge'|'ml.r5.2xlarge'|'ml.r5.4xlarge'|'ml.r5.12xlarge'|'ml.r5.24xlarge'|'ml.r5d.large'|'ml.r5d.xlarge'|'ml.r5d.2xlarge'|'ml.r5d.4xlarge'|'ml.r5d.12xlarge'|'ml.r5d.24xlarge'|'ml.inf1.xlarge'|'ml.inf1.2xlarge'|'ml.inf1.6xlarge'|'ml.inf1.24xlarge',
                'AcceleratorType': 'ml.eia1.medium'|'ml.eia1.large'|'ml.eia1.xlarge'|'ml.eia2.medium'|'ml.eia2.large'|'ml.eia2.xlarge',
                'VariantStatus': [
                    {
                        'Status': 'Creating'|'Updating'|'Deleting'|'ActivatingTraffic'|'Baking',
                        'StatusMessage': 'string',
                        'StartTime': datetime(2015, 1, 1)
                    },
                ]
            },
        ],
        'StartTime': datetime(2015, 1, 1)
    }
}

Response Structure

  • (dict) --

    • EndpointName (string) --

      Name of the endpoint.

    • EndpointArn (string) --

      The Amazon Resource Name (ARN) of the endpoint.

    • EndpointConfigName (string) --

      The name of the endpoint configuration associated with this endpoint.

    • ProductionVariants (list) --

      An array of ProductionVariantSummary objects, one for each model hosted behind this endpoint.

      • (dict) --

        Describes weight and capacities for a production variant associated with an endpoint. If you sent a request to the UpdateEndpointWeightsAndCapacities API and the endpoint status is Updating , you get different desired and current values.

        • VariantName (string) --

          The name of the variant.

        • DeployedImages (list) --

          An array of DeployedImage objects that specify the Amazon EC2 Container Registry paths of the inference images deployed on instances of this ProductionVariant .

          • (dict) --

            Gets the Amazon EC2 Container Registry path of the docker image of the model that is hosted in this ProductionVariant.

            If you used the registry/repository[:tag] form to specify the image path of the primary container when you created the model hosted in this ProductionVariant , the path resolves to a path of the form registry/repository[@digest] . A digest is a hash value that identifies a specific version of an image. For information about Amazon ECR paths, see Pulling an Image in the Amazon ECR User Guide .

            • SpecifiedImage (string) --

              The image path you specified when you created the model.

            • ResolvedImage (string) --

              The specific digest path of the image hosted in this ProductionVariant .

            • ResolutionTime (datetime) --

              The date and time when the image path for the model resolved to the ResolvedImage

        • CurrentWeight (float) --

          The weight associated with the variant.

        • DesiredWeight (float) --

          The requested weight, as specified in the UpdateEndpointWeightsAndCapacities request.

        • CurrentInstanceCount (integer) --

          The number of instances associated with the variant.

        • DesiredInstanceCount (integer) --

          The number of instances requested in the UpdateEndpointWeightsAndCapacities request.

        • VariantStatus (list) --

          The endpoint variant status which describes the current deployment stage status or operational status.

          • (dict) --

            Describes the status of the production variant.

            • Status (string) --

              The endpoint variant status which describes the current deployment stage status or operational status.

              • Creating : Creating inference resources for the production variant.

              • Deleting : Terminating inference resources for the production variant.

              • Updating : Updating capacity for the production variant.

              • ActivatingTraffic : Turning on traffic for the production variant.

              • Baking : Waiting period to monitor the CloudWatch alarms in the automatic rollback configuration.

            • StatusMessage (string) --

              A message that describes the status of the production variant.

            • StartTime (datetime) --

              The start time of the current status change.

    • DataCaptureConfig (dict) --

      • EnableCapture (boolean) --

      • CaptureStatus (string) --

      • CurrentSamplingPercentage (integer) --

      • DestinationS3Uri (string) --

      • KmsKeyId (string) --

    • EndpointStatus (string) --

      The status of the endpoint.

      • OutOfService : Endpoint is not available to take incoming requests.

      • Creating : CreateEndpoint is executing.

      • Updating : UpdateEndpoint or UpdateEndpointWeightsAndCapacities is executing.

      • SystemUpdating : Endpoint is undergoing maintenance and cannot be updated or deleted or re-scaled until it has completed. This maintenance operation does not change any customer-specified values such as VPC config, KMS encryption, model, instance type, or instance count.

      • RollingBack : Endpoint fails to scale up or down or change its variant weight and is in the process of rolling back to its previous configuration. Once the rollback completes, endpoint returns to an InService status. This transitional status only applies to an endpoint that has autoscaling enabled and is undergoing variant weight or capacity changes as part of an UpdateEndpointWeightsAndCapacities call or when the UpdateEndpointWeightsAndCapacities operation is called explicitly.

      • InService : Endpoint is available to process incoming requests.

      • Deleting : DeleteEndpoint is executing.

      • Failed : Endpoint could not be created, updated, or re-scaled. Use DescribeEndpointOutput$FailureReason for information about the failure. DeleteEndpoint is the only operation that can be performed on a failed endpoint.

    • FailureReason (string) --

      If the status of the endpoint is Failed , the reason why it failed.

    • CreationTime (datetime) --

      A timestamp that shows when the endpoint was created.

    • LastModifiedTime (datetime) --

      A timestamp that shows when the endpoint was last modified.

    • LastDeploymentConfig (dict) --

      The most recent deployment configuration for the endpoint.

      • BlueGreenUpdatePolicy (dict) --

        Update policy for a blue/green deployment. If this update policy is specified, SageMaker creates a new fleet during the deployment while maintaining the old fleet. SageMaker flips traffic to the new fleet according to the specified traffic routing configuration. Only one update policy should be used in the deployment configuration. If no update policy is specified, SageMaker uses a blue/green deployment strategy with all at once traffic shifting by default.

        • TrafficRoutingConfiguration (dict) --

          Defines the traffic routing strategy to shift traffic from the old fleet to the new fleet during an endpoint deployment.

          • Type (string) --

            Traffic routing strategy type.

            • ALL_AT_ONCE : Endpoint traffic shifts to the new fleet in a single step.

            • CANARY : Endpoint traffic shifts to the new fleet in two steps. The first step is the canary, which is a small portion of the traffic. The second step is the remainder of the traffic.

            • LINEAR : Endpoint traffic shifts to the new fleet in n steps of a configurable size.

          • WaitIntervalInSeconds (integer) --

            The waiting time (in seconds) between incremental steps to turn on traffic on the new endpoint fleet.

          • CanarySize (dict) --

            Batch size for the first step to turn on traffic on the new endpoint fleet. Value must be less than or equal to 50% of the variant's total instance count.

            • Type (string) --

              Specifies the endpoint capacity type.

              • INSTANCE_COUNT : The endpoint activates based on the number of instances.

              • CAPACITY_PERCENT : The endpoint activates based on the specified percentage of capacity.

            • Value (integer) --

              Defines the capacity size, either as a number of instances or a capacity percentage.

          • LinearStepSize (dict) --

            Batch size for each step to turn on traffic on the new endpoint fleet. Value must be 10-50% of the variant's total instance count.

            • Type (string) --

              Specifies the endpoint capacity type.

              • INSTANCE_COUNT : The endpoint activates based on the number of instances.

              • CAPACITY_PERCENT : The endpoint activates based on the specified percentage of capacity.

            • Value (integer) --

              Defines the capacity size, either as a number of instances or a capacity percentage.

        • TerminationWaitInSeconds (integer) --

          Additional waiting time in seconds after the completion of an endpoint deployment before terminating the old endpoint fleet. Default is 0.

        • MaximumExecutionTimeoutInSeconds (integer) --

          Maximum execution timeout for the deployment. Note that the timeout value should be larger than the total waiting time specified in TerminationWaitInSeconds and WaitIntervalInSeconds .

      • AutoRollbackConfiguration (dict) --

        Automatic rollback configuration for handling endpoint deployment failures and recovery.

        • Alarms (list) --

          List of CloudWatch alarms in your account that are configured to monitor metrics on an endpoint. If any alarms are tripped during a deployment, SageMaker rolls back the deployment.

          • (dict) --

            An Amazon CloudWatch alarm configured to monitor metrics on an endpoint.

            • AlarmName (string) --

              The name of a CloudWatch alarm in your account.

    • AsyncInferenceConfig (dict) --

      Returns the description of an endpoint configuration created using the CreateEndpointConfig API.

      • ClientConfig (dict) --

        Configures the behavior of the client used by Amazon SageMaker to interact with the model container during asynchronous inference.

        • MaxConcurrentInvocationsPerInstance (integer) --

          The maximum number of concurrent requests sent by the SageMaker client to the model container. If no value is provided, Amazon SageMaker will choose an optimal value for you.

      • OutputConfig (dict) --

        Specifies the configuration for asynchronous inference invocation outputs.

        • KmsKeyId (string) --

          The Amazon Web Services Key Management Service (Amazon Web Services KMS) key that Amazon SageMaker uses to encrypt the asynchronous inference output in Amazon S3.

        • S3OutputPath (string) --

          The Amazon S3 location to upload inference responses to.

        • NotificationConfig (dict) --

          Specifies the configuration for notifications of inference results for asynchronous inference.

          • SuccessTopic (string) --

            Amazon SNS topic to post a notification to when inference completes successfully. If no topic is provided, no notification is sent on success.

          • ErrorTopic (string) --

            Amazon SNS topic to post a notification to when inference fails. If no topic is provided, no notification is sent on failure.

    • PendingDeploymentSummary (dict) --

      Returns the summary of an in-progress deployment. This field is only returned when the endpoint is creating or updating with a new endpoint configuration.

      • EndpointConfigName (string) --

        The name of the endpoint configuration used in the deployment.

      • ProductionVariants (list) --

        List of PendingProductionVariantSummary objects.

        • (dict) --

          The production variant summary for a deployment when an endpoint is creating or updating with the CreateEndpoint or UpdateEndpoint operations. Describes the VariantStatus , weight and capacity for a production variant associated with an endpoint.

          • VariantName (string) --

            The name of the variant.

          • DeployedImages (list) --

            An array of DeployedImage objects that specify the Amazon EC2 Container Registry paths of the inference images deployed on instances of this ProductionVariant .

            • (dict) --

              Gets the Amazon EC2 Container Registry path of the docker image of the model that is hosted in this ProductionVariant.

              If you used the registry/repository[:tag] form to specify the image path of the primary container when you created the model hosted in this ProductionVariant , the path resolves to a path of the form registry/repository[@digest] . A digest is a hash value that identifies a specific version of an image. For information about Amazon ECR paths, see Pulling an Image in the Amazon ECR User Guide .

              • SpecifiedImage (string) --

                The image path you specified when you created the model.

              • ResolvedImage (string) --

                The specific digest path of the image hosted in this ProductionVariant .

              • ResolutionTime (datetime) --

                The date and time when the image path for the model resolved to the ResolvedImage

          • CurrentWeight (float) --

            The weight associated with the variant.

          • DesiredWeight (float) --

            The requested weight for the variant in this deployment, as specified in the endpoint configuration for the endpoint. The value is taken from the request to the CreateEndpointConfig operation.

          • CurrentInstanceCount (integer) --

            The number of instances associated with the variant.

          • DesiredInstanceCount (integer) --

            The number of instances requested in this deployment, as specified in the endpoint configuration for the endpoint. The value is taken from the request to the CreateEndpointConfig operation.

          • InstanceType (string) --

            The type of instances associated with the variant.

          • AcceleratorType (string) --

            The size of the Elastic Inference (EI) instance to use for the production variant. EI instances provide on-demand GPU computing for inference. For more information, see Using Elastic Inference in Amazon SageMaker.

          • VariantStatus (list) --

            The endpoint variant status which describes the current deployment stage status or operational status.

            • (dict) --

              Describes the status of the production variant.

              • Status (string) --

                The endpoint variant status which describes the current deployment stage status or operational status.

                • Creating : Creating inference resources for the production variant.

                • Deleting : Terminating inference resources for the production variant.

                • Updating : Updating capacity for the production variant.

                • ActivatingTraffic : Turning on traffic for the production variant.

                • Baking : Waiting period to monitor the CloudWatch alarms in the automatic rollback configuration.

              • StatusMessage (string) --

                A message that describes the status of the production variant.

              • StartTime (datetime) --

                The start time of the current status change.

      • StartTime (datetime) --

        The start time of the deployment.

UpdateEndpoint (updated) Link ¶
Changes (request)
{'DeploymentConfig': {'BlueGreenUpdatePolicy': {'TrafficRoutingConfiguration': {'LinearStepSize': {'Type': 'INSTANCE_COUNT '
                                                                                                           '| '
                                                                                                           'CAPACITY_PERCENT',
                                                                                                   'Value': 'integer'},
                                                                                'Type': {'LINEAR'}}}},
 'RetainDeploymentConfig': 'boolean'}

Deploys the new EndpointConfig specified in the request, switches to using newly created endpoint, and then deletes resources provisioned for the endpoint using the previous EndpointConfig (there is no availability loss).

When Amazon SageMaker receives the request, it sets the endpoint status to Updating . After updating the endpoint, it sets the status to InService . To check the status of an endpoint, use the DescribeEndpoint API.

Note

You must not delete an EndpointConfig in use by an endpoint that is live or while the UpdateEndpoint or CreateEndpoint operations are being performed on the endpoint. To update an endpoint, you must create a new EndpointConfig .

If you delete the EndpointConfig of an endpoint that is active or being created or updated you may lose visibility into the instance type the endpoint is using. The endpoint must be deleted in order to stop incurring charges.

See also: AWS API Documentation

Request Syntax

client.update_endpoint(
    EndpointName='string',
    EndpointConfigName='string',
    RetainAllVariantProperties=True|False,
    ExcludeRetainedVariantProperties=[
        {
            'VariantPropertyType': 'DesiredInstanceCount'|'DesiredWeight'|'DataCaptureConfig'
        },
    ],
    DeploymentConfig={
        'BlueGreenUpdatePolicy': {
            'TrafficRoutingConfiguration': {
                'Type': 'ALL_AT_ONCE'|'CANARY'|'LINEAR',
                'WaitIntervalInSeconds': 123,
                'CanarySize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                },
                'LinearStepSize': {
                    'Type': 'INSTANCE_COUNT'|'CAPACITY_PERCENT',
                    'Value': 123
                }
            },
            'TerminationWaitInSeconds': 123,
            'MaximumExecutionTimeoutInSeconds': 123
        },
        'AutoRollbackConfiguration': {
            'Alarms': [
                {
                    'AlarmName': 'string'
                },
            ]
        }
    },
    RetainDeploymentConfig=True|False
)
type EndpointName

string

param EndpointName

[REQUIRED]

The name of the endpoint whose configuration you want to update.

type EndpointConfigName

string

param EndpointConfigName

[REQUIRED]

The name of the new endpoint configuration.

type RetainAllVariantProperties

boolean

param RetainAllVariantProperties

When updating endpoint resources, enables or disables the retention of variant properties, such as the instance count or the variant weight. To retain the variant properties of an endpoint when updating it, set RetainAllVariantProperties to true . To use the variant properties specified in a new EndpointConfig call when updating an endpoint, set RetainAllVariantProperties to false . The default is false .

type ExcludeRetainedVariantProperties

list

param ExcludeRetainedVariantProperties

When you are updating endpoint resources with UpdateEndpointInput$RetainAllVariantProperties, whose value is set to true , ExcludeRetainedVariantProperties specifies the list of type VariantProperty to override with the values provided by EndpointConfig . If you don't specify a value for ExcludeAllVariantProperties , no variant properties are overridden.

  • (dict) --

    Specifies a production variant property type for an Endpoint.

    If you are updating an endpoint with the UpdateEndpointInput$RetainAllVariantProperties option set to true , the VariantProperty objects listed in UpdateEndpointInput$ExcludeRetainedVariantProperties override the existing variant properties of the endpoint.

    • VariantPropertyType (string) -- [REQUIRED]

      The type of variant property. The supported values are:

      • DesiredInstanceCount : Overrides the existing variant instance counts using the ProductionVariant$InitialInstanceCount values in the CreateEndpointConfigInput$ProductionVariants.

      • DesiredWeight : Overrides the existing variant weights using the ProductionVariant$InitialVariantWeight values in the CreateEndpointConfigInput$ProductionVariants.

      • DataCaptureConfig : (Not currently supported.)

type DeploymentConfig

dict

param DeploymentConfig

The deployment configuration for an endpoint, which contains the desired deployment strategy and rollback configurations.

  • BlueGreenUpdatePolicy (dict) -- [REQUIRED]

    Update policy for a blue/green deployment. If this update policy is specified, SageMaker creates a new fleet during the deployment while maintaining the old fleet. SageMaker flips traffic to the new fleet according to the specified traffic routing configuration. Only one update policy should be used in the deployment configuration. If no update policy is specified, SageMaker uses a blue/green deployment strategy with all at once traffic shifting by default.

    • TrafficRoutingConfiguration (dict) -- [REQUIRED]

      Defines the traffic routing strategy to shift traffic from the old fleet to the new fleet during an endpoint deployment.

      • Type (string) -- [REQUIRED]

        Traffic routing strategy type.

        • ALL_AT_ONCE : Endpoint traffic shifts to the new fleet in a single step.

        • CANARY : Endpoint traffic shifts to the new fleet in two steps. The first step is the canary, which is a small portion of the traffic. The second step is the remainder of the traffic.

        • LINEAR : Endpoint traffic shifts to the new fleet in n steps of a configurable size.

      • WaitIntervalInSeconds (integer) -- [REQUIRED]

        The waiting time (in seconds) between incremental steps to turn on traffic on the new endpoint fleet.

      • CanarySize (dict) --

        Batch size for the first step to turn on traffic on the new endpoint fleet. Value must be less than or equal to 50% of the variant's total instance count.

        • Type (string) -- [REQUIRED]

          Specifies the endpoint capacity type.

          • INSTANCE_COUNT : The endpoint activates based on the number of instances.

          • CAPACITY_PERCENT : The endpoint activates based on the specified percentage of capacity.

        • Value (integer) -- [REQUIRED]

          Defines the capacity size, either as a number of instances or a capacity percentage.

      • LinearStepSize (dict) --

        Batch size for each step to turn on traffic on the new endpoint fleet. Value must be 10-50% of the variant's total instance count.

        • Type (string) -- [REQUIRED]

          Specifies the endpoint capacity type.

          • INSTANCE_COUNT : The endpoint activates based on the number of instances.

          • CAPACITY_PERCENT : The endpoint activates based on the specified percentage of capacity.

        • Value (integer) -- [REQUIRED]

          Defines the capacity size, either as a number of instances or a capacity percentage.

    • TerminationWaitInSeconds (integer) --

      Additional waiting time in seconds after the completion of an endpoint deployment before terminating the old endpoint fleet. Default is 0.

    • MaximumExecutionTimeoutInSeconds (integer) --

      Maximum execution timeout for the deployment. Note that the timeout value should be larger than the total waiting time specified in TerminationWaitInSeconds and WaitIntervalInSeconds .

  • AutoRollbackConfiguration (dict) --

    Automatic rollback configuration for handling endpoint deployment failures and recovery.

    • Alarms (list) --

      List of CloudWatch alarms in your account that are configured to monitor metrics on an endpoint. If any alarms are tripped during a deployment, SageMaker rolls back the deployment.

      • (dict) --

        An Amazon CloudWatch alarm configured to monitor metrics on an endpoint.

        • AlarmName (string) --

          The name of a CloudWatch alarm in your account.

type RetainDeploymentConfig

boolean

param RetainDeploymentConfig

Specifies whether to reuse the last deployment configuration. The default value is false (the configuration is not reused).

rtype

dict

returns

Response Syntax

{
    'EndpointArn': 'string'
}

Response Structure

  • (dict) --

    • EndpointArn (string) --

      The Amazon Resource Name (ARN) of the endpoint.