Amazon SageMaker Service

2018/07/17 - Amazon SageMaker Service - 4 new api methods

Changes  Amazon SageMaker has added the capability for customers to run fully-managed, high-throughput batch transform machine learning models with a simple API call. Batch Transform is ideal for high-throughput workloads and predictions in non-real-time scenarios where data is accumulated over a period of time for offline processing.

DescribeTransformJob (new) Link ¶

Returns information about a transform job.

See also: AWS API Documentation

Request Syntax

client.describe_transform_job(
    TransformJobName='string'
)
type TransformJobName

string

param TransformJobName

[REQUIRED]

The name of the transform job that you want to view details of.

rtype

dict

returns

Response Syntax

{
    'TransformJobName': 'string',
    'TransformJobArn': 'string',
    'TransformJobStatus': 'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped',
    'FailureReason': 'string',
    'ModelName': 'string',
    'MaxConcurrentTransforms': 123,
    'MaxPayloadInMB': 123,
    'BatchStrategy': 'MultiRecord'|'SingleRecord',
    'Environment': {
        'string': 'string'
    },
    'TransformInput': {
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'ManifestFile'|'S3Prefix',
                'S3Uri': 'string'
            }
        },
        'ContentType': 'string',
        'CompressionType': 'None'|'Gzip',
        'SplitType': 'None'|'Line'|'RecordIO'
    },
    'TransformOutput': {
        'S3OutputPath': 'string',
        'Accept': 'string',
        'AssembleWith': 'None'|'Line',
        'KmsKeyId': 'string'
    },
    'TransformResources': {
        'InstanceType': 'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge',
        'InstanceCount': 123
    },
    'CreationTime': datetime(2015, 1, 1),
    'TransformStartTime': datetime(2015, 1, 1),
    'TransformEndTime': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • TransformJobName (string) --

      The name of the transform job.

    • TransformJobArn (string) --

      The Amazon Resource Name (ARN) of the transform job.

    • TransformJobStatus (string) --

      The status of the transform job. If the transform job failed, the reason is returned in the FailureReason field.

    • FailureReason (string) --

      If the transform job failed, the reason that it failed.

    • ModelName (string) --

      The name of the model used in the transform job.

    • MaxConcurrentTransforms (integer) --

      The maximum number of parallel requests on each instance node that can be launched in a transform job. The default value is 1.

    • MaxPayloadInMB (integer) --

      The maximum payload size , in MB used in the transform job.

    • BatchStrategy (string) --

      SingleRecord means only one record was used per a batch. <code>MultiRecord</code> means batches contained as many records that could possibly fit within the MaxPayloadInMB limit.

    • Environment (dict) --

      • (string) --

        • (string) --

    • TransformInput (dict) --

      Describes the dataset to be transformed and the Amazon S3 location where it is stored.

      • DataSource (dict) --

        Describes the location of the channel data, meaning the S3 location of the input data that the model can consume.

        • S3DataSource (dict) --

          The S3 location of the data source that is associated with a channel.

          • S3DataType (string) --

            If you choose S3Prefix , S3Uri identifies a key name prefix. Amazon SageMaker uses all objects with the specified key name prefix for batch transform.

            If you choose ManifestFile , S3Uri identifies an object that is a manifest file containing a list of object keys that you want Amazon SageMaker to use for batch transform.

          • S3Uri (string) --

            Depending on the value specified for the S3DataType , identifies either a key name prefix or a manifest. For example:

            • A key name prefix might look like this: s3://bucketname/exampleprefix .

            • A manifest might look like this: s3://bucketname/example.manifest The manifest is an S3 object which is a JSON file with the following format: [ {"prefix": "s3://customer_bucket/some/prefix/"}, "relative/path/to/custdata-1", "relative/path/custdata-2", ... ] The preceding JSON matches the following S3Uris : s3://customer_bucket/some/prefix/relative/path/to/custdata-1 s3://customer_bucket/some/prefix/relative/path/custdata-1 ... The complete set of S3Uris in this manifest constitutes the input data for the channel for this datasource. The object that each S3Uris points to must be readable by the IAM role that Amazon SageMaker uses to perform tasks on your behalf.

      • ContentType (string) --

        The multipurpose internet mail extension (MIME) type of the data. Amazon SageMaker uses the MIME type with each http call to transfer data to the transform job.

      • CompressionType (string) --

        Compressing data helps save on storage space. If your transform data is compressed, specify the compression type.and Amazon SageMaker will automatically decompress the data for the transform job accordingly. The default value is None .

      • SplitType (string) --

        The method to use to split the transform job's data into smaller batches. The default value is None . If you don't want to split the data, specify ( None ). If you want to split records on a newline character boundary, specify Line . To split records according to the RecordIO format, specify RecordIO .

        Amazon SageMaker will send maximum number of records per batch in each request up to the MaxPayloadInMB limit. For more information, see RecordIO data format.

        Note

        For information about the RecordIO format, see Data Format.

    • TransformOutput (dict) --

      Identifies the Amazon S3 location where you want Amazon SageMaker to save the results from the transform job.

      • S3OutputPath (string) --

        The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job. For example, s3://bucket-name/key-name-prefix .

        For every S3 object used as input for the transform job, the transformed data is stored in a corresponding subfolder in the location under the output prefix.For example, the input data s3://bucket-name/input-name-prefix/dataset01/data.csv will have the transformed data stored at s3://bucket-name/key-name-prefix/dataset01/ , based on the original name, as a series of .part files (.part0001, part0002, etc).

      • Accept (string) --

        The MIME type used to specify the output data. Amazon SageMaker uses the MIME type with each http call to transfer data from the transform job.

      • AssembleWith (string) --

        Defines how to assemble the results of the transform job as a single S3 object. You should select a format that is most convienant to you. To concatenate the results in binary format, specify None . To add a newline character at the end of every transformed record, specify Line . To assemble the output in RecordIO format, specify RecordIO . The default value is None .

        For information about the RecordIO format, see Data Format.

      • KmsKeyId (string) --

        The AWS Key Management Service (AWS KMS) key for Amazon S3 server-side encryption that Amazon SageMaker uses to encrypt the transformed data.

        If you don't provide a KMS key ID, Amazon SageMaker uses the default KMS key for Amazon S3 for your role's account. For more information, see KMS-Managed Encryption Keys in the Amazon Simple Storage Service Developer Guide.

        The KMS key policy must grant permission to the IAM role that you specify in your CreateTramsformJob request. For more information, see Using Key Policies in AWS KMS in the AWS Key Management Service Developer Guide .

    • TransformResources (dict) --

      Describes the resources, including ML instance types and ML instance count, to use for the transform job.

      • InstanceType (string) --

        The ML compute instance type for the transform job. For using built-in algorithms to transform moderately sized datasets, ml.m4.xlarge or ml.m5.large should suffice. There is no default value for InstanceType .

      • InstanceCount (integer) --

        The number of ML compute instances to use in the transform job. For distributed transform, provide a value greater than 1. The default value is 1 .

    • CreationTime (datetime) --

      A timestamp that shows when the transform Job was created.

    • TransformStartTime (datetime) --

      Indicates when the transform job starts on ML instances. You are billed for the time interval between this time and the value of TransformEndTime .

    • TransformEndTime (datetime) --

      Indicates when the transform job is Completed , Stopped , or Failed . You are billed for the time interval between this time and the value of TransformStartTime .

CreateTransformJob (new) Link ¶

Starts a transform job. After the results are obtained, Amazon SageMaker saves them to an Amazon S3 location that you specify.

To perform batch transformations, you create a transform job and use the data that you have readily available.

In the request body, you provide the following:

  • TransformJobName - Identifies the transform job. The name must be unique within an AWS Region in an AWS account.

  • ModelName - Identifies the model to use.

  • TransformInput - Describes the dataset to be transformed and the Amazon S3 location where it is stored.

  • TransformOutput - Identifies the Amazon S3 location where you want Amazon SageMaker to save the results from the transform job.

  • TransformResources - Identifies the ML compute instances for the transform job.

For more information about how batch transformation works Amazon SageMaker, see How It Works.

See also: AWS API Documentation

Request Syntax

client.create_transform_job(
    TransformJobName='string',
    ModelName='string',
    MaxConcurrentTransforms=123,
    MaxPayloadInMB=123,
    BatchStrategy='MultiRecord'|'SingleRecord',
    Environment={
        'string': 'string'
    },
    TransformInput={
        'DataSource': {
            'S3DataSource': {
                'S3DataType': 'ManifestFile'|'S3Prefix',
                'S3Uri': 'string'
            }
        },
        'ContentType': 'string',
        'CompressionType': 'None'|'Gzip',
        'SplitType': 'None'|'Line'|'RecordIO'
    },
    TransformOutput={
        'S3OutputPath': 'string',
        'Accept': 'string',
        'AssembleWith': 'None'|'Line',
        'KmsKeyId': 'string'
    },
    TransformResources={
        'InstanceType': 'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge',
        'InstanceCount': 123
    },
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
type TransformJobName

string

param TransformJobName

[REQUIRED]

The name of the transform job. The name must be unique within an AWS Region in an AWS account.

type ModelName

string

param ModelName

[REQUIRED]

The name of the model that you want to use for the transform job.

type MaxConcurrentTransforms

integer

param MaxConcurrentTransforms

The maximum number of parallel requests on each instance node that can be launched in a transform job. The default value is 1 . To allow Amazon SageMaker to determine the appropriate number for MaxConcurrentTransforms , set the value to 0 .

type MaxPayloadInMB

integer

param MaxPayloadInMB

The maximum payload size allowed, in MB. A payload is the data portion of a record (without metadata). The value in MaxPayloadInMB must be greater than the size of a single record.You can approximate the size of a record by dividing the size of your dataset by the number of records. The value you enter should be proportional to the number of records you want per batch. It is recommended to enter a slightly higher value to ensure the records will fit within the maximum payload size. The default value is 6 MB. For an unlimited payload size, set the value to 0 .

type BatchStrategy

string

param BatchStrategy

Determins the number of records included in a single batch. SingleRecord means only one record is used per batch. MultiRecord means a batch is set to contain as many records that could possibly fit within the MaxPayloadInMB limit.

type Environment

dict

param Environment

The environment variables to set in the Docker container. We support up to 16 key and values entries in the map.

  • (string) --

    • (string) --

type TransformInput

dict

param TransformInput

[REQUIRED]

Describes the input source and the way the transform job consumes it.

  • DataSource (dict) -- [REQUIRED]

    Describes the location of the channel data, meaning the S3 location of the input data that the model can consume.

    • S3DataSource (dict) -- [REQUIRED]

      The S3 location of the data source that is associated with a channel.

      • S3DataType (string) -- [REQUIRED]

        If you choose S3Prefix , S3Uri identifies a key name prefix. Amazon SageMaker uses all objects with the specified key name prefix for batch transform.

        If you choose ManifestFile , S3Uri identifies an object that is a manifest file containing a list of object keys that you want Amazon SageMaker to use for batch transform.

      • S3Uri (string) -- [REQUIRED]

        Depending on the value specified for the S3DataType , identifies either a key name prefix or a manifest. For example:

        • A key name prefix might look like this: s3://bucketname/exampleprefix .

        • A manifest might look like this: s3://bucketname/example.manifest The manifest is an S3 object which is a JSON file with the following format: [ {"prefix": "s3://customer_bucket/some/prefix/"}, "relative/path/to/custdata-1", "relative/path/custdata-2", ... ] The preceding JSON matches the following S3Uris : s3://customer_bucket/some/prefix/relative/path/to/custdata-1 s3://customer_bucket/some/prefix/relative/path/custdata-1 ... The complete set of S3Uris in this manifest constitutes the input data for the channel for this datasource. The object that each S3Uris points to must be readable by the IAM role that Amazon SageMaker uses to perform tasks on your behalf.

  • ContentType (string) --

    The multipurpose internet mail extension (MIME) type of the data. Amazon SageMaker uses the MIME type with each http call to transfer data to the transform job.

  • CompressionType (string) --

    Compressing data helps save on storage space. If your transform data is compressed, specify the compression type.and Amazon SageMaker will automatically decompress the data for the transform job accordingly. The default value is None .

  • SplitType (string) --

    The method to use to split the transform job's data into smaller batches. The default value is None . If you don't want to split the data, specify ( None ). If you want to split records on a newline character boundary, specify Line . To split records according to the RecordIO format, specify RecordIO .

    Amazon SageMaker will send maximum number of records per batch in each request up to the MaxPayloadInMB limit. For more information, see RecordIO data format.

    Note

    For information about the RecordIO format, see Data Format.

type TransformOutput

dict

param TransformOutput

[REQUIRED]

Describes the results of the transform job.

  • S3OutputPath (string) -- [REQUIRED]

    The Amazon S3 path where you want Amazon SageMaker to store the results of the transform job. For example, s3://bucket-name/key-name-prefix .

    For every S3 object used as input for the transform job, the transformed data is stored in a corresponding subfolder in the location under the output prefix.For example, the input data s3://bucket-name/input-name-prefix/dataset01/data.csv will have the transformed data stored at s3://bucket-name/key-name-prefix/dataset01/ , based on the original name, as a series of .part files (.part0001, part0002, etc).

  • Accept (string) --

    The MIME type used to specify the output data. Amazon SageMaker uses the MIME type with each http call to transfer data from the transform job.

  • AssembleWith (string) --

    Defines how to assemble the results of the transform job as a single S3 object. You should select a format that is most convienant to you. To concatenate the results in binary format, specify None . To add a newline character at the end of every transformed record, specify Line . To assemble the output in RecordIO format, specify RecordIO . The default value is None .

    For information about the RecordIO format, see Data Format.

  • KmsKeyId (string) --

    The AWS Key Management Service (AWS KMS) key for Amazon S3 server-side encryption that Amazon SageMaker uses to encrypt the transformed data.

    If you don't provide a KMS key ID, Amazon SageMaker uses the default KMS key for Amazon S3 for your role's account. For more information, see KMS-Managed Encryption Keys in the Amazon Simple Storage Service Developer Guide.

    The KMS key policy must grant permission to the IAM role that you specify in your CreateTramsformJob request. For more information, see Using Key Policies in AWS KMS in the AWS Key Management Service Developer Guide .

type TransformResources

dict

param TransformResources

[REQUIRED]

Describes the resources, including ML instance types and ML instance count, to use for the transform job.

  • InstanceType (string) -- [REQUIRED]

    The ML compute instance type for the transform job. For using built-in algorithms to transform moderately sized datasets, ml.m4.xlarge or ml.m5.large should suffice. There is no default value for InstanceType .

  • InstanceCount (integer) -- [REQUIRED]

    The number of ML compute instances to use in the transform job. For distributed transform, provide a value greater than 1. The default value is 1 .

type Tags

list

param Tags

An array of key-value pairs. Adding tags is optional. For more information, see Using Cost Allocation Tags in the AWS Billing and Cost Management User Guide .

  • (dict) --

    Describes a tag.

    • Key (string) -- [REQUIRED]

      The tag key.

    • Value (string) -- [REQUIRED]

      The tag value.

rtype

dict

returns

Response Syntax

{
    'TransformJobArn': 'string'
}

Response Structure

  • (dict) --

    • TransformJobArn (string) --

      The Amazon Resource Name (ARN) of the transform job.

ListTransformJobs (new) Link ¶

Lists transform jobs.

See also: AWS API Documentation

Request Syntax

client.list_transform_jobs(
    CreationTimeAfter=datetime(2015, 1, 1),
    CreationTimeBefore=datetime(2015, 1, 1),
    LastModifiedTimeAfter=datetime(2015, 1, 1),
    LastModifiedTimeBefore=datetime(2015, 1, 1),
    NameContains='string',
    StatusEquals='InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped',
    SortBy='Name'|'CreationTime'|'Status',
    SortOrder='Ascending'|'Descending',
    NextToken='string',
    MaxResults=123
)
type CreationTimeAfter

datetime

param CreationTimeAfter

A filter that returns only transform jobs created after the specified time.

type CreationTimeBefore

datetime

param CreationTimeBefore

A filter that returns only transform jobs created before the specified time.

type LastModifiedTimeAfter

datetime

param LastModifiedTimeAfter

A filter that returns only transform jobs modified after the specified time.

type LastModifiedTimeBefore

datetime

param LastModifiedTimeBefore

A filter that returns only transform jobs modified before the specified time.

type NameContains

string

param NameContains

A string in the transform job name. This filter returns only transform jobs whose name contains the specified string.

type StatusEquals

string

param StatusEquals

A filter that retrieves only transform jobs with a specific status.

type SortBy

string

param SortBy

The field to sort results by. The default is CreationTime .

type SortOrder

string

param SortOrder

The sort order for results. The default is Descending .

type NextToken

string

param NextToken

If the result of the previous ListTransformJobs request was truncated, the response includes a NextToken . To retrieve the next set of transform jobs, use the token in the next request.

type MaxResults

integer

param MaxResults

The maximum number of transform jobs to return in the response. The default value is 10 .

rtype

dict

returns

Response Syntax

{
    'TransformJobSummaries': [
        {
            'TransformJobName': 'string',
            'TransformJobArn': 'string',
            'CreationTime': datetime(2015, 1, 1),
            'TransformEndTime': datetime(2015, 1, 1),
            'LastModifiedTime': datetime(2015, 1, 1),
            'TransformJobStatus': 'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped',
            'FailureReason': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • TransformJobSummaries (list) --

      An array of TransformJobSummary objects.

      • (dict) --

        Provides a summary information for a transform job. Multiple TransformJobSummary objects are returned as a list after calling ListTransformJobs.

        • TransformJobName (string) --

          The name of the transform job.

        • TransformJobArn (string) --

          The Amazon Resource Name (ARN) of the transform job.

        • CreationTime (datetime) --

          A timestamp that shows when the transform Job was created.

        • TransformEndTime (datetime) --

          Indicates when the transform job ends on compute instances. For successful jobs and stopped jobs, this is the exact time recorded after the results are uploaded. For failed jobs, this is when Amazon SageMaker detected that the job failed.

        • LastModifiedTime (datetime) --

          Indicates when the transform job was last modified.

        • TransformJobStatus (string) --

          The status of the transform job.

        • FailureReason (string) --

          If the transform job failed, the reason it failed.

    • NextToken (string) --

      If the response is truncated, Amazon SageMaker returns this token. To retrieve the next set of transform jobs, use it in the next request.

StopTransformJob (new) Link ¶

Stops a transform job.

When Amazon SageMaker receives a StopTransformJob request, the status of the job changes to Stopping . After Amazon SageMaker stops the job, the status is set to Stopped . When you stop a transform job before it is completed, Amazon SageMaker doesn't store the job's output in Amazon S3.

See also: AWS API Documentation

Request Syntax

client.stop_transform_job(
    TransformJobName='string'
)
type TransformJobName

string

param TransformJobName

[REQUIRED]

The name of the transform job to stop.

returns

None