Amazon SageMaker Service

2018/04/30 - Amazon SageMaker Service - 4 updated api methods

Changes  SageMaker has added support for VPC configuration for both Endpoints and Training Jobs. This allows you to connect from the instances running the Endpoint or Training Job to your VPC and any resources reachable in the VPC rather than being restricted to resources that were internet accessible.

CreateModel (updated) Link ¶
Changes (request)
{'VpcConfig': {'SecurityGroupIds': ['string'], 'Subnets': ['string']}}

Creates a model in Amazon SageMaker. In the request, you name the model and describe one or more containers. For each container, you specify the docker image containing inference code, artifacts (from prior training), and custom environment map that the inference code uses when you deploy the model into production.

Use this API to create a model only if you want to use Amazon SageMaker hosting services. To host your model, you create an endpoint configuration with the CreateEndpointConfig API, and then create an endpoint with the CreateEndpoint API.

Amazon SageMaker then deploys all of the containers that you defined for the model in the hosting environment.

In the CreateModel request, you must define a container with the PrimaryContainer parameter.

In the request, you also provide an IAM role that Amazon SageMaker can assume to access model artifacts and docker image for deployment on ML compute hosting instances. In addition, you also use the IAM role to manage permissions the inference code needs. For example, if the inference code access any other AWS resources, you grant necessary permissions via this role.

See also: AWS API Documentation

Request Syntax

client.create_model(
    ModelName='string',
    PrimaryContainer={
        'ContainerHostname': 'string',
        'Image': 'string',
        'ModelDataUrl': 'string',
        'Environment': {
            'string': 'string'
        }
    },
    ExecutionRoleArn='string',
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ],
    VpcConfig={
        'SecurityGroupIds': [
            'string',
        ],
        'Subnets': [
            'string',
        ]
    }
)
type ModelName

string

param ModelName

[REQUIRED]

The name of the new model.

type PrimaryContainer

dict

param PrimaryContainer

[REQUIRED]

The location of the primary docker image containing inference code, associated artifacts, and custom environment map that the inference code uses when the model is deployed into production.

  • ContainerHostname (string) --

    The DNS host name for the container after Amazon SageMaker deploys it.

  • Image (string) -- [REQUIRED]

    The Amazon EC2 Container Registry (Amazon ECR) path where inference code is stored. If you are using your own custom algorithm instead of an algorithm provided by Amazon SageMaker, the inference code must meet Amazon SageMaker requirements. For more information, see Using Your Own Algorithms with Amazon SageMaker

  • ModelDataUrl (string) --

    The S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).

  • Environment (dict) --

    The environment variables to set in the Docker container. Each key and value in the Environment string to string map can have length of up to 1024. We support up to 16 entries in the map.

    • (string) --

      • (string) --

type ExecutionRoleArn

string

param ExecutionRoleArn

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role that Amazon SageMaker can assume to access model artifacts and docker image for deployment on ML compute instances. Deploying on ML compute instances is part of model hosting. For more information, see Amazon SageMaker Roles.

type Tags

list

param Tags

An array of key-value pairs. For more information, see Using Cost Allocation Tags in the AWS Billing and Cost Management User Guide .

  • (dict) --

    Describes a tag.

    • Key (string) -- [REQUIRED]

      The tag key.

    • Value (string) -- [REQUIRED]

      The tag value.

type VpcConfig

dict

param VpcConfig

A object that specifies the VPC that you want your model to connect to. Control access to and from your training container by configuring the VPC. For more information, see host-vpc.

  • SecurityGroupIds (list) -- [REQUIRED]

    The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.

    • (string) --

  • Subnets (list) -- [REQUIRED]

    The ID of the subnets in the VPC to which you want to connect your training job or model.

    • (string) --

rtype

dict

returns

Response Syntax

{
    'ModelArn': 'string'
}

Response Structure

  • (dict) --

    • ModelArn (string) --

      The ARN of the model created in Amazon SageMaker.

CreateTrainingJob (updated) Link ¶
Changes (request)
{'VpcConfig': {'SecurityGroupIds': ['string'], 'Subnets': ['string']}}

Starts a model training job. After training completes, Amazon SageMaker saves the resulting model artifacts to an Amazon S3 location that you specify.

If you choose to host your model using Amazon SageMaker hosting services, you can use the resulting model artifacts as part of the model. You can also use the artifacts in a deep learning service other than Amazon SageMaker, provided that you know how to use them for inferences.

In the request body, you provide the following:

  • AlgorithmSpecification - Identifies the training algorithm to use.

  • HyperParameters - Specify these algorithm-specific parameters to influence the quality of the final model. For a list of hyperparameters for each training algorithm provided by Amazon SageMaker, see Algorithms.

  • InputDataConfig - Describes the training dataset and the Amazon S3 location where it is stored.

  • OutputDataConfig - Identifies the Amazon S3 location where you want Amazon SageMaker to save the results of model training.

  • ResourceConfig - Identifies the resources, ML compute instances, and ML storage volumes to deploy for model training. In distributed training, you specify more than one instance.

  • RoleARN - The Amazon Resource Number (ARN) that Amazon SageMaker assumes to perform tasks on your behalf during model training. You must grant this role the necessary permissions so that Amazon SageMaker can successfully complete model training.

  • StoppingCondition - Sets a duration for training. Use this parameter to cap model training costs.

For more information about Amazon SageMaker, see How It Works.

See also: AWS API Documentation

Request Syntax

client.create_training_job(
    TrainingJobName='string',
    HyperParameters={
        'string': 'string'
    },
    AlgorithmSpecification={
        'TrainingImage': 'string',
        'TrainingInputMode': 'Pipe'|'File'
    },
    RoleArn='string',
    InputDataConfig=[
        {
            'ChannelName': 'string',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'ManifestFile'|'S3Prefix',
                    'S3Uri': 'string',
                    'S3DataDistributionType': 'FullyReplicated'|'ShardedByS3Key'
                }
            },
            'ContentType': 'string',
            'CompressionType': 'None'|'Gzip',
            'RecordWrapperType': 'None'|'RecordIO'
        },
    ],
    OutputDataConfig={
        'KmsKeyId': 'string',
        'S3OutputPath': 'string'
    },
    ResourceConfig={
        'InstanceType': 'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge',
        'InstanceCount': 123,
        'VolumeSizeInGB': 123,
        'VolumeKmsKeyId': 'string'
    },
    VpcConfig={
        'SecurityGroupIds': [
            'string',
        ],
        'Subnets': [
            'string',
        ]
    },
    StoppingCondition={
        'MaxRuntimeInSeconds': 123
    },
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
type TrainingJobName

string

param TrainingJobName

[REQUIRED]

The name of the training job. The name must be unique within an AWS Region in an AWS account. It appears in the Amazon SageMaker console.

type HyperParameters

dict

param HyperParameters

Algorithm-specific parameters. You set hyperparameters before you start the learning process. Hyperparameters influence the quality of the model. For a list of hyperparameters for each training algorithm provided by Amazon SageMaker, see Algorithms.

You can specify a maximum of 100 hyperparameters. Each hyperparameter is a key-value pair. Each key and value is limited to 256 characters, as specified by the Length Constraint .

  • (string) --

    • (string) --

type AlgorithmSpecification

dict

param AlgorithmSpecification

[REQUIRED]

The registry path of the Docker image that contains the training algorithm and algorithm-specific metadata, including the input mode. For more information about algorithms provided by Amazon SageMaker, see Algorithms. For information about providing your own algorithms, see your-algorithms.

  • TrainingImage (string) -- [REQUIRED]

    The registry path of the Docker image that contains the training algorithm. For information about docker registry paths for built-in algorithms, see sagemaker-algo-docker-registry-paths.

  • TrainingInputMode (string) -- [REQUIRED]

    The input mode that the algorithm supports. For the input modes that Amazon SageMaker algorithms support, see Algorithms. If an algorithm supports the File input mode, Amazon SageMaker downloads the training data from S3 to the provisioned ML storage Volume, and mounts the directory to docker volume for training container. If an algorithm supports the Pipe input mode, Amazon SageMaker streams data directly from S3 to the container.

    In File mode, make sure you provision ML storage volume with sufficient capacity to accommodate the data download from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container use ML storage volume to also store intermediate information, if any.

    For distributed algorithms using File mode, training data is distributed uniformly, and your training duration is predictable if the input data objects size is approximately same. Amazon SageMaker does not split the files any further for model training. If the object sizes are skewed, training won't be optimal as the data distribution is also skewed where one host in a training cluster is overloaded, thus becoming bottleneck in training.

type RoleArn

string

param RoleArn

[REQUIRED]

The Amazon Resource Name (ARN) of an IAM role that Amazon SageMaker can assume to perform tasks on your behalf.

During model training, Amazon SageMaker needs your permission to read input data from an S3 bucket, download a Docker image that contains training code, write model artifacts to an S3 bucket, write logs to Amazon CloudWatch Logs, and publish metrics to Amazon CloudWatch. You grant permissions for all of these tasks to an IAM role. For more information, see Amazon SageMaker Roles.

type InputDataConfig

list

param InputDataConfig

[REQUIRED]

An array of Channel objects. Each channel is a named input source. InputDataConfig describes the input data and its location.

Algorithms can accept input data from one or more channels. For example, an algorithm might have two channels of input data, training_data and validation_data . The configuration for each channel provides the S3 location where the input data is stored. It also provides information about the stored data: the MIME type, compression method, and whether the data is wrapped in RecordIO format.

Depending on the input mode that the algorithm supports, Amazon SageMaker either copies input data files from an S3 bucket to a local directory in the Docker container, or makes it available as input streams.

  • (dict) --

    A channel is a named input source that training algorithms can consume.

    • ChannelName (string) -- [REQUIRED]

      The name of the channel.

    • DataSource (dict) -- [REQUIRED]

      The location of the channel data.

      • S3DataSource (dict) -- [REQUIRED]

        The S3 location of the data source that is associated with a channel.

        • S3DataType (string) -- [REQUIRED]

          If you choose S3Prefix , S3Uri identifies a key name prefix. Amazon SageMaker uses all objects with the specified key name prefix for model training.

          If you choose ManifestFile , S3Uri identifies an object that is a manifest file containing a list of object keys that you want Amazon SageMaker to use for model training.

        • S3Uri (string) -- [REQUIRED]

          Depending on the value specified for the S3DataType , identifies either a key name prefix or a manifest. For example:

          • A key name prefix might look like this: s3://bucketname/exampleprefix .

          • A manifest might look like this: s3://bucketname/example.manifest The manifest is an S3 object which is a JSON file with the following format: [ {"prefix": "s3://customer_bucket/some/prefix/"}, "relative/path/to/custdata-1", "relative/path/custdata-2", ... ] The preceding JSON matches the following s3Uris : s3://customer_bucket/some/prefix/relative/path/to/custdata-1 s3://customer_bucket/some/prefix/relative/path/custdata-1 ... The complete set of s3uris in this manifest constitutes the input data for the channel for this datasource. The object that each s3uris points to must readable by the IAM role that Amazon SageMaker uses to perform tasks on your behalf.

        • S3DataDistributionType (string) --

          If you want Amazon SageMaker to replicate the entire dataset on each ML compute instance that is launched for model training, specify FullyReplicated .

          If you want Amazon SageMaker to replicate a subset of data on each ML compute instance that is launched for model training, specify ShardedByS3Key . If there are n ML compute instances launched for a training job, each instance gets approximately 1/n of the number of S3 objects. In this case, model training on each machine uses only the subset of training data.

          Don't choose more ML compute instances for training than available S3 objects. If you do, some nodes won't get any data and you will pay for nodes that aren't getting any training data. This applies in both FILE and PIPE modes. Keep this in mind when developing algorithms.

          In distributed training, where you use multiple ML compute EC2 instances, you might choose ShardedByS3Key . If the algorithm requires copying training data to the ML storage volume (when TrainingInputMode is set to File ), this copies 1/n of the number of objects.

    • ContentType (string) --

      The MIME type of the data.

    • CompressionType (string) --

      If training data is compressed, the compression type. The default value is None . CompressionType is used only in PIPE input mode. In FILE mode, leave this field unset or set it to None.

    • RecordWrapperType (string) --

      Specify RecordIO as the value when input data is in raw format but the training algorithm requires the RecordIO format, in which caseAmazon SageMaker wraps each individual S3 object in a RecordIO record. If the input data is already in RecordIO format, you don't need to set this attribute. For more information, see Create a Dataset Using RecordIO.

      In FILE mode, leave this field unset or set it to None.

type OutputDataConfig

dict

param OutputDataConfig

[REQUIRED]

Specifies the path to the S3 bucket where you want to store model artifacts. Amazon SageMaker creates subfolders for the artifacts.

  • KmsKeyId (string) --

    The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt the model artifacts at rest using Amazon S3 server-side encryption.

    Note

    If the configuration of the output S3 bucket requires server-side encryption for objects, and you don't provide the KMS key ID, Amazon SageMaker uses the default service key. For more information, see KMS-Managed Encryption Keys in Amazon Simple Storage Service developer guide.

    Note

    The KMS key policy must grant permission to the IAM role you specify in your CreateTrainingJob request. Using Key Policies in AWS KMS in the AWS Key Management Service Developer Guide.

  • S3OutputPath (string) -- [REQUIRED]

    Identifies the S3 path where you want Amazon SageMaker to store the model artifacts. For example, s3://bucket-name/key-name-prefix .

type ResourceConfig

dict

param ResourceConfig

[REQUIRED]

The resources, including the ML compute instances and ML storage volumes, to use for model training.

ML storage volumes store model artifacts and incremental states. Training algorithms might also use ML storage volumes for scratch space. If you want Amazon SageMaker to use the ML storage volume to store the training data, choose File as the TrainingInputMode in the algorithm specification. For distributed training algorithms, specify an instance count greater than 1.

  • InstanceType (string) -- [REQUIRED]

    The ML compute instance type.

  • InstanceCount (integer) -- [REQUIRED]

    The number of ML compute instances to use. For distributed training, provide a value greater than 1.

  • VolumeSizeInGB (integer) -- [REQUIRED]

    The size of the ML storage volume that you want to provision.

    ML storage volumes store model artifacts and incremental states. Training algorithms might also use the ML storage volume for scratch space. If you want to store the training data in the ML storage volume, choose File as the TrainingInputMode in the algorithm specification.

    You must specify sufficient ML storage for your scenario.

    Note

    Amazon SageMaker supports only the General Purpose SSD (gp2) ML storage volume type.

  • VolumeKmsKeyId (string) --

    The Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) that run the training job.

type VpcConfig

dict

param VpcConfig

A object that specifies the VPC that you want your training job to connect to. Control access to and from your training container by configuring the VPC. For more information, see train-vpc

  • SecurityGroupIds (list) -- [REQUIRED]

    The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.

    • (string) --

  • Subnets (list) -- [REQUIRED]

    The ID of the subnets in the VPC to which you want to connect your training job or model.

    • (string) --

type StoppingCondition

dict

param StoppingCondition

[REQUIRED]

Sets a duration for training. Use this parameter to cap model training costs. To stop a job, Amazon SageMaker sends the algorithm the SIGTERM signal, which delays job termination for 120 seconds. Algorithms might use this 120-second window to save the model artifacts.

When Amazon SageMaker terminates a job because the stopping condition has been met, training algorithms provided by Amazon SageMaker save the intermediate results of the job. This intermediate data is a valid model artifact. You can use it to create a model using the CreateModel API.

  • MaxRuntimeInSeconds (integer) --

    The maximum length of time, in seconds, that the training job can run. If model training does not complete during this time, Amazon SageMaker ends the job. If value is not specified, default value is 1 day. Maximum value is 5 days.

type Tags

list

param Tags

An array of key-value pairs. For more information, see Using Cost Allocation Tags in the AWS Billing and Cost Management User Guide .

  • (dict) --

    Describes a tag.

    • Key (string) -- [REQUIRED]

      The tag key.

    • Value (string) -- [REQUIRED]

      The tag value.

rtype

dict

returns

Response Syntax

{
    'TrainingJobArn': 'string'
}

Response Structure

  • (dict) --

    • TrainingJobArn (string) --

      The Amazon Resource Name (ARN) of the training job.

DescribeModel (updated) Link ¶
Changes (response)
{'VpcConfig': {'SecurityGroupIds': ['string'], 'Subnets': ['string']}}

Describes a model that you created using the CreateModel API.

See also: AWS API Documentation

Request Syntax

client.describe_model(
    ModelName='string'
)
type ModelName

string

param ModelName

[REQUIRED]

The name of the model.

rtype

dict

returns

Response Syntax

{
    'ModelName': 'string',
    'PrimaryContainer': {
        'ContainerHostname': 'string',
        'Image': 'string',
        'ModelDataUrl': 'string',
        'Environment': {
            'string': 'string'
        }
    },
    'ExecutionRoleArn': 'string',
    'VpcConfig': {
        'SecurityGroupIds': [
            'string',
        ],
        'Subnets': [
            'string',
        ]
    },
    'CreationTime': datetime(2015, 1, 1),
    'ModelArn': 'string'
}

Response Structure

  • (dict) --

    • ModelName (string) --

      Name of the Amazon SageMaker model.

    • PrimaryContainer (dict) --

      The location of the primary inference code, associated artifacts, and custom environment map that the inference code uses when it is deployed in production.

      • ContainerHostname (string) --

        The DNS host name for the container after Amazon SageMaker deploys it.

      • Image (string) --

        The Amazon EC2 Container Registry (Amazon ECR) path where inference code is stored. If you are using your own custom algorithm instead of an algorithm provided by Amazon SageMaker, the inference code must meet Amazon SageMaker requirements. For more information, see Using Your Own Algorithms with Amazon SageMaker

      • ModelDataUrl (string) --

        The S3 path where the model artifacts, which result from model training, are stored. This path must point to a single gzip compressed tar archive (.tar.gz suffix).

      • Environment (dict) --

        The environment variables to set in the Docker container. Each key and value in the Environment string to string map can have length of up to 1024. We support up to 16 entries in the map.

        • (string) --

          • (string) --

    • ExecutionRoleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role that you specified for the model.

    • VpcConfig (dict) --

      A object that specifies the VPC that this model has access to. For more information, see host-vpc

      • SecurityGroupIds (list) --

        The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.

        • (string) --

      • Subnets (list) --

        The ID of the subnets in the VPC to which you want to connect your training job or model.

        • (string) --

    • CreationTime (datetime) --

      A timestamp that shows when the model was created.

    • ModelArn (string) --

      The Amazon Resource Name (ARN) of the model.

DescribeTrainingJob (updated) Link ¶
Changes (response)
{'VpcConfig': {'SecurityGroupIds': ['string'], 'Subnets': ['string']}}

Returns information about a training job.

See also: AWS API Documentation

Request Syntax

client.describe_training_job(
    TrainingJobName='string'
)
type TrainingJobName

string

param TrainingJobName

[REQUIRED]

The name of the training job.

rtype

dict

returns

Response Syntax

{
    'TrainingJobName': 'string',
    'TrainingJobArn': 'string',
    'ModelArtifacts': {
        'S3ModelArtifacts': 'string'
    },
    'TrainingJobStatus': 'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped',
    'SecondaryStatus': 'Starting'|'Downloading'|'Training'|'Uploading'|'Stopping'|'Stopped'|'MaxRuntimeExceeded'|'Completed'|'Failed',
    'FailureReason': 'string',
    'HyperParameters': {
        'string': 'string'
    },
    'AlgorithmSpecification': {
        'TrainingImage': 'string',
        'TrainingInputMode': 'Pipe'|'File'
    },
    'RoleArn': 'string',
    'InputDataConfig': [
        {
            'ChannelName': 'string',
            'DataSource': {
                'S3DataSource': {
                    'S3DataType': 'ManifestFile'|'S3Prefix',
                    'S3Uri': 'string',
                    'S3DataDistributionType': 'FullyReplicated'|'ShardedByS3Key'
                }
            },
            'ContentType': 'string',
            'CompressionType': 'None'|'Gzip',
            'RecordWrapperType': 'None'|'RecordIO'
        },
    ],
    'OutputDataConfig': {
        'KmsKeyId': 'string',
        'S3OutputPath': 'string'
    },
    'ResourceConfig': {
        'InstanceType': 'ml.m4.xlarge'|'ml.m4.2xlarge'|'ml.m4.4xlarge'|'ml.m4.10xlarge'|'ml.m4.16xlarge'|'ml.m5.large'|'ml.m5.xlarge'|'ml.m5.2xlarge'|'ml.m5.4xlarge'|'ml.m5.12xlarge'|'ml.m5.24xlarge'|'ml.c4.xlarge'|'ml.c4.2xlarge'|'ml.c4.4xlarge'|'ml.c4.8xlarge'|'ml.p2.xlarge'|'ml.p2.8xlarge'|'ml.p2.16xlarge'|'ml.p3.2xlarge'|'ml.p3.8xlarge'|'ml.p3.16xlarge'|'ml.c5.xlarge'|'ml.c5.2xlarge'|'ml.c5.4xlarge'|'ml.c5.9xlarge'|'ml.c5.18xlarge',
        'InstanceCount': 123,
        'VolumeSizeInGB': 123,
        'VolumeKmsKeyId': 'string'
    },
    'VpcConfig': {
        'SecurityGroupIds': [
            'string',
        ],
        'Subnets': [
            'string',
        ]
    },
    'StoppingCondition': {
        'MaxRuntimeInSeconds': 123
    },
    'CreationTime': datetime(2015, 1, 1),
    'TrainingStartTime': datetime(2015, 1, 1),
    'TrainingEndTime': datetime(2015, 1, 1),
    'LastModifiedTime': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • TrainingJobName (string) --

      Name of the model training job.

    • TrainingJobArn (string) --

      The Amazon Resource Name (ARN) of the training job.

    • ModelArtifacts (dict) --

      Information about the Amazon S3 location that is configured for storing model artifacts.

      • S3ModelArtifacts (string) --

        The path of the S3 object that contains the model artifacts. For example, s3://bucket-name/keynameprefix/model.tar.gz .

    • TrainingJobStatus (string) --

      The status of the training job.

      For the InProgress status, Amazon SageMaker can return these secondary statuses:

      • Starting - Preparing for training.

      • Downloading - Optional stage for algorithms that support File training input mode. It indicates data is being downloaded to ML storage volumes.

      • Training - Training is in progress.

      • Uploading - Training is complete and model upload is in progress.

      For the Stopped training status, Amazon SageMaker can return these secondary statuses:

      • MaxRuntimeExceeded - Job stopped as a result of maximum allowed runtime exceeded.

    • SecondaryStatus (string) --

      Provides granular information about the system state. For more information, see TrainingJobStatus .

    • FailureReason (string) --

      If the training job failed, the reason it failed.

    • HyperParameters (dict) --

      Algorithm-specific parameters.

      • (string) --

        • (string) --

    • AlgorithmSpecification (dict) --

      Information about the algorithm used for training, and algorithm metadata.

      • TrainingImage (string) --

        The registry path of the Docker image that contains the training algorithm. For information about docker registry paths for built-in algorithms, see sagemaker-algo-docker-registry-paths.

      • TrainingInputMode (string) --

        The input mode that the algorithm supports. For the input modes that Amazon SageMaker algorithms support, see Algorithms. If an algorithm supports the File input mode, Amazon SageMaker downloads the training data from S3 to the provisioned ML storage Volume, and mounts the directory to docker volume for training container. If an algorithm supports the Pipe input mode, Amazon SageMaker streams data directly from S3 to the container.

        In File mode, make sure you provision ML storage volume with sufficient capacity to accommodate the data download from S3. In addition to the training data, the ML storage volume also stores the output model. The algorithm container use ML storage volume to also store intermediate information, if any.

        For distributed algorithms using File mode, training data is distributed uniformly, and your training duration is predictable if the input data objects size is approximately same. Amazon SageMaker does not split the files any further for model training. If the object sizes are skewed, training won't be optimal as the data distribution is also skewed where one host in a training cluster is overloaded, thus becoming bottleneck in training.

    • RoleArn (string) --

      The AWS Identity and Access Management (IAM) role configured for the training job.

    • InputDataConfig (list) --

      An array of Channel objects that describes each data input channel.

      • (dict) --

        A channel is a named input source that training algorithms can consume.

        • ChannelName (string) --

          The name of the channel.

        • DataSource (dict) --

          The location of the channel data.

          • S3DataSource (dict) --

            The S3 location of the data source that is associated with a channel.

            • S3DataType (string) --

              If you choose S3Prefix , S3Uri identifies a key name prefix. Amazon SageMaker uses all objects with the specified key name prefix for model training.

              If you choose ManifestFile , S3Uri identifies an object that is a manifest file containing a list of object keys that you want Amazon SageMaker to use for model training.

            • S3Uri (string) --

              Depending on the value specified for the S3DataType , identifies either a key name prefix or a manifest. For example:

              • A key name prefix might look like this: s3://bucketname/exampleprefix .

              • A manifest might look like this: s3://bucketname/example.manifest The manifest is an S3 object which is a JSON file with the following format: [ {"prefix": "s3://customer_bucket/some/prefix/"}, "relative/path/to/custdata-1", "relative/path/custdata-2", ... ] The preceding JSON matches the following s3Uris : s3://customer_bucket/some/prefix/relative/path/to/custdata-1 s3://customer_bucket/some/prefix/relative/path/custdata-1 ... The complete set of s3uris in this manifest constitutes the input data for the channel for this datasource. The object that each s3uris points to must readable by the IAM role that Amazon SageMaker uses to perform tasks on your behalf.

            • S3DataDistributionType (string) --

              If you want Amazon SageMaker to replicate the entire dataset on each ML compute instance that is launched for model training, specify FullyReplicated .

              If you want Amazon SageMaker to replicate a subset of data on each ML compute instance that is launched for model training, specify ShardedByS3Key . If there are n ML compute instances launched for a training job, each instance gets approximately 1/n of the number of S3 objects. In this case, model training on each machine uses only the subset of training data.

              Don't choose more ML compute instances for training than available S3 objects. If you do, some nodes won't get any data and you will pay for nodes that aren't getting any training data. This applies in both FILE and PIPE modes. Keep this in mind when developing algorithms.

              In distributed training, where you use multiple ML compute EC2 instances, you might choose ShardedByS3Key . If the algorithm requires copying training data to the ML storage volume (when TrainingInputMode is set to File ), this copies 1/n of the number of objects.

        • ContentType (string) --

          The MIME type of the data.

        • CompressionType (string) --

          If training data is compressed, the compression type. The default value is None . CompressionType is used only in PIPE input mode. In FILE mode, leave this field unset or set it to None.

        • RecordWrapperType (string) --

          Specify RecordIO as the value when input data is in raw format but the training algorithm requires the RecordIO format, in which caseAmazon SageMaker wraps each individual S3 object in a RecordIO record. If the input data is already in RecordIO format, you don't need to set this attribute. For more information, see Create a Dataset Using RecordIO.

          In FILE mode, leave this field unset or set it to None.

    • OutputDataConfig (dict) --

      The S3 path where model artifacts that you configured when creating the job are stored. Amazon SageMaker creates subfolders for model artifacts.

      • KmsKeyId (string) --

        The AWS Key Management Service (AWS KMS) key that Amazon SageMaker uses to encrypt the model artifacts at rest using Amazon S3 server-side encryption.

        Note

        If the configuration of the output S3 bucket requires server-side encryption for objects, and you don't provide the KMS key ID, Amazon SageMaker uses the default service key. For more information, see KMS-Managed Encryption Keys in Amazon Simple Storage Service developer guide.

        Note

        The KMS key policy must grant permission to the IAM role you specify in your CreateTrainingJob request. Using Key Policies in AWS KMS in the AWS Key Management Service Developer Guide.

      • S3OutputPath (string) --

        Identifies the S3 path where you want Amazon SageMaker to store the model artifacts. For example, s3://bucket-name/key-name-prefix .

    • ResourceConfig (dict) --

      Resources, including ML compute instances and ML storage volumes, that are configured for model training.

      • InstanceType (string) --

        The ML compute instance type.

      • InstanceCount (integer) --

        The number of ML compute instances to use. For distributed training, provide a value greater than 1.

      • VolumeSizeInGB (integer) --

        The size of the ML storage volume that you want to provision.

        ML storage volumes store model artifacts and incremental states. Training algorithms might also use the ML storage volume for scratch space. If you want to store the training data in the ML storage volume, choose File as the TrainingInputMode in the algorithm specification.

        You must specify sufficient ML storage for your scenario.

        Note

        Amazon SageMaker supports only the General Purpose SSD (gp2) ML storage volume type.

      • VolumeKmsKeyId (string) --

        The Amazon Resource Name (ARN) of a AWS Key Management Service key that Amazon SageMaker uses to encrypt data on the storage volume attached to the ML compute instance(s) that run the training job.

    • VpcConfig (dict) --

      A object that specifies the VPC that this training job has access to. For more information, see train-vpc.

      • SecurityGroupIds (list) --

        The VPC security group IDs, in the form sg-xxxxxxxx. Specify the security groups for the VPC that is specified in the Subnets field.

        • (string) --

      • Subnets (list) --

        The ID of the subnets in the VPC to which you want to connect your training job or model.

        • (string) --

    • StoppingCondition (dict) --

      The condition under which to stop the training job.

      • MaxRuntimeInSeconds (integer) --

        The maximum length of time, in seconds, that the training job can run. If model training does not complete during this time, Amazon SageMaker ends the job. If value is not specified, default value is 1 day. Maximum value is 5 days.

    • CreationTime (datetime) --

      A timestamp that indicates when the training job was created.

    • TrainingStartTime (datetime) --

      A timestamp that indicates when training started.

    • TrainingEndTime (datetime) --

      A timestamp that indicates when model training ended.

    • LastModifiedTime (datetime) --

      A timestamp that indicates when the status of the training job was last modified.