Amazon Forecast Service

2020/12/08 - Amazon Forecast Service - 4 updated api methods

Changes  This release adds support for the Amazon Forecast Weather Index which can increase forecasting accuracy by automatically including weather forecasts in demand forecasts.

CreateDataset (updated) Link ¶
Changes (request)
{'Schema': {'Attributes': {'AttributeType': {'geolocation'}}}}

Creates an Amazon Forecast dataset. The information about the dataset that you provide helps Forecast understand how to consume the data for model training. This includes the following:

  • DataFrequency - How frequently your historical time-series data is collected.

  • Domain and DatasetType - Each dataset has an associated dataset domain and a type within the domain. Amazon Forecast provides a list of predefined domains and types within each domain. For each unique dataset domain and type within the domain, Amazon Forecast requires your data to include a minimum set of predefined fields.

  • Schema - A schema specifies the fields in the dataset, including the field name and data type.

After creating a dataset, you import your training data into it and add the dataset to a dataset group. You use the dataset group to create a predictor. For more information, see howitworks-datasets-groups.

To get a list of all your datasets, use the ListDatasets operation.

For example Forecast datasets, see the Amazon Forecast Sample GitHub repository.

Note

The Status of a dataset must be ACTIVE before you can import training data. Use the DescribeDataset operation to get the status.

See also: AWS API Documentation

Request Syntax

client.create_dataset(
    DatasetName='string',
    Domain='RETAIL'|'CUSTOM'|'INVENTORY_PLANNING'|'EC2_CAPACITY'|'WORK_FORCE'|'WEB_TRAFFIC'|'METRICS',
    DatasetType='TARGET_TIME_SERIES'|'RELATED_TIME_SERIES'|'ITEM_METADATA',
    DataFrequency='string',
    Schema={
        'Attributes': [
            {
                'AttributeName': 'string',
                'AttributeType': 'string'|'integer'|'float'|'timestamp'|'geolocation'
            },
        ]
    },
    EncryptionConfig={
        'RoleArn': 'string',
        'KMSKeyArn': 'string'
    },
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
type DatasetName

string

param DatasetName

[REQUIRED]

A name for the dataset.

type Domain

string

param Domain

[REQUIRED]

The domain associated with the dataset. When you add a dataset to a dataset group, this value and the value specified for the Domain parameter of the CreateDatasetGroup operation must match.

The Domain and DatasetType that you choose determine the fields that must be present in the training data that you import to the dataset. For example, if you choose the RETAIL domain and TARGET_TIME_SERIES as the DatasetType , Amazon Forecast requires item_id , timestamp , and demand fields to be present in your data. For more information, see howitworks-datasets-groups.

type DatasetType

string

param DatasetType

[REQUIRED]

The dataset type. Valid values depend on the chosen Domain .

type DataFrequency

string

param DataFrequency

The frequency of data collection. This parameter is required for RELATED_TIME_SERIES datasets.

Valid intervals are Y (Year), M (Month), W (Week), D (Day), H (Hour), 30min (30 minutes), 15min (15 minutes), 10min (10 minutes), 5min (5 minutes), and 1min (1 minute). For example, "D" indicates every day and "15min" indicates every 15 minutes.

type Schema

dict

param Schema

[REQUIRED]

The schema for the dataset. The schema attributes and their order must match the fields in your data. The dataset Domain and DatasetType that you choose determine the minimum required fields in your training data. For information about the required fields for a specific dataset domain and type, see howitworks-domains-ds-types.

  • Attributes (list) --

    An array of attributes specifying the name and type of each field in a dataset.

    • (dict) --

      An attribute of a schema, which defines a dataset field. A schema attribute is required for every field in a dataset. The Schema object contains an array of SchemaAttribute objects.

      • AttributeName (string) --

        The name of the dataset field.

      • AttributeType (string) --

        The data type of the field.

type EncryptionConfig

dict

param EncryptionConfig

An AWS Key Management Service (KMS) key and the AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the key.

  • RoleArn (string) -- [REQUIRED]

    The ARN of the IAM role that Amazon Forecast can assume to access the AWS KMS key.

    Passing a role across AWS accounts is not allowed. If you pass a role that isn't in your account, you get an InvalidInputException error.

  • KMSKeyArn (string) -- [REQUIRED]

    The Amazon Resource Name (ARN) of the KMS key.

type Tags

list

param Tags

The optional metadata that you apply to the dataset to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.

The following basic restrictions apply to tags:

  • Maximum number of tags per resource - 50.

  • For each resource, each tag key must be unique, and each tag key can have only one value.

  • Maximum key length - 128 Unicode characters in UTF-8.

  • Maximum value length - 256 Unicode characters in UTF-8.

  • If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.

  • Tag keys and values are case sensitive.

  • Do not use aws: , AWS: , or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Forecast considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.

  • (dict) --

    The optional metadata that you apply to a resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.

    The following basic restrictions apply to tags:

    • Maximum number of tags per resource - 50.

    • For each resource, each tag key must be unique, and each tag key can have only one value.

    • Maximum key length - 128 Unicode characters in UTF-8.

    • Maximum value length - 256 Unicode characters in UTF-8.

    • If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.

    • Tag keys and values are case sensitive.

    • Do not use aws: , AWS: , or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Forecast considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.

    • Key (string) -- [REQUIRED]

      One part of a key-value pair that makes up a tag. A key is a general label that acts like a category for more specific tag values.

    • Value (string) -- [REQUIRED]

      The optional part of a key-value pair that makes up a tag. A value acts as a descriptor within a tag category (key).

rtype

dict

returns

Response Syntax

{
    'DatasetArn': 'string'
}

Response Structure

  • (dict) --

    • DatasetArn (string) --

      The Amazon Resource Name (ARN) of the dataset.

CreateDatasetImportJob (updated) Link ¶
Changes (request)
{'GeolocationFormat': 'string',
 'TimeZone': 'string',
 'UseGeolocationForTimeZone': 'boolean'}

Imports your training data to an Amazon Forecast dataset. You provide the location of your training data in an Amazon Simple Storage Service (Amazon S3) bucket and the Amazon Resource Name (ARN) of the dataset that you want to import the data to.

You must specify a DataSource object that includes an AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the data, as Amazon Forecast makes a copy of your data and processes it in an internal AWS system. For more information, see aws-forecast-iam-roles.

The training data must be in CSV format. The delimiter must be a comma (,).

You can specify the path to a specific CSV file, the S3 bucket, or to a folder in the S3 bucket. For the latter two cases, Amazon Forecast imports all files up to the limit of 10,000 files.

Because dataset imports are not aggregated, your most recent dataset import is the one that is used when training a predictor or generating a forecast. Make sure that your most recent dataset import contains all of the data you want to model off of, and not just the new data collected since the previous import.

To get a list of all your dataset import jobs, filtered by specified criteria, use the ListDatasetImportJobs operation.

See also: AWS API Documentation

Request Syntax

client.create_dataset_import_job(
    DatasetImportJobName='string',
    DatasetArn='string',
    DataSource={
        'S3Config': {
            'Path': 'string',
            'RoleArn': 'string',
            'KMSKeyArn': 'string'
        }
    },
    TimestampFormat='string',
    TimeZone='string',
    UseGeolocationForTimeZone=True|False,
    GeolocationFormat='string',
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ]
)
type DatasetImportJobName

string

param DatasetImportJobName

[REQUIRED]

The name for the dataset import job. We recommend including the current timestamp in the name, for example, 20190721DatasetImport . This can help you avoid getting a ResourceAlreadyExistsException exception.

type DatasetArn

string

param DatasetArn

[REQUIRED]

The Amazon Resource Name (ARN) of the Amazon Forecast dataset that you want to import data to.

type DataSource

dict

param DataSource

[REQUIRED]

The location of the training data to import and an AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the data. The training data must be stored in an Amazon S3 bucket.

If encryption is used, DataSource must include an AWS Key Management Service (KMS) key and the IAM role must allow Amazon Forecast permission to access the key. The KMS key and IAM role must match those specified in the EncryptionConfig parameter of the CreateDataset operation.

  • S3Config (dict) -- [REQUIRED]

    The path to the training data stored in an Amazon Simple Storage Service (Amazon S3) bucket along with the credentials to access the data.

    • Path (string) -- [REQUIRED]

      The path to an Amazon Simple Storage Service (Amazon S3) bucket or file(s) in an Amazon S3 bucket.

    • RoleArn (string) -- [REQUIRED]

      The ARN of the AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the Amazon S3 bucket or files. If you provide a value for the KMSKeyArn key, the role must allow access to the key.

      Passing a role across AWS accounts is not allowed. If you pass a role that isn't in your account, you get an InvalidInputException error.

    • KMSKeyArn (string) --

      The Amazon Resource Name (ARN) of an AWS Key Management Service (KMS) key.

type TimestampFormat

string

param TimestampFormat

The format of timestamps in the dataset. The format that you specify depends on the DataFrequency specified when the dataset was created. The following formats are supported

  • "yyyy-MM-dd" For the following data frequencies: Y, M, W, and D

  • "yyyy-MM-dd HH:mm:ss" For the following data frequencies: H, 30min, 15min, and 1min; and optionally, for: Y, M, W, and D

If the format isn't specified, Amazon Forecast expects the format to be "yyyy-MM-dd HH:mm:ss".

type TimeZone

string

param TimeZone

A single time zone for every item in your dataset. This option is ideal for datasets with all timestamps within a single time zone, or if all timestamps are normalized to a single time zone.

Refer to the Joda-Time API for a complete list of valid time zone names.

type UseGeolocationForTimeZone

boolean

param UseGeolocationForTimeZone

Automatically derive time zone information from the geolocation attribute. This option is ideal for datasets that contain timestamps in multiple time zones and those timestamps are expressed in local time.

type GeolocationFormat

string

param GeolocationFormat

The format of the geolocation attribute. The geolocation attribute can be formatted in one of two ways:

  • LAT_LONG - the latitude and longitude in decimal format (Example: 47.61_-122.33).

  • CC_POSTALCODE (US Only) - the country code (US), followed by the 5-digit ZIP code (Example: US_98121).

type Tags

list

param Tags

The optional metadata that you apply to the dataset import job to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.

The following basic restrictions apply to tags:

  • Maximum number of tags per resource - 50.

  • For each resource, each tag key must be unique, and each tag key can have only one value.

  • Maximum key length - 128 Unicode characters in UTF-8.

  • Maximum value length - 256 Unicode characters in UTF-8.

  • If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.

  • Tag keys and values are case sensitive.

  • Do not use aws: , AWS: , or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Forecast considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.

  • (dict) --

    The optional metadata that you apply to a resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.

    The following basic restrictions apply to tags:

    • Maximum number of tags per resource - 50.

    • For each resource, each tag key must be unique, and each tag key can have only one value.

    • Maximum key length - 128 Unicode characters in UTF-8.

    • Maximum value length - 256 Unicode characters in UTF-8.

    • If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.

    • Tag keys and values are case sensitive.

    • Do not use aws: , AWS: , or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Forecast considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.

    • Key (string) -- [REQUIRED]

      One part of a key-value pair that makes up a tag. A key is a general label that acts like a category for more specific tag values.

    • Value (string) -- [REQUIRED]

      The optional part of a key-value pair that makes up a tag. A value acts as a descriptor within a tag category (key).

rtype

dict

returns

Response Syntax

{
    'DatasetImportJobArn': 'string'
}

Response Structure

  • (dict) --

    • DatasetImportJobArn (string) --

      The Amazon Resource Name (ARN) of the dataset import job.

DescribeDataset (updated) Link ¶
Changes (response)
{'Schema': {'Attributes': {'AttributeType': {'geolocation'}}}}

Describes an Amazon Forecast dataset created using the CreateDataset operation.

In addition to listing the parameters specified in the CreateDataset request, this operation includes the following dataset properties:

  • CreationTime

  • LastModificationTime

  • Status

See also: AWS API Documentation

Request Syntax

client.describe_dataset(
    DatasetArn='string'
)
type DatasetArn

string

param DatasetArn

[REQUIRED]

The Amazon Resource Name (ARN) of the dataset.

rtype

dict

returns

Response Syntax

{
    'DatasetArn': 'string',
    'DatasetName': 'string',
    'Domain': 'RETAIL'|'CUSTOM'|'INVENTORY_PLANNING'|'EC2_CAPACITY'|'WORK_FORCE'|'WEB_TRAFFIC'|'METRICS',
    'DatasetType': 'TARGET_TIME_SERIES'|'RELATED_TIME_SERIES'|'ITEM_METADATA',
    'DataFrequency': 'string',
    'Schema': {
        'Attributes': [
            {
                'AttributeName': 'string',
                'AttributeType': 'string'|'integer'|'float'|'timestamp'|'geolocation'
            },
        ]
    },
    'EncryptionConfig': {
        'RoleArn': 'string',
        'KMSKeyArn': 'string'
    },
    'Status': 'string',
    'CreationTime': datetime(2015, 1, 1),
    'LastModificationTime': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • DatasetArn (string) --

      The Amazon Resource Name (ARN) of the dataset.

    • DatasetName (string) --

      The name of the dataset.

    • Domain (string) --

      The domain associated with the dataset.

    • DatasetType (string) --

      The dataset type.

    • DataFrequency (string) --

      The frequency of data collection.

      Valid intervals are Y (Year), M (Month), W (Week), D (Day), H (Hour), 30min (30 minutes), 15min (15 minutes), 10min (10 minutes), 5min (5 minutes), and 1min (1 minute). For example, "M" indicates every month and "30min" indicates every 30 minutes.

    • Schema (dict) --

      An array of SchemaAttribute objects that specify the dataset fields. Each SchemaAttribute specifies the name and data type of a field.

      • Attributes (list) --

        An array of attributes specifying the name and type of each field in a dataset.

        • (dict) --

          An attribute of a schema, which defines a dataset field. A schema attribute is required for every field in a dataset. The Schema object contains an array of SchemaAttribute objects.

          • AttributeName (string) --

            The name of the dataset field.

          • AttributeType (string) --

            The data type of the field.

    • EncryptionConfig (dict) --

      The AWS Key Management Service (KMS) key and the AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the key.

      • RoleArn (string) --

        The ARN of the IAM role that Amazon Forecast can assume to access the AWS KMS key.

        Passing a role across AWS accounts is not allowed. If you pass a role that isn't in your account, you get an InvalidInputException error.

      • KMSKeyArn (string) --

        The Amazon Resource Name (ARN) of the KMS key.

    • Status (string) --

      The status of the dataset. States include:

      • ACTIVE

      • CREATE_PENDING , CREATE_IN_PROGRESS , CREATE_FAILED

      • DELETE_PENDING , DELETE_IN_PROGRESS , DELETE_FAILED

      • UPDATE_PENDING , UPDATE_IN_PROGRESS , UPDATE_FAILED

      The UPDATE states apply while data is imported to the dataset from a call to the CreateDatasetImportJob operation and reflect the status of the dataset import job. For example, when the import job status is CREATE_IN_PROGRESS , the status of the dataset is UPDATE_IN_PROGRESS .

      Note

      The Status of the dataset must be ACTIVE before you can import training data.

    • CreationTime (datetime) --

      When the dataset was created.

    • LastModificationTime (datetime) --

      When you create a dataset, LastModificationTime is the same as CreationTime . While data is being imported to the dataset, LastModificationTime is the current time of the DescribeDataset call. After a CreateDatasetImportJob operation has finished, LastModificationTime is when the import job completed or failed.

DescribeDatasetImportJob (updated) Link ¶
Changes (response)
{'GeolocationFormat': 'string',
 'TimeZone': 'string',
 'UseGeolocationForTimeZone': 'boolean'}

Describes a dataset import job created using the CreateDatasetImportJob operation.

In addition to listing the parameters provided in the CreateDatasetImportJob request, this operation includes the following properties:

  • CreationTime

  • LastModificationTime

  • DataSize

  • FieldStatistics

  • Status

  • Message - If an error occurred, information about the error.

See also: AWS API Documentation

Request Syntax

client.describe_dataset_import_job(
    DatasetImportJobArn='string'
)
type DatasetImportJobArn

string

param DatasetImportJobArn

[REQUIRED]

The Amazon Resource Name (ARN) of the dataset import job.

rtype

dict

returns

Response Syntax

{
    'DatasetImportJobName': 'string',
    'DatasetImportJobArn': 'string',
    'DatasetArn': 'string',
    'TimestampFormat': 'string',
    'TimeZone': 'string',
    'UseGeolocationForTimeZone': True|False,
    'GeolocationFormat': 'string',
    'DataSource': {
        'S3Config': {
            'Path': 'string',
            'RoleArn': 'string',
            'KMSKeyArn': 'string'
        }
    },
    'FieldStatistics': {
        'string': {
            'Count': 123,
            'CountDistinct': 123,
            'CountNull': 123,
            'CountNan': 123,
            'Min': 'string',
            'Max': 'string',
            'Avg': 123.0,
            'Stddev': 123.0
        }
    },
    'DataSize': 123.0,
    'Status': 'string',
    'Message': 'string',
    'CreationTime': datetime(2015, 1, 1),
    'LastModificationTime': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • DatasetImportJobName (string) --

      The name of the dataset import job.

    • DatasetImportJobArn (string) --

      The ARN of the dataset import job.

    • DatasetArn (string) --

      The Amazon Resource Name (ARN) of the dataset that the training data was imported to.

    • TimestampFormat (string) --

      The format of timestamps in the dataset. The format that you specify depends on the DataFrequency specified when the dataset was created. The following formats are supported

      • "yyyy-MM-dd" For the following data frequencies: Y, M, W, and D

      • "yyyy-MM-dd HH:mm:ss" For the following data frequencies: H, 30min, 15min, and 1min; and optionally, for: Y, M, W, and D

    • TimeZone (string) --

      The single time zone applied to every item in the dataset

    • UseGeolocationForTimeZone (boolean) --

      Whether TimeZone is automatically derived from the geolocation attribute.

    • GeolocationFormat (string) --

      The format of the geolocation attribute. Valid Values: "LAT_LONG" and "CC_POSTALCODE" .

    • DataSource (dict) --

      The location of the training data to import and an AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the data.

      If encryption is used, DataSource includes an AWS Key Management Service (KMS) key.

      • S3Config (dict) --

        The path to the training data stored in an Amazon Simple Storage Service (Amazon S3) bucket along with the credentials to access the data.

        • Path (string) --

          The path to an Amazon Simple Storage Service (Amazon S3) bucket or file(s) in an Amazon S3 bucket.

        • RoleArn (string) --

          The ARN of the AWS Identity and Access Management (IAM) role that Amazon Forecast can assume to access the Amazon S3 bucket or files. If you provide a value for the KMSKeyArn key, the role must allow access to the key.

          Passing a role across AWS accounts is not allowed. If you pass a role that isn't in your account, you get an InvalidInputException error.

        • KMSKeyArn (string) --

          The Amazon Resource Name (ARN) of an AWS Key Management Service (KMS) key.

    • FieldStatistics (dict) --

      Statistical information about each field in the input data.

      • (string) --

        • (dict) --

          Provides statistics for each data field imported into to an Amazon Forecast dataset with the CreateDatasetImportJob operation.

          • Count (integer) --

            The number of values in the field.

          • CountDistinct (integer) --

            The number of distinct values in the field.

          • CountNull (integer) --

            The number of null values in the field.

          • CountNan (integer) --

            The number of NAN (not a number) values in the field.

          • Min (string) --

            For a numeric field, the minimum value in the field.

          • Max (string) --

            For a numeric field, the maximum value in the field.

          • Avg (float) --

            For a numeric field, the average value in the field.

          • Stddev (float) --

            For a numeric field, the standard deviation.

    • DataSize (float) --

      The size of the dataset in gigabytes (GB) after the import job has finished.

    • Status (string) --

      The status of the dataset import job. The status is reflected in the status of the dataset. For example, when the import job status is CREATE_IN_PROGRESS , the status of the dataset is UPDATE_IN_PROGRESS . States include:

      • ACTIVE

      • CREATE_PENDING , CREATE_IN_PROGRESS , CREATE_FAILED

      • DELETE_PENDING , DELETE_IN_PROGRESS , DELETE_FAILED

    • Message (string) --

      If an error occurred, an informational message about the error.

    • CreationTime (datetime) --

      When the dataset import job was created.

    • LastModificationTime (datetime) --

      The last time that the dataset was modified. The time depends on the status of the job, as follows:

      • CREATE_PENDING - The same time as CreationTime .

      • CREATE_IN_PROGRESS - The current timestamp.

      • ACTIVE or CREATE_FAILED - When the job finished or failed.