2025/05/12 - AWS Supply Chain - 9 new9 updated api methods
Changes Launch new AWS Supply Chain public APIs for DataIntegrationEvent, DataIntegrationFlowExecution and DatasetNamespace. Also add more capabilities to existing public APIs to support direct dataset event publish, data deduplication in DataIntegrationFlow, partition specification of custom datasets.
Enables you to programmatically view the list of Amazon Web Services Supply Chain data lake namespaces. Developers can view the namespaces and the corresponding information such as description for a given instance ID. Note that this API only return custom namespaces, instance pre-defined namespaces are not included.
See also: AWS API Documentation
Request Syntax
client.list_data_lake_namespaces( instanceId='string', nextToken='string', maxResults=123 )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
The pagination token to fetch next page of namespaces.
integer
The max number of namespaces to fetch in this paginated request.
dict
Response Syntax
{ 'namespaces': [ { 'instanceId': 'string', 'name': 'string', 'arn': 'string', 'description': 'string', 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) }, ], 'nextToken': 'string' }
Response Structure
(dict) --
The response parameters of ListDataLakeNamespaces.
namespaces (list) --
The list of fetched namespace details. Noted it only contains custom namespaces, pre-defined namespaces are not included.
(dict) --
The data lake namespace details.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
name (string) --
The name of the namespace.
arn (string) --
The arn of the namespace.
description (string) --
The description of the namespace.
createdTime (datetime) --
The creation time of the namespace.
lastModifiedTime (datetime) --
The last modified time of the namespace.
nextToken (string) --
The pagination token to fetch next page of namespaces.
Enables you to programmatically create an Amazon Web Services Supply Chain data lake namespace. Developers can create the namespaces for a given instance ID.
See also: AWS API Documentation
Request Syntax
client.create_data_lake_namespace( instanceId='string', name='string', description='string', tags={ 'string': 'string' } )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The name of the namespace. Noted you cannot create namespace with name starting with asc, default, scn, aws, amazon, amzn
string
The description of the namespace.
dict
The tags of the namespace.
(string) --
(string) --
dict
Response Syntax
{ 'namespace': { 'instanceId': 'string', 'name': 'string', 'arn': 'string', 'description': 'string', 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters of CreateDataLakeNamespace.
namespace (dict) --
The detail of created namespace.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
name (string) --
The name of the namespace.
arn (string) --
The arn of the namespace.
description (string) --
The description of the namespace.
createdTime (datetime) --
The creation time of the namespace.
lastModifiedTime (datetime) --
The last modified time of the namespace.
Enables you to programmatically delete an Amazon Web Services Supply Chain data lake namespace and its underling datasets. Developers can delete the existing namespaces for a given instance ID and namespace name.
See also: AWS API Documentation
Request Syntax
client.delete_data_lake_namespace( instanceId='string', name='string' )
string
[REQUIRED]
The AWS Supply Chain instance identifier.
string
[REQUIRED]
The name of the namespace. Noted you cannot delete pre-defined namespace like asc, default which are only deleted through instance deletion.
dict
Response Syntax
{ 'instanceId': 'string', 'name': 'string' }
Response Structure
(dict) --
The response parameters of DeleteDataLakeNamespace.
instanceId (string) --
The AWS Supply Chain instance identifier.
name (string) --
The name of deleted namespace.
List flow executions.
See also: AWS API Documentation
Request Syntax
client.list_data_integration_flow_executions( instanceId='string', flowName='string', nextToken='string', maxResults=123 )
string
[REQUIRED]
The AWS Supply Chain instance identifier.
string
[REQUIRED]
The flow name.
string
The pagination token to fetch next page of flow executions.
integer
The number to specify the max number of flow executions to fetch in this paginated request.
dict
Response Syntax
{ 'flowExecutions': [ { 'instanceId': 'string', 'flowName': 'string', 'executionId': 'string', 'status': 'SUCCEEDED'|'IN_PROGRESS'|'FAILED', 'sourceInfo': { 'sourceType': 'S3'|'DATASET', 's3Source': { 'bucketName': 'string', 'key': 'string' }, 'datasetSource': { 'datasetIdentifier': 'string' } }, 'message': 'string', 'startTime': datetime(2015, 1, 1), 'endTime': datetime(2015, 1, 1), 'outputMetadata': { 'diagnosticReportsRootS3URI': 'string' } }, ], 'nextToken': 'string' }
Response Structure
(dict) --
The response parameters of ListFlowExecutions.
flowExecutions (list) --
The list of flow executions.
(dict) --
The flow execution details.
instanceId (string) --
The flow execution's instanceId.
flowName (string) --
The flow execution's flowName.
executionId (string) --
The flow executionId.
status (string) --
The status of flow execution.
sourceInfo (dict) --
The source information for a flow execution.
sourceType (string) --
The data integration flow execution source type.
s3Source (dict) --
The source details of a flow execution with S3 source.
bucketName (string) --
The S3 bucket name of the S3 source.
key (string) --
The S3 object key of the S3 source.
datasetSource (dict) --
The source details of a flow execution with dataset source.
datasetIdentifier (string) --
The ARN of the dataset source.
message (string) --
The failure message (if any) of failed flow execution.
startTime (datetime) --
The flow execution start timestamp.
endTime (datetime) --
The flow execution end timestamp.
outputMetadata (dict) --
The flow execution output metadata.
diagnosticReportsRootS3URI (string) --
The S3 URI under which all diagnostic files (such as deduped records if any) are stored.
nextToken (string) --
The pagination token to fetch next page of flow executions.
Enables you to programmatically update an Amazon Web Services Supply Chain data lake namespace. Developers can update the description of a data lake namespace for a given instance ID and namespace name.
See also: AWS API Documentation
Request Syntax
client.update_data_lake_namespace( instanceId='string', name='string', description='string' )
string
[REQUIRED]
The Amazon Web Services Chain instance identifier.
string
[REQUIRED]
The name of the namespace. Noted you cannot update namespace with name starting with asc, default, scn, aws, amazon, amzn
string
The updated description of the data lake namespace.
dict
Response Syntax
{ 'namespace': { 'instanceId': 'string', 'name': 'string', 'arn': 'string', 'description': 'string', 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters of UpdateDataLakeNamespace.
namespace (dict) --
The updated namespace details.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
name (string) --
The name of the namespace.
arn (string) --
The arn of the namespace.
description (string) --
The description of the namespace.
createdTime (datetime) --
The creation time of the namespace.
lastModifiedTime (datetime) --
The last modified time of the namespace.
Enables you to programmatically view an Amazon Web Services Supply Chain Data Integration Event. Developers can view the eventType, eventGroupId, eventTimestamp, datasetTarget, datasetLoadExecution.
See also: AWS API Documentation
Request Syntax
client.get_data_integration_event( instanceId='string', eventId='string' )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The unique event identifier.
dict
Response Syntax
{ 'event': { 'instanceId': 'string', 'eventId': 'string', 'eventType': 'scn.data.forecast'|'scn.data.inventorylevel'|'scn.data.inboundorder'|'scn.data.inboundorderline'|'scn.data.inboundorderlineschedule'|'scn.data.outboundorderline'|'scn.data.outboundshipment'|'scn.data.processheader'|'scn.data.processoperation'|'scn.data.processproduct'|'scn.data.reservation'|'scn.data.shipment'|'scn.data.shipmentstop'|'scn.data.shipmentstoporder'|'scn.data.supplyplan'|'scn.data.dataset', 'eventGroupId': 'string', 'eventTimestamp': datetime(2015, 1, 1), 'datasetTargetDetails': { 'datasetIdentifier': 'string', 'operationType': 'APPEND'|'UPSERT'|'DELETE', 'datasetLoadExecution': { 'status': 'SUCCEEDED'|'IN_PROGRESS'|'FAILED', 'message': 'string' } } } }
Response Structure
(dict) --
The response parameters for GetDataIntegrationEvent.
event (dict) --
The details of the DataIntegrationEvent returned.
instanceId (string) --
The AWS Supply Chain instance identifier.
eventId (string) --
The unique event identifier.
eventType (string) --
The data event type.
eventGroupId (string) --
Event identifier (for example, orderId for InboundOrder) used for data sharding or partitioning.
eventTimestamp (datetime) --
The event timestamp (in epoch seconds).
datasetTargetDetails (dict) --
The target dataset details for a DATASET event type.
datasetIdentifier (string) --
The datalake dataset ARN identifier.
operationType (string) --
The target dataset load operation type. The available options are:
APPEND - Add new records to the dataset. Noted that this operation type will just try to append records as-is without any primary key or partition constraints.
UPSERT - Modify existing records in the dataset with primary key configured, events for datasets without primary keys are not allowed. If event data contains primary keys that match records in the dataset within same partition, then those existing records (in that partition) will be updated. If primary keys do not match, new records will be added. Note that if dataset contain records with duplicate primary key values in the same partition, those duplicate records will be deduped into one updated record.
DELETE - Remove existing records in the dataset with primary key configured, events for datasets without primary keys are not allowed. If event data contains primary keys that match records in the dataset within same partition, then those existing records (in that partition) will be deleted. If primary keys do not match, no actions will be done. Note that if dataset contain records with duplicate primary key values in the same partition, all those duplicates will be removed.
datasetLoadExecution (dict) --
The target dataset load execution.
status (string) --
The event load execution status to target dataset.
message (string) --
The failure message (if any) of failed event load execution to dataset.
Enables you to programmatically view an Amazon Web Services Supply Chain data lake namespace. Developers can view the data lake namespace information such as description for a given instance ID and namespace name.
See also: AWS API Documentation
Request Syntax
client.get_data_lake_namespace( instanceId='string', name='string' )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The name of the namespace. Besides the namespaces user created, you can also specify the pre-defined namespaces:
asc - Pre-defined namespace containing Amazon Web Services Supply Chain supported datasets, see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - Pre-defined namespace containing datasets with custom user-defined schemas.
dict
Response Syntax
{ 'namespace': { 'instanceId': 'string', 'name': 'string', 'arn': 'string', 'description': 'string', 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters for GetDataLakeNamespace.
namespace (dict) --
The fetched namespace details.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
name (string) --
The name of the namespace.
arn (string) --
The arn of the namespace.
description (string) --
The description of the namespace.
createdTime (datetime) --
The creation time of the namespace.
lastModifiedTime (datetime) --
The last modified time of the namespace.
Enables you to programmatically list all data integration events for the provided Amazon Web Services Supply Chain instance.
See also: AWS API Documentation
Request Syntax
client.list_data_integration_events( instanceId='string', eventType='scn.data.forecast'|'scn.data.inventorylevel'|'scn.data.inboundorder'|'scn.data.inboundorderline'|'scn.data.inboundorderlineschedule'|'scn.data.outboundorderline'|'scn.data.outboundshipment'|'scn.data.processheader'|'scn.data.processoperation'|'scn.data.processproduct'|'scn.data.reservation'|'scn.data.shipment'|'scn.data.shipmentstop'|'scn.data.shipmentstoporder'|'scn.data.supplyplan'|'scn.data.dataset', nextToken='string', maxResults=123 )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
List data integration events for the specified eventType.
string
The pagination token to fetch the next page of the data integration events.
integer
Specify the maximum number of data integration events to fetch in one paginated request.
dict
Response Syntax
{ 'events': [ { 'instanceId': 'string', 'eventId': 'string', 'eventType': 'scn.data.forecast'|'scn.data.inventorylevel'|'scn.data.inboundorder'|'scn.data.inboundorderline'|'scn.data.inboundorderlineschedule'|'scn.data.outboundorderline'|'scn.data.outboundshipment'|'scn.data.processheader'|'scn.data.processoperation'|'scn.data.processproduct'|'scn.data.reservation'|'scn.data.shipment'|'scn.data.shipmentstop'|'scn.data.shipmentstoporder'|'scn.data.supplyplan'|'scn.data.dataset', 'eventGroupId': 'string', 'eventTimestamp': datetime(2015, 1, 1), 'datasetTargetDetails': { 'datasetIdentifier': 'string', 'operationType': 'APPEND'|'UPSERT'|'DELETE', 'datasetLoadExecution': { 'status': 'SUCCEEDED'|'IN_PROGRESS'|'FAILED', 'message': 'string' } } }, ], 'nextToken': 'string' }
Response Structure
(dict) --
The response parameters for ListDataIntegrationEvents.
events (list) --
The list of data integration events.
(dict) --
The data integration event details.
instanceId (string) --
The AWS Supply Chain instance identifier.
eventId (string) --
The unique event identifier.
eventType (string) --
The data event type.
eventGroupId (string) --
Event identifier (for example, orderId for InboundOrder) used for data sharding or partitioning.
eventTimestamp (datetime) --
The event timestamp (in epoch seconds).
datasetTargetDetails (dict) --
The target dataset details for a DATASET event type.
datasetIdentifier (string) --
The datalake dataset ARN identifier.
operationType (string) --
The target dataset load operation type. The available options are:
APPEND - Add new records to the dataset. Noted that this operation type will just try to append records as-is without any primary key or partition constraints.
UPSERT - Modify existing records in the dataset with primary key configured, events for datasets without primary keys are not allowed. If event data contains primary keys that match records in the dataset within same partition, then those existing records (in that partition) will be updated. If primary keys do not match, new records will be added. Note that if dataset contain records with duplicate primary key values in the same partition, those duplicate records will be deduped into one updated record.
DELETE - Remove existing records in the dataset with primary key configured, events for datasets without primary keys are not allowed. If event data contains primary keys that match records in the dataset within same partition, then those existing records (in that partition) will be deleted. If primary keys do not match, no actions will be done. Note that if dataset contain records with duplicate primary key values in the same partition, all those duplicates will be removed.
datasetLoadExecution (dict) --
The target dataset load execution.
status (string) --
The event load execution status to target dataset.
message (string) --
The failure message (if any) of failed event load execution to dataset.
nextToken (string) --
The pagination token to fetch the next page of the ListDataIntegrationEvents.
Get the flow execution.
See also: AWS API Documentation
Request Syntax
client.get_data_integration_flow_execution( instanceId='string', flowName='string', executionId='string' )
string
[REQUIRED]
The AWS Supply Chain instance identifier.
string
[REQUIRED]
The flow name.
string
[REQUIRED]
The flow execution identifier.
dict
Response Syntax
{ 'flowExecution': { 'instanceId': 'string', 'flowName': 'string', 'executionId': 'string', 'status': 'SUCCEEDED'|'IN_PROGRESS'|'FAILED', 'sourceInfo': { 'sourceType': 'S3'|'DATASET', 's3Source': { 'bucketName': 'string', 'key': 'string' }, 'datasetSource': { 'datasetIdentifier': 'string' } }, 'message': 'string', 'startTime': datetime(2015, 1, 1), 'endTime': datetime(2015, 1, 1), 'outputMetadata': { 'diagnosticReportsRootS3URI': 'string' } } }
Response Structure
(dict) --
The response parameters of GetFlowExecution.
flowExecution (dict) --
The flow execution details.
instanceId (string) --
The flow execution's instanceId.
flowName (string) --
The flow execution's flowName.
executionId (string) --
The flow executionId.
status (string) --
The status of flow execution.
sourceInfo (dict) --
The source information for a flow execution.
sourceType (string) --
The data integration flow execution source type.
s3Source (dict) --
The source details of a flow execution with S3 source.
bucketName (string) --
The S3 bucket name of the S3 source.
key (string) --
The S3 object key of the S3 source.
datasetSource (dict) --
The source details of a flow execution with dataset source.
datasetIdentifier (string) --
The ARN of the dataset source.
message (string) --
The failure message (if any) of failed flow execution.
startTime (datetime) --
The flow execution start timestamp.
endTime (datetime) --
The flow execution end timestamp.
outputMetadata (dict) --
The flow execution output metadata.
diagnosticReportsRootS3URI (string) --
The S3 URI under which all diagnostic files (such as deduped records if any) are stored.
{'sources': {'datasetSource': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}, 'target': {'datasetTarget': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}}
Enables you to programmatically create a data pipeline to ingest data from source systems such as Amazon S3 buckets, to a predefined Amazon Web Services Supply Chain dataset (product, inbound_order) or a temporary dataset along with the data transformation query provided with the API.
See also: AWS API Documentation
Request Syntax
client.create_data_integration_flow( instanceId='string', name='string', sources=[ { 'sourceType': 'S3'|'DATASET', 'sourceName': 'string', 's3Source': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetSource': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, ], transformation={ 'transformationType': 'SQL'|'NONE', 'sqlTransformation': { 'query': 'string' } }, target={ 'targetType': 'S3'|'DATASET', 's3Target': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetTarget': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, tags={ 'string': 'string' } )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
Name of the DataIntegrationFlow.
list
[REQUIRED]
The source configurations for DataIntegrationFlow.
(dict) --
The DataIntegrationFlow source parameters.
sourceType (string) -- [REQUIRED]
The DataIntegrationFlow source type.
sourceName (string) -- [REQUIRED]
The DataIntegrationFlow source name that can be used as table alias in SQL transformation query.
s3Source (dict) --
The S3 DataIntegrationFlow source.
bucketName (string) -- [REQUIRED]
The bucketName of the S3 source objects.
prefix (string) -- [REQUIRED]
The prefix of the S3 source objects. To trigger data ingestion, S3 files need to be put under s3://bucketName/prefix/.
options (dict) --
The other options of the S3 DataIntegrationFlow source.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetSource (dict) --
The dataset DataIntegrationFlow source.
datasetIdentifier (string) -- [REQUIRED]
The ARN of the dataset.
options (dict) --
The dataset DataIntegrationFlow source options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) -- [REQUIRED]
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) -- [REQUIRED]
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) -- [REQUIRED]
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) -- [REQUIRED]
The sort order for the deduplication field.
dict
[REQUIRED]
The transformation configurations for DataIntegrationFlow.
transformationType (string) -- [REQUIRED]
The DataIntegrationFlow transformation type.
sqlTransformation (dict) --
The SQL DataIntegrationFlow transformation configuration.
query (string) -- [REQUIRED]
The transformation SQL query body based on SparkSQL.
dict
[REQUIRED]
The target configurations for DataIntegrationFlow.
targetType (string) -- [REQUIRED]
The DataIntegrationFlow target type.
s3Target (dict) --
The S3 DataIntegrationFlow target.
bucketName (string) -- [REQUIRED]
The bucketName of the S3 target objects.
prefix (string) -- [REQUIRED]
The prefix of the S3 target objects.
options (dict) --
The S3 DataIntegrationFlow target options.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetTarget (dict) --
The dataset DataIntegrationFlow target. Note that for AWS Supply Chain dataset under asc namespace, it has a connection_id internal field that is not allowed to be provided by client directly, they will be auto populated.
datasetIdentifier (string) -- [REQUIRED]
The dataset ARN.
options (dict) --
The dataset DataIntegrationFlow target options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) -- [REQUIRED]
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) -- [REQUIRED]
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) -- [REQUIRED]
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) -- [REQUIRED]
The sort order for the deduplication field.
dict
The tags of the DataIntegrationFlow to be created
(string) --
(string) --
dict
Response Syntax
{ 'instanceId': 'string', 'name': 'string' }
Response Structure
(dict) --
The response parameters for CreateDataIntegrationFlow.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
name (string) --
The name of the DataIntegrationFlow created.
{'partitionSpec': {'fields': [{'name': 'string', 'transform': {'type': 'YEAR | MONTH | DAY | ' 'HOUR | IDENTITY'}}]}, 'schema': {'fields': {'type': {'LONG'}}, 'primaryKeys': [{'name': 'string'}]}}Response
{'dataset': {'partitionSpec': {'fields': [{'name': 'string', 'transform': {'type': 'YEAR | MONTH ' '| DAY | HOUR ' '| ' 'IDENTITY'}}]}, 'schema': {'fields': {'type': {'LONG'}}, 'primaryKeys': [{'name': 'string'}]}}}
Enables you to programmatically create an Amazon Web Services Supply Chain data lake dataset. Developers can create the datasets using their pre-defined or custom schema for a given instance ID, namespace, and dataset name.
See also: AWS API Documentation
Request Syntax
client.create_data_lake_dataset( instanceId='string', namespace='string', name='string', schema={ 'name': 'string', 'fields': [ { 'name': 'string', 'type': 'INT'|'DOUBLE'|'STRING'|'TIMESTAMP'|'LONG', 'isRequired': True|False }, ], 'primaryKeys': [ { 'name': 'string' }, ] }, description='string', partitionSpec={ 'fields': [ { 'name': 'string', 'transform': { 'type': 'YEAR'|'MONTH'|'DAY'|'HOUR'|'IDENTITY' } }, ] }, tags={ 'string': 'string' } )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
string
[REQUIRED]
The name of the dataset. For asc name space, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
dict
The custom schema of the data lake dataset and required for dataset in default and custom namespaces.
name (string) -- [REQUIRED]
The name of the dataset schema.
fields (list) -- [REQUIRED]
The list of field details of the dataset schema.
(dict) --
The dataset field details.
name (string) -- [REQUIRED]
The dataset field name.
type (string) -- [REQUIRED]
The dataset field type.
isRequired (boolean) -- [REQUIRED]
Indicate if the field is required or not.
primaryKeys (list) --
The list of primary key fields for the dataset. Primary keys defined can help data ingestion methods to ensure data uniqueness: CreateDataIntegrationFlow's dedupe strategy will leverage primary keys to perform records deduplication before write to dataset; SendDataIntegrationEvent's UPSERT and DELETE can only work with dataset with primary keys. For more details, refer to those data ingestion documentations.
Note that defining primary keys does not necessarily mean the dataset cannot have duplicate records, duplicate records can still be ingested if CreateDataIntegrationFlow's dedupe disabled or through SendDataIntegrationEvent's APPEND operation.
(dict) --
The detail of the primary key field.
name (string) -- [REQUIRED]
The name of the primary key field.
string
The description of the dataset.
dict
The partition specification of the dataset. Partitioning can effectively improve the dataset query performance by reducing the amount of data scanned during query execution. But partitioning or not will affect how data get ingested by data ingestion methods, such as SendDataIntegrationEvent's dataset UPSERT will upsert records within partition (instead of within whole dataset). For more details, refer to those data ingestion documentations.
fields (list) -- [REQUIRED]
The fields on which to partition a dataset. The partitions will be applied hierarchically based on the order of this list.
(dict) --
The detail of the partition field.
name (string) -- [REQUIRED]
The name of the partition field.
transform (dict) -- [REQUIRED]
The transformation of the partition field. A transformation specifies how to partition on a given field. For example, with timestamp you can specify that you'd like to partition fields by day, e.g. data record with value 2025-01-03T00:00:00Z in partition field is in 2025-01-03 partition. Also noted that data record without any value in optional partition field is in NULL partition.
type (string) -- [REQUIRED]
The type of partitioning transformation for this field. The available options are:
IDENTITY - Partitions data on a given field by its exact values.
YEAR - Partitions data on a timestamp field using year granularity.
MONTH - Partitions data on a timestamp field using month granularity.
DAY - Partitions data on a timestamp field using day granularity.
HOUR - Partitions data on a timestamp field using hour granularity.
dict
The tags of the dataset.
(string) --
(string) --
dict
Response Syntax
{ 'dataset': { 'instanceId': 'string', 'namespace': 'string', 'name': 'string', 'arn': 'string', 'schema': { 'name': 'string', 'fields': [ { 'name': 'string', 'type': 'INT'|'DOUBLE'|'STRING'|'TIMESTAMP'|'LONG', 'isRequired': True|False }, ], 'primaryKeys': [ { 'name': 'string' }, ] }, 'description': 'string', 'partitionSpec': { 'fields': [ { 'name': 'string', 'transform': { 'type': 'YEAR'|'MONTH'|'DAY'|'HOUR'|'IDENTITY' } }, ] }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters of CreateDataLakeDataset.
dataset (dict) --
The detail of created dataset.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
namespace (string) --
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
name (string) --
The name of the dataset. For asc namespace, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
arn (string) --
The arn of the dataset.
schema (dict) --
The schema of the dataset.
name (string) --
The name of the dataset schema.
fields (list) --
The list of field details of the dataset schema.
(dict) --
The dataset field details.
name (string) --
The dataset field name.
type (string) --
The dataset field type.
isRequired (boolean) --
Indicate if the field is required or not.
primaryKeys (list) --
The list of primary key fields for the dataset. Primary keys defined can help data ingestion methods to ensure data uniqueness: CreateDataIntegrationFlow's dedupe strategy will leverage primary keys to perform records deduplication before write to dataset; SendDataIntegrationEvent's UPSERT and DELETE can only work with dataset with primary keys. For more details, refer to those data ingestion documentations.
Note that defining primary keys does not necessarily mean the dataset cannot have duplicate records, duplicate records can still be ingested if CreateDataIntegrationFlow's dedupe disabled or through SendDataIntegrationEvent's APPEND operation.
(dict) --
The detail of the primary key field.
name (string) --
The name of the primary key field.
description (string) --
The description of the dataset.
partitionSpec (dict) --
The partition specification for a dataset.
fields (list) --
The fields on which to partition a dataset. The partitions will be applied hierarchically based on the order of this list.
(dict) --
The detail of the partition field.
name (string) --
The name of the partition field.
transform (dict) --
The transformation of the partition field. A transformation specifies how to partition on a given field. For example, with timestamp you can specify that you'd like to partition fields by day, e.g. data record with value 2025-01-03T00:00:00Z in partition field is in 2025-01-03 partition. Also noted that data record without any value in optional partition field is in NULL partition.
type (string) --
The type of partitioning transformation for this field. The available options are:
IDENTITY - Partitions data on a given field by its exact values.
YEAR - Partitions data on a timestamp field using year granularity.
MONTH - Partitions data on a timestamp field using month granularity.
DAY - Partitions data on a timestamp field using day granularity.
HOUR - Partitions data on a timestamp field using hour granularity.
createdTime (datetime) --
The creation time of the dataset.
lastModifiedTime (datetime) --
The last modified time of the dataset.
{'flow': {'sources': {'datasetSource': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}, 'target': {'datasetTarget': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}}}
Enables you to programmatically view a specific data pipeline for the provided Amazon Web Services Supply Chain instance and DataIntegrationFlow name.
See also: AWS API Documentation
Request Syntax
client.get_data_integration_flow( instanceId='string', name='string' )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The name of the DataIntegrationFlow created.
dict
Response Syntax
{ 'flow': { 'instanceId': 'string', 'name': 'string', 'sources': [ { 'sourceType': 'S3'|'DATASET', 'sourceName': 'string', 's3Source': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetSource': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, ], 'transformation': { 'transformationType': 'SQL'|'NONE', 'sqlTransformation': { 'query': 'string' } }, 'target': { 'targetType': 'S3'|'DATASET', 's3Target': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetTarget': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters for GetDataIntegrationFlow.
flow (dict) --
The details of the DataIntegrationFlow returned.
instanceId (string) --
The DataIntegrationFlow instance ID.
name (string) --
The DataIntegrationFlow name.
sources (list) --
The DataIntegrationFlow source configurations.
(dict) --
The DataIntegrationFlow source parameters.
sourceType (string) --
The DataIntegrationFlow source type.
sourceName (string) --
The DataIntegrationFlow source name that can be used as table alias in SQL transformation query.
s3Source (dict) --
The S3 DataIntegrationFlow source.
bucketName (string) --
The bucketName of the S3 source objects.
prefix (string) --
The prefix of the S3 source objects. To trigger data ingestion, S3 files need to be put under s3://bucketName/prefix/.
options (dict) --
The other options of the S3 DataIntegrationFlow source.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetSource (dict) --
The dataset DataIntegrationFlow source.
datasetIdentifier (string) --
The ARN of the dataset.
options (dict) --
The dataset DataIntegrationFlow source options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) --
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) --
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) --
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) --
The sort order for the deduplication field.
transformation (dict) --
The DataIntegrationFlow transformation configurations.
transformationType (string) --
The DataIntegrationFlow transformation type.
sqlTransformation (dict) --
The SQL DataIntegrationFlow transformation configuration.
query (string) --
The transformation SQL query body based on SparkSQL.
target (dict) --
The DataIntegrationFlow target configuration.
targetType (string) --
The DataIntegrationFlow target type.
s3Target (dict) --
The S3 DataIntegrationFlow target.
bucketName (string) --
The bucketName of the S3 target objects.
prefix (string) --
The prefix of the S3 target objects.
options (dict) --
The S3 DataIntegrationFlow target options.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetTarget (dict) --
The dataset DataIntegrationFlow target. Note that for AWS Supply Chain dataset under asc namespace, it has a connection_id internal field that is not allowed to be provided by client directly, they will be auto populated.
datasetIdentifier (string) --
The dataset ARN.
options (dict) --
The dataset DataIntegrationFlow target options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) --
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) --
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) --
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) --
The sort order for the deduplication field.
createdTime (datetime) --
The DataIntegrationFlow creation timestamp.
lastModifiedTime (datetime) --
The DataIntegrationFlow last modified timestamp.
{'dataset': {'partitionSpec': {'fields': [{'name': 'string', 'transform': {'type': 'YEAR | MONTH ' '| DAY | HOUR ' '| ' 'IDENTITY'}}]}, 'schema': {'fields': {'type': {'LONG'}}, 'primaryKeys': [{'name': 'string'}]}}}
Enables you to programmatically view an Amazon Web Services Supply Chain data lake dataset. Developers can view the data lake dataset information such as namespace, schema, and so on for a given instance ID, namespace, and dataset name.
See also: AWS API Documentation
Request Syntax
client.get_data_lake_dataset( instanceId='string', namespace='string', name='string' )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
string
[REQUIRED]
The name of the dataset. For asc namespace, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
dict
Response Syntax
{ 'dataset': { 'instanceId': 'string', 'namespace': 'string', 'name': 'string', 'arn': 'string', 'schema': { 'name': 'string', 'fields': [ { 'name': 'string', 'type': 'INT'|'DOUBLE'|'STRING'|'TIMESTAMP'|'LONG', 'isRequired': True|False }, ], 'primaryKeys': [ { 'name': 'string' }, ] }, 'description': 'string', 'partitionSpec': { 'fields': [ { 'name': 'string', 'transform': { 'type': 'YEAR'|'MONTH'|'DAY'|'HOUR'|'IDENTITY' } }, ] }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters for GetDataLakeDataset.
dataset (dict) --
The fetched dataset details.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
namespace (string) --
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
name (string) --
The name of the dataset. For asc namespace, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
arn (string) --
The arn of the dataset.
schema (dict) --
The schema of the dataset.
name (string) --
The name of the dataset schema.
fields (list) --
The list of field details of the dataset schema.
(dict) --
The dataset field details.
name (string) --
The dataset field name.
type (string) --
The dataset field type.
isRequired (boolean) --
Indicate if the field is required or not.
primaryKeys (list) --
The list of primary key fields for the dataset. Primary keys defined can help data ingestion methods to ensure data uniqueness: CreateDataIntegrationFlow's dedupe strategy will leverage primary keys to perform records deduplication before write to dataset; SendDataIntegrationEvent's UPSERT and DELETE can only work with dataset with primary keys. For more details, refer to those data ingestion documentations.
Note that defining primary keys does not necessarily mean the dataset cannot have duplicate records, duplicate records can still be ingested if CreateDataIntegrationFlow's dedupe disabled or through SendDataIntegrationEvent's APPEND operation.
(dict) --
The detail of the primary key field.
name (string) --
The name of the primary key field.
description (string) --
The description of the dataset.
partitionSpec (dict) --
The partition specification for a dataset.
fields (list) --
The fields on which to partition a dataset. The partitions will be applied hierarchically based on the order of this list.
(dict) --
The detail of the partition field.
name (string) --
The name of the partition field.
transform (dict) --
The transformation of the partition field. A transformation specifies how to partition on a given field. For example, with timestamp you can specify that you'd like to partition fields by day, e.g. data record with value 2025-01-03T00:00:00Z in partition field is in 2025-01-03 partition. Also noted that data record without any value in optional partition field is in NULL partition.
type (string) --
The type of partitioning transformation for this field. The available options are:
IDENTITY - Partitions data on a given field by its exact values.
YEAR - Partitions data on a timestamp field using year granularity.
MONTH - Partitions data on a timestamp field using month granularity.
DAY - Partitions data on a timestamp field using day granularity.
HOUR - Partitions data on a timestamp field using hour granularity.
createdTime (datetime) --
The creation time of the dataset.
lastModifiedTime (datetime) --
The last modified time of the dataset.
{'flows': {'sources': {'datasetSource': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}, 'target': {'datasetTarget': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}}}
Enables you to programmatically list all data pipelines for the provided Amazon Web Services Supply Chain instance.
See also: AWS API Documentation
Request Syntax
client.list_data_integration_flows( instanceId='string', nextToken='string', maxResults=123 )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
The pagination token to fetch the next page of the DataIntegrationFlows.
integer
Specify the maximum number of DataIntegrationFlows to fetch in one paginated request.
dict
Response Syntax
{ 'flows': [ { 'instanceId': 'string', 'name': 'string', 'sources': [ { 'sourceType': 'S3'|'DATASET', 'sourceName': 'string', 's3Source': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetSource': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, ], 'transformation': { 'transformationType': 'SQL'|'NONE', 'sqlTransformation': { 'query': 'string' } }, 'target': { 'targetType': 'S3'|'DATASET', 's3Target': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetTarget': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) }, ], 'nextToken': 'string' }
Response Structure
(dict) --
The response parameters for ListDataIntegrationFlows.
flows (list) --
The response parameters for ListDataIntegrationFlows.
(dict) --
The DataIntegrationFlow details.
instanceId (string) --
The DataIntegrationFlow instance ID.
name (string) --
The DataIntegrationFlow name.
sources (list) --
The DataIntegrationFlow source configurations.
(dict) --
The DataIntegrationFlow source parameters.
sourceType (string) --
The DataIntegrationFlow source type.
sourceName (string) --
The DataIntegrationFlow source name that can be used as table alias in SQL transformation query.
s3Source (dict) --
The S3 DataIntegrationFlow source.
bucketName (string) --
The bucketName of the S3 source objects.
prefix (string) --
The prefix of the S3 source objects. To trigger data ingestion, S3 files need to be put under s3://bucketName/prefix/.
options (dict) --
The other options of the S3 DataIntegrationFlow source.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetSource (dict) --
The dataset DataIntegrationFlow source.
datasetIdentifier (string) --
The ARN of the dataset.
options (dict) --
The dataset DataIntegrationFlow source options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) --
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) --
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) --
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) --
The sort order for the deduplication field.
transformation (dict) --
The DataIntegrationFlow transformation configurations.
transformationType (string) --
The DataIntegrationFlow transformation type.
sqlTransformation (dict) --
The SQL DataIntegrationFlow transformation configuration.
query (string) --
The transformation SQL query body based on SparkSQL.
target (dict) --
The DataIntegrationFlow target configuration.
targetType (string) --
The DataIntegrationFlow target type.
s3Target (dict) --
The S3 DataIntegrationFlow target.
bucketName (string) --
The bucketName of the S3 target objects.
prefix (string) --
The prefix of the S3 target objects.
options (dict) --
The S3 DataIntegrationFlow target options.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetTarget (dict) --
The dataset DataIntegrationFlow target. Note that for AWS Supply Chain dataset under asc namespace, it has a connection_id internal field that is not allowed to be provided by client directly, they will be auto populated.
datasetIdentifier (string) --
The dataset ARN.
options (dict) --
The dataset DataIntegrationFlow target options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) --
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) --
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) --
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) --
The sort order for the deduplication field.
createdTime (datetime) --
The DataIntegrationFlow creation timestamp.
lastModifiedTime (datetime) --
The DataIntegrationFlow last modified timestamp.
nextToken (string) --
The pagination token to fetch the next page of the DataIntegrationFlows.
{'datasets': {'partitionSpec': {'fields': [{'name': 'string', 'transform': {'type': 'YEAR | ' 'MONTH | DAY ' '| HOUR | ' 'IDENTITY'}}]}, 'schema': {'fields': {'type': {'LONG'}}, 'primaryKeys': [{'name': 'string'}]}}}
Enables you to programmatically view the list of Amazon Web Services Supply Chain data lake datasets. Developers can view the datasets and the corresponding information such as namespace, schema, and so on for a given instance ID and namespace.
See also: AWS API Documentation
Request Syntax
client.list_data_lake_datasets( instanceId='string', namespace='string', nextToken='string', maxResults=123 )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
string
The pagination token to fetch next page of datasets.
integer
The max number of datasets to fetch in this paginated request.
dict
Response Syntax
{ 'datasets': [ { 'instanceId': 'string', 'namespace': 'string', 'name': 'string', 'arn': 'string', 'schema': { 'name': 'string', 'fields': [ { 'name': 'string', 'type': 'INT'|'DOUBLE'|'STRING'|'TIMESTAMP'|'LONG', 'isRequired': True|False }, ], 'primaryKeys': [ { 'name': 'string' }, ] }, 'description': 'string', 'partitionSpec': { 'fields': [ { 'name': 'string', 'transform': { 'type': 'YEAR'|'MONTH'|'DAY'|'HOUR'|'IDENTITY' } }, ] }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) }, ], 'nextToken': 'string' }
Response Structure
(dict) --
The response parameters of ListDataLakeDatasets.
datasets (list) --
The list of fetched dataset details.
(dict) --
The data lake dataset details.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
namespace (string) --
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
name (string) --
The name of the dataset. For asc namespace, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
arn (string) --
The arn of the dataset.
schema (dict) --
The schema of the dataset.
name (string) --
The name of the dataset schema.
fields (list) --
The list of field details of the dataset schema.
(dict) --
The dataset field details.
name (string) --
The dataset field name.
type (string) --
The dataset field type.
isRequired (boolean) --
Indicate if the field is required or not.
primaryKeys (list) --
The list of primary key fields for the dataset. Primary keys defined can help data ingestion methods to ensure data uniqueness: CreateDataIntegrationFlow's dedupe strategy will leverage primary keys to perform records deduplication before write to dataset; SendDataIntegrationEvent's UPSERT and DELETE can only work with dataset with primary keys. For more details, refer to those data ingestion documentations.
Note that defining primary keys does not necessarily mean the dataset cannot have duplicate records, duplicate records can still be ingested if CreateDataIntegrationFlow's dedupe disabled or through SendDataIntegrationEvent's APPEND operation.
(dict) --
The detail of the primary key field.
name (string) --
The name of the primary key field.
description (string) --
The description of the dataset.
partitionSpec (dict) --
The partition specification for a dataset.
fields (list) --
The fields on which to partition a dataset. The partitions will be applied hierarchically based on the order of this list.
(dict) --
The detail of the partition field.
name (string) --
The name of the partition field.
transform (dict) --
The transformation of the partition field. A transformation specifies how to partition on a given field. For example, with timestamp you can specify that you'd like to partition fields by day, e.g. data record with value 2025-01-03T00:00:00Z in partition field is in 2025-01-03 partition. Also noted that data record without any value in optional partition field is in NULL partition.
type (string) --
The type of partitioning transformation for this field. The available options are:
IDENTITY - Partitions data on a given field by its exact values.
YEAR - Partitions data on a timestamp field using year granularity.
MONTH - Partitions data on a timestamp field using month granularity.
DAY - Partitions data on a timestamp field using day granularity.
HOUR - Partitions data on a timestamp field using hour granularity.
createdTime (datetime) --
The creation time of the dataset.
lastModifiedTime (datetime) --
The last modified time of the dataset.
nextToken (string) --
The pagination token to fetch next page of datasets.
{'datasetTarget': {'datasetIdentifier': 'string', 'operationType': 'APPEND | UPSERT | DELETE'}, 'eventType': {'scn.data.dataset'}}
Send the data payload for the event with real-time data for analysis or monitoring. The real-time data events are stored in an Amazon Web Services service before being processed and stored in data lake.
See also: AWS API Documentation
Request Syntax
client.send_data_integration_event( instanceId='string', eventType='scn.data.forecast'|'scn.data.inventorylevel'|'scn.data.inboundorder'|'scn.data.inboundorderline'|'scn.data.inboundorderlineschedule'|'scn.data.outboundorderline'|'scn.data.outboundshipment'|'scn.data.processheader'|'scn.data.processoperation'|'scn.data.processproduct'|'scn.data.reservation'|'scn.data.shipment'|'scn.data.shipmentstop'|'scn.data.shipmentstoporder'|'scn.data.supplyplan'|'scn.data.dataset', data='string', eventGroupId='string', eventTimestamp=datetime(2015, 1, 1), clientToken='string', datasetTarget={ 'datasetIdentifier': 'string', 'operationType': 'APPEND'|'UPSERT'|'DELETE' } )
string
[REQUIRED]
The AWS Supply Chain instance identifier.
string
[REQUIRED]
The data event type.
scn.data.dataset - Send data directly to any specified dataset.
scn.data.supplyplan - Send data to supply_plan dataset.
scn.data.shipmentstoporder - Send data to shipment_stop_order dataset.
scn.data.shipmentstop - Send data to shipment_stop dataset.
scn.data.shipment - Send data to shipment dataset.
scn.data.reservation - Send data to reservation dataset.
scn.data.processproduct - Send data to process_product dataset.
scn.data.processoperation - Send data to process_operation dataset.
scn.data.processheader - Send data to process_header dataset.
scn.data.forecast - Send data to forecast dataset.
scn.data.inventorylevel - Send data to inv_level dataset.
scn.data.inboundorder - Send data to inbound_order dataset.
scn.data.inboundorderline - Send data to inbound_order_line dataset.
scn.data.inboundorderlineschedule - Send data to inbound_order_line_schedule dataset.
scn.data.outboundorderline - Send data to outbound_order_line dataset.
scn.data.outboundshipment - Send data to outbound_shipment dataset.
string
[REQUIRED]
The data payload of the event, should follow the data schema of the target dataset, or see Data entities supported in AWS Supply Chain. To send single data record, use JsonObject format; to send multiple data records, use JsonArray format.
Note that for AWS Supply Chain dataset under asc namespace, it has a connection_id internal field that is not allowed to be provided by client directly, they will be auto populated.
string
[REQUIRED]
Event identifier (for example, orderId for InboundOrder) used for data sharding or partitioning. Noted under one eventGroupId of same eventType and instanceId, events are processed sequentially in the order they are received by the server.
datetime
The timestamp (in epoch seconds) associated with the event. If not provided, it will be assigned with current timestamp.
string
The idempotent client token. The token is active for 8 hours, and within its lifetime, it ensures the request completes only once upon retry with same client token. If omitted, the AWS SDK generates a unique value so that AWS SDK can safely retry the request upon network errors.
This field is autopopulated if not provided.
dict
The target dataset configuration for scn.data.dataset event type.
datasetIdentifier (string) -- [REQUIRED]
The datalake dataset ARN identifier.
operationType (string) -- [REQUIRED]
The target dataset load operation type.
dict
Response Syntax
{ 'eventId': 'string' }
Response Structure
(dict) --
The response parameters for SendDataIntegrationEvent.
eventId (string) --
The unique event identifier.
{'sources': {'datasetSource': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}, 'target': {'datasetTarget': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}}Response
{'flow': {'sources': {'datasetSource': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}, 'target': {'datasetTarget': {'options': {'dedupeStrategy': {'fieldPriority': {'fields': [{'name': 'string', 'sortOrder': 'ASC ' '| ' 'DESC'}]}, 'type': 'FIELD_PRIORITY'}}}}}}
Enables you to programmatically update an existing data pipeline to ingest data from the source systems such as, Amazon S3 buckets, to a predefined Amazon Web Services Supply Chain dataset (product, inbound_order) or a temporary dataset along with the data transformation query provided with the API.
See also: AWS API Documentation
Request Syntax
client.update_data_integration_flow( instanceId='string', name='string', sources=[ { 'sourceType': 'S3'|'DATASET', 'sourceName': 'string', 's3Source': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetSource': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, ], transformation={ 'transformationType': 'SQL'|'NONE', 'sqlTransformation': { 'query': 'string' } }, target={ 'targetType': 'S3'|'DATASET', 's3Target': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetTarget': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } } )
string
[REQUIRED]
The Amazon Web Services Supply Chain instance identifier.
string
[REQUIRED]
The name of the DataIntegrationFlow to be updated.
list
The new source configurations for the DataIntegrationFlow.
(dict) --
The DataIntegrationFlow source parameters.
sourceType (string) -- [REQUIRED]
The DataIntegrationFlow source type.
sourceName (string) -- [REQUIRED]
The DataIntegrationFlow source name that can be used as table alias in SQL transformation query.
s3Source (dict) --
The S3 DataIntegrationFlow source.
bucketName (string) -- [REQUIRED]
The bucketName of the S3 source objects.
prefix (string) -- [REQUIRED]
The prefix of the S3 source objects. To trigger data ingestion, S3 files need to be put under s3://bucketName/prefix/.
options (dict) --
The other options of the S3 DataIntegrationFlow source.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetSource (dict) --
The dataset DataIntegrationFlow source.
datasetIdentifier (string) -- [REQUIRED]
The ARN of the dataset.
options (dict) --
The dataset DataIntegrationFlow source options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) -- [REQUIRED]
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) -- [REQUIRED]
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) -- [REQUIRED]
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) -- [REQUIRED]
The sort order for the deduplication field.
dict
The new transformation configurations for the DataIntegrationFlow.
transformationType (string) -- [REQUIRED]
The DataIntegrationFlow transformation type.
sqlTransformation (dict) --
The SQL DataIntegrationFlow transformation configuration.
query (string) -- [REQUIRED]
The transformation SQL query body based on SparkSQL.
dict
The new target configurations for the DataIntegrationFlow.
targetType (string) -- [REQUIRED]
The DataIntegrationFlow target type.
s3Target (dict) --
The S3 DataIntegrationFlow target.
bucketName (string) -- [REQUIRED]
The bucketName of the S3 target objects.
prefix (string) -- [REQUIRED]
The prefix of the S3 target objects.
options (dict) --
The S3 DataIntegrationFlow target options.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetTarget (dict) --
The dataset DataIntegrationFlow target. Note that for AWS Supply Chain dataset under asc namespace, it has a connection_id internal field that is not allowed to be provided by client directly, they will be auto populated.
datasetIdentifier (string) -- [REQUIRED]
The dataset ARN.
options (dict) --
The dataset DataIntegrationFlow target options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) -- [REQUIRED]
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) -- [REQUIRED]
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) -- [REQUIRED]
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) -- [REQUIRED]
The sort order for the deduplication field.
dict
Response Syntax
{ 'flow': { 'instanceId': 'string', 'name': 'string', 'sources': [ { 'sourceType': 'S3'|'DATASET', 'sourceName': 'string', 's3Source': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetSource': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, ], 'transformation': { 'transformationType': 'SQL'|'NONE', 'sqlTransformation': { 'query': 'string' } }, 'target': { 'targetType': 'S3'|'DATASET', 's3Target': { 'bucketName': 'string', 'prefix': 'string', 'options': { 'fileType': 'CSV'|'PARQUET'|'JSON' } }, 'datasetTarget': { 'datasetIdentifier': 'string', 'options': { 'loadType': 'INCREMENTAL'|'REPLACE', 'dedupeRecords': True|False, 'dedupeStrategy': { 'type': 'FIELD_PRIORITY', 'fieldPriority': { 'fields': [ { 'name': 'string', 'sortOrder': 'ASC'|'DESC' }, ] } } } } }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters for UpdateDataIntegrationFlow.
flow (dict) --
The details of the updated DataIntegrationFlow.
instanceId (string) --
The DataIntegrationFlow instance ID.
name (string) --
The DataIntegrationFlow name.
sources (list) --
The DataIntegrationFlow source configurations.
(dict) --
The DataIntegrationFlow source parameters.
sourceType (string) --
The DataIntegrationFlow source type.
sourceName (string) --
The DataIntegrationFlow source name that can be used as table alias in SQL transformation query.
s3Source (dict) --
The S3 DataIntegrationFlow source.
bucketName (string) --
The bucketName of the S3 source objects.
prefix (string) --
The prefix of the S3 source objects. To trigger data ingestion, S3 files need to be put under s3://bucketName/prefix/.
options (dict) --
The other options of the S3 DataIntegrationFlow source.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetSource (dict) --
The dataset DataIntegrationFlow source.
datasetIdentifier (string) --
The ARN of the dataset.
options (dict) --
The dataset DataIntegrationFlow source options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) --
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) --
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) --
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) --
The sort order for the deduplication field.
transformation (dict) --
The DataIntegrationFlow transformation configurations.
transformationType (string) --
The DataIntegrationFlow transformation type.
sqlTransformation (dict) --
The SQL DataIntegrationFlow transformation configuration.
query (string) --
The transformation SQL query body based on SparkSQL.
target (dict) --
The DataIntegrationFlow target configuration.
targetType (string) --
The DataIntegrationFlow target type.
s3Target (dict) --
The S3 DataIntegrationFlow target.
bucketName (string) --
The bucketName of the S3 target objects.
prefix (string) --
The prefix of the S3 target objects.
options (dict) --
The S3 DataIntegrationFlow target options.
fileType (string) --
The Amazon S3 file type in S3 options.
datasetTarget (dict) --
The dataset DataIntegrationFlow target. Note that for AWS Supply Chain dataset under asc namespace, it has a connection_id internal field that is not allowed to be provided by client directly, they will be auto populated.
datasetIdentifier (string) --
The dataset ARN.
options (dict) --
The dataset DataIntegrationFlow target options.
loadType (string) --
The target dataset's data load type. This only affects how source S3 files are selected in the S3-to-dataset flow.
REPLACE - Target dataset will get replaced with the new file added under the source s3 prefix.
INCREMENTAL - Target dataset will get updated with the up-to-date content under S3 prefix incorporating any file additions or removals there.
dedupeRecords (boolean) --
The option to perform deduplication on data records sharing same primary key values. If disabled, transformed data with duplicate primary key values will ingest into dataset, for datasets within asc namespace, such duplicates will cause ingestion fail. If enabled without dedupeStrategy, deduplication is done by retaining a random data record among those sharing the same primary key values. If enabled with dedupeStragtegy, the deduplication is done following the strategy.
Note that target dataset may have partition configured, when dedupe is enabled, it only dedupe against primary keys and retain only one record out of those duplicates regardless of its partition status.
dedupeStrategy (dict) --
The deduplication strategy to dedupe the data records sharing same primary key values of the target dataset. This strategy only applies to target dataset with primary keys and with dedupeRecords option enabled. If transformed data still got duplicates after the dedupeStrategy evaluation, a random data record is chosen to be retained.
type (string) --
The type of the deduplication strategy.
FIELD_PRIORITY - Field priority configuration for the deduplication strategy specifies an ordered list of fields used to tie-break the data records sharing the same primary key values. Fields earlier in the list have higher priority for evaluation. For each field, the sort order determines whether to retain data record with larger or smaller field value.
fieldPriority (dict) --
The field priority deduplication strategy.
fields (list) --
The list of field names and their sort order for deduplication, arranged in descending priority from highest to lowest.
(dict) --
The field used in the field priority deduplication strategy.
name (string) --
The name of the deduplication field. Must exist in the dataset and not be a primary key.
sortOrder (string) --
The sort order for the deduplication field.
createdTime (datetime) --
The DataIntegrationFlow creation timestamp.
lastModifiedTime (datetime) --
The DataIntegrationFlow last modified timestamp.
{'dataset': {'partitionSpec': {'fields': [{'name': 'string', 'transform': {'type': 'YEAR | MONTH ' '| DAY | HOUR ' '| ' 'IDENTITY'}}]}, 'schema': {'fields': {'type': {'LONG'}}, 'primaryKeys': [{'name': 'string'}]}}}
Enables you to programmatically update an Amazon Web Services Supply Chain data lake dataset. Developers can update the description of a data lake dataset for a given instance ID, namespace, and dataset name.
See also: AWS API Documentation
Request Syntax
client.update_data_lake_dataset( instanceId='string', namespace='string', name='string', description='string' )
string
[REQUIRED]
The Amazon Web Services Chain instance identifier.
string
[REQUIRED]
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
string
[REQUIRED]
The name of the dataset. For asc namespace, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
string
The updated description of the data lake dataset.
dict
Response Syntax
{ 'dataset': { 'instanceId': 'string', 'namespace': 'string', 'name': 'string', 'arn': 'string', 'schema': { 'name': 'string', 'fields': [ { 'name': 'string', 'type': 'INT'|'DOUBLE'|'STRING'|'TIMESTAMP'|'LONG', 'isRequired': True|False }, ], 'primaryKeys': [ { 'name': 'string' }, ] }, 'description': 'string', 'partitionSpec': { 'fields': [ { 'name': 'string', 'transform': { 'type': 'YEAR'|'MONTH'|'DAY'|'HOUR'|'IDENTITY' } }, ] }, 'createdTime': datetime(2015, 1, 1), 'lastModifiedTime': datetime(2015, 1, 1) } }
Response Structure
(dict) --
The response parameters of UpdateDataLakeDataset.
dataset (dict) --
The updated dataset details.
instanceId (string) --
The Amazon Web Services Supply Chain instance identifier.
namespace (string) --
The namespace of the dataset, besides the custom defined namespace, every instance comes with below pre-defined namespaces:
asc - For information on the Amazon Web Services Supply Chain supported datasets see https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
default - For datasets with custom user-defined schemas.
name (string) --
The name of the dataset. For asc namespace, the name must be one of the supported data entities under https://docs.aws.amazon.com/aws-supply-chain/latest/userguide/data-model-asc.html.
arn (string) --
The arn of the dataset.
schema (dict) --
The schema of the dataset.
name (string) --
The name of the dataset schema.
fields (list) --
The list of field details of the dataset schema.
(dict) --
The dataset field details.
name (string) --
The dataset field name.
type (string) --
The dataset field type.
isRequired (boolean) --
Indicate if the field is required or not.
primaryKeys (list) --
The list of primary key fields for the dataset. Primary keys defined can help data ingestion methods to ensure data uniqueness: CreateDataIntegrationFlow's dedupe strategy will leverage primary keys to perform records deduplication before write to dataset; SendDataIntegrationEvent's UPSERT and DELETE can only work with dataset with primary keys. For more details, refer to those data ingestion documentations.
Note that defining primary keys does not necessarily mean the dataset cannot have duplicate records, duplicate records can still be ingested if CreateDataIntegrationFlow's dedupe disabled or through SendDataIntegrationEvent's APPEND operation.
(dict) --
The detail of the primary key field.
name (string) --
The name of the primary key field.
description (string) --
The description of the dataset.
partitionSpec (dict) --
The partition specification for a dataset.
fields (list) --
The fields on which to partition a dataset. The partitions will be applied hierarchically based on the order of this list.
(dict) --
The detail of the partition field.
name (string) --
The name of the partition field.
transform (dict) --
The transformation of the partition field. A transformation specifies how to partition on a given field. For example, with timestamp you can specify that you'd like to partition fields by day, e.g. data record with value 2025-01-03T00:00:00Z in partition field is in 2025-01-03 partition. Also noted that data record without any value in optional partition field is in NULL partition.
type (string) --
The type of partitioning transformation for this field. The available options are:
IDENTITY - Partitions data on a given field by its exact values.
YEAR - Partitions data on a timestamp field using year granularity.
MONTH - Partitions data on a timestamp field using month granularity.
DAY - Partitions data on a timestamp field using day granularity.
HOUR - Partitions data on a timestamp field using hour granularity.
createdTime (datetime) --
The creation time of the dataset.
lastModifiedTime (datetime) --
The last modified time of the dataset.