AWS EntityResolution

2025/09/23 - AWS EntityResolution - 6 updated api methods

Changes  Support incremental id mapping workflow for AWS Entity Resolution

CreateIdMappingWorkflow (updated) Link ¶
Changes (both)
{'incrementalRunConfig': {'incrementalRunType': 'ON_DEMAND'}}

Creates an IdMappingWorkflow object which stores the configuration of the data processing job to be run. Each IdMappingWorkflow must have a unique workflow name. To modify an existing workflow, use the UpdateIdMappingWorkflow API.

See also: AWS API Documentation

Request Syntax

client.create_id_mapping_workflow(
    workflowName='string',
    description='string',
    inputSourceConfig=[
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'type': 'SOURCE'|'TARGET'
        },
    ],
    outputSourceConfig=[
        {
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    idMappingTechniques={
        'idMappingType': 'PROVIDER'|'RULE_BASED',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'ruleDefinitionType': 'SOURCE'|'TARGET',
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET'
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    incrementalRunConfig={
        'incrementalRunType': 'ON_DEMAND'
    },
    roleArn='string',
    tags={
        'string': 'string'
    }
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow. There can't be multiple IdMappingWorkflows with the same name.

type description:

string

param description:

A description of the workflow.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing inputSourceARN, schemaName, and type.

    • inputSourceARN (string) -- [REQUIRED]

      An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.

    • schemaName (string) --

      The name of the schema to be retrieved.

    • type (string) --

      The type of ID namespace. There are two types: SOURCE and TARGET.

      The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.

      The TARGET contains a configuration of targetId which all sourceIds will resolve to.

type outputSourceConfig:

list

param outputSourceConfig:

A list of IdMappingWorkflowOutputSource objects, each of which contains fields outputS3Path and KMSArn.

  • (dict) --

    The output source for the ID mapping workflow.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

type idMappingTechniques:

dict

param idMappingTechniques:

[REQUIRED]

An object which defines the ID mapping technique and any additional configurations.

  • idMappingType (string) -- [REQUIRED]

    The type of ID mapping.

  • ruleBasedProperties (dict) --

    An object which defines any additional configurations required by rule-based matching.

    • rules (list) --

      The rules that can be used for ID mapping.

      • (dict) --

        An object containing the ruleName and matchingKeys.

        • ruleName (string) -- [REQUIRED]

          A name for the matching rule.

        • matchingKeys (list) -- [REQUIRED]

          A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

          • (string) --

    • ruleDefinitionType (string) -- [REQUIRED]

      The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.

    • attributeMatchingModel (string) -- [REQUIRED]

      The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.

      If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.

      If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.

    • recordMatchingModel (string) -- [REQUIRED]

      The type of matching record that is allowed to be used in an ID mapping workflow.

      If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.

      If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.

  • providerProperties (dict) --

    An object which defines any additional configurations required by the provider service.

    • providerServiceArn (string) -- [REQUIRED]

      The ARN of the provider service.

    • providerConfiguration (:ref:`document<document>`) --

      The required configuration fields to use with the provider service.

    • intermediateSourceConfiguration (dict) --

      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

      • intermediateS3Path (string) -- [REQUIRED]

        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

type incrementalRunConfig:

dict

param incrementalRunConfig:

The incremental run configuration for the ID mapping workflow.

  • incrementalRunType (string) --

    The incremental run type for an ID mapping workflow.

    It takes only one value: ON_DEMAND. This setting runs the ID mapping workflow when it's manually triggered through the StartIdMappingJob API.

type roleArn:

string

param roleArn:

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'workflowName': 'string',
    'workflowArn': 'string',
    'description': 'string',
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'type': 'SOURCE'|'TARGET'
        },
    ],
    'outputSourceConfig': [
        {
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    'idMappingTechniques': {
        'idMappingType': 'PROVIDER'|'RULE_BASED',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'ruleDefinitionType': 'SOURCE'|'TARGET',
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET'
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    'incrementalRunConfig': {
        'incrementalRunType': 'ON_DEMAND'
    },
    'roleArn': 'string'
}

Response Structure

  • (dict) --

    • workflowName (string) --

      The name of the workflow.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the IDMappingWorkflow.

    • description (string) --

      A description of the workflow.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing inputSourceARN, schemaName, and type.

        • inputSourceARN (string) --

          An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

        • type (string) --

          The type of ID namespace. There are two types: SOURCE and TARGET.

          The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.

          The TARGET contains a configuration of targetId which all sourceIds will resolve to.

    • outputSourceConfig (list) --

      A list of IdMappingWorkflowOutputSource objects, each of which contains fields outputS3Path and KMSArn.

      • (dict) --

        The output source for the ID mapping workflow.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • idMappingTechniques (dict) --

      An object which defines the ID mapping technique and any additional configurations.

      • idMappingType (string) --

        The type of ID mapping.

      • ruleBasedProperties (dict) --

        An object which defines any additional configurations required by rule-based matching.

        • rules (list) --

          The rules that can be used for ID mapping.

          • (dict) --

            An object containing the ruleName and matchingKeys.

            • ruleName (string) --

              A name for the matching rule.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

        • ruleDefinitionType (string) --

          The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.

        • attributeMatchingModel (string) --

          The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.

          If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.

          If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.

        • recordMatchingModel (string) --

          The type of matching record that is allowed to be used in an ID mapping workflow.

          If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.

          If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.

      • providerProperties (dict) --

        An object which defines any additional configurations required by the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • incrementalRunConfig (dict) --

      The incremental run configuration for the ID mapping workflow.

      • incrementalRunType (string) --

        The incremental run type for an ID mapping workflow.

        It takes only one value: ON_DEMAND. This setting runs the ID mapping workflow when it's manually triggered through the StartIdMappingJob API.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

GetIdMappingJob (updated) Link ¶
Changes (response)
{'jobType': 'BATCH | INCREMENTAL | DELETE_ONLY',
 'metrics': {'deleteRecordsProcessed': 'integer',
             'mappedRecordsRemoved': 'integer',
             'mappedSourceRecordsRemoved': 'integer',
             'mappedTargetRecordsRemoved': 'integer',
             'newMappedRecords': 'integer',
             'newMappedSourceRecords': 'integer',
             'newMappedTargetRecords': 'integer',
             'newUniqueRecordsLoaded': 'integer'}}

Returns the status, metrics, and errors (if there are any) that are associated with a job.

See also: AWS API Documentation

Request Syntax

client.get_id_mapping_job(
    workflowName='string',
    jobId='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

type jobId:

string

param jobId:

[REQUIRED]

The ID of the job.

rtype:

dict

returns:

Response Syntax

{
    'jobId': 'string',
    'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED',
    'startTime': datetime(2015, 1, 1),
    'endTime': datetime(2015, 1, 1),
    'metrics': {
        'inputRecords': 123,
        'totalRecordsProcessed': 123,
        'recordsNotProcessed': 123,
        'deleteRecordsProcessed': 123,
        'totalMappedRecords': 123,
        'totalMappedSourceRecords': 123,
        'totalMappedTargetRecords': 123,
        'uniqueRecordsLoaded': 123,
        'newMappedRecords': 123,
        'newMappedSourceRecords': 123,
        'newMappedTargetRecords': 123,
        'newUniqueRecordsLoaded': 123,
        'mappedRecordsRemoved': 123,
        'mappedSourceRecordsRemoved': 123,
        'mappedTargetRecordsRemoved': 123
    },
    'errorDetails': {
        'errorMessage': 'string'
    },
    'outputSourceConfig': [
        {
            'roleArn': 'string',
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    'jobType': 'BATCH'|'INCREMENTAL'|'DELETE_ONLY'
}

Response Structure

  • (dict) --

    • jobId (string) --

      The ID of the job.

    • status (string) --

      The current status of the job.

    • startTime (datetime) --

      The time at which the job was started.

    • endTime (datetime) --

      The time at which the job has finished.

    • metrics (dict) --

      Metrics associated with the execution, specifically total records processed, unique IDs generated, and records the execution skipped.

      • inputRecords (integer) --

        The total number of records that were input for processing.

      • totalRecordsProcessed (integer) --

        The total number of records that were processed.

      • recordsNotProcessed (integer) --

        The total number of records that did not get processed.

      • deleteRecordsProcessed (integer) --

        The number of records processed that were marked for deletion in the input file using the DELETE schema mapping field. These are the records to be removed from the ID mapping table.

      • totalMappedRecords (integer) --

        The total number of records that were mapped.

      • totalMappedSourceRecords (integer) --

        The total number of mapped source records.

      • totalMappedTargetRecords (integer) --

        The total number of distinct mapped target records.

      • uniqueRecordsLoaded (integer) --

        The number of de-duplicated processed records across all runs, excluding deletion-related records. Duplicates are determined by the field marked as UNIQUE_ID in your schema mapping. Records sharing the same value in this field are considered duplicates. For example, if you specified "customer_id" as a UNIQUE_ID field and had three records with the same customer_id value, they would count as one unique record in this metric.

      • newMappedRecords (integer) --

        The number of new mapped records.

      • newMappedSourceRecords (integer) --

        The number of new source records mapped.

      • newMappedTargetRecords (integer) --

        The number of new mapped target records.

      • newUniqueRecordsLoaded (integer) --

        The number of new unique records processed in the current job run, after removing duplicates. This metric excludes deletion-related records. Duplicates are determined by the field marked as UNIQUE_ID in your schema mapping. Records sharing the same value in this field are considered duplicates. For example, if your current run processes five new records with the same UNIQUE_ID value, they would count as one new unique record in this metric.

      • mappedRecordsRemoved (integer) --

        The number of mapped records removed.

      • mappedSourceRecordsRemoved (integer) --

        The number of source records removed due to ID mapping.

      • mappedTargetRecordsRemoved (integer) --

        The number of mapped target records removed.

    • errorDetails (dict) --

      An object containing an error message, if there was an error.

      • errorMessage (string) --

        The error message from the job, if there is one.

    • outputSourceConfig (list) --

      A list of OutputSource objects.

      • (dict) --

        An object containing KMSArn, outputS3Path, and roleARN.

        • roleArn (string) --

          The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf as part of workflow execution.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • jobType (string) --

      The job type of the ID mapping job.

      A value of INCREMENTAL indicates that only new or changed data was processed since the last job run. This is the default job type if the workflow was created with an incrementalRunConfig.

      A value of BATCH indicates that all data was processed from the input source, regardless of previous job runs. This is the default job type if the workflow wasn't created with an incrementalRunConfig.

      A value of DELETE_ONLY indicates that only deletion requests from BatchDeleteUniqueIds were processed.

GetIdMappingWorkflow (updated) Link ¶
Changes (response)
{'incrementalRunConfig': {'incrementalRunType': 'ON_DEMAND'}}

Returns the IdMappingWorkflow with a given name, if it exists.

See also: AWS API Documentation

Request Syntax

client.get_id_mapping_workflow(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'workflowName': 'string',
    'workflowArn': 'string',
    'description': 'string',
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'type': 'SOURCE'|'TARGET'
        },
    ],
    'outputSourceConfig': [
        {
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    'idMappingTechniques': {
        'idMappingType': 'PROVIDER'|'RULE_BASED',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'ruleDefinitionType': 'SOURCE'|'TARGET',
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET'
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1),
    'incrementalRunConfig': {
        'incrementalRunType': 'ON_DEMAND'
    },
    'roleArn': 'string',
    'tags': {
        'string': 'string'
    }
}

Response Structure

  • (dict) --

    • workflowName (string) --

      The name of the workflow.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the IdMappingWorkflow .

    • description (string) --

      A description of the workflow.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing inputSourceARN, schemaName, and type.

        • inputSourceARN (string) --

          An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

        • type (string) --

          The type of ID namespace. There are two types: SOURCE and TARGET.

          The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.

          The TARGET contains a configuration of targetId which all sourceIds will resolve to.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields outputS3Path and KMSArn.

      • (dict) --

        The output source for the ID mapping workflow.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • idMappingTechniques (dict) --

      An object which defines the ID mapping technique and any additional configurations.

      • idMappingType (string) --

        The type of ID mapping.

      • ruleBasedProperties (dict) --

        An object which defines any additional configurations required by rule-based matching.

        • rules (list) --

          The rules that can be used for ID mapping.

          • (dict) --

            An object containing the ruleName and matchingKeys.

            • ruleName (string) --

              A name for the matching rule.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

        • ruleDefinitionType (string) --

          The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.

        • attributeMatchingModel (string) --

          The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.

          If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.

          If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.

        • recordMatchingModel (string) --

          The type of matching record that is allowed to be used in an ID mapping workflow.

          If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.

          If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.

      • providerProperties (dict) --

        An object which defines any additional configurations required by the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • createdAt (datetime) --

      The timestamp of when the workflow was created.

    • updatedAt (datetime) --

      The timestamp of when the workflow was last updated.

    • incrementalRunConfig (dict) --

      The incremental run configuration for the ID mapping workflow.

      • incrementalRunType (string) --

        The incremental run type for an ID mapping workflow.

        It takes only one value: ON_DEMAND. This setting runs the ID mapping workflow when it's manually triggered through the StartIdMappingJob API.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

GetMatchingJob (updated) Link ¶
Changes (response)
{'metrics': {'deleteRecordsProcessed': 'integer'}}

Returns the status, metrics, and errors (if there are any) that are associated with a job.

See also: AWS API Documentation

Request Syntax

client.get_matching_job(
    workflowName='string',
    jobId='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

type jobId:

string

param jobId:

[REQUIRED]

The ID of the job.

rtype:

dict

returns:

Response Syntax

{
    'jobId': 'string',
    'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED',
    'startTime': datetime(2015, 1, 1),
    'endTime': datetime(2015, 1, 1),
    'metrics': {
        'inputRecords': 123,
        'totalRecordsProcessed': 123,
        'recordsNotProcessed': 123,
        'deleteRecordsProcessed': 123,
        'matchIDs': 123
    },
    'errorDetails': {
        'errorMessage': 'string'
    },
    'outputSourceConfig': [
        {
            'roleArn': 'string',
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • jobId (string) --

      The unique identifier of the matching job.

    • status (string) --

      The current status of the job.

    • startTime (datetime) --

      The time at which the job was started.

    • endTime (datetime) --

      The time at which the job has finished.

    • metrics (dict) --

      Metrics associated with the execution, specifically total records processed, unique IDs generated, and records the execution skipped.

      • inputRecords (integer) --

        The total number of input records.

      • totalRecordsProcessed (integer) --

        The total number of records processed.

      • recordsNotProcessed (integer) --

        The total number of records that did not get processed.

      • deleteRecordsProcessed (integer) --

        The number of records processed that were marked for deletion ( DELETE = True) in the input file. This metric tracks records flagged for removal during the job execution.

      • matchIDs (integer) --

        The total number of ``matchID``s generated.

    • errorDetails (dict) --

      An object containing an error message, if there was an error.

      • errorMessage (string) --

        The error message from the job, if there is one.

    • outputSourceConfig (list) --

      A list of OutputSource objects.

      • (dict) --

        An object containing KMSArn, outputS3Path, and roleArn.

        • roleArn (string) --

          The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf as part of workflow execution.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

StartIdMappingJob (updated) Link ¶
Changes (both)
{'jobType': 'BATCH | INCREMENTAL | DELETE_ONLY'}

Starts the IdMappingJob of a workflow. The workflow must have previously been created using the CreateIdMappingWorkflow endpoint.

See also: AWS API Documentation

Request Syntax

client.start_id_mapping_job(
    workflowName='string',
    outputSourceConfig=[
        {
            'roleArn': 'string',
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    jobType='BATCH'|'INCREMENTAL'|'DELETE_ONLY'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the ID mapping job to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

A list of OutputSource objects.

  • (dict) --

    An object containing KMSArn, outputS3Path, and roleARN.

    • roleArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf as part of workflow execution.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

type jobType:

string

param jobType:

The job type for the ID mapping job.

If the jobType value is set to INCREMENTAL, only new or changed data is processed since the last job run. This is the default value if the CreateIdMappingWorkflow API is configured with an incrementalRunConfig.

If the jobType value is set to BATCH, all data is processed from the input source, regardless of previous job runs. This is the default value if the CreateIdMappingWorkflow API isn't configured with an incrementalRunConfig.

If the jobType value is set to DELETE_ONLY, only deletion requests from BatchDeleteUniqueIds are processed.

rtype:

dict

returns:

Response Syntax

{
    'jobId': 'string',
    'outputSourceConfig': [
        {
            'roleArn': 'string',
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    'jobType': 'BATCH'|'INCREMENTAL'|'DELETE_ONLY'
}

Response Structure

  • (dict) --

    • jobId (string) --

      The ID of the job.

    • outputSourceConfig (list) --

      A list of OutputSource objects.

      • (dict) --

        An object containing KMSArn, outputS3Path, and roleARN.

        • roleArn (string) --

          The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf as part of workflow execution.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • jobType (string) --

      The job type for the started ID mapping job.

      A value of INCREMENTAL indicates that only new or changed data was processed since the last job run. This is the default job type if the workflow was created with an incrementalRunConfig.

      A value of BATCH indicates that all data was processed from the input source, regardless of previous job runs. This is the default job type if the workflow wasn't created with an incrementalRunConfig.

      A value of DELETE_ONLY indicates that only deletion requests from BatchDeleteUniqueIds were processed.

UpdateIdMappingWorkflow (updated) Link ¶
Changes (both)
{'incrementalRunConfig': {'incrementalRunType': 'ON_DEMAND'}}

Updates an existing IdMappingWorkflow. This method is identical to CreateIdMappingWorkflow, except it uses an HTTP PUT request instead of a POST request, and the IdMappingWorkflow must already exist for the method to succeed.

See also: AWS API Documentation

Request Syntax

client.update_id_mapping_workflow(
    workflowName='string',
    description='string',
    inputSourceConfig=[
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'type': 'SOURCE'|'TARGET'
        },
    ],
    outputSourceConfig=[
        {
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    idMappingTechniques={
        'idMappingType': 'PROVIDER'|'RULE_BASED',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'ruleDefinitionType': 'SOURCE'|'TARGET',
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET'
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    incrementalRunConfig={
        'incrementalRunType': 'ON_DEMAND'
    },
    roleArn='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

type description:

string

param description:

A description of the workflow.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing inputSourceARN, schemaName, and type.

    • inputSourceARN (string) -- [REQUIRED]

      An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.

    • schemaName (string) --

      The name of the schema to be retrieved.

    • type (string) --

      The type of ID namespace. There are two types: SOURCE and TARGET.

      The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.

      The TARGET contains a configuration of targetId which all sourceIds will resolve to.

type outputSourceConfig:

list

param outputSourceConfig:

A list of OutputSource objects, each of which contains fields outputS3Path and KMSArn.

  • (dict) --

    The output source for the ID mapping workflow.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

type idMappingTechniques:

dict

param idMappingTechniques:

[REQUIRED]

An object which defines the ID mapping technique and any additional configurations.

  • idMappingType (string) -- [REQUIRED]

    The type of ID mapping.

  • ruleBasedProperties (dict) --

    An object which defines any additional configurations required by rule-based matching.

    • rules (list) --

      The rules that can be used for ID mapping.

      • (dict) --

        An object containing the ruleName and matchingKeys.

        • ruleName (string) -- [REQUIRED]

          A name for the matching rule.

        • matchingKeys (list) -- [REQUIRED]

          A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

          • (string) --

    • ruleDefinitionType (string) -- [REQUIRED]

      The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.

    • attributeMatchingModel (string) -- [REQUIRED]

      The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.

      If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.

      If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.

    • recordMatchingModel (string) -- [REQUIRED]

      The type of matching record that is allowed to be used in an ID mapping workflow.

      If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.

      If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.

  • providerProperties (dict) --

    An object which defines any additional configurations required by the provider service.

    • providerServiceArn (string) -- [REQUIRED]

      The ARN of the provider service.

    • providerConfiguration (:ref:`document<document>`) --

      The required configuration fields to use with the provider service.

    • intermediateSourceConfiguration (dict) --

      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

      • intermediateS3Path (string) -- [REQUIRED]

        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

type incrementalRunConfig:

dict

param incrementalRunConfig:

The incremental run configuration for the update ID mapping workflow.

  • incrementalRunType (string) --

    The incremental run type for an ID mapping workflow.

    It takes only one value: ON_DEMAND. This setting runs the ID mapping workflow when it's manually triggered through the StartIdMappingJob API.

type roleArn:

string

param roleArn:

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.

rtype:

dict

returns:

Response Syntax

{
    'workflowName': 'string',
    'workflowArn': 'string',
    'description': 'string',
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'type': 'SOURCE'|'TARGET'
        },
    ],
    'outputSourceConfig': [
        {
            'outputS3Path': 'string',
            'KMSArn': 'string'
        },
    ],
    'idMappingTechniques': {
        'idMappingType': 'PROVIDER'|'RULE_BASED',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'ruleDefinitionType': 'SOURCE'|'TARGET',
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET'
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    'incrementalRunConfig': {
        'incrementalRunType': 'ON_DEMAND'
    },
    'roleArn': 'string'
}

Response Structure

  • (dict) --

    • workflowName (string) --

      The name of the workflow.

    • workflowArn (string) --

      The Amazon Resource Name (ARN) of the workflow role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.

    • description (string) --

      A description of the workflow.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing inputSourceARN, schemaName, and type.

        • inputSourceARN (string) --

          An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

        • type (string) --

          The type of ID namespace. There are two types: SOURCE and TARGET.

          The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.

          The TARGET contains a configuration of targetId which all sourceIds will resolve to.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields outputS3Path and KMSArn.

      • (dict) --

        The output source for the ID mapping workflow.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • idMappingTechniques (dict) --

      An object which defines the ID mapping technique and any additional configurations.

      • idMappingType (string) --

        The type of ID mapping.

      • ruleBasedProperties (dict) --

        An object which defines any additional configurations required by rule-based matching.

        • rules (list) --

          The rules that can be used for ID mapping.

          • (dict) --

            An object containing the ruleName and matchingKeys.

            • ruleName (string) --

              A name for the matching rule.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

        • ruleDefinitionType (string) --

          The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.

        • attributeMatchingModel (string) --

          The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.

          If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.

          If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.

        • recordMatchingModel (string) --

          The type of matching record that is allowed to be used in an ID mapping workflow.

          If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.

          If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.

      • providerProperties (dict) --

        An object which defines any additional configurations required by the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • incrementalRunConfig (dict) --

      The incremental run configuration for the update ID mapping workflow output.

      • incrementalRunType (string) --

        The incremental run type for an ID mapping workflow.

        It takes only one value: ON_DEMAND. This setting runs the ID mapping workflow when it's manually triggered through the StartIdMappingJob API.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.