AWS EntityResolution

2023/10/16 - AWS EntityResolution - 11 new7 updated api methods

Changes  This launch expands our matching techniques to include provider-based matching to help customer match, link, and enhance records with minimal data movement. With data service providers, we have removed the need for customers to build bespoke integrations,.

DeleteIdMappingWorkflow (new) Link ¶

Deletes the IdMappingWorkflow with a given name. This operation will succeed even if a workflow with the given name does not exist.

See also: AWS API Documentation

Request Syntax

client.delete_id_mapping_workflow(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be deleted.

rtype:

dict

returns:

Response Syntax

{
    'message': 'string'
}

Response Structure

  • (dict) --

    • message (string) --

      A successful operation message.

ListIdMappingJobs (new) Link ¶

Lists all ID mapping jobs for a given workflow.

See also: AWS API Documentation

Request Syntax

client.list_id_mapping_jobs(
    maxResults=123,
    nextToken='string',
    workflowName='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous API call.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'jobs': [
        {
            'endTime': datetime(2015, 1, 1),
            'jobId': 'string',
            'startTime': datetime(2015, 1, 1),
            'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED'
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • jobs (list) --

      A list of JobSummary objects.

      • (dict) --

        An object containing the JobId, Status, StartTime, and EndTime of a job.

        • endTime (datetime) --

          The time at which the job has finished.

        • jobId (string) --

          The ID of the job.

        • startTime (datetime) --

          The time at which the job was started.

        • status (string) --

          The current status of the job.

    • nextToken (string) --

      The pagination token from the previous API call.

GetIdMappingWorkflow (new) Link ¶

Returns the IdMappingWorkflow with a given name, if it exists.

See also: AWS API Documentation

Request Syntax

client.get_id_mapping_workflow(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'createdAt': datetime(2015, 1, 1),
    'description': 'string',
    'idMappingTechniques': {
        'idMappingType': 'PROVIDER',
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        }
    },
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'outputS3Path': 'string'
        },
    ],
    'roleArn': 'string',
    'tags': {
        'string': 'string'
    },
    'updatedAt': datetime(2015, 1, 1),
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • createdAt (datetime) --

      The timestamp of when the workflow was created.

    • description (string) --

      A description of the workflow.

    • idMappingTechniques (dict) --

      An object which defines the idMappingType and the providerProperties.

      • idMappingType (string) --

        The type of ID mapping.

      • providerProperties (dict) --

        An object which defines any additional configurations required by the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN and SchemaName.

        • inputSourceARN (string) --

          An Gluetable ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path and KMSArn.

      • (dict) --

        The output source for the ID mapping workflow.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access resources on your behalf.

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

    • updatedAt (datetime) --

      The timestamp of when the workflow was last updated.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the IdMappingWorkflow .

    • workflowName (string) --

      The name of the workflow.

UpdateIdMappingWorkflow (new) Link ¶

Updates an existing IdMappingWorkflow. This method is identical to CreateIdMappingWorkflow, except it uses an HTTP PUT request instead of a POST request, and the IdMappingWorkflow must already exist for the method to succeed.

See also: AWS API Documentation

Request Syntax

client.update_id_mapping_workflow(
    description='string',
    idMappingTechniques={
        'idMappingType': 'PROVIDER',
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        }
    },
    inputSourceConfig=[
        {
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    outputSourceConfig=[
        {
            'KMSArn': 'string',
            'outputS3Path': 'string'
        },
    ],
    roleArn='string',
    workflowName='string'
)
type description:

string

param description:

A description of the workflow.

type idMappingTechniques:

dict

param idMappingTechniques:

[REQUIRED]

An object which defines the idMappingType and the providerProperties.

  • idMappingType (string) -- [REQUIRED]

    The type of ID mapping.

  • providerProperties (dict) -- [REQUIRED]

    An object which defines any additional configurations required by the provider service.

    • intermediateSourceConfiguration (dict) --

      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

      • intermediateS3Path (string) -- [REQUIRED]

        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • providerConfiguration (:ref:`document<document>`) --

      The required configuration fields to use with the provider service.

    • providerServiceArn (string) -- [REQUIRED]

      The ARN of the provider service.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing InputSourceARN and SchemaName.

    • inputSourceARN (string) -- [REQUIRED]

      An Gluetable ARN for the input source table.

    • schemaName (string) -- [REQUIRED]

      The name of the schema to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields OutputS3Path and KMSArn.

  • (dict) --

    The output source for the ID mapping workflow.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access resources on your behalf.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'idMappingTechniques': {
        'idMappingType': 'PROVIDER',
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        }
    },
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'outputS3Path': 'string'
        },
    ],
    'roleArn': 'string',
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the workflow.

    • idMappingTechniques (dict) --

      An object which defines the idMappingType and the providerProperties.

      • idMappingType (string) --

        The type of ID mapping.

      • providerProperties (dict) --

        An object which defines any additional configurations required by the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN and SchemaName.

        • inputSourceARN (string) --

          An Gluetable ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path and KMSArn.

      • (dict) --

        The output source for the ID mapping workflow.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access resources on your behalf.

    • workflowArn (string) --

      The Amazon Resource Name (ARN) of the workflow role. Entity Resolution assumes this role to access resources on your behalf.

    • workflowName (string) --

      The name of the workflow.

GetProviderService (new) Link ¶

Returns the ProviderService of a given name.

See also: AWS API Documentation

Request Syntax

client.get_provider_service(
    providerName='string',
    providerServiceName='string'
)
type providerName:

string

param providerName:

[REQUIRED]

The name of the provider. This name is typically the company name.

type providerServiceName:

string

param providerServiceName:

[REQUIRED]

The ARN (Amazon Resource Name) of the product that the provider service provides.

rtype:

dict

returns:

Response Syntax

{
    'anonymizedOutput': True|False,
    'providerConfigurationDefinition': {...}|[...]|123|123.4|'string'|True|None,
    'providerEndpointConfiguration': {
        'marketplaceConfiguration': {
            'assetId': 'string',
            'dataSetId': 'string',
            'listingId': 'string',
            'revisionId': 'string'
        }
    },
    'providerEntityOutputDefinition': {...}|[...]|123|123.4|'string'|True|None,
    'providerIntermediateDataAccessConfiguration': {
        'awsAccountIds': [
            'string',
        ],
        'requiredBucketActions': [
            'string',
        ]
    },
    'providerName': 'string',
    'providerServiceArn': 'string',
    'providerServiceDisplayName': 'string',
    'providerServiceName': 'string',
    'providerServiceType': 'ASSIGNMENT'|'ID_MAPPING'
}

Response Structure

  • (dict) --

    • anonymizedOutput (boolean) --

      Specifies whether output data from the provider is anonymized. A value of TRUE means the output will be anonymized and you can't relate the data that comes back from the provider to the identifying input. A value of FALSE means the output won't be anonymized and you can relate the data that comes back from the provider to your source data.

    • providerConfigurationDefinition (:ref:`document<document>`) --

      The definition of the provider configuration.

    • providerEndpointConfiguration (dict) --

      The required configuration fields to use with the provider service.

      • marketplaceConfiguration (dict) --

        The identifiers of the provider service, from Data Exchange.

        • assetId (string) --

          The asset ID on Data Exchange.

        • dataSetId (string) --

          The dataset ID on Data Exchange.

        • listingId (string) --

          The listing ID on Data Exchange.

        • revisionId (string) --

          The revision ID on Data Exchange.

    • providerEntityOutputDefinition (:ref:`document<document>`) --

      The definition of the provider entity output.

    • providerIntermediateDataAccessConfiguration (dict) --

      The Amazon Web Services accounts and the S3 permissions that are required by some providers to create an S3 bucket for intermediate data storage.

      • awsAccountIds (list) --

        The Amazon Web Services account that provider can use to read or write data into the customer's intermediate S3 bucket.

        • (string) --

      • requiredBucketActions (list) --

        The S3 bucket actions that the provider requires permission for.

        • (string) --

    • providerName (string) --

      The name of the provider. This name is typically the company name.

    • providerServiceArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the provider service.

    • providerServiceDisplayName (string) --

      The display name of the provider service.

    • providerServiceName (string) --

      The name of the product that the provider service provides.

    • providerServiceType (string) --

      The type of provider service.

ListProviderServices (new) Link ¶

Returns a list of all the ProviderServices that are available in this Amazon Web Services Region.

See also: AWS API Documentation

Request Syntax

client.list_provider_services(
    maxResults=123,
    nextToken='string',
    providerName='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous API call.

type providerName:

string

param providerName:

The name of the provider. This name is typically the company name.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'providerServiceSummaries': [
        {
            'providerName': 'string',
            'providerServiceArn': 'string',
            'providerServiceDisplayName': 'string',
            'providerServiceName': 'string',
            'providerServiceType': 'ASSIGNMENT'|'ID_MAPPING'
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      The pagination token from the previous API call.

    • providerServiceSummaries (list) --

      A list of ProviderServices objects.

      • (dict) --

        A list of ProviderService objects, each of which contain the fields providerName, providerServiceArn, providerServiceName, and providerServiceType.

        • providerName (string) --

          The name of the provider. This name is typically the company name.

        • providerServiceArn (string) --

          The ARN (Amazon Resource Name) that Entity Resolution generated for the providerService.

        • providerServiceDisplayName (string) --

          The display name of the provider service.

        • providerServiceName (string) --

          The name of the product that the provider service provides.

        • providerServiceType (string) --

          The type of provider service.

ListIdMappingWorkflows (new) Link ¶

Returns a list of all the IdMappingWorkflows that have been created for an Amazon Web Services account.

See also: AWS API Documentation

Request Syntax

client.list_id_mapping_workflows(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous API call.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'workflowSummaries': [
        {
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'workflowArn': 'string',
            'workflowName': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      The pagination token from the previous API call.

    • workflowSummaries (list) --

      A list of IdMappingWorkflowSummary objects.

      • (dict) --

        A list of IdMappingWorkflowSummary objects, each of which contain the fields WorkflowName, WorkflowArn, CreatedAt, and UpdatedAt.

        • createdAt (datetime) --

          The timestamp of when the workflow was created.

        • updatedAt (datetime) --

          The timestamp of when the workflow was last updated.

        • workflowArn (string) --

          The ARN (Amazon Resource Name) that Entity Resolution generated for the IdMappingWorkflow.

        • workflowName (string) --

          The name of the workflow.

UpdateSchemaMapping (new) Link ¶

Updates a schema mapping.

See also: AWS API Documentation

Request Syntax

client.update_schema_mapping(
    description='string',
    mappedInputFields=[
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'subType': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID'
        },
    ],
    schemaName='string'
)
type description:

string

param description:

A description of the schema.

type mappedInputFields:

list

param mappedInputFields:

[REQUIRED]

A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.

  • (dict) --

    An object containing FieldName, Type, GroupName, and MatchKey.

    • fieldName (string) -- [REQUIRED]

      A string containing the field name.

    • groupName (string) --

      Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

    • matchKey (string) --

      A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

    • subType (string) --

      The subtype of the attribute, selected from a list of values.

    • type (string) -- [REQUIRED]

      The type of the attribute, selected from a list of values.

type schemaName:

string

param schemaName:

[REQUIRED]

The name of the schema. There can't be multiple SchemaMappings with the same name.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'mappedInputFields': [
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'subType': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID'
        },
    ],
    'schemaArn': 'string',
    'schemaName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the schema.

    • mappedInputFields (list) --

      A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.

      • (dict) --

        An object containing FieldName, Type, GroupName, and MatchKey.

        • fieldName (string) --

          A string containing the field name.

        • groupName (string) --

          Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

        • matchKey (string) --

          A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

        • subType (string) --

          The subtype of the attribute, selected from a list of values.

        • type (string) --

          The type of the attribute, selected from a list of values.

    • schemaArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

    • schemaName (string) --

      The name of the schema.

StartIdMappingJob (new) Link ¶

Starts the IdMappingJob of a workflow. The workflow must have previously been created using the CreateIdMappingWorkflow endpoint.

See also: AWS API Documentation

Request Syntax

client.start_id_mapping_job(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the ID mapping job to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'jobId': 'string'
}

Response Structure

  • (dict) --

    • jobId (string) --

      The ID of the job.

GetIdMappingJob (new) Link ¶

Gets the status, metrics, and errors (if there are any) that are associated with a job.

See also: AWS API Documentation

Request Syntax

client.get_id_mapping_job(
    jobId='string',
    workflowName='string'
)
type jobId:

string

param jobId:

[REQUIRED]

The ID of the job.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'endTime': datetime(2015, 1, 1),
    'errorDetails': {
        'errorMessage': 'string'
    },
    'jobId': 'string',
    'metrics': {
        'inputRecords': 123,
        'recordsNotProcessed': 123,
        'totalRecordsProcessed': 123
    },
    'startTime': datetime(2015, 1, 1),
    'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED'
}

Response Structure

  • (dict) --

    • endTime (datetime) --

      The time at which the job has finished.

    • errorDetails (dict) --

      An object containing an error message, if there was an error.

      • errorMessage (string) --

        The error message from the job, if there is one.

    • jobId (string) --

      The ID of the job.

    • metrics (dict) --

      Metrics associated with the execution, specifically total records processed, unique IDs generated, and records the execution skipped.

      • inputRecords (integer) --

        The total number of input records.

      • recordsNotProcessed (integer) --

        The total number of records that did not get processed.

      • totalRecordsProcessed (integer) --

        The total number of records processed.

    • startTime (datetime) --

      The time at which the job was started.

    • status (string) --

      The current status of the job.

CreateIdMappingWorkflow (new) Link ¶

Creates an IdMappingWorkflow object which stores the configuration of the data processing job to be run. Each IdMappingWorkflow must have a unique workflow name. To modify an existing workflow, use the UpdateIdMappingWorkflow API.

See also: AWS API Documentation

Request Syntax

client.create_id_mapping_workflow(
    description='string',
    idMappingTechniques={
        'idMappingType': 'PROVIDER',
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        }
    },
    inputSourceConfig=[
        {
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    outputSourceConfig=[
        {
            'KMSArn': 'string',
            'outputS3Path': 'string'
        },
    ],
    roleArn='string',
    tags={
        'string': 'string'
    },
    workflowName='string'
)
type description:

string

param description:

A description of the workflow.

type idMappingTechniques:

dict

param idMappingTechniques:

[REQUIRED]

An object which defines the idMappingType and the providerProperties.

  • idMappingType (string) -- [REQUIRED]

    The type of ID mapping.

  • providerProperties (dict) -- [REQUIRED]

    An object which defines any additional configurations required by the provider service.

    • intermediateSourceConfiguration (dict) --

      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

      • intermediateS3Path (string) -- [REQUIRED]

        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • providerConfiguration (:ref:`document<document>`) --

      The required configuration fields to use with the provider service.

    • providerServiceArn (string) -- [REQUIRED]

      The ARN of the provider service.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing InputSourceARN and SchemaName.

    • inputSourceARN (string) -- [REQUIRED]

      An Gluetable ARN for the input source table.

    • schemaName (string) -- [REQUIRED]

      The name of the schema to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of IdMappingWorkflowOutputSource objects, each of which contains fields OutputS3Path and Output.

  • (dict) --

    The output source for the ID mapping workflow.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow. There can't be multiple IdMappingWorkflows with the same name.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'idMappingTechniques': {
        'idMappingType': 'PROVIDER',
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        }
    },
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'outputS3Path': 'string'
        },
    ],
    'roleArn': 'string',
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the workflow.

    • idMappingTechniques (dict) --

      An object which defines the idMappingType and the providerProperties.

      • idMappingType (string) --

        The type of ID mapping.

      • providerProperties (dict) --

        An object which defines any additional configurations required by the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN and SchemaName.

        • inputSourceARN (string) --

          An Gluetable ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of IdMappingWorkflowOutputSource objects, each of which contains fields OutputS3Path and Output.

      • (dict) --

        The output source for the ID mapping workflow.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the IDMappingWorkflow.

    • workflowName (string) --

      The name of the workflow.

CreateMatchingWorkflow (updated) Link ¶
Changes (both)
{'resolutionTechniques': {'providerProperties': {'intermediateSourceConfiguration': {'intermediateS3Path': 'string'},
                                                 'providerConfiguration': {},
                                                 'providerServiceArn': 'string'},
                          'resolutionType': {'PROVIDER'}}}

Creates a MatchingWorkflow object which stores the configuration of the data processing job to be run. It is important to note that there should not be a pre-existing MatchingWorkflow with the same name. To modify an existing workflow, utilize the UpdateMatchingWorkflow API.

See also: AWS API Documentation

Request Syntax

client.create_matching_workflow(
    description='string',
    incrementalRunConfig={
        'incrementalRunType': 'IMMEDIATE'
    },
    inputSourceConfig=[
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    outputSourceConfig=[
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    resolutionTechniques={
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        },
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    roleArn='string',
    tags={
        'string': 'string'
    },
    workflowName='string'
)
type description:

string

param description:

A description of the workflow.

type incrementalRunConfig:

dict

param incrementalRunConfig:

An object which defines an incremental run type and has only incrementalRunType as a field.

  • incrementalRunType (string) --

    The type of incremental run. It takes only one value: IMMEDIATE.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing InputSourceARN, SchemaName, and ApplyNormalization.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • inputSourceARN (string) -- [REQUIRED]

      An Glue table ARN for the input source table.

    • schemaName (string) -- [REQUIRED]

      The name of the schema to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

  • (dict) --

    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • output (list) -- [REQUIRED]

      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • hashed (boolean) --

          Enables the ability to hash the column values in the output.

        • name (string) -- [REQUIRED]

          A name of a column to be written to the output. This must be an InputField name in the schema mapping.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

type resolutionTechniques:

dict

param resolutionTechniques:

[REQUIRED]

An object which defines the resolutionType and the ruleBasedProperties.

  • providerProperties (dict) --

    The properties of the provider service.

    • intermediateSourceConfiguration (dict) --

      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

      • intermediateS3Path (string) -- [REQUIRED]

        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • providerConfiguration (:ref:`document<document>`) --

      The required configuration fields to use with the provider service.

    • providerServiceArn (string) -- [REQUIRED]

      The ARN of the provider service.

  • resolutionType (string) -- [REQUIRED]

    The type of matching. There are two types of matching: RULE_MATCHING and ML_MATCHING.

  • ruleBasedProperties (dict) --

    An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

    • attributeMatchingModel (string) -- [REQUIRED]

      The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE ,the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

    • rules (list) -- [REQUIRED]

      A list of Rule objects, each of which have fields RuleName and MatchingKeys.

      • (dict) --

        An object containing RuleName, and MatchingKeys.

        • matchingKeys (list) -- [REQUIRED]

          A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

          • (string) --

        • ruleName (string) -- [REQUIRED]

          A name for the matching rule.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow. There can't be multiple MatchingWorkflows with the same name.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'inputSourceConfig': [
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    'resolutionTechniques': {
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        },
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    'roleArn': 'string',
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the workflow.

    • incrementalRunConfig (dict) --

      An object which defines an incremental run type and has only incrementalRunType as a field.

      • incrementalRunType (string) --

        The type of incremental run. It takes only one value: IMMEDIATE.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN, SchemaName, and ApplyNormalization.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • inputSourceARN (string) --

          An Glue table ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • output (list) --

          A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

          • (dict) --

            A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

            • hashed (boolean) --

              Enables the ability to hash the column values in the output.

            • name (string) --

              A name of a column to be written to the output. This must be an InputField name in the schema mapping.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • resolutionTechniques (dict) --

      An object which defines the resolutionType and the ruleBasedProperties.

      • providerProperties (dict) --

        The properties of the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

      • resolutionType (string) --

        The type of matching. There are two types of matching: RULE_MATCHING and ML_MATCHING.

      • ruleBasedProperties (dict) --

        An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

        • attributeMatchingModel (string) --

          The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE ,the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

        • rules (list) --

          A list of Rule objects, each of which have fields RuleName and MatchingKeys.

          • (dict) --

            An object containing RuleName, and MatchingKeys.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

            • ruleName (string) --

              A name for the matching rule.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.

    • workflowName (string) --

      The name of the workflow.

CreateSchemaMapping (updated) Link ¶
Changes (both)
{'mappedInputFields': {'subType': 'string', 'type': {'PROVIDER_ID'}}}

Creates a schema mapping, which defines the schema of the input customer records table. The SchemaMapping also provides Entity Resolution with some metadata about the table, such as the attribute types of the columns and which columns to match on.

See also: AWS API Documentation

Request Syntax

client.create_schema_mapping(
    description='string',
    mappedInputFields=[
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'subType': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID'
        },
    ],
    schemaName='string',
    tags={
        'string': 'string'
    }
)
type description:

string

param description:

A description of the schema.

type mappedInputFields:

list

param mappedInputFields:

[REQUIRED]

A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.

  • (dict) --

    An object containing FieldName, Type, GroupName, and MatchKey.

    • fieldName (string) -- [REQUIRED]

      A string containing the field name.

    • groupName (string) --

      Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

    • matchKey (string) --

      A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

    • subType (string) --

      The subtype of the attribute, selected from a list of values.

    • type (string) -- [REQUIRED]

      The type of the attribute, selected from a list of values.

type schemaName:

string

param schemaName:

[REQUIRED]

The name of the schema. There can't be multiple SchemaMappings with the same name.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'mappedInputFields': [
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'subType': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID'
        },
    ],
    'schemaArn': 'string',
    'schemaName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the schema.

    • mappedInputFields (list) --

      A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.

      • (dict) --

        An object containing FieldName, Type, GroupName, and MatchKey.

        • fieldName (string) --

          A string containing the field name.

        • groupName (string) --

          Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

        • matchKey (string) --

          A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

        • subType (string) --

          The subtype of the attribute, selected from a list of values.

        • type (string) --

          The type of the attribute, selected from a list of values.

    • schemaArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

    • schemaName (string) --

      The name of the schema.

GetMatchingWorkflow (updated) Link ¶
Changes (response)
{'resolutionTechniques': {'providerProperties': {'intermediateSourceConfiguration': {'intermediateS3Path': 'string'},
                                                 'providerConfiguration': {},
                                                 'providerServiceArn': 'string'},
                          'resolutionType': {'PROVIDER'}}}

Returns the MatchingWorkflow with a given name, if it exists.

See also: AWS API Documentation

Request Syntax

client.get_matching_workflow(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'createdAt': datetime(2015, 1, 1),
    'description': 'string',
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'inputSourceConfig': [
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    'resolutionTechniques': {
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        },
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    'roleArn': 'string',
    'tags': {
        'string': 'string'
    },
    'updatedAt': datetime(2015, 1, 1),
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • createdAt (datetime) --

      The timestamp of when the workflow was created.

    • description (string) --

      A description of the workflow.

    • incrementalRunConfig (dict) --

      An object which defines an incremental run type and has only incrementalRunType as a field.

      • incrementalRunType (string) --

        The type of incremental run. It takes only one value: IMMEDIATE.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN, SchemaName, and ApplyNormalization.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • inputSourceARN (string) --

          An Glue table ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • output (list) --

          A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

          • (dict) --

            A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

            • hashed (boolean) --

              Enables the ability to hash the column values in the output.

            • name (string) --

              A name of a column to be written to the output. This must be an InputField name in the schema mapping.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • resolutionTechniques (dict) --

      An object which defines the resolutionType and the ruleBasedProperties.

      • providerProperties (dict) --

        The properties of the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

      • resolutionType (string) --

        The type of matching. There are two types of matching: RULE_MATCHING and ML_MATCHING.

      • ruleBasedProperties (dict) --

        An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

        • attributeMatchingModel (string) --

          The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE ,the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

        • rules (list) --

          A list of Rule objects, each of which have fields RuleName and MatchingKeys.

          • (dict) --

            An object containing RuleName, and MatchingKeys.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

            • ruleName (string) --

              A name for the matching rule.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access resources on your behalf.

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

    • updatedAt (datetime) --

      The timestamp of when the workflow was last updated.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.

    • workflowName (string) --

      The name of the workflow.

GetSchemaMapping (updated) Link ¶
Changes (response)
{'hasWorkflows': 'boolean',
 'mappedInputFields': {'subType': 'string', 'type': {'PROVIDER_ID'}}}

Returns the SchemaMapping of a given name.

See also: AWS API Documentation

Request Syntax

client.get_schema_mapping(
    schemaName='string'
)
type schemaName:

string

param schemaName:

[REQUIRED]

The name of the schema to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'createdAt': datetime(2015, 1, 1),
    'description': 'string',
    'hasWorkflows': True|False,
    'mappedInputFields': [
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'subType': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID'
        },
    ],
    'schemaArn': 'string',
    'schemaName': 'string',
    'tags': {
        'string': 'string'
    },
    'updatedAt': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • createdAt (datetime) --

      The timestamp of when the SchemaMapping was created.

    • description (string) --

      A description of the schema.

    • hasWorkflows (boolean) --

      Specifies whether the schema mapping has been applied to a workflow.

    • mappedInputFields (list) --

      A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information Venice uses for matching.

      • (dict) --

        An object containing FieldName, Type, GroupName, and MatchKey.

        • fieldName (string) --

          A string containing the field name.

        • groupName (string) --

          Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

        • matchKey (string) --

          A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

        • subType (string) --

          The subtype of the attribute, selected from a list of values.

        • type (string) --

          The type of the attribute, selected from a list of values.

    • schemaArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

    • schemaName (string) --

      The name of the schema.

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

    • updatedAt (datetime) --

      The timestamp of when the SchemaMapping was last updated.

ListMatchingWorkflows (updated) Link ¶
Changes (response)
{'workflowSummaries': {'resolutionType': 'RULE_MATCHING | ML_MATCHING | '
                                         'PROVIDER'}}

Returns a list of all the MatchingWorkflows that have been created for an Amazon Web Services account.

See also: AWS API Documentation

Request Syntax

client.list_matching_workflows(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous API call.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'workflowSummaries': [
        {
            'createdAt': datetime(2015, 1, 1),
            'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
            'updatedAt': datetime(2015, 1, 1),
            'workflowArn': 'string',
            'workflowName': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      The pagination token from the previous API call.

    • workflowSummaries (list) --

      A list of MatchingWorkflowSummary objects, each of which contain the fields WorkflowName, WorkflowArn, CreatedAt, and UpdatedAt.

      • (dict) --

        A list of MatchingWorkflowSummary objects, each of which contain the fields WorkflowName, WorkflowArn, CreatedAt, UpdatedAt.

        • createdAt (datetime) --

          The timestamp of when the workflow was created.

        • resolutionType (string) --

          The method that has been specified for data matching, either using matching provided by Entity Resolution or through a provider service.

        • updatedAt (datetime) --

          The timestamp of when the workflow was last updated.

        • workflowArn (string) --

          The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.

        • workflowName (string) --

          The name of the workflow.

ListSchemaMappings (updated) Link ¶
Changes (response)
{'schemaList': {'hasWorkflows': 'boolean'}}

Returns a list of all the SchemaMappings that have been created for an Amazon Web Services account.

See also: AWS API Documentation

Request Syntax

client.list_schema_mappings(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous API call.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'schemaList': [
        {
            'createdAt': datetime(2015, 1, 1),
            'hasWorkflows': True|False,
            'schemaArn': 'string',
            'schemaName': 'string',
            'updatedAt': datetime(2015, 1, 1)
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      The pagination token from the previous API call.

    • schemaList (list) --

      A list of SchemaMappingSummary objects, each of which contain the fields SchemaName, SchemaArn, CreatedAt, UpdatedAt.

      • (dict) --

        An object containing SchemaName, SchemaArn, CreatedAt, and UpdatedAt.

        • createdAt (datetime) --

          The timestamp of when the SchemaMapping was created.

        • hasWorkflows (boolean) --

          Specifies whether the schema mapping has been applied to a workflow.

        • schemaArn (string) --

          The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

        • schemaName (string) --

          The name of the schema.

        • updatedAt (datetime) --

          The timestamp of when the SchemaMapping was last updated.

UpdateMatchingWorkflow (updated) Link ¶
Changes (both)
{'resolutionTechniques': {'providerProperties': {'intermediateSourceConfiguration': {'intermediateS3Path': 'string'},
                                                 'providerConfiguration': {},
                                                 'providerServiceArn': 'string'},
                          'resolutionType': {'PROVIDER'}}}

Updates an existing MatchingWorkflow. This method is identical to CreateMatchingWorkflow, except it uses an HTTP PUT request instead of a POST request, and the MatchingWorkflow must already exist for the method to succeed.

See also: AWS API Documentation

Request Syntax

client.update_matching_workflow(
    description='string',
    incrementalRunConfig={
        'incrementalRunType': 'IMMEDIATE'
    },
    inputSourceConfig=[
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    outputSourceConfig=[
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    resolutionTechniques={
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        },
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    roleArn='string',
    workflowName='string'
)
type description:

string

param description:

A description of the workflow.

type incrementalRunConfig:

dict

param incrementalRunConfig:

An object which defines an incremental run type and has only incrementalRunType as a field.

  • incrementalRunType (string) --

    The type of incremental run. It takes only one value: IMMEDIATE.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing InputSourceARN, SchemaName, and ApplyNormalization.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • inputSourceARN (string) -- [REQUIRED]

      An Glue table ARN for the input source table.

    • schemaName (string) -- [REQUIRED]

      The name of the schema to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

  • (dict) --

    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • output (list) -- [REQUIRED]

      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • hashed (boolean) --

          Enables the ability to hash the column values in the output.

        • name (string) -- [REQUIRED]

          A name of a column to be written to the output. This must be an InputField name in the schema mapping.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

type resolutionTechniques:

dict

param resolutionTechniques:

[REQUIRED]

An object which defines the resolutionType and the ruleBasedProperties.

  • providerProperties (dict) --

    The properties of the provider service.

    • intermediateSourceConfiguration (dict) --

      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

      • intermediateS3Path (string) -- [REQUIRED]

        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

    • providerConfiguration (:ref:`document<document>`) --

      The required configuration fields to use with the provider service.

    • providerServiceArn (string) -- [REQUIRED]

      The ARN of the provider service.

  • resolutionType (string) -- [REQUIRED]

    The type of matching. There are two types of matching: RULE_MATCHING and ML_MATCHING.

  • ruleBasedProperties (dict) --

    An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

    • attributeMatchingModel (string) -- [REQUIRED]

      The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE ,the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

    • rules (list) -- [REQUIRED]

      A list of Rule objects, each of which have fields RuleName and MatchingKeys.

      • (dict) --

        An object containing RuleName, and MatchingKeys.

        • matchingKeys (list) -- [REQUIRED]

          A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

          • (string) --

        • ruleName (string) -- [REQUIRED]

          A name for the matching rule.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'inputSourceConfig': [
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    'resolutionTechniques': {
        'providerProperties': {
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            },
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'providerServiceArn': 'string'
        },
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    'roleArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the workflow.

    • incrementalRunConfig (dict) --

      An object which defines an incremental run type and has only incrementalRunType as a field.

      • incrementalRunType (string) --

        The type of incremental run. It takes only one value: IMMEDIATE.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN, SchemaName, and ApplyNormalization.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • inputSourceARN (string) --

          An Glue table ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • output (list) --

          A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

          • (dict) --

            A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

            • hashed (boolean) --

              Enables the ability to hash the column values in the output.

            • name (string) --

              A name of a column to be written to the output. This must be an InputField name in the schema mapping.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • resolutionTechniques (dict) --

      An object which defines the resolutionType and the ruleBasedProperties

      • providerProperties (dict) --

        The properties of the provider service.

        • intermediateSourceConfiguration (dict) --

          The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.

          • intermediateS3Path (string) --

            The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

        • providerConfiguration (:ref:`document<document>`) --

          The required configuration fields to use with the provider service.

        • providerServiceArn (string) --

          The ARN of the provider service.

      • resolutionType (string) --

        The type of matching. There are two types of matching: RULE_MATCHING and ML_MATCHING.

      • ruleBasedProperties (dict) --

        An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

        • attributeMatchingModel (string) --

          The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE ,the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

        • rules (list) --

          A list of Rule objects, each of which have fields RuleName and MatchingKeys.

          • (dict) --

            An object containing RuleName, and MatchingKeys.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

            • ruleName (string) --

              A name for the matching rule.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

    • workflowName (string) --

      The name of the workflow.