AWS EntityResolution

2023/07/26 - AWS EntityResolution - 16 new api methods

Changes  AWS Entity Resolution can effectively match a source record from a customer relationship management (CRM) system with a source record from a marketing system containing campaign information.

StartMatchingJob (new) Link ¶

Starts the MatchingJob of a workflow. The workflow must have previously been created using the CreateMatchingWorkflow endpoint.

See also: AWS API Documentation

Request Syntax

client.start_matching_job(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the matching job to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'jobId': 'string'
}

Response Structure

  • (dict) --

    • jobId (string) --

      The ID of the job.

GetMatchId (new) Link ¶

Returns the corresponding Match ID of a customer record if the record has been processed.

See also: AWS API Documentation

Request Syntax

client.get_match_id(
    record={
        'string': 'string'
    },
    workflowName='string'
)
type record:

dict

param record:

[REQUIRED]

The record to fetch the Match ID for.

  • (string) --

    • (string) --

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'matchId': 'string'
}

Response Structure

  • (dict) --

    • matchId (string) --

      The unique identifiers for this group of match records.

DeleteSchemaMapping (new) Link ¶

Deletes the SchemaMapping with a given name. This operation will succeed even if a schema with the given name does not exist. This operation will fail if there is a DataIntegrationWorkflow object that references the SchemaMapping in the workflow's InputSourceConfig.

See also: AWS API Documentation

Request Syntax

client.delete_schema_mapping(
    schemaName='string'
)
type schemaName:

string

param schemaName:

[REQUIRED]

The name of the schema to delete.

rtype:

dict

returns:

Response Syntax

{
    'message': 'string'
}

Response Structure

  • (dict) --

    • message (string) --

      A successful operation message.

ListMatchingJobs (new) Link ¶

Lists all jobs for a given workflow.

See also: AWS API Documentation

Request Syntax

client.list_matching_jobs(
    maxResults=123,
    nextToken='string',
    workflowName='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous ListSchemaMappings API call.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'jobs': [
        {
            'endTime': datetime(2015, 1, 1),
            'jobId': 'string',
            'startTime': datetime(2015, 1, 1),
            'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED'
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • jobs (list) --

      A list of JobSummary objects, each of which contain the ID, status, start time, and end time of a job.

      • (dict) --

        An object containing the JobId, Status, StartTime, and EndTime of a job.

        • endTime (datetime) --

          The time at which the job has finished.

        • jobId (string) --

          The ID of the job.

        • startTime (datetime) --

          The time at which the job was started.

        • status (string) --

          The current status of the job. Either running, succeeded, queued, or failed.

    • nextToken (string) --

      The pagination token from the previous ListSchemaMappings API call.

CreateMatchingWorkflow (new) Link ¶

Creates a MatchingWorkflow object which stores the configuration of the data processing job to be run. It is important to note that there should not be a pre-existing MatchingWorkflow with the same name. To modify an existing workflow, utilize the UpdateMatchingWorkflow API.

See also: AWS API Documentation

Request Syntax

client.create_matching_workflow(
    description='string',
    incrementalRunConfig={
        'incrementalRunType': 'IMMEDIATE'
    },
    inputSourceConfig=[
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    outputSourceConfig=[
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    resolutionTechniques={
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    roleArn='string',
    tags={
        'string': 'string'
    },
    workflowName='string'
)
type description:

string

param description:

A description of the workflow.

type incrementalRunConfig:

dict

param incrementalRunConfig:

An object which defines an incremental run type and has only incrementalRunType as a field.

  • incrementalRunType (string) --

    The type of incremental run. It takes only one value: IMMEDIATE.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing InputSourceARN, SchemaName, and ApplyNormalization.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • inputSourceARN (string) -- [REQUIRED]

      An Glue table ARN for the input source table.

    • schemaName (string) -- [REQUIRED]

      The name of the schema to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

  • (dict) --

    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • output (list) -- [REQUIRED]

      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • hashed (boolean) --

          Enables the ability to hash the column values in the output.

        • name (string) -- [REQUIRED]

          A name of a column to be written to the output. This must be an InputField name in the schema mapping.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

type resolutionTechniques:

dict

param resolutionTechniques:

[REQUIRED]

An object which defines the resolutionType and the ruleBasedProperties

  • resolutionType (string) --

    There are two types of matching, RULE_MATCHING and ML_MATCHING

  • ruleBasedProperties (dict) --

    An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

    • attributeMatchingModel (string) -- [REQUIRED]

      You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attribute across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

    • rules (list) -- [REQUIRED]

      A list of Rule objects, each of which have fields RuleName and MatchingKeys.

      • (dict) --

        An object containing RuleName, and MatchingKeys.

        • matchingKeys (list) -- [REQUIRED]

          A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

          • (string) --

        • ruleName (string) -- [REQUIRED]

          A name for the matching rule.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. AWS Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow. There cannot be multiple DataIntegrationWorkflows with the same name.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'inputSourceConfig': [
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    'resolutionTechniques': {
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    'roleArn': 'string',
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the workflow.

    • incrementalRunConfig (dict) --

      An object which defines an incremental run type and has only incrementalRunType as a field.

      • incrementalRunType (string) --

        The type of incremental run. It takes only one value: IMMEDIATE.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN, SchemaName, and ApplyNormalization.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • inputSourceARN (string) --

          An Glue table ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • output (list) --

          A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

          • (dict) --

            A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

            • hashed (boolean) --

              Enables the ability to hash the column values in the output.

            • name (string) --

              A name of a column to be written to the output. This must be an InputField name in the schema mapping.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • resolutionTechniques (dict) --

      An object which defines the resolutionType and the ruleBasedProperties

      • resolutionType (string) --

        There are two types of matching, RULE_MATCHING and ML_MATCHING

      • ruleBasedProperties (dict) --

        An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

        • attributeMatchingModel (string) --

          You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attribute across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

        • rules (list) --

          A list of Rule objects, each of which have fields RuleName and MatchingKeys.

          • (dict) --

            An object containing RuleName, and MatchingKeys.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

            • ruleName (string) --

              A name for the matching rule.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. AWS Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.

    • workflowName (string) --

      The name of the workflow.

CreateSchemaMapping (new) Link ¶

Creates a schema mapping, which defines the schema of the input customer records table. The SchemaMapping also provides Entity Resolution with some metadata about the table, such as the attribute types of the columns and which columns to match on.

See also: AWS API Documentation

Request Syntax

client.create_schema_mapping(
    description='string',
    mappedInputFields=[
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'
        },
    ],
    schemaName='string',
    tags={
        'string': 'string'
    }
)
type description:

string

param description:

A description of the schema.

type mappedInputFields:

list

param mappedInputFields:

A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.

  • (dict) --

    An object containing FieldField, Type, GroupName, and MatchKey.

    • fieldName (string) -- [REQUIRED]

      A string containing the field name.

    • groupName (string) --

      Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

    • matchKey (string) --

      A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address' to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

    • type (string) -- [REQUIRED]

      The type of the attribute, selected from a list of values.

type schemaName:

string

param schemaName:

[REQUIRED]

The name of the schema. There cannot be multiple SchemaMappings with the same name.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'mappedInputFields': [
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'
        },
    ],
    'schemaArn': 'string',
    'schemaName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the schema.

    • mappedInputFields (list) --

      A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.

      • (dict) --

        An object containing FieldField, Type, GroupName, and MatchKey.

        • fieldName (string) --

          A string containing the field name.

        • groupName (string) --

          Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

        • matchKey (string) --

          A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address' to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

        • type (string) --

          The type of the attribute, selected from a list of values.

    • schemaArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

    • schemaName (string) --

      The name of the schema.

GetMatchingJob (new) Link ¶

Gets the status, metrics, and errors (if there are any) that are associated with a job.

See also: AWS API Documentation

Request Syntax

client.get_matching_job(
    jobId='string',
    workflowName='string'
)
type jobId:

string

param jobId:

[REQUIRED]

The ID of the job.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'endTime': datetime(2015, 1, 1),
    'errorDetails': {
        'errorMessage': 'string'
    },
    'jobId': 'string',
    'metrics': {
        'inputRecords': 123,
        'matchIDs': 123,
        'recordsNotProcessed': 123,
        'totalRecordsProcessed': 123
    },
    'startTime': datetime(2015, 1, 1),
    'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED'
}

Response Structure

  • (dict) --

    • endTime (datetime) --

      The time at which the job has finished.

    • errorDetails (dict) --

      An object containing an error message, if there was an error.

      • errorMessage (string) --

        The error message from the job, if there is one.

    • jobId (string) --

      The ID of the job.

    • metrics (dict) --

      Metrics associated with the execution, specifically total records processed, unique IDs generated, and records the execution skipped.

      • inputRecords (integer) --

        The total number of input records.

      • matchIDs (integer) --

        The total number of ``matchID``s generated.

      • recordsNotProcessed (integer) --

        The total number of records that did not get processed,

      • totalRecordsProcessed (integer) --

        The total number of records processed.

    • startTime (datetime) --

      The time at which the job was started.

    • status (string) --

      The current status of the job. Either running, succeeded, queued, or failed.

GetMatchingWorkflow (new) Link ¶

Returns the MatchingWorkflow with a given name, if it exists.

See also: AWS API Documentation

Request Syntax

client.get_matching_workflow(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'createdAt': datetime(2015, 1, 1),
    'description': 'string',
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'inputSourceConfig': [
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    'resolutionTechniques': {
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    'roleArn': 'string',
    'tags': {
        'string': 'string'
    },
    'updatedAt': datetime(2015, 1, 1),
    'workflowArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • createdAt (datetime) --

      The timestamp of when the workflow was created.

    • description (string) --

      A description of the workflow.

    • incrementalRunConfig (dict) --

      An object which defines an incremental run type and has only incrementalRunType as a field.

      • incrementalRunType (string) --

        The type of incremental run. It takes only one value: IMMEDIATE.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN, SchemaName, and ApplyNormalization.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • inputSourceARN (string) --

          An Glue table ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • output (list) --

          A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

          • (dict) --

            A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

            • hashed (boolean) --

              Enables the ability to hash the column values in the output.

            • name (string) --

              A name of a column to be written to the output. This must be an InputField name in the schema mapping.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • resolutionTechniques (dict) --

      An object which defines the resolutionType and the ruleBasedProperties

      • resolutionType (string) --

        There are two types of matching, RULE_MATCHING and ML_MATCHING

      • ruleBasedProperties (dict) --

        An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

        • attributeMatchingModel (string) --

          You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attribute across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

        • rules (list) --

          A list of Rule objects, each of which have fields RuleName and MatchingKeys.

          • (dict) --

            An object containing RuleName, and MatchingKeys.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

            • ruleName (string) --

              A name for the matching rule.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. AWS Entity Resolution assumes this role to access resources on your behalf.

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

    • updatedAt (datetime) --

      The timestamp of when the workflow was last updated.

    • workflowArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.

    • workflowName (string) --

      The name of the workflow.

TagResource (new) Link ¶

Assigns one or more tags (key-value pairs) to the specified AWS Entity Resolution resource. Tags can help you organize and categorize your resources. You can also use them to scope user permissions by granting a user permission to access or change only resources with certain tag values. In Entity Resolution, SchemaMapping, and MatchingWorkflow can be tagged. Tags don't have any semantic meaning to AWS and are interpreted strictly as strings of characters. You can use the TagResource action with a resource that already has tags. If you specify a new tag key, this tag is appended to the list of tags associated with the resource. If you specify a tag key that is already associated with the resource, the new tag value that you specify replaces the previous value for that tag.

See also: AWS API Documentation

Request Syntax

client.tag_resource(
    resourceArn='string',
    tags={
        'string': 'string'
    }
)
type resourceArn:

string

param resourceArn:

[REQUIRED]

The ARN of the resource for which you want to view tags.

type tags:

dict

param tags:

[REQUIRED]

The tags used to organize, track, or control access for this resource.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

ListMatchingWorkflows (new) Link ¶

Returns a list of all the MatchingWorkflows that have been created for an AWS account.

See also: AWS API Documentation

Request Syntax

client.list_matching_workflows(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous ListSchemaMappings API call.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'workflowSummaries': [
        {
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'workflowArn': 'string',
            'workflowName': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      The pagination token from the previous ListSchemaMappings API call.

    • workflowSummaries (list) --

      A list of MatchingWorkflowSummary objects, each of which contain the fields WorkflowName, WorkflowArn, CreatedAt, and UpdatedAt.

      • (dict) --

        A list of MatchingWorkflowSummary objects, each of which contain the fields WorkflowName, WorkflowArn, CreatedAt, UpdatedAt.

        • createdAt (datetime) --

          The timestamp of when the workflow was created.

        • updatedAt (datetime) --

          The timestamp of when the workflow was last updated.

        • workflowArn (string) --

          The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.

        • workflowName (string) --

          The name of the workflow.

DeleteMatchingWorkflow (new) Link ¶

Deletes the MatchingWorkflow with a given name. This operation will succeed even if a workflow with the given name does not exist.

See also: AWS API Documentation

Request Syntax

client.delete_matching_workflow(
    workflowName='string'
)
type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'message': 'string'
}

Response Structure

  • (dict) --

    • message (string) --

      A successful operation message.

UpdateMatchingWorkflow (new) Link ¶

Updates an existing MatchingWorkflow. This method is identical to CreateMatchingWorkflow, except it uses an HTTP PUT request instead of a POST request, and the MatchingWorkflow must already exist for the method to succeed.

See also: AWS API Documentation

Request Syntax

client.update_matching_workflow(
    description='string',
    incrementalRunConfig={
        'incrementalRunType': 'IMMEDIATE'
    },
    inputSourceConfig=[
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    outputSourceConfig=[
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    resolutionTechniques={
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    roleArn='string',
    workflowName='string'
)
type description:

string

param description:

A description of the workflow.

type incrementalRunConfig:

dict

param incrementalRunConfig:

An object which defines an incremental run type and has only incrementalRunType as a field.

  • incrementalRunType (string) --

    The type of incremental run. It takes only one value: IMMEDIATE.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

  • (dict) --

    An object containing InputSourceARN, SchemaName, and ApplyNormalization.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • inputSourceARN (string) -- [REQUIRED]

      An Glue table ARN for the input source table.

    • schemaName (string) -- [REQUIRED]

      The name of the schema to be retrieved.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

  • (dict) --

    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

    • KMSArn (string) --

      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

    • applyNormalization (boolean) --

      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

    • output (list) -- [REQUIRED]

      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • hashed (boolean) --

          Enables the ability to hash the column values in the output.

        • name (string) -- [REQUIRED]

          A name of a column to be written to the output. This must be an InputField name in the schema mapping.

    • outputS3Path (string) -- [REQUIRED]

      The S3 path to which Entity Resolution will write the output table.

type resolutionTechniques:

dict

param resolutionTechniques:

[REQUIRED]

An object which defines the resolutionType and the ruleBasedProperties

  • resolutionType (string) --

    There are two types of matching, RULE_MATCHING and ML_MATCHING

  • ruleBasedProperties (dict) --

    An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

    • attributeMatchingModel (string) -- [REQUIRED]

      You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attribute across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

    • rules (list) -- [REQUIRED]

      A list of Rule objects, each of which have fields RuleName and MatchingKeys.

      • (dict) --

        An object containing RuleName, and MatchingKeys.

        • matchingKeys (list) -- [REQUIRED]

          A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

          • (string) --

        • ruleName (string) -- [REQUIRED]

          A name for the matching rule.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. AWS Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'description': 'string',
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'inputSourceConfig': [
        {
            'applyNormalization': True|False,
            'inputSourceARN': 'string',
            'schemaName': 'string'
        },
    ],
    'outputSourceConfig': [
        {
            'KMSArn': 'string',
            'applyNormalization': True|False,
            'output': [
                {
                    'hashed': True|False,
                    'name': 'string'
                },
            ],
            'outputS3Path': 'string'
        },
    ],
    'resolutionTechniques': {
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING',
        'ruleBasedProperties': {
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'rules': [
                {
                    'matchingKeys': [
                        'string',
                    ],
                    'ruleName': 'string'
                },
            ]
        }
    },
    'roleArn': 'string',
    'workflowName': 'string'
}

Response Structure

  • (dict) --

    • description (string) --

      A description of the workflow.

    • incrementalRunConfig (dict) --

      An object which defines an incremental run type and has only incrementalRunType as a field.

      • incrementalRunType (string) --

        The type of incremental run. It takes only one value: IMMEDIATE.

    • inputSourceConfig (list) --

      A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

      • (dict) --

        An object containing InputSourceARN, SchemaName, and ApplyNormalization.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • inputSourceARN (string) --

          An Glue table ARN for the input source table.

        • schemaName (string) --

          The name of the schema to be retrieved.

    • outputSourceConfig (list) --

      A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.

      • (dict) --

        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

        • KMSArn (string) --

          Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.

        • applyNormalization (boolean) --

          Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

        • output (list) --

          A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

          • (dict) --

            A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.

            • hashed (boolean) --

              Enables the ability to hash the column values in the output.

            • name (string) --

              A name of a column to be written to the output. This must be an InputField name in the schema mapping.

        • outputS3Path (string) --

          The S3 path to which Entity Resolution will write the output table.

    • resolutionTechniques (dict) --

      An object which defines the resolutionType and the ruleBasedProperties

      • resolutionType (string) --

        There are two types of matching, RULE_MATCHING and ML_MATCHING

      • ruleBasedProperties (dict) --

        An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.

        • attributeMatchingModel (string) --

          You can either choose ONE_TO_ONE or MANY_TO_MANY as the AttributeMatchingModel. When choosing MANY_TO_MANY, the system can match attribute across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email type. When choosing ONE_TO_ONE the system can only match if the sub-types are exact matches. For example, only when the value of the Email field of Profile A and the value of the Email field of Profile B matches, the two profiles are matched on the Email type.

        • rules (list) --

          A list of Rule objects, each of which have fields RuleName and MatchingKeys.

          • (dict) --

            An object containing RuleName, and MatchingKeys.

            • matchingKeys (list) --

              A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.

              • (string) --

            • ruleName (string) --

              A name for the matching rule.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the IAM role. AWS Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

    • workflowName (string) --

      The name of the workflow.

ListSchemaMappings (new) Link ¶

Returns a list of all the SchemaMappings that have been created for an AWS account.

See also: AWS API Documentation

Request Syntax

client.list_schema_mappings(
    maxResults=123,
    nextToken='string'
)
type maxResults:

integer

param maxResults:

The maximum number of objects returned per page.

type nextToken:

string

param nextToken:

The pagination token from the previous ListSchemaMappings API call.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'schemaList': [
        {
            'createdAt': datetime(2015, 1, 1),
            'schemaArn': 'string',
            'schemaName': 'string',
            'updatedAt': datetime(2015, 1, 1)
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      The pagination token from the previous ListDomains API call.

    • schemaList (list) --

      A list of SchemaMappingSummary objects, each of which contain the fields SchemaName, SchemaArn, CreatedAt, UpdatedAt.

      • (dict) --

        An object containing SchemaName, SchemaArn, CreatedAt, and UpdatedAt.

        • createdAt (datetime) --

          The timestamp of when the SchemaMapping was created.

        • schemaArn (string) --

          The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

        • schemaName (string) --

          The name of the schema.

        • updatedAt (datetime) --

          The timestamp of when the SchemaMapping was last updated.

UntagResource (new) Link ¶

Removes one or more tags from the specified AWS Entity Resolution resource. In Entity Resolution, SchemaMapping, and MatchingWorkflow can be tagged.

See also: AWS API Documentation

Request Syntax

client.untag_resource(
    resourceArn='string',
    tagKeys=[
        'string',
    ]
)
type resourceArn:

string

param resourceArn:

[REQUIRED]

The ARN of the resource for which you want to untag.

type tagKeys:

list

param tagKeys:

[REQUIRED]

The list of tag keys to remove from the resource.

  • (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

ListTagsForResource (new) Link ¶

Displays the tags associated with an AWS Entity Resolution resource. In Entity Resolution, SchemaMapping, and MatchingWorkflow can be tagged.

See also: AWS API Documentation

Request Syntax

client.list_tags_for_resource(
    resourceArn='string'
)
type resourceArn:

string

param resourceArn:

[REQUIRED]

The ARN of the resource for which you want to view tags.

rtype:

dict

returns:

Response Syntax

{
    'tags': {
        'string': 'string'
    }
}

Response Structure

  • (dict) --

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

GetSchemaMapping (new) Link ¶

Returns the SchemaMapping of a given name.

See also: AWS API Documentation

Request Syntax

client.get_schema_mapping(
    schemaName='string'
)
type schemaName:

string

param schemaName:

[REQUIRED]

The name of the schema to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'createdAt': datetime(2015, 1, 1),
    'description': 'string',
    'mappedInputFields': [
        {
            'fieldName': 'string',
            'groupName': 'string',
            'matchKey': 'string',
            'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'
        },
    ],
    'schemaArn': 'string',
    'schemaName': 'string',
    'tags': {
        'string': 'string'
    },
    'updatedAt': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • createdAt (datetime) --

      The timestamp of when the SchemaMapping was created.

    • description (string) --

      A description of the schema.

    • mappedInputFields (list) --

      A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information Venice uses for matching.

      • (dict) --

        An object containing FieldField, Type, GroupName, and MatchKey.

        • fieldName (string) --

          A string containing the field name.

        • groupName (string) --

          Instruct Entity Resolution to combine several columns into a unified column with the identical attribute type. For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common GroupName will prompt Entity Resolution to concatenate them into a single value.

        • matchKey (string) --

          A key that allows grouping of multiple input attributes into a unified matching group. For example, let's consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning the MatchKey Address' to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group. If no MatchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.

        • type (string) --

          The type of the attribute, selected from a list of values.

    • schemaArn (string) --

      The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.

    • schemaName (string) --

      The name of the schema.

    • tags (dict) --

      The tags used to organize, track, or control access for this resource.

      • (string) --

        • (string) --

    • updatedAt (datetime) --

      The timestamp of when the SchemaMapping was last updated.