2025/07/31 - AWS EntityResolution - 3 updated api methods
Changes Add support for creating advanced rule-based matching workflows in AWS Entity Resolution.
{'resolutionTechniques': {'ruleConditionProperties': {'rules': [{'condition': 'string', 'ruleName': 'string'}]}}}
Creates a matching workflow that defines the configuration for a data processing job. The workflow name must be unique. To modify an existing workflow, use UpdateMatchingWorkflow.
See also: AWS API Documentation
Request Syntax
client.create_matching_workflow( workflowName='string', description='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], outputSourceConfig=[ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], resolutionTechniques={ 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, roleArn='string', tags={ 'string': 'string' } )
string
[REQUIRED]
The name of the workflow. There can't be multiple MatchingWorkflows with the same name.
string
A description of the workflow.
list
[REQUIRED]
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing inputSourceARN, schemaName, and applyNormalization.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) -- [REQUIRED]
The name of the schema to be retrieved.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
list
[REQUIRED]
A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
outputS3Path (string) -- [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) -- [REQUIRED]
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
name (string) -- [REQUIRED]
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
hashed (boolean) --
Enables the ability to hash the column values in the output.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
dict
[REQUIRED]
An object which defines the resolutionType and the ruleBasedProperties.
resolutionType (string) -- [REQUIRED]
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
rules (list) -- [REQUIRED]
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing the ruleName and matchingKeys.
ruleName (string) -- [REQUIRED]
A name for the matching rule.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
attributeMatchingModel (string) -- [REQUIRED]
The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --
An object containing the rules for a matching workflow.
rules (list) -- [REQUIRED]
A list of rule objects, each of which have fields ruleName and condition.
(dict) --
An object that defines the ruleCondition and the ruleName to use in a matching workflow.
ruleName (string) -- [REQUIRED]
A name for the matching rule.
For example: Rule1
condition (string) -- [REQUIRED]
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --
The properties of the provider service.
providerServiceArn (string) -- [REQUIRED]
The ARN of the provider service.
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) -- [REQUIRED]
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
dict
Optional. An object that defines the incremental run type. This object contains only the incrementalRunType field, which appears as "Automatic" in the console.
incrementalRunType (string) --
The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
string
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
dict
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
dict
Response Syntax
{ 'workflowName': 'string', 'workflowArn': 'string', 'description': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], 'outputSourceConfig': [ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'roleArn': 'string' }
Response Structure
(dict) --
workflowName (string) --
The name of the workflow.
workflowArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.
description (string) --
A description of the workflow.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing inputSourceARN, schemaName, and applyNormalization.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
name (string) --
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
hashed (boolean) --
Enables the ability to hash the column values in the output.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
resolutionTechniques (dict) --
An object which defines the resolutionType and the ruleBasedProperties.
resolutionType (string) --
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
rules (list) --
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing the ruleName and matchingKeys.
ruleName (string) --
A name for the matching rule.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
attributeMatchingModel (string) --
The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --
An object containing the rules for a matching workflow.
rules (list) --
A list of rule objects, each of which have fields ruleName and condition.
(dict) --
An object that defines the ruleCondition and the ruleName to use in a matching workflow.
ruleName (string) --
A name for the matching rule.
For example: Rule1
condition (string) --
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --
The properties of the provider service.
providerServiceArn (string) --
The ARN of the provider service.
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
incrementalRunConfig (dict) --
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
{'resolutionTechniques': {'ruleConditionProperties': {'rules': [{'condition': 'string', 'ruleName': 'string'}]}}}
Returns the MatchingWorkflow with a given name, if it exists.
See also: AWS API Documentation
Request Syntax
client.get_matching_workflow( workflowName='string' )
string
[REQUIRED]
The name of the workflow.
dict
Response Syntax
{ 'workflowName': 'string', 'workflowArn': 'string', 'description': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], 'outputSourceConfig': [ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, 'createdAt': datetime(2015, 1, 1), 'updatedAt': datetime(2015, 1, 1), 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'roleArn': 'string', 'tags': { 'string': 'string' } }
Response Structure
(dict) --
workflowName (string) --
The name of the workflow.
workflowArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.
description (string) --
A description of the workflow.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing inputSourceARN, schemaName, and applyNormalization.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
name (string) --
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
hashed (boolean) --
Enables the ability to hash the column values in the output.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
resolutionTechniques (dict) --
An object which defines the resolutionType and the ruleBasedProperties.
resolutionType (string) --
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
rules (list) --
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing the ruleName and matchingKeys.
ruleName (string) --
A name for the matching rule.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
attributeMatchingModel (string) --
The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --
An object containing the rules for a matching workflow.
rules (list) --
A list of rule objects, each of which have fields ruleName and condition.
(dict) --
An object that defines the ruleCondition and the ruleName to use in a matching workflow.
ruleName (string) --
A name for the matching rule.
For example: Rule1
condition (string) --
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --
The properties of the provider service.
providerServiceArn (string) --
The ARN of the provider service.
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
createdAt (datetime) --
The timestamp of when the workflow was created.
updatedAt (datetime) --
The timestamp of when the workflow was last updated.
incrementalRunConfig (dict) --
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
tags (dict) --
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
{'resolutionTechniques': {'ruleConditionProperties': {'rules': [{'condition': 'string', 'ruleName': 'string'}]}}}
Updates an existing matching workflow. The workflow must already exist for this operation to succeed.
See also: AWS API Documentation
Request Syntax
client.update_matching_workflow( workflowName='string', description='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], outputSourceConfig=[ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], resolutionTechniques={ 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, roleArn='string' )
string
[REQUIRED]
The name of the workflow to be retrieved.
string
A description of the workflow.
list
[REQUIRED]
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing inputSourceARN, schemaName, and applyNormalization.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) -- [REQUIRED]
The name of the schema to be retrieved.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
list
[REQUIRED]
A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
outputS3Path (string) -- [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) -- [REQUIRED]
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
name (string) -- [REQUIRED]
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
hashed (boolean) --
Enables the ability to hash the column values in the output.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
dict
[REQUIRED]
An object which defines the resolutionType and the ruleBasedProperties.
resolutionType (string) -- [REQUIRED]
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
rules (list) -- [REQUIRED]
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing the ruleName and matchingKeys.
ruleName (string) -- [REQUIRED]
A name for the matching rule.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
attributeMatchingModel (string) -- [REQUIRED]
The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --
An object containing the rules for a matching workflow.
rules (list) -- [REQUIRED]
A list of rule objects, each of which have fields ruleName and condition.
(dict) --
An object that defines the ruleCondition and the ruleName to use in a matching workflow.
ruleName (string) -- [REQUIRED]
A name for the matching rule.
For example: Rule1
condition (string) -- [REQUIRED]
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --
The properties of the provider service.
providerServiceArn (string) -- [REQUIRED]
The ARN of the provider service.
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) -- [REQUIRED]
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
dict
Optional. An object that defines the incremental run type. This object contains only the incrementalRunType field, which appears as "Automatic" in the console.
incrementalRunType (string) --
The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
string
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
dict
Response Syntax
{ 'workflowName': 'string', 'description': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'applyNormalization': True|False }, ], 'outputSourceConfig': [ { 'outputS3Path': 'string', 'KMSArn': 'string', 'output': [ { 'name': 'string', 'hashed': True|False }, ], 'applyNormalization': True|False }, ], 'resolutionTechniques': { 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'rules': [ { 'ruleName': 'string', 'matchingKeys': [ 'string', ] }, ], 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING' }, 'ruleConditionProperties': { 'rules': [ { 'ruleName': 'string', 'condition': 'string' }, ] }, 'providerProperties': { 'providerServiceArn': 'string', 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' } } }, 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'roleArn': 'string' }
Response Structure
(dict) --
workflowName (string) --
The name of the workflow.
description (string) --
A description of the workflow.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing inputSourceARN, schemaName, and applyNormalization.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
output (list) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
name (string) --
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
hashed (boolean) --
Enables the ability to hash the column values in the output.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
resolutionTechniques (dict) --
An object which defines the resolutionType and the ruleBasedProperties.
resolutionType (string) --
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
rules (list) --
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing the ruleName and matchingKeys.
ruleName (string) --
A name for the matching rule.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
attributeMatchingModel (string) --
The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --
An object containing the rules for a matching workflow.
rules (list) --
A list of rule objects, each of which have fields ruleName and condition.
(dict) --
An object that defines the ruleCondition and the ruleName to use in a matching workflow.
ruleName (string) --
A name for the matching rule.
For example: Rule1
condition (string) --
A statement that specifies the conditions for a matching rule.
If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --
The properties of the provider service.
providerServiceArn (string) --
The ARN of the provider service.
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
incrementalRunConfig (dict) --
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.