2024/07/23 - AWS EntityResolution - 14 updated api methods
Changes Support First Party ID Mapping
{'idMappingTechniques': {'idMappingType': {'RULE_BASED'}, 'ruleBasedProperties': {'attributeMatchingModel': 'ONE_TO_ONE ' '| ' 'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET ' '| ' 'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE ' '| ' 'TARGET', 'rules': [{'matchingKeys': ['string'], 'ruleName': 'string'}]}}}
Creates an IdMappingWorkflow object which stores the configuration of the data processing job to be run. Each IdMappingWorkflow must have a unique workflow name. To modify an existing workflow, use the UpdateIdMappingWorkflow API.
See also: AWS API Documentation
Request Syntax
client.create_id_mapping_workflow( description='string', idMappingTechniques={ 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE'|'TARGET', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'type': 'SOURCE'|'TARGET' }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'outputS3Path': 'string' }, ], roleArn='string', tags={ 'string': 'string' }, workflowName='string' )
string
A description of the workflow.
dict
[REQUIRED]
An object which defines the ID mapping technique and any additional configurations.
idMappingType (string) -- [REQUIRED]
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) -- [REQUIRED]
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) -- [REQUIRED]
The ARN of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) -- [REQUIRED]
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModel (string) -- [REQUIRED]
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.
ruleDefinitionType (string) -- [REQUIRED]
The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.
rules (list) --
The rules that can be used for ID mapping.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) -- [REQUIRED]
A name for the matching rule.
list
[REQUIRED]
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and Type.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId which all sourceIds will resolve to.
list
A list of IdMappingWorkflowOutputSource objects, each of which contains fields OutputS3Path and Output.
(dict) --
The output source for the ID mapping workflow.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) -- [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
string
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
dict
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
string
[REQUIRED]
The name of the workflow. There can't be multiple IdMappingWorkflows with the same name.
dict
Response Syntax
{ 'description': 'string', 'idMappingTechniques': { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE'|'TARGET', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'type': 'SOURCE'|'TARGET' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'outputS3Path': 'string' }, ], 'roleArn': 'string', 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) --
description (string) --
A description of the workflow.
idMappingTechniques (dict) --
An object which defines the ID mapping technique and any additional configurations.
idMappingType (string) --
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) --
The ARN of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModel (string) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.
ruleDefinitionType (string) --
The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.
rules (list) --
The rules that can be used for ID mapping.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and Type.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId which all sourceIds will resolve to.
outputSourceConfig (list) --
A list of IdMappingWorkflowOutputSource objects, each of which contains fields OutputS3Path and Output.
(dict) --
The output source for the ID mapping workflow.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
workflowArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the IDMappingWorkflow.
workflowName (string) --
The name of the workflow.
{'idMappingWorkflowProperties': {'idMappingType': {'RULE_BASED'}, 'ruleBasedProperties': {'attributeMatchingModel': 'ONE_TO_ONE ' '| ' 'MANY_TO_MANY', 'recordMatchingModels': ['ONE_SOURCE_TO_ONE_TARGET ' '| ' 'MANY_SOURCE_TO_ONE_TARGET'], 'ruleDefinitionTypes': ['SOURCE ' '| ' 'TARGET'], 'rules': [{'matchingKeys': ['string'], 'ruleName': 'string'}]}}}
Creates an ID namespace object which will help customers provide metadata explaining their dataset and how to use it. Each ID namespace must have a unique name. To modify an existing ID namespace, use the UpdateIdNamespace API.
See also: AWS API Documentation
Request Syntax
client.create_id_namespace( description='string', idMappingWorkflowProperties=[ { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModels': [ 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', ], 'ruleDefinitionTypes': [ 'SOURCE'|'TARGET', ], 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, ], idNamespaceName='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string' }, ], roleArn='string', tags={ 'string': 'string' }, type='SOURCE'|'TARGET' )
string
The description of the ID namespace.
list
Determines the properties of IdMappingWorflow where this IdNamespace can be used as a Source or a Target.
(dict) --
An object containing IdMappingType, ProviderProperties, and RuleBasedProperties.
idMappingType (string) -- [REQUIRED]
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
providerConfiguration (:ref:`document<document>`) --
An object which defines any additional configurations required by the provider service.
providerServiceArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModels (list) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source is matched to one record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, all matching records in the source are matched to one record in the target.
(string) --
ruleDefinitionTypes (list) --
The sets of rules you can use in an ID mapping workflow. The limitations specified for the source and target must be compatible.
(string) --
rules (list) --
The rules for the ID namespace.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) -- [REQUIRED]
A name for the matching rule.
string
[REQUIRED]
The name of the ID namespace.
list
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN and SchemaName.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema.
string
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access the resources defined in this IdNamespace on your behalf as part of the workflow run.
dict
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
string
[REQUIRED]
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId to which all sourceIds will resolve to.
dict
Response Syntax
{ 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'idMappingWorkflowProperties': [ { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModels': [ 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', ], 'ruleDefinitionTypes': [ 'SOURCE'|'TARGET', ], 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, ], 'idNamespaceArn': 'string', 'idNamespaceName': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'roleArn': 'string', 'tags': { 'string': 'string' }, 'type': 'SOURCE'|'TARGET', 'updatedAt': datetime(2015, 1, 1) }
Response Structure
(dict) --
createdAt (datetime) --
The timestamp of when the ID namespace was created.
description (string) --
The description of the ID namespace.
idMappingWorkflowProperties (list) --
Determines the properties of IdMappingWorkflow where this IdNamespace can be used as a Source or a Target.
(dict) --
An object containing IdMappingType, ProviderProperties, and RuleBasedProperties.
idMappingType (string) --
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
providerConfiguration (:ref:`document<document>`) --
An object which defines any additional configurations required by the provider service.
providerServiceArn (string) --
The Amazon Resource Name (ARN) of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModels (list) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source is matched to one record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, all matching records in the source are matched to one record in the target.
(string) --
ruleDefinitionTypes (list) --
The sets of rules you can use in an ID mapping workflow. The limitations specified for the source and target must be compatible.
(string) --
rules (list) --
The rules for the ID namespace.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
idNamespaceArn (string) --
The Amazon Resource Name (ARN) of the ID namespace.
idNamespaceName (string) --
The name of the ID namespace.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN and SchemaName.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access the resources defined in inputSourceConfig on your behalf as part of the workflow run.
tags (dict) --
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId to which all sourceIds will resolve to.
updatedAt (datetime) --
The timestamp of when the ID namespace was last updated.
{'resolutionTechniques': {'ruleBasedProperties': {'matchPurpose': 'IDENTIFIER_GENERATION ' '| ' 'INDEXING'}}}
Creates a MatchingWorkflow object which stores the configuration of the data processing job to be run. It is important to note that there should not be a pre-existing MatchingWorkflow with the same name. To modify an existing workflow, utilize the UpdateMatchingWorkflow API.
See also: AWS API Documentation
Request Syntax
client.create_matching_workflow( description='string', incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, inputSourceConfig=[ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], resolutionTechniques={ 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, roleArn='string', tags={ 'string': 'string' }, workflowName='string' )
string
A description of the workflow.
dict
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. It takes only one value: IMMEDIATE.
list
[REQUIRED]
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and ApplyNormalization.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) -- [REQUIRED]
The name of the schema to be retrieved.
list
[REQUIRED]
A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
output (list) -- [REQUIRED]
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
hashed (boolean) --
Enables the ability to hash the column values in the output.
name (string) -- [REQUIRED]
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
outputS3Path (string) -- [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
dict
[REQUIRED]
An object which defines the resolutionType and the ruleBasedProperties.
providerProperties (dict) --
The properties of the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) -- [REQUIRED]
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) -- [REQUIRED]
The ARN of the provider service.
resolutionType (string) -- [REQUIRED]
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.
attributeMatchingModel (string) -- [REQUIRED]
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
rules (list) -- [REQUIRED]
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) -- [REQUIRED]
A name for the matching rule.
string
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
dict
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
string
[REQUIRED]
The name of the workflow. There can't be multiple MatchingWorkflows with the same name.
dict
Response Syntax
{ 'description': 'string', 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'inputSourceConfig': [ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], 'resolutionTechniques': { 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'roleArn': 'string', 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) --
description (string) --
A description of the workflow.
incrementalRunConfig (dict) --
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. It takes only one value: IMMEDIATE.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and ApplyNormalization.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
output (list) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
hashed (boolean) --
Enables the ability to hash the column values in the output.
name (string) --
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) --
An object which defines the resolutionType and the ruleBasedProperties.
providerProperties (dict) --
The properties of the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) --
The ARN of the provider service.
resolutionType (string) --
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
rules (list) --
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
workflowArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.
workflowName (string) --
The name of the workflow.
{'mappedInputFields': {'hashed': 'boolean'}}
Creates a schema mapping, which defines the schema of the input customer records table. The SchemaMapping also provides Entity Resolution with some metadata about the table, such as the attribute types of the columns and which columns to match on.
See also: AWS API Documentation
Request Syntax
client.create_schema_mapping( description='string', mappedInputFields=[ { 'fieldName': 'string', 'groupName': 'string', 'hashed': True|False, 'matchKey': 'string', 'subType': 'string', 'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID' }, ], schemaName='string', tags={ 'string': 'string' } )
string
A description of the schema.
list
[REQUIRED]
A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.
(dict) --
An object containing FieldName, Type, GroupName, MatchKey, Hashing, and SubType.
fieldName (string) -- [REQUIRED]
A string containing the field name.
groupName (string) --
A string that instructs Entity Resolution to combine several columns into a unified column with the identical attribute type.
For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common groupName will prompt Entity Resolution to concatenate them into a single value.
hashed (boolean) --
Indicates if the column values are hashed in the schema input. If the value is set to TRUE, the column values are hashed. If the value is set to FALSE, the column values are cleartext.
matchKey (string) --
A key that allows grouping of multiple input attributes into a unified matching group.
For example, consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning a matchKey called address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group.
If no matchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.
subType (string) --
The subtype of the attribute, selected from a list of values.
type (string) -- [REQUIRED]
The type of the attribute, selected from a list of values.
string
[REQUIRED]
The name of the schema. There can't be multiple SchemaMappings with the same name.
dict
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
dict
Response Syntax
{ 'description': 'string', 'mappedInputFields': [ { 'fieldName': 'string', 'groupName': 'string', 'hashed': True|False, 'matchKey': 'string', 'subType': 'string', 'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID' }, ], 'schemaArn': 'string', 'schemaName': 'string' }
Response Structure
(dict) --
description (string) --
A description of the schema.
mappedInputFields (list) --
A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.
(dict) --
An object containing FieldName, Type, GroupName, MatchKey, Hashing, and SubType.
fieldName (string) --
A string containing the field name.
groupName (string) --
A string that instructs Entity Resolution to combine several columns into a unified column with the identical attribute type.
For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common groupName will prompt Entity Resolution to concatenate them into a single value.
hashed (boolean) --
Indicates if the column values are hashed in the schema input. If the value is set to TRUE, the column values are hashed. If the value is set to FALSE, the column values are cleartext.
matchKey (string) --
A key that allows grouping of multiple input attributes into a unified matching group.
For example, consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning a matchKey called address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group.
If no matchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.
subType (string) --
The subtype of the attribute, selected from a list of values.
type (string) --
The type of the attribute, selected from a list of values.
schemaArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.
schemaName (string) --
The name of the schema.
{'metrics': {'totalMappedRecords': 'integer', 'totalMappedSourceRecords': 'integer', 'totalMappedTargetRecords': 'integer'}}
Gets the status, metrics, and errors (if there are any) that are associated with a job.
See also: AWS API Documentation
Request Syntax
client.get_id_mapping_job( jobId='string', workflowName='string' )
string
[REQUIRED]
The ID of the job.
string
[REQUIRED]
The name of the workflow.
dict
Response Syntax
{ 'endTime': datetime(2015, 1, 1), 'errorDetails': { 'errorMessage': 'string' }, 'jobId': 'string', 'metrics': { 'inputRecords': 123, 'recordsNotProcessed': 123, 'totalMappedRecords': 123, 'totalMappedSourceRecords': 123, 'totalMappedTargetRecords': 123, 'totalRecordsProcessed': 123 }, 'outputSourceConfig': [ { 'KMSArn': 'string', 'outputS3Path': 'string', 'roleArn': 'string' }, ], 'startTime': datetime(2015, 1, 1), 'status': 'RUNNING'|'SUCCEEDED'|'FAILED'|'QUEUED' }
Response Structure
(dict) --
endTime (datetime) --
The time at which the job has finished.
errorDetails (dict) --
An object containing an error message, if there was an error.
errorMessage (string) --
The error message from the job, if there is one.
jobId (string) --
The ID of the job.
metrics (dict) --
Metrics associated with the execution, specifically total records processed, unique IDs generated, and records the execution skipped.
inputRecords (integer) --
The total number of records that were input for processing.
recordsNotProcessed (integer) --
The total number of records that did not get processed.
totalMappedRecords (integer) --
The total number of records that were mapped.
totalMappedSourceRecords (integer) --
The total number of mapped source records.
totalMappedTargetRecords (integer) --
The total number of distinct mapped target records.
totalRecordsProcessed (integer) --
The total number of records that were processed.
outputSourceConfig (list) --
A list of OutputSource objects.
(dict) --
An object containing KMSArn, OutputS3Path, and RoleARN.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf as part of workflow execution.
startTime (datetime) --
The time at which the job was started.
status (string) --
The current status of the job.
{'idMappingTechniques': {'idMappingType': {'RULE_BASED'}, 'ruleBasedProperties': {'attributeMatchingModel': 'ONE_TO_ONE ' '| ' 'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET ' '| ' 'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE ' '| ' 'TARGET', 'rules': [{'matchingKeys': ['string'], 'ruleName': 'string'}]}}}
Returns the IdMappingWorkflow with a given name, if it exists.
See also: AWS API Documentation
Request Syntax
client.get_id_mapping_workflow( workflowName='string' )
string
[REQUIRED]
The name of the workflow.
dict
Response Syntax
{ 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'idMappingTechniques': { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE'|'TARGET', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'type': 'SOURCE'|'TARGET' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'outputS3Path': 'string' }, ], 'roleArn': 'string', 'tags': { 'string': 'string' }, 'updatedAt': datetime(2015, 1, 1), 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) --
createdAt (datetime) --
The timestamp of when the workflow was created.
description (string) --
A description of the workflow.
idMappingTechniques (dict) --
An object which defines the ID mapping technique and any additional configurations.
idMappingType (string) --
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) --
The ARN of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModel (string) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.
ruleDefinitionType (string) --
The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.
rules (list) --
The rules that can be used for ID mapping.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and Type.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId which all sourceIds will resolve to.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields OutputS3Path and KMSArn.
(dict) --
The output source for the ID mapping workflow.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
tags (dict) --
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
updatedAt (datetime) --
The timestamp of when the workflow was last updated.
workflowArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the IdMappingWorkflow .
workflowName (string) --
The name of the workflow.
{'idMappingWorkflowProperties': {'idMappingType': {'RULE_BASED'}, 'ruleBasedProperties': {'attributeMatchingModel': 'ONE_TO_ONE ' '| ' 'MANY_TO_MANY', 'recordMatchingModels': ['ONE_SOURCE_TO_ONE_TARGET ' '| ' 'MANY_SOURCE_TO_ONE_TARGET'], 'ruleDefinitionTypes': ['SOURCE ' '| ' 'TARGET'], 'rules': [{'matchingKeys': ['string'], 'ruleName': 'string'}]}}}
Returns the IdNamespace with a given name, if it exists.
See also: AWS API Documentation
Request Syntax
client.get_id_namespace( idNamespaceName='string' )
string
[REQUIRED]
The name of the ID namespace.
dict
Response Syntax
{ 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'idMappingWorkflowProperties': [ { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModels': [ 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', ], 'ruleDefinitionTypes': [ 'SOURCE'|'TARGET', ], 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, ], 'idNamespaceArn': 'string', 'idNamespaceName': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'roleArn': 'string', 'tags': { 'string': 'string' }, 'type': 'SOURCE'|'TARGET', 'updatedAt': datetime(2015, 1, 1) }
Response Structure
(dict) --
createdAt (datetime) --
The timestamp of when the ID namespace was created.
description (string) --
The description of the ID namespace.
idMappingWorkflowProperties (list) --
Determines the properties of IdMappingWorkflow where this IdNamespace can be used as a Source or a Target.
(dict) --
An object containing IdMappingType, ProviderProperties, and RuleBasedProperties.
idMappingType (string) --
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
providerConfiguration (:ref:`document<document>`) --
An object which defines any additional configurations required by the provider service.
providerServiceArn (string) --
The Amazon Resource Name (ARN) of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModels (list) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source is matched to one record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, all matching records in the source are matched to one record in the target.
(string) --
ruleDefinitionTypes (list) --
The sets of rules you can use in an ID mapping workflow. The limitations specified for the source and target must be compatible.
(string) --
rules (list) --
The rules for the ID namespace.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
idNamespaceArn (string) --
The Amazon Resource Name (ARN) of the ID namespace.
idNamespaceName (string) --
The name of the ID namespace.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN and SchemaName.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access the resources defined in this IdNamespace on your behalf as part of a workflow run.
tags (dict) --
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId to which all sourceIds will resolve to.
updatedAt (datetime) --
The timestamp of when the ID namespace was last updated.
{'resolutionTechniques': {'ruleBasedProperties': {'matchPurpose': 'IDENTIFIER_GENERATION ' '| ' 'INDEXING'}}}
Returns the MatchingWorkflow with a given name, if it exists.
See also: AWS API Documentation
Request Syntax
client.get_matching_workflow( workflowName='string' )
string
[REQUIRED]
The name of the workflow.
dict
Response Syntax
{ 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'inputSourceConfig': [ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], 'resolutionTechniques': { 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'roleArn': 'string', 'tags': { 'string': 'string' }, 'updatedAt': datetime(2015, 1, 1), 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) --
createdAt (datetime) --
The timestamp of when the workflow was created.
description (string) --
A description of the workflow.
incrementalRunConfig (dict) --
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. It takes only one value: IMMEDIATE.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and ApplyNormalization.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
output (list) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
hashed (boolean) --
Enables the ability to hash the column values in the output.
name (string) --
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) --
An object which defines the resolutionType and the ruleBasedProperties.
providerProperties (dict) --
The properties of the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) --
The ARN of the provider service.
resolutionType (string) --
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
rules (list) --
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
tags (dict) --
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
updatedAt (datetime) --
The timestamp of when the workflow was last updated.
workflowArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.
workflowName (string) --
The name of the workflow.
{'mappedInputFields': {'hashed': 'boolean'}}
Returns the SchemaMapping of a given name.
See also: AWS API Documentation
Request Syntax
client.get_schema_mapping( schemaName='string' )
string
[REQUIRED]
The name of the schema to be retrieved.
dict
Response Syntax
{ 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'hasWorkflows': True|False, 'mappedInputFields': [ { 'fieldName': 'string', 'groupName': 'string', 'hashed': True|False, 'matchKey': 'string', 'subType': 'string', 'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID' }, ], 'schemaArn': 'string', 'schemaName': 'string', 'tags': { 'string': 'string' }, 'updatedAt': datetime(2015, 1, 1) }
Response Structure
(dict) --
createdAt (datetime) --
The timestamp of when the SchemaMapping was created.
description (string) --
A description of the schema.
hasWorkflows (boolean) --
Specifies whether the schema mapping has been applied to a workflow.
mappedInputFields (list) --
A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information Venice uses for matching.
(dict) --
An object containing FieldName, Type, GroupName, MatchKey, Hashing, and SubType.
fieldName (string) --
A string containing the field name.
groupName (string) --
A string that instructs Entity Resolution to combine several columns into a unified column with the identical attribute type.
For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common groupName will prompt Entity Resolution to concatenate them into a single value.
hashed (boolean) --
Indicates if the column values are hashed in the schema input. If the value is set to TRUE, the column values are hashed. If the value is set to FALSE, the column values are cleartext.
matchKey (string) --
A key that allows grouping of multiple input attributes into a unified matching group.
For example, consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning a matchKey called address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group.
If no matchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.
subType (string) --
The subtype of the attribute, selected from a list of values.
type (string) --
The type of the attribute, selected from a list of values.
schemaArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.
schemaName (string) --
The name of the schema.
tags (dict) --
The tags used to organize, track, or control access for this resource.
(string) --
(string) --
updatedAt (datetime) --
The timestamp of when the SchemaMapping was last updated.
{'idNamespaceSummaries': {'idMappingWorkflowProperties': [{'idMappingType': 'PROVIDER ' '| ' 'RULE_BASED'}]}}
Returns a list of all ID namespaces.
See also: AWS API Documentation
Request Syntax
client.list_id_namespaces( maxResults=123, nextToken='string' )
integer
The maximum number of IdNamespace objects returned per page.
string
The pagination token from the previous API call.
dict
Response Syntax
{ 'idNamespaceSummaries': [ { 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'idMappingWorkflowProperties': [ { 'idMappingType': 'PROVIDER'|'RULE_BASED' }, ], 'idNamespaceArn': 'string', 'idNamespaceName': 'string', 'type': 'SOURCE'|'TARGET', 'updatedAt': datetime(2015, 1, 1) }, ], 'nextToken': 'string' }
Response Structure
(dict) --
idNamespaceSummaries (list) --
A list of IdNamespaceSummaries objects.
(dict) --
A summary of ID namespaces.
createdAt (datetime) --
The timestamp of when the ID namespace was created.
description (string) --
The description of the ID namespace.
idMappingWorkflowProperties (list) --
An object which defines any additional configurations required by the ID mapping workflow.
(dict) --
The settings for the ID namespace for the ID mapping workflow job.
idMappingType (string) --
The type of ID mapping.
idNamespaceArn (string) --
The Amazon Resource Name (ARN) of the ID namespace.
idNamespaceName (string) --
The name of the ID namespace.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId which all sourceIds will resolve to.
updatedAt (datetime) --
The timestamp of when the ID namespace was last updated.
nextToken (string) --
The pagination token from the previous API call.
{'idMappingTechniques': {'idMappingType': {'RULE_BASED'}, 'ruleBasedProperties': {'attributeMatchingModel': 'ONE_TO_ONE ' '| ' 'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET ' '| ' 'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE ' '| ' 'TARGET', 'rules': [{'matchingKeys': ['string'], 'ruleName': 'string'}]}}}
Updates an existing IdMappingWorkflow. This method is identical to CreateIdMappingWorkflow, except it uses an HTTP PUT request instead of a POST request, and the IdMappingWorkflow must already exist for the method to succeed.
See also: AWS API Documentation
Request Syntax
client.update_id_mapping_workflow( description='string', idMappingTechniques={ 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE'|'TARGET', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string', 'type': 'SOURCE'|'TARGET' }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'outputS3Path': 'string' }, ], roleArn='string', workflowName='string' )
string
A description of the workflow.
dict
[REQUIRED]
An object which defines the ID mapping technique and any additional configurations.
idMappingType (string) -- [REQUIRED]
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) -- [REQUIRED]
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) -- [REQUIRED]
The ARN of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) -- [REQUIRED]
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModel (string) -- [REQUIRED]
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.
ruleDefinitionType (string) -- [REQUIRED]
The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.
rules (list) --
The rules that can be used for ID mapping.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) -- [REQUIRED]
A name for the matching rule.
list
[REQUIRED]
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and Type.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId which all sourceIds will resolve to.
list
A list of OutputSource objects, each of which contains fields OutputS3Path and KMSArn.
(dict) --
The output source for the ID mapping workflow.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) -- [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
string
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
string
[REQUIRED]
The name of the workflow.
dict
Response Syntax
{ 'description': 'string', 'idMappingTechniques': { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModel': 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', 'ruleDefinitionType': 'SOURCE'|'TARGET', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string', 'type': 'SOURCE'|'TARGET' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'outputS3Path': 'string' }, ], 'roleArn': 'string', 'workflowArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) --
description (string) --
A description of the workflow.
idMappingTechniques (dict) --
An object which defines the ID mapping technique and any additional configurations.
idMappingType (string) --
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) --
The ARN of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of the BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModel (string) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source can be matched to the same record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, multiple records in the source can be matched to one record in the target.
ruleDefinitionType (string) --
The set of rules you can use in an ID mapping workflow. The limitations specified for the source or target to define the match rules must be compatible.
rules (list) --
The rules that can be used for ID mapping.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and Type.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId which all sourceIds will resolve to.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields OutputS3Path and KMSArn.
(dict) --
The output source for the ID mapping workflow.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
workflowArn (string) --
The Amazon Resource Name (ARN) of the workflow role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
workflowName (string) --
The name of the workflow.
{'idMappingWorkflowProperties': {'idMappingType': {'RULE_BASED'}, 'ruleBasedProperties': {'attributeMatchingModel': 'ONE_TO_ONE ' '| ' 'MANY_TO_MANY', 'recordMatchingModels': ['ONE_SOURCE_TO_ONE_TARGET ' '| ' 'MANY_SOURCE_TO_ONE_TARGET'], 'ruleDefinitionTypes': ['SOURCE ' '| ' 'TARGET'], 'rules': [{'matchingKeys': ['string'], 'ruleName': 'string'}]}}}
Updates an existing ID namespace.
See also: AWS API Documentation
Request Syntax
client.update_id_namespace( description='string', idMappingWorkflowProperties=[ { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModels': [ 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', ], 'ruleDefinitionTypes': [ 'SOURCE'|'TARGET', ], 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, ], idNamespaceName='string', inputSourceConfig=[ { 'inputSourceARN': 'string', 'schemaName': 'string' }, ], roleArn='string' )
string
The description of the ID namespace.
list
Determines the properties of IdMappingWorkflow where this IdNamespace can be used as a Source or a Target.
(dict) --
An object containing IdMappingType, ProviderProperties, and RuleBasedProperties.
idMappingType (string) -- [REQUIRED]
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
providerConfiguration (:ref:`document<document>`) --
An object which defines any additional configurations required by the provider service.
providerServiceArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModels (list) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source is matched to one record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, all matching records in the source are matched to one record in the target.
(string) --
ruleDefinitionTypes (list) --
The sets of rules you can use in an ID mapping workflow. The limitations specified for the source and target must be compatible.
(string) --
rules (list) --
The rules for the ID namespace.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) -- [REQUIRED]
A name for the matching rule.
string
[REQUIRED]
The name of the ID namespace.
list
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN and SchemaName.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema.
string
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access the resources defined in this IdNamespace on your behalf as part of a workflow run.
dict
Response Syntax
{ 'createdAt': datetime(2015, 1, 1), 'description': 'string', 'idMappingWorkflowProperties': [ { 'idMappingType': 'PROVIDER'|'RULE_BASED', 'providerProperties': { 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'recordMatchingModels': [ 'ONE_SOURCE_TO_ONE_TARGET'|'MANY_SOURCE_TO_ONE_TARGET', ], 'ruleDefinitionTypes': [ 'SOURCE'|'TARGET', ], 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, ], 'idNamespaceArn': 'string', 'idNamespaceName': 'string', 'inputSourceConfig': [ { 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'roleArn': 'string', 'type': 'SOURCE'|'TARGET', 'updatedAt': datetime(2015, 1, 1) }
Response Structure
(dict) --
createdAt (datetime) --
The timestamp of when the ID namespace was created.
description (string) --
The description of the ID namespace.
idMappingWorkflowProperties (list) --
Determines the properties of IdMappingWorkflow where this IdNamespace can be used as a Source or a Target.
(dict) --
An object containing IdMappingType, ProviderProperties, and RuleBasedProperties.
idMappingType (string) --
The type of ID mapping.
providerProperties (dict) --
An object which defines any additional configurations required by the provider service.
providerConfiguration (:ref:`document<document>`) --
An object which defines any additional configurations required by the provider service.
providerServiceArn (string) --
The Amazon Resource Name (ARN) of the provider service.
ruleBasedProperties (dict) --
An object which defines any additional configurations required by rule-based matching.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A matches the value of BusinessEmail field of Profile B, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
recordMatchingModels (list) --
The type of matching record that is allowed to be used in an ID mapping workflow.
If the value is set to ONE_SOURCE_TO_ONE_TARGET, only one record in the source is matched to one record in the target.
If the value is set to MANY_SOURCE_TO_ONE_TARGET, all matching records in the source are matched to one record in the target.
(string) --
ruleDefinitionTypes (list) --
The sets of rules you can use in an ID mapping workflow. The limitations specified for the source and target must be compatible.
(string) --
rules (list) --
The rules for the ID namespace.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
idNamespaceArn (string) --
The Amazon Resource Name (ARN) of the ID namespace.
idNamespaceName (string) --
The name of the ID namespace.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN and SchemaName.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) or a matching workflow ARN for the input source table.
schemaName (string) --
The name of the schema.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access the resources defined in this IdNamespace on your behalf as part of a workflow run.
type (string) --
The type of ID namespace. There are two types: SOURCE and TARGET.
The SOURCE contains configurations for sourceId data that will be processed in an ID mapping workflow.
The TARGET contains a configuration of targetId to which all sourceIds will resolve to.
updatedAt (datetime) --
The timestamp of when the ID namespace was last updated.
{'resolutionTechniques': {'ruleBasedProperties': {'matchPurpose': 'IDENTIFIER_GENERATION ' '| ' 'INDEXING'}}}
Updates an existing MatchingWorkflow. This method is identical to CreateMatchingWorkflow, except it uses an HTTP PUT request instead of a POST request, and the MatchingWorkflow must already exist for the method to succeed.
See also: AWS API Documentation
Request Syntax
client.update_matching_workflow( description='string', incrementalRunConfig={ 'incrementalRunType': 'IMMEDIATE' }, inputSourceConfig=[ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], outputSourceConfig=[ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], resolutionTechniques={ 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, roleArn='string', workflowName='string' )
string
A description of the workflow.
dict
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. It takes only one value: IMMEDIATE.
list
[REQUIRED]
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and ApplyNormalization.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
inputSourceARN (string) -- [REQUIRED]
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) -- [REQUIRED]
The name of the schema to be retrieved.
list
[REQUIRED]
A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
output (list) -- [REQUIRED]
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
hashed (boolean) --
Enables the ability to hash the column values in the output.
name (string) -- [REQUIRED]
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
outputS3Path (string) -- [REQUIRED]
The S3 path to which Entity Resolution will write the output table.
dict
[REQUIRED]
An object which defines the resolutionType and the ruleBasedProperties.
providerProperties (dict) --
The properties of the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) -- [REQUIRED]
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) -- [REQUIRED]
The ARN of the provider service.
resolutionType (string) -- [REQUIRED]
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.
attributeMatchingModel (string) -- [REQUIRED]
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
rules (list) -- [REQUIRED]
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) -- [REQUIRED]
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) -- [REQUIRED]
A name for the matching rule.
string
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
string
[REQUIRED]
The name of the workflow to be retrieved.
dict
Response Syntax
{ 'description': 'string', 'incrementalRunConfig': { 'incrementalRunType': 'IMMEDIATE' }, 'inputSourceConfig': [ { 'applyNormalization': True|False, 'inputSourceARN': 'string', 'schemaName': 'string' }, ], 'outputSourceConfig': [ { 'KMSArn': 'string', 'applyNormalization': True|False, 'output': [ { 'hashed': True|False, 'name': 'string' }, ], 'outputS3Path': 'string' }, ], 'resolutionTechniques': { 'providerProperties': { 'intermediateSourceConfiguration': { 'intermediateS3Path': 'string' }, 'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None, 'providerServiceArn': 'string' }, 'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER', 'ruleBasedProperties': { 'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY', 'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING', 'rules': [ { 'matchingKeys': [ 'string', ], 'ruleName': 'string' }, ] } }, 'roleArn': 'string', 'workflowName': 'string' }
Response Structure
(dict) --
description (string) --
A description of the workflow.
incrementalRunConfig (dict) --
An object which defines an incremental run type and has only incrementalRunType as a field.
incrementalRunType (string) --
The type of incremental run. It takes only one value: IMMEDIATE.
inputSourceConfig (list) --
A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
(dict) --
An object containing InputSourceARN, SchemaName, and ApplyNormalization.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
inputSourceARN (string) --
An Glue table Amazon Resource Name (ARN) for the input source table.
schemaName (string) --
The name of the schema to be retrieved.
outputSourceConfig (list) --
A list of OutputSource objects, each of which contains fields OutputS3Path, ApplyNormalization, and Output.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
KMSArn (string) --
Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
applyNormalization (boolean) --
Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
output (list) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
(dict) --
A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
hashed (boolean) --
Enables the ability to hash the column values in the output.
name (string) --
A name of a column to be written to the output. This must be an InputField name in the schema mapping.
outputS3Path (string) --
The S3 path to which Entity Resolution will write the output table.
resolutionTechniques (dict) --
An object which defines the resolutionType and the ruleBasedProperties
providerProperties (dict) --
The properties of the provider service.
intermediateSourceConfiguration (dict) --
The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
intermediateS3Path (string) --
The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
providerConfiguration (:ref:`document<document>`) --
The required configuration fields to use with the provider service.
providerServiceArn (string) --
The ARN of the provider service.
resolutionType (string) --
The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --
An object which defines the list of matching rules to run and has a field Rules, which is a list of rule objects.
attributeMatchingModel (string) --
The comparison type. You can either choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
matchPurpose (string) --
An indicator of whether to generate IDs and index the data or not.
If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
If you choose INDEXING, the process indexes the data without generating IDs.
rules (list) --
A list of Rule objects, each of which have fields RuleName and MatchingKeys.
(dict) --
An object containing RuleName, and MatchingKeys.
matchingKeys (list) --
A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
(string) --
ruleName (string) --
A name for the matching rule.
roleArn (string) --
The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.
workflowName (string) --
The name of the workflow.
{'mappedInputFields': {'hashed': 'boolean'}}
Updates a schema mapping.
See also: AWS API Documentation
Request Syntax
client.update_schema_mapping( description='string', mappedInputFields=[ { 'fieldName': 'string', 'groupName': 'string', 'hashed': True|False, 'matchKey': 'string', 'subType': 'string', 'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID' }, ], schemaName='string' )
string
A description of the schema.
list
[REQUIRED]
A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.
(dict) --
An object containing FieldName, Type, GroupName, MatchKey, Hashing, and SubType.
fieldName (string) -- [REQUIRED]
A string containing the field name.
groupName (string) --
A string that instructs Entity Resolution to combine several columns into a unified column with the identical attribute type.
For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common groupName will prompt Entity Resolution to concatenate them into a single value.
hashed (boolean) --
Indicates if the column values are hashed in the schema input. If the value is set to TRUE, the column values are hashed. If the value is set to FALSE, the column values are cleartext.
matchKey (string) --
A key that allows grouping of multiple input attributes into a unified matching group.
For example, consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning a matchKey called address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group.
If no matchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.
subType (string) --
The subtype of the attribute, selected from a list of values.
type (string) -- [REQUIRED]
The type of the attribute, selected from a list of values.
string
[REQUIRED]
The name of the schema. There can't be multiple SchemaMappings with the same name.
dict
Response Syntax
{ 'description': 'string', 'mappedInputFields': [ { 'fieldName': 'string', 'groupName': 'string', 'hashed': True|False, 'matchKey': 'string', 'subType': 'string', 'type': 'NAME'|'NAME_FIRST'|'NAME_MIDDLE'|'NAME_LAST'|'ADDRESS'|'ADDRESS_STREET1'|'ADDRESS_STREET2'|'ADDRESS_STREET3'|'ADDRESS_CITY'|'ADDRESS_STATE'|'ADDRESS_COUNTRY'|'ADDRESS_POSTALCODE'|'PHONE'|'PHONE_NUMBER'|'PHONE_COUNTRYCODE'|'EMAIL_ADDRESS'|'UNIQUE_ID'|'DATE'|'STRING'|'PROVIDER_ID' }, ], 'schemaArn': 'string', 'schemaName': 'string' }
Response Structure
(dict) --
description (string) --
A description of the schema.
mappedInputFields (list) --
A list of MappedInputFields. Each MappedInputField corresponds to a column the source data table, and contains column name plus additional information that Entity Resolution uses for matching.
(dict) --
An object containing FieldName, Type, GroupName, MatchKey, Hashing, and SubType.
fieldName (string) --
A string containing the field name.
groupName (string) --
A string that instructs Entity Resolution to combine several columns into a unified column with the identical attribute type.
For example, when working with columns such as first_name, middle_name, and last_name, assigning them a common groupName will prompt Entity Resolution to concatenate them into a single value.
hashed (boolean) --
Indicates if the column values are hashed in the schema input. If the value is set to TRUE, the column values are hashed. If the value is set to FALSE, the column values are cleartext.
matchKey (string) --
A key that allows grouping of multiple input attributes into a unified matching group.
For example, consider a scenario where the source table contains various addresses, such as business_address and shipping_address. By assigning a matchKey called address to both attributes, Entity Resolution will match records across these fields to create a consolidated matching group.
If no matchKey is specified for a column, it won't be utilized for matching purposes but will still be included in the output table.
subType (string) --
The subtype of the attribute, selected from a list of values.
type (string) --
The type of the attribute, selected from a list of values.
schemaArn (string) --
The ARN (Amazon Resource Name) that Entity Resolution generated for the SchemaMapping.
schemaName (string) --
The name of the schema.