AWS API Changes

2025/07/31 - AWS EntityResolution - 3 updated api methods

Changes Add support for creating advanced rule-based matching workflows in AWS Entity Resolution.

CreateMatchingWorkflow (updated)

Link ¶
Changes (both)

{'resolutionTechniques': {'ruleConditionProperties': {'rules': [{'condition': 'string',
                                                                 'ruleName': 'string'}]}}}

Creates a matching workflow that defines the configuration for a data processing job. The workflow name must be unique. To modify an existing workflow, use UpdateMatchingWorkflow.

See also: AWS API Documentation

Request Syntax

client.create_matching_workflow(
    workflowName='string',
    description='string',
    inputSourceConfig=[
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'applyNormalization': True|False
        },
    ],
    outputSourceConfig=[
        {
            'outputS3Path': 'string',
            'KMSArn': 'string',
            'output': [
                {
                    'name': 'string',
                    'hashed': True|False
                },
            ],
            'applyNormalization': True|False
        },
    ],
    resolutionTechniques={
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING'
        },
        'ruleConditionProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'condition': 'string'
                },
            ]
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    incrementalRunConfig={
        'incrementalRunType': 'IMMEDIATE'
    },
    roleArn='string',
    tags={
        'string': 'string'
    }
)

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow. There can't be multiple MatchingWorkflows with the same name.

type description:

string

param description:

A description of the workflow.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

(dict) --

An object containing inputSourceARN, schemaName, and applyNormalization.
- inputSourceARN (string) -- [REQUIRED]
  
  An Glue table Amazon Resource Name (ARN) for the input source table.
- schemaName (string) -- [REQUIRED]
  
  The name of the schema to be retrieved.
- applyNormalization (boolean) --
  
  Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.

(dict) --

A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
- outputS3Path (string) -- [REQUIRED]
  
  The S3 path to which Entity Resolution will write the output table.
- KMSArn (string) --
  
  Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
- output (list) -- [REQUIRED]
  
  A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
  - (dict) --
    
    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
    - name (string) -- [REQUIRED]
      
      A name of a column to be written to the output. This must be an InputField name in the schema mapping.
    - hashed (boolean) --
      
      Enables the ability to hash the column values in the output.
- applyNormalization (boolean) --
  
  Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

type resolutionTechniques:

dict

param resolutionTechniques:

[REQUIRED]

An object which defines the resolutionType and the ruleBasedProperties.

resolutionType (string) -- [REQUIRED]

The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --

An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
- rules (list) -- [REQUIRED]
  
  A list of Rule objects, each of which have fields RuleName and MatchingKeys.
  - (dict) --
    
    An object containing the ruleName and matchingKeys.
    - ruleName (string) -- [REQUIRED]
      
      A name for the matching rule.
    - matchingKeys (list) -- [REQUIRED]
      
      A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
      - (string) --
- attributeMatchingModel (string) -- [REQUIRED]
  
  The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
  
  If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
  
  If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
- matchPurpose (string) --
  
  An indicator of whether to generate IDs and index the data or not.
  
  If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
  
  If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --

An object containing the rules for a matching workflow.
- rules (list) -- [REQUIRED]
  
  A list of rule objects, each of which have fields ruleName and condition.
  - (dict) --
    
    An object that defines the ruleCondition and the ruleName to use in a matching workflow.
    - ruleName (string) -- [REQUIRED]
      
      A name for the matching rule.
      
      For example: Rule1
    - condition (string) -- [REQUIRED]
      
      A statement that specifies the conditions for a matching rule.
      
      If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
      
      If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
      
      Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
      
      For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --

The properties of the provider service.
- providerServiceArn (string) -- [REQUIRED]
  
  The ARN of the provider service.
- providerConfiguration (:ref:`document<document>`) --
  
  The required configuration fields to use with the provider service.
- intermediateSourceConfiguration (dict) --
  
  The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
  - intermediateS3Path (string) -- [REQUIRED]
    
    The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

type incrementalRunConfig:

dict

param incrementalRunConfig:

Optional. An object that defines the incremental run type. This object contains only the incrementalRunType field, which appears as "Automatic" in the console.

incrementalRunType (string) --

The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.

Warning

For workflows where resolutionType is ML_MATCHING, incremental processing is not supported.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

type tags:

dict

param tags:

The tags used to organize, track, or control access for this resource.

(string) --
- (string) --

rtype:

dict

returns:

Response Syntax

{
    'workflowName': 'string',
    'workflowArn': 'string',
    'description': 'string',
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'applyNormalization': True|False
        },
    ],
    'outputSourceConfig': [
        {
            'outputS3Path': 'string',
            'KMSArn': 'string',
            'output': [
                {
                    'name': 'string',
                    'hashed': True|False
                },
            ],
            'applyNormalization': True|False
        },
    ],
    'resolutionTechniques': {
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING'
        },
        'ruleConditionProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'condition': 'string'
                },
            ]
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'roleArn': 'string'
}

Response Structure

(dict) --
- workflowName (string) --
  
  The name of the workflow.
- workflowArn (string) --
  
  The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.
- description (string) --
  
  A description of the workflow.
- inputSourceConfig (list) --
  
  A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
  - (dict) --
    
    An object containing inputSourceARN, schemaName, and applyNormalization.
    - inputSourceARN (string) --
      
      An Glue table Amazon Resource Name (ARN) for the input source table.
    - schemaName (string) --
      
      The name of the schema to be retrieved.
    - applyNormalization (boolean) --
      
      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
- outputSourceConfig (list) --
  
  A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
  - (dict) --
    
    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
    - outputS3Path (string) --
      
      The S3 path to which Entity Resolution will write the output table.
    - KMSArn (string) --
      
      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
    - output (list) --
      
      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
      - (dict) --
        
        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
        
        name (string) --
        
        A name of a column to be written to the output. This must be an InputField name in the schema mapping.
        
        hashed (boolean) --
        
        Enables the ability to hash the column values in the output.
    - applyNormalization (boolean) --
      
      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
- resolutionTechniques (dict) --
  
  An object which defines the resolutionType and the ruleBasedProperties.
  - resolutionType (string) --
    
    The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
  - ruleBasedProperties (dict) --
    
    An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
    - rules (list) --
      
      A list of Rule objects, each of which have fields RuleName and MatchingKeys.
      - (dict) --
        
        An object containing the ruleName and matchingKeys.
        
        ruleName (string) --
        
        A name for the matching rule.
        
        matchingKeys (list) --
        
        A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
        
        (string) --
    - attributeMatchingModel (string) --
      
      The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
      
      If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
      
      If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
    - matchPurpose (string) --
      
      An indicator of whether to generate IDs and index the data or not.
      
      If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
      
      If you choose INDEXING, the process indexes the data without generating IDs.
  - ruleConditionProperties (dict) --
    
    An object containing the rules for a matching workflow.
    - rules (list) --
      
      A list of rule objects, each of which have fields ruleName and condition.
      - (dict) --
        
        An object that defines the ruleCondition and the ruleName to use in a matching workflow.
        
        ruleName (string) --
        
        A name for the matching rule.
        
        For example: Rule1
        
        condition (string) --
        
        A statement that specifies the conditions for a matching rule.
        
        If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
        
        If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
        
        Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
        
        For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
  - providerProperties (dict) --
    
    The properties of the provider service.
    - providerServiceArn (string) --
      
      The ARN of the provider service.
    - providerConfiguration (:ref:`document<document>`) --
      
      The required configuration fields to use with the provider service.
    - intermediateSourceConfiguration (dict) --
      
      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
      - intermediateS3Path (string) --
        
        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
- incrementalRunConfig (dict) --
  
  An object which defines an incremental run type and has only incrementalRunType as a field.
  - incrementalRunType (string) --
    
    The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
    
    Warning
    
    For workflows where resolutionType is ML_MATCHING, incremental processing is not supported.
- roleArn (string) --
  
  The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

GetMatchingWorkflow (updated)

Link ¶
Changes (response)

{'resolutionTechniques': {'ruleConditionProperties': {'rules': [{'condition': 'string',
                                                                 'ruleName': 'string'}]}}}

Returns the MatchingWorkflow with a given name, if it exists.

See also: AWS API Documentation

Request Syntax

client.get_matching_workflow(
    workflowName='string'
)

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow.

rtype:

dict

returns:

Response Syntax

{
    'workflowName': 'string',
    'workflowArn': 'string',
    'description': 'string',
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'applyNormalization': True|False
        },
    ],
    'outputSourceConfig': [
        {
            'outputS3Path': 'string',
            'KMSArn': 'string',
            'output': [
                {
                    'name': 'string',
                    'hashed': True|False
                },
            ],
            'applyNormalization': True|False
        },
    ],
    'resolutionTechniques': {
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING'
        },
        'ruleConditionProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'condition': 'string'
                },
            ]
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    'createdAt': datetime(2015, 1, 1),
    'updatedAt': datetime(2015, 1, 1),
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'roleArn': 'string',
    'tags': {
        'string': 'string'
    }
}

Response Structure

(dict) --
- workflowName (string) --
  
  The name of the workflow.
- workflowArn (string) --
  
  The ARN (Amazon Resource Name) that Entity Resolution generated for the MatchingWorkflow.
- description (string) --
  
  A description of the workflow.
- inputSourceConfig (list) --
  
  A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
  - (dict) --
    
    An object containing inputSourceARN, schemaName, and applyNormalization.
    - inputSourceARN (string) --
      
      An Glue table Amazon Resource Name (ARN) for the input source table.
    - schemaName (string) --
      
      The name of the schema to be retrieved.
    - applyNormalization (boolean) --
      
      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
- outputSourceConfig (list) --
  
  A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
  - (dict) --
    
    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
    - outputS3Path (string) --
      
      The S3 path to which Entity Resolution will write the output table.
    - KMSArn (string) --
      
      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
    - output (list) --
      
      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
      - (dict) --
        
        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
        
        name (string) --
        
        A name of a column to be written to the output. This must be an InputField name in the schema mapping.
        
        hashed (boolean) --
        
        Enables the ability to hash the column values in the output.
    - applyNormalization (boolean) --
      
      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
- resolutionTechniques (dict) --
  
  An object which defines the resolutionType and the ruleBasedProperties.
  - resolutionType (string) --
    
    The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
  - ruleBasedProperties (dict) --
    
    An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
    - rules (list) --
      
      A list of Rule objects, each of which have fields RuleName and MatchingKeys.
      - (dict) --
        
        An object containing the ruleName and matchingKeys.
        
        ruleName (string) --
        
        A name for the matching rule.
        
        matchingKeys (list) --
        
        A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
        
        (string) --
    - attributeMatchingModel (string) --
      
      The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
      
      If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
      
      If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
    - matchPurpose (string) --
      
      An indicator of whether to generate IDs and index the data or not.
      
      If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
      
      If you choose INDEXING, the process indexes the data without generating IDs.
  - ruleConditionProperties (dict) --
    
    An object containing the rules for a matching workflow.
    - rules (list) --
      
      A list of rule objects, each of which have fields ruleName and condition.
      - (dict) --
        
        An object that defines the ruleCondition and the ruleName to use in a matching workflow.
        
        ruleName (string) --
        
        A name for the matching rule.
        
        For example: Rule1
        
        condition (string) --
        
        A statement that specifies the conditions for a matching rule.
        
        If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
        
        If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
        
        Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
        
        For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
  - providerProperties (dict) --
    
    The properties of the provider service.
    - providerServiceArn (string) --
      
      The ARN of the provider service.
    - providerConfiguration (:ref:`document<document>`) --
      
      The required configuration fields to use with the provider service.
    - intermediateSourceConfiguration (dict) --
      
      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
      - intermediateS3Path (string) --
        
        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
- createdAt (datetime) --
  
  The timestamp of when the workflow was created.
- updatedAt (datetime) --
  
  The timestamp of when the workflow was last updated.
- incrementalRunConfig (dict) --
  
  An object which defines an incremental run type and has only incrementalRunType as a field.
  - incrementalRunType (string) --
    
    The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
    
    Warning
    
    For workflows where resolutionType is ML_MATCHING, incremental processing is not supported.
- roleArn (string) --
  
  The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to access Amazon Web Services resources on your behalf.
- tags (dict) --
  
  The tags used to organize, track, or control access for this resource.
  - (string) --
    - (string) --

UpdateMatchingWorkflow (updated)

Link ¶
Changes (both)

{'resolutionTechniques': {'ruleConditionProperties': {'rules': [{'condition': 'string',
                                                                 'ruleName': 'string'}]}}}

Updates an existing matching workflow. The workflow must already exist for this operation to succeed.

See also: AWS API Documentation

Request Syntax

client.update_matching_workflow(
    workflowName='string',
    description='string',
    inputSourceConfig=[
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'applyNormalization': True|False
        },
    ],
    outputSourceConfig=[
        {
            'outputS3Path': 'string',
            'KMSArn': 'string',
            'output': [
                {
                    'name': 'string',
                    'hashed': True|False
                },
            ],
            'applyNormalization': True|False
        },
    ],
    resolutionTechniques={
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING'
        },
        'ruleConditionProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'condition': 'string'
                },
            ]
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    incrementalRunConfig={
        'incrementalRunType': 'IMMEDIATE'
    },
    roleArn='string'
)

type workflowName:

string

param workflowName:

[REQUIRED]

The name of the workflow to be retrieved.

type description:

string

param description:

A description of the workflow.

type inputSourceConfig:

list

param inputSourceConfig:

[REQUIRED]

A list of InputSource objects, which have the fields InputSourceARN and SchemaName.

(dict) --

An object containing inputSourceARN, schemaName, and applyNormalization.
- inputSourceARN (string) -- [REQUIRED]
  
  An Glue table Amazon Resource Name (ARN) for the input source table.
- schemaName (string) -- [REQUIRED]
  
  The name of the schema to be retrieved.
- applyNormalization (boolean) --
  
  Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

type outputSourceConfig:

list

param outputSourceConfig:

[REQUIRED]

A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.

(dict) --

A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
- outputS3Path (string) -- [REQUIRED]
  
  The S3 path to which Entity Resolution will write the output table.
- KMSArn (string) --
  
  Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
- output (list) -- [REQUIRED]
  
  A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
  - (dict) --
    
    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
    - name (string) -- [REQUIRED]
      
      A name of a column to be written to the output. This must be an InputField name in the schema mapping.
    - hashed (boolean) --
      
      Enables the ability to hash the column values in the output.
- applyNormalization (boolean) --
  
  Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.

type resolutionTechniques:

dict

param resolutionTechniques:

[REQUIRED]

An object which defines the resolutionType and the ruleBasedProperties.

resolutionType (string) -- [REQUIRED]

The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
ruleBasedProperties (dict) --

An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
- rules (list) -- [REQUIRED]
  
  A list of Rule objects, each of which have fields RuleName and MatchingKeys.
  - (dict) --
    
    An object containing the ruleName and matchingKeys.
    - ruleName (string) -- [REQUIRED]
      
      A name for the matching rule.
    - matchingKeys (list) -- [REQUIRED]
      
      A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
      - (string) --
- attributeMatchingModel (string) -- [REQUIRED]
  
  The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
  
  If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
  
  If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
- matchPurpose (string) --
  
  An indicator of whether to generate IDs and index the data or not.
  
  If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
  
  If you choose INDEXING, the process indexes the data without generating IDs.
ruleConditionProperties (dict) --

An object containing the rules for a matching workflow.
- rules (list) -- [REQUIRED]
  
  A list of rule objects, each of which have fields ruleName and condition.
  - (dict) --
    
    An object that defines the ruleCondition and the ruleName to use in a matching workflow.
    - ruleName (string) -- [REQUIRED]
      
      A name for the matching rule.
      
      For example: Rule1
    - condition (string) -- [REQUIRED]
      
      A statement that specifies the conditions for a matching rule.
      
      If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
      
      If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
      
      Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
      
      For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
providerProperties (dict) --

The properties of the provider service.
- providerServiceArn (string) -- [REQUIRED]
  
  The ARN of the provider service.
- providerConfiguration (:ref:`document<document>`) --
  
  The required configuration fields to use with the provider service.
- intermediateSourceConfiguration (dict) --
  
  The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
  - intermediateS3Path (string) -- [REQUIRED]
    
    The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET

type incrementalRunConfig:

dict

param incrementalRunConfig:

Optional. An object that defines the incremental run type. This object contains only the incrementalRunType field, which appears as "Automatic" in the console.

incrementalRunType (string) --

The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.

Warning

For workflows where resolutionType is ML_MATCHING, incremental processing is not supported.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.

rtype:

dict

returns:

Response Syntax

{
    'workflowName': 'string',
    'description': 'string',
    'inputSourceConfig': [
        {
            'inputSourceARN': 'string',
            'schemaName': 'string',
            'applyNormalization': True|False
        },
    ],
    'outputSourceConfig': [
        {
            'outputS3Path': 'string',
            'KMSArn': 'string',
            'output': [
                {
                    'name': 'string',
                    'hashed': True|False
                },
            ],
            'applyNormalization': True|False
        },
    ],
    'resolutionTechniques': {
        'resolutionType': 'RULE_MATCHING'|'ML_MATCHING'|'PROVIDER',
        'ruleBasedProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'matchingKeys': [
                        'string',
                    ]
                },
            ],
            'attributeMatchingModel': 'ONE_TO_ONE'|'MANY_TO_MANY',
            'matchPurpose': 'IDENTIFIER_GENERATION'|'INDEXING'
        },
        'ruleConditionProperties': {
            'rules': [
                {
                    'ruleName': 'string',
                    'condition': 'string'
                },
            ]
        },
        'providerProperties': {
            'providerServiceArn': 'string',
            'providerConfiguration': {...}|[...]|123|123.4|'string'|True|None,
            'intermediateSourceConfiguration': {
                'intermediateS3Path': 'string'
            }
        }
    },
    'incrementalRunConfig': {
        'incrementalRunType': 'IMMEDIATE'
    },
    'roleArn': 'string'
}

Response Structure

(dict) --
- workflowName (string) --
  
  The name of the workflow.
- description (string) --
  
  A description of the workflow.
- inputSourceConfig (list) --
  
  A list of InputSource objects, which have the fields InputSourceARN and SchemaName.
  - (dict) --
    
    An object containing inputSourceARN, schemaName, and applyNormalization.
    - inputSourceARN (string) --
      
      An Glue table Amazon Resource Name (ARN) for the input source table.
    - schemaName (string) --
      
      The name of the schema to be retrieved.
    - applyNormalization (boolean) --
      
      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
- outputSourceConfig (list) --
  
  A list of OutputSource objects, each of which contains fields outputS3Path, applyNormalization, KMSArn, and output.
  - (dict) --
    
    A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
    - outputS3Path (string) --
      
      The S3 path to which Entity Resolution will write the output table.
    - KMSArn (string) --
      
      Customer KMS ARN for encryption at rest. If not provided, system will use an Entity Resolution managed KMS key.
    - output (list) --
      
      A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
      - (dict) --
        
        A list of OutputAttribute objects, each of which have the fields Name and Hashed. Each of these objects selects a column to be included in the output table, and whether the values of the column should be hashed.
        
        name (string) --
        
        A name of a column to be written to the output. This must be an InputField name in the schema mapping.
        
        hashed (boolean) --
        
        Enables the ability to hash the column values in the output.
    - applyNormalization (boolean) --
      
      Normalizes the attributes defined in the schema in the input data. For example, if an attribute has an AttributeType of PHONE_NUMBER, and the data in the input table is in a format of 1234567890, Entity Resolution will normalize this field in the output to (123)-456-7890.
- resolutionTechniques (dict) --
  
  An object which defines the resolutionType and the ruleBasedProperties.
  - resolutionType (string) --
    
    The type of matching. There are three types of matching: RULE_MATCHING, ML_MATCHING, and PROVIDER.
  - ruleBasedProperties (dict) --
    
    An object which defines the list of matching rules to run and has a field rules, which is a list of rule objects.
    - rules (list) --
      
      A list of Rule objects, each of which have fields RuleName and MatchingKeys.
      - (dict) --
        
        An object containing the ruleName and matchingKeys.
        
        ruleName (string) --
        
        A name for the matching rule.
        
        matchingKeys (list) --
        
        A list of MatchingKeys. The MatchingKeys must have been defined in the SchemaMapping. Two records are considered to match according to this rule if all of the MatchingKeys match.
        
        (string) --
    - attributeMatchingModel (string) --
      
      The comparison type. You can choose ONE_TO_ONE or MANY_TO_MANY as the attributeMatchingModel.
      
      If you choose ONE_TO_ONE, the system can only match attributes if the sub-types are an exact match. For example, for the Email attribute type, the system will only consider it a match if the value of the Email field of Profile A matches the value of the Email field of Profile B.
      
      If you choose MANY_TO_MANY, the system can match attributes across the sub-types of an attribute type. For example, if the value of the Email field of Profile A and the value of BusinessEmail field of Profile B matches, the two profiles are matched on the Email attribute type.
    - matchPurpose (string) --
      
      An indicator of whether to generate IDs and index the data or not.
      
      If you choose IDENTIFIER_GENERATION, the process generates IDs and indexes the data.
      
      If you choose INDEXING, the process indexes the data without generating IDs.
  - ruleConditionProperties (dict) --
    
    An object containing the rules for a matching workflow.
    - rules (list) --
      
      A list of rule objects, each of which have fields ruleName and condition.
      - (dict) --
        
        An object that defines the ruleCondition and the ruleName to use in a matching workflow.
        
        ruleName (string) --
        
        A name for the matching rule.
        
        For example: Rule1
        
        condition (string) --
        
        A statement that specifies the conditions for a matching rule.
        
        If your data is accurate, use an Exact matching function: Exact or ExactManyToMany.
        
        If your data has variations in spelling or pronunciation, use a Fuzzy matching function: Cosine, Levenshtein, or Soundex.
        
        Use operators if you want to combine ( AND), separate ( OR), or group matching functions (...).
        
        For example: (Cosine(a, 10) AND Exact(b, true)) OR ExactManyToMany(c, d)
  - providerProperties (dict) --
    
    The properties of the provider service.
    - providerServiceArn (string) --
      
      The ARN of the provider service.
    - providerConfiguration (:ref:`document<document>`) --
      
      The required configuration fields to use with the provider service.
    - intermediateSourceConfiguration (dict) --
      
      The Amazon S3 location that temporarily stores your data while it processes. Your information won't be saved permanently.
      - intermediateS3Path (string) --
        
        The Amazon S3 location (bucket and prefix). For example: s3://provider_bucket/DOC-EXAMPLE-BUCKET
- incrementalRunConfig (dict) --
  
  An object which defines an incremental run type and has only incrementalRunType as a field.
  - incrementalRunType (string) --
    
    The type of incremental run. The only valid value is IMMEDIATE. This appears as "Automatic" in the console.
    
    Warning
    
    For workflows where resolutionType is ML_MATCHING, incremental processing is not supported.
- roleArn (string) --
  
  The Amazon Resource Name (ARN) of the IAM role. Entity Resolution assumes this role to create resources on your behalf as part of workflow execution.