2024/06/13 - AWS Glue - 2 updated api methods
Changes This release adds support for configuration of evaluation method for composite rules in Glue Data Quality rulesets.
{'AdditionalRunOptions': {'CompositeRuleEvaluationMethod': 'COLUMN | ROW'}}
    Retrieves a specific run where a ruleset is evaluated against a data source.
See also: AWS API Documentation
Request Syntax
client.get_data_quality_ruleset_evaluation_run(
    RunId='string'
)
string
[REQUIRED]
The unique run identifier associated with this run.
dict
Response Syntax
{
    'RunId': 'string',
    'DataSource': {
        'GlueTable': {
            'DatabaseName': 'string',
            'TableName': 'string',
            'CatalogId': 'string',
            'ConnectionName': 'string',
            'AdditionalOptions': {
                'string': 'string'
            }
        }
    },
    'Role': 'string',
    'NumberOfWorkers': 123,
    'Timeout': 123,
    'AdditionalRunOptions': {
        'CloudWatchMetricsEnabled': True|False,
        'ResultsS3Prefix': 'string',
        'CompositeRuleEvaluationMethod': 'COLUMN'|'ROW'
    },
    'Status': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT',
    'ErrorString': 'string',
    'StartedOn': datetime(2015, 1, 1),
    'LastModifiedOn': datetime(2015, 1, 1),
    'CompletedOn': datetime(2015, 1, 1),
    'ExecutionTime': 123,
    'RulesetNames': [
        'string',
    ],
    'ResultIds': [
        'string',
    ],
    'AdditionalDataSources': {
        'string': {
            'GlueTable': {
                'DatabaseName': 'string',
                'TableName': 'string',
                'CatalogId': 'string',
                'ConnectionName': 'string',
                'AdditionalOptions': {
                    'string': 'string'
                }
            }
        }
    }
}
Response Structure
(dict) --
RunId (string) --
The unique run identifier associated with this run.
DataSource (dict) --
The data source (an Glue table) associated with this evaluation run.
GlueTable (dict) --
An Glue table.
DatabaseName (string) --
A database name in the Glue Data Catalog.
TableName (string) --
A table name in the Glue Data Catalog.
CatalogId (string) --
A unique identifier for the Glue Data Catalog.
ConnectionName (string) --
The name of the connection to the Glue Data Catalog.
AdditionalOptions (dict) --
Additional options for the table. Currently there are two keys supported:
pushDownPredicate : to filter on partitions without having to list and read all the files in your dataset.
catalogPartitionPredicate : to use server-side partition pruning using partition indexes in the Glue Data Catalog.
(string) --
(string) --
Role (string) --
An IAM role supplied to encrypt the results of the run.
NumberOfWorkers (integer) --
The number of G.1X workers to be used in the run. The default is 5.
Timeout (integer) --
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
AdditionalRunOptions (dict) --
Additional run options you can specify for an evaluation run.
CloudWatchMetricsEnabled (boolean) --
Whether or not to enable CloudWatch metrics.
ResultsS3Prefix (string) --
Prefix for Amazon S3 to store results.
CompositeRuleEvaluationMethod (string) --
Set the evaluation method for composite rules in the ruleset to ROW/COLUMN
Status (string) --
The status for this run.
ErrorString (string) --
The error strings that are associated with the run.
StartedOn (datetime) --
The date and time when this run started.
LastModifiedOn (datetime) --
A timestamp. The last point in time when this data quality rule recommendation run was modified.
CompletedOn (datetime) --
The date and time when this run was completed.
ExecutionTime (integer) --
The amount of time (in seconds) that the run consumed resources.
RulesetNames (list) --
A list of ruleset names for the run. Currently, this parameter takes only one Ruleset name.
(string) --
ResultIds (list) --
A list of result IDs for the data quality results for the run.
(string) --
AdditionalDataSources (dict) --
A map of reference strings to additional data sources you can specify for an evaluation run.
(string) --
(dict) --
A data source (an Glue table) for which you want data quality results.
GlueTable (dict) --
An Glue table.
DatabaseName (string) --
A database name in the Glue Data Catalog.
TableName (string) --
A table name in the Glue Data Catalog.
CatalogId (string) --
A unique identifier for the Glue Data Catalog.
ConnectionName (string) --
The name of the connection to the Glue Data Catalog.
AdditionalOptions (dict) --
Additional options for the table. Currently there are two keys supported:
pushDownPredicate : to filter on partitions without having to list and read all the files in your dataset.
catalogPartitionPredicate : to use server-side partition pruning using partition indexes in the Glue Data Catalog.
(string) --
(string) --
{'AdditionalRunOptions': {'CompositeRuleEvaluationMethod': 'COLUMN | ROW'}}
    Once you have a ruleset definition (either recommended or your own), you call this operation to evaluate the ruleset against a data source (Glue table). The evaluation computes results which you can retrieve with the GetDataQualityResult API.
See also: AWS API Documentation
Request Syntax
client.start_data_quality_ruleset_evaluation_run(
    DataSource={
        'GlueTable': {
            'DatabaseName': 'string',
            'TableName': 'string',
            'CatalogId': 'string',
            'ConnectionName': 'string',
            'AdditionalOptions': {
                'string': 'string'
            }
        }
    },
    Role='string',
    NumberOfWorkers=123,
    Timeout=123,
    ClientToken='string',
    AdditionalRunOptions={
        'CloudWatchMetricsEnabled': True|False,
        'ResultsS3Prefix': 'string',
        'CompositeRuleEvaluationMethod': 'COLUMN'|'ROW'
    },
    RulesetNames=[
        'string',
    ],
    AdditionalDataSources={
        'string': {
            'GlueTable': {
                'DatabaseName': 'string',
                'TableName': 'string',
                'CatalogId': 'string',
                'ConnectionName': 'string',
                'AdditionalOptions': {
                    'string': 'string'
                }
            }
        }
    }
)
dict
[REQUIRED]
The data source (Glue table) associated with this run.
GlueTable (dict) -- [REQUIRED]
An Glue table.
DatabaseName (string) -- [REQUIRED]
A database name in the Glue Data Catalog.
TableName (string) -- [REQUIRED]
A table name in the Glue Data Catalog.
CatalogId (string) --
A unique identifier for the Glue Data Catalog.
ConnectionName (string) --
The name of the connection to the Glue Data Catalog.
AdditionalOptions (dict) --
Additional options for the table. Currently there are two keys supported:
pushDownPredicate : to filter on partitions without having to list and read all the files in your dataset.
catalogPartitionPredicate : to use server-side partition pruning using partition indexes in the Glue Data Catalog.
(string) --
(string) --
string
[REQUIRED]
An IAM role supplied to encrypt the results of the run.
integer
The number of G.1X workers to be used in the run. The default is 5.
integer
The timeout for a run in minutes. This is the maximum time that a run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).
string
Used for idempotency and is recommended to be set to a random ID (such as a UUID) to avoid creating or starting multiple instances of the same resource.
dict
Additional run options you can specify for an evaluation run.
CloudWatchMetricsEnabled (boolean) --
Whether or not to enable CloudWatch metrics.
ResultsS3Prefix (string) --
Prefix for Amazon S3 to store results.
CompositeRuleEvaluationMethod (string) --
Set the evaluation method for composite rules in the ruleset to ROW/COLUMN
list
[REQUIRED]
A list of ruleset names.
(string) --
dict
A map of reference strings to additional data sources you can specify for an evaluation run.
(string) --
(dict) --
A data source (an Glue table) for which you want data quality results.
GlueTable (dict) -- [REQUIRED]
An Glue table.
DatabaseName (string) -- [REQUIRED]
A database name in the Glue Data Catalog.
TableName (string) -- [REQUIRED]
A table name in the Glue Data Catalog.
CatalogId (string) --
A unique identifier for the Glue Data Catalog.
ConnectionName (string) --
The name of the connection to the Glue Data Catalog.
AdditionalOptions (dict) --
Additional options for the table. Currently there are two keys supported:
pushDownPredicate : to filter on partitions without having to list and read all the files in your dataset.
catalogPartitionPredicate : to use server-side partition pruning using partition indexes in the Glue Data Catalog.
(string) --
(string) --
dict
Response Syntax
{
    'RunId': 'string'
}
Response Structure
(dict) --
RunId (string) --
The unique run identifier associated with this run.