2024/10/31 - AWS Glue - 6 new 2 updated api methods
Changes Add schedule support for AWS Glue column statistics
Updates settings for a column statistics task.
See also: AWS API Documentation
Request Syntax
client.update_column_statistics_task_settings( DatabaseName='string', TableName='string', Role='string', Schedule='string', ColumnNameList=[ 'string', ], SampleSize=123.0, CatalogID='string', SecurityConfiguration='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table for which to generate column statistics.
string
The role used for running the column statistics.
string
A schedule for running the column statistics, specified in CRON syntax.
list
A list of column names for which to run statistics.
(string) --
float
The percentage of data to sample.
string
The ID of the Data Catalog in which the database resides.
string
Name of the security configuration that is used to encrypt CloudWatch logs.
dict
Response Syntax
{}
Response Structure
(dict) --
Deletes settings for a column statistics task.
See also: AWS API Documentation
Request Syntax
client.delete_column_statistics_task_settings( DatabaseName='string', TableName='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table for which to delete column statistics.
dict
Response Syntax
{}
Response Structure
(dict) --
Gets settings for a column statistics task.
See also: AWS API Documentation
Request Syntax
client.get_column_statistics_task_settings( DatabaseName='string', TableName='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table for which to retrieve column statistics.
dict
Response Syntax
{ 'ColumnStatisticsTaskSettings': { 'DatabaseName': 'string', 'TableName': 'string', 'Schedule': { 'ScheduleExpression': 'string', 'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING' }, 'ColumnNameList': [ 'string', ], 'CatalogID': 'string', 'Role': 'string', 'SampleSize': 123.0, 'SecurityConfiguration': 'string' } }
Response Structure
(dict) --
ColumnStatisticsTaskSettings (dict) --
A ColumnStatisticsTaskSettings object representing the settings for the column statistics task.
DatabaseName (string) --
The name of the database where the table resides.
TableName (string) --
The name of the table for which to generate column statistics.
Schedule (dict) --
A schedule for running the column statistics, specified in CRON syntax.
ScheduleExpression (string) --
A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .
State (string) --
The state of the schedule.
ColumnNameList (list) --
A list of column names for which to run statistics.
(string) --
CatalogID (string) --
The ID of the Data Catalog in which the database resides.
Role (string) --
The role used for running the column statistics.
SampleSize (float) --
The percentage of data to sample.
SecurityConfiguration (string) --
Name of the security configuration that is used to encrypt CloudWatch logs.
Starts a column statistics task run schedule.
See also: AWS API Documentation
Request Syntax
client.start_column_statistics_task_run_schedule( DatabaseName='string', TableName='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table for which to start a column statistic task run schedule.
dict
Response Syntax
{}
Response Structure
(dict) --
Stops a column statistics task run schedule.
See also: AWS API Documentation
Request Syntax
client.stop_column_statistics_task_run_schedule( DatabaseName='string', TableName='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table for which to stop a column statistic task run schedule.
dict
Response Syntax
{}
Response Structure
(dict) --
Creates settings for a column statistics task.
See also: AWS API Documentation
Request Syntax
client.create_column_statistics_task_settings( DatabaseName='string', TableName='string', Role='string', Schedule='string', ColumnNameList=[ 'string', ], SampleSize=123.0, CatalogID='string', SecurityConfiguration='string', Tags={ 'string': 'string' } )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table for which to generate column statistics.
string
[REQUIRED]
The role used for running the column statistics.
string
A schedule for running the column statistics, specified in CRON syntax.
list
A list of column names for which to run statistics.
(string) --
float
The percentage of data to sample.
string
The ID of the Data Catalog in which the database resides.
string
Name of the security configuration that is used to encrypt CloudWatch logs.
dict
A map of tags.
(string) --
(string) --
dict
Response Syntax
{}
Response Structure
(dict) --
{'ColumnStatisticsTaskRun': {'ComputationType': 'FULL | INCREMENTAL'}}
Get the associated metadata/information for a task run, given a task run ID.
See also: AWS API Documentation
Request Syntax
client.get_column_statistics_task_run( ColumnStatisticsTaskRunId='string' )
string
[REQUIRED]
The identifier for the particular column statistics task run.
dict
Response Syntax
{ 'ColumnStatisticsTaskRun': { 'CustomerId': 'string', 'ColumnStatisticsTaskRunId': 'string', 'DatabaseName': 'string', 'TableName': 'string', 'ColumnNameList': [ 'string', ], 'CatalogID': 'string', 'Role': 'string', 'SampleSize': 123.0, 'SecurityConfiguration': 'string', 'NumberOfWorkers': 123, 'WorkerType': 'string', 'ComputationType': 'FULL'|'INCREMENTAL', 'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'StartTime': datetime(2015, 1, 1), 'EndTime': datetime(2015, 1, 1), 'ErrorMessage': 'string', 'DPUSeconds': 123.0 } }
Response Structure
(dict) --
ColumnStatisticsTaskRun (dict) --
A ColumnStatisticsTaskRun object representing the details of the column stats run.
CustomerId (string) --
The Amazon Web Services account ID.
ColumnStatisticsTaskRunId (string) --
The identifier for the particular column statistics task run.
DatabaseName (string) --
The database where the table resides.
TableName (string) --
The name of the table for which column statistics is generated.
ColumnNameList (list) --
A list of the column names. If none is supplied, all column names for the table will be used by default.
(string) --
CatalogID (string) --
The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
Role (string) --
The IAM role that the service assumes to generate statistics.
SampleSize (float) --
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
SecurityConfiguration (string) --
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
NumberOfWorkers (integer) --
The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
WorkerType (string) --
The type of workers being used for generating stats. The default is g.1x .
ComputationType (string) --
The type of column statistics computation.
Status (string) --
The status of the task run.
CreationTime (datetime) --
The time that this task was created.
LastUpdated (datetime) --
The last point in time when this task was modified.
StartTime (datetime) --
The start time of the task.
EndTime (datetime) --
The end time of the task.
ErrorMessage (string) --
The error message for the job.
DPUSeconds (float) --
The calculated DPU usage in seconds for all autoscaled workers.
{'ColumnStatisticsTaskRuns': {'ComputationType': 'FULL | INCREMENTAL'}}
Retrieves information about all runs associated with the specified table.
See also: AWS API Documentation
Request Syntax
client.get_column_statistics_task_runs( DatabaseName='string', TableName='string', MaxResults=123, NextToken='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table.
integer
The maximum size of the response.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'ColumnStatisticsTaskRuns': [ { 'CustomerId': 'string', 'ColumnStatisticsTaskRunId': 'string', 'DatabaseName': 'string', 'TableName': 'string', 'ColumnNameList': [ 'string', ], 'CatalogID': 'string', 'Role': 'string', 'SampleSize': 123.0, 'SecurityConfiguration': 'string', 'NumberOfWorkers': 123, 'WorkerType': 'string', 'ComputationType': 'FULL'|'INCREMENTAL', 'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'StartTime': datetime(2015, 1, 1), 'EndTime': datetime(2015, 1, 1), 'ErrorMessage': 'string', 'DPUSeconds': 123.0 }, ], 'NextToken': 'string' }
Response Structure
(dict) --
ColumnStatisticsTaskRuns (list) --
A list of column statistics task runs.
(dict) --
The object that shows the details of the column stats run.
CustomerId (string) --
The Amazon Web Services account ID.
ColumnStatisticsTaskRunId (string) --
The identifier for the particular column statistics task run.
DatabaseName (string) --
The database where the table resides.
TableName (string) --
The name of the table for which column statistics is generated.
ColumnNameList (list) --
A list of the column names. If none is supplied, all column names for the table will be used by default.
(string) --
CatalogID (string) --
The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
Role (string) --
The IAM role that the service assumes to generate statistics.
SampleSize (float) --
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
SecurityConfiguration (string) --
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
NumberOfWorkers (integer) --
The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
WorkerType (string) --
The type of workers being used for generating stats. The default is g.1x .
ComputationType (string) --
The type of column statistics computation.
Status (string) --
The status of the task run.
CreationTime (datetime) --
The time that this task was created.
LastUpdated (datetime) --
The last point in time when this task was modified.
StartTime (datetime) --
The start time of the task.
EndTime (datetime) --
The end time of the task.
ErrorMessage (string) --
The error message for the job.
DPUSeconds (float) --
The calculated DPU usage in seconds for all autoscaled workers.
NextToken (string) --
A continuation token, if not all task runs have yet been returned.