2023/11/16 - AWS Glue - 5 new api methods
Changes Introduces new column statistics APIs to support statistics generation for tables within the Glue Data Catalog.
Starts a column statistics task run, for a specified table and columns.
See also: AWS API Documentation
Request Syntax
client.start_column_statistics_task_run( DatabaseName='string', TableName='string', ColumnNameList=[ 'string', ], Role='string', SampleSize=123.0, CatalogID='string', SecurityConfiguration='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table to generate statistics.
list
A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.
(string) --
string
[REQUIRED]
The IAM role that the service assumes to generate statistics.
float
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
string
The ID of the Data Catalog where the table reside. If none is supplied, the Amazon Web Services account ID is used by default.
string
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
dict
Response Syntax
{ 'ColumnStatisticsTaskRunId': 'string' }
Response Structure
(dict) --
ColumnStatisticsTaskRunId (string) --
The identifier for the column statistics task run.
Retrieves information about all runs associated with the specified table.
See also: AWS API Documentation
Request Syntax
client.get_column_statistics_task_runs( DatabaseName='string', TableName='string', MaxResults=123, NextToken='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table.
integer
The maximum size of the response.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'ColumnStatisticsTaskRuns': [ { 'CustomerId': 'string', 'ColumnStatisticsTaskRunId': 'string', 'DatabaseName': 'string', 'TableName': 'string', 'ColumnNameList': [ 'string', ], 'CatalogID': 'string', 'Role': 'string', 'SampleSize': 123.0, 'SecurityConfiguration': 'string', 'NumberOfWorkers': 123, 'WorkerType': 'string', 'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'StartTime': datetime(2015, 1, 1), 'EndTime': datetime(2015, 1, 1), 'ErrorMessage': 'string', 'DPUSeconds': 123.0 }, ], 'NextToken': 'string' }
Response Structure
(dict) --
ColumnStatisticsTaskRuns (list) --
A list of column statistics task runs.
(dict) --
The object that shows the details of the column stats run.
CustomerId (string) --
The Amazon Web Services account ID.
ColumnStatisticsTaskRunId (string) --
The identifier for the particular column statistics task run.
DatabaseName (string) --
The database where the table resides.
TableName (string) --
The name of the table for which column statistics is generated.
ColumnNameList (list) --
A list of the column names. If none is supplied, all column names for the table will be used by default.
(string) --
CatalogID (string) --
The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
Role (string) --
The IAM role that the service assumes to generate statistics.
SampleSize (float) --
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
SecurityConfiguration (string) --
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
NumberOfWorkers (integer) --
The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
WorkerType (string) --
The type of workers being used for generating stats. The default is g.1x.
Status (string) --
The status of the task run.
CreationTime (datetime) --
The time that this task was created.
LastUpdated (datetime) --
The last point in time when this task was modified.
StartTime (datetime) --
The start time of the task.
EndTime (datetime) --
The end time of the task.
ErrorMessage (string) --
The error message for the job.
DPUSeconds (float) --
The calculated DPU usage in seconds for all autoscaled workers.
NextToken (string) --
A continuation token, if not all task runs have yet been returned.
Stops a task run for the specified table.
See also: AWS API Documentation
Request Syntax
client.stop_column_statistics_task_run( DatabaseName='string', TableName='string' )
string
[REQUIRED]
The name of the database where the table resides.
string
[REQUIRED]
The name of the table.
dict
Response Syntax
{}
Response Structure
(dict) --
Get the associated metadata/information for a task run, given a task run ID.
See also: AWS API Documentation
Request Syntax
client.get_column_statistics_task_run( ColumnStatisticsTaskRunId='string' )
string
[REQUIRED]
The identifier for the particular column statistics task run.
dict
Response Syntax
{ 'ColumnStatisticsTaskRun': { 'CustomerId': 'string', 'ColumnStatisticsTaskRunId': 'string', 'DatabaseName': 'string', 'TableName': 'string', 'ColumnNameList': [ 'string', ], 'CatalogID': 'string', 'Role': 'string', 'SampleSize': 123.0, 'SecurityConfiguration': 'string', 'NumberOfWorkers': 123, 'WorkerType': 'string', 'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED', 'CreationTime': datetime(2015, 1, 1), 'LastUpdated': datetime(2015, 1, 1), 'StartTime': datetime(2015, 1, 1), 'EndTime': datetime(2015, 1, 1), 'ErrorMessage': 'string', 'DPUSeconds': 123.0 } }
Response Structure
(dict) --
ColumnStatisticsTaskRun (dict) --
A ColumnStatisticsTaskRun object representing the details of the column stats run.
CustomerId (string) --
The Amazon Web Services account ID.
ColumnStatisticsTaskRunId (string) --
The identifier for the particular column statistics task run.
DatabaseName (string) --
The database where the table resides.
TableName (string) --
The name of the table for which column statistics is generated.
ColumnNameList (list) --
A list of the column names. If none is supplied, all column names for the table will be used by default.
(string) --
CatalogID (string) --
The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
Role (string) --
The IAM role that the service assumes to generate statistics.
SampleSize (float) --
The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
SecurityConfiguration (string) --
Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
NumberOfWorkers (integer) --
The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
WorkerType (string) --
The type of workers being used for generating stats. The default is g.1x.
Status (string) --
The status of the task run.
CreationTime (datetime) --
The time that this task was created.
LastUpdated (datetime) --
The last point in time when this task was modified.
StartTime (datetime) --
The start time of the task.
EndTime (datetime) --
The end time of the task.
ErrorMessage (string) --
The error message for the job.
DPUSeconds (float) --
The calculated DPU usage in seconds for all autoscaled workers.
List all task runs for a particular account.
See also: AWS API Documentation
Request Syntax
client.list_column_statistics_task_runs( MaxResults=123, NextToken='string' )
integer
The maximum size of the response.
string
A continuation token, if this is a continuation call.
dict
Response Syntax
{ 'ColumnStatisticsTaskRunIds': [ 'string', ], 'NextToken': 'string' }
Response Structure
(dict) --
ColumnStatisticsTaskRunIds (list) --
A list of column statistics task run IDs.
(string) --
NextToken (string) --
A continuation token, if not all task run IDs have yet been returned.