AWS API Changes

2023/11/16 - AWS Glue - 5 new api methods

Changes Introduces new column statistics APIs to support statistics generation for tables within the Glue Data Catalog.

StartColumnStatisticsTaskRun (new)

Link ¶

Starts a column statistics task run, for a specified table and columns.

Request Syntax

client.start_column_statistics_task_run(
    DatabaseName='string',
    TableName='string',
    ColumnNameList=[
        'string',
    ],
    Role='string',
    SampleSize=123.0,
    CatalogID='string',
    SecurityConfiguration='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table to generate statistics.

type ColumnNameList:

list

param ColumnNameList:

A list of the column names to generate statistics. If none is supplied, all column names for the table will be used by default.

(string) --

type Role:

string

param Role:

[REQUIRED]

The IAM role that the service assumes to generate statistics.

type SampleSize:

float

param SampleSize:

The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

type CatalogID:

string

param CatalogID:

The ID of the Data Catalog where the table reside. If none is supplied, the Amazon Web Services account ID is used by default.

type SecurityConfiguration:

string

param SecurityConfiguration:

Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRunId': 'string'
}

Response Structure

(dict) --
- ColumnStatisticsTaskRunId (string) --
  
  The identifier for the column statistics task run.

GetColumnStatisticsTaskRuns (new)

Link ¶

Retrieves information about all runs associated with the specified table.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_runs(
    DatabaseName='string',
    TableName='string',
    MaxResults=123,
    NextToken='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type MaxResults:

integer

param MaxResults:

The maximum size of the response.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation call.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRuns': [
        {
            'CustomerId': 'string',
            'ColumnStatisticsTaskRunId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'ColumnNameList': [
                'string',
            ],
            'CatalogID': 'string',
            'Role': 'string',
            'SampleSize': 123.0,
            'SecurityConfiguration': 'string',
            'NumberOfWorkers': 123,
            'WorkerType': 'string',
            'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'StartTime': datetime(2015, 1, 1),
            'EndTime': datetime(2015, 1, 1),
            'ErrorMessage': 'string',
            'DPUSeconds': 123.0
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- ColumnStatisticsTaskRuns (list) --
  
  A list of column statistics task runs.
  - (dict) --
    
    The object that shows the details of the column stats run.
    - CustomerId (string) --
      
      The Amazon Web Services account ID.
    - ColumnStatisticsTaskRunId (string) --
      
      The identifier for the particular column statistics task run.
    - DatabaseName (string) --
      
      The database where the table resides.
    - TableName (string) --
      
      The name of the table for which column statistics is generated.
    - ColumnNameList (list) --
      
      A list of the column names. If none is supplied, all column names for the table will be used by default.
      - (string) --
    - CatalogID (string) --
      
      The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
    - Role (string) --
      
      The IAM role that the service assumes to generate statistics.
    - SampleSize (float) --
      
      The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
    - SecurityConfiguration (string) --
      
      Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
    - NumberOfWorkers (integer) --
      
      The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
    - WorkerType (string) --
      
      The type of workers being used for generating stats. The default is g.1x.
    - Status (string) --
      
      The status of the task run.
    - CreationTime (datetime) --
      
      The time that this task was created.
    - LastUpdated (datetime) --
      
      The last point in time when this task was modified.
    - StartTime (datetime) --
      
      The start time of the task.
    - EndTime (datetime) --
      
      The end time of the task.
    - ErrorMessage (string) --
      
      The error message for the job.
    - DPUSeconds (float) --
      
      The calculated DPU usage in seconds for all autoscaled workers.
- NextToken (string) --
  
  A continuation token, if not all task runs have yet been returned.

StopColumnStatisticsTaskRun (new)

Link ¶

Stops a task run for the specified table.

See also: AWS API Documentation

Request Syntax

client.stop_column_statistics_task_run(
    DatabaseName='string',
    TableName='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

GetColumnStatisticsTaskRun (new)

Link ¶

Get the associated metadata/information for a task run, given a task run ID.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_run(
    ColumnStatisticsTaskRunId='string'
)

type ColumnStatisticsTaskRunId:

string

param ColumnStatisticsTaskRunId:

[REQUIRED]

The identifier for the particular column statistics task run.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRun': {
        'CustomerId': 'string',
        'ColumnStatisticsTaskRunId': 'string',
        'DatabaseName': 'string',
        'TableName': 'string',
        'ColumnNameList': [
            'string',
        ],
        'CatalogID': 'string',
        'Role': 'string',
        'SampleSize': 123.0,
        'SecurityConfiguration': 'string',
        'NumberOfWorkers': 123,
        'WorkerType': 'string',
        'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
        'CreationTime': datetime(2015, 1, 1),
        'LastUpdated': datetime(2015, 1, 1),
        'StartTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1),
        'ErrorMessage': 'string',
        'DPUSeconds': 123.0
    }
}

Response Structure

(dict) --
- ColumnStatisticsTaskRun (dict) --
  
  A ColumnStatisticsTaskRun object representing the details of the column stats run.
  - CustomerId (string) --
    
    The Amazon Web Services account ID.
  - ColumnStatisticsTaskRunId (string) --
    
    The identifier for the particular column statistics task run.
  - DatabaseName (string) --
    
    The database where the table resides.
  - TableName (string) --
    
    The name of the table for which column statistics is generated.
  - ColumnNameList (list) --
    
    A list of the column names. If none is supplied, all column names for the table will be used by default.
    - (string) --
  - CatalogID (string) --
    
    The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
  - Role (string) --
    
    The IAM role that the service assumes to generate statistics.
  - SampleSize (float) --
    
    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
  - SecurityConfiguration (string) --
    
    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
  - NumberOfWorkers (integer) --
    
    The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
  - WorkerType (string) --
    
    The type of workers being used for generating stats. The default is g.1x.
  - Status (string) --
    
    The status of the task run.
  - CreationTime (datetime) --
    
    The time that this task was created.
  - LastUpdated (datetime) --
    
    The last point in time when this task was modified.
  - StartTime (datetime) --
    
    The start time of the task.
  - EndTime (datetime) --
    
    The end time of the task.
  - ErrorMessage (string) --
    
    The error message for the job.
  - DPUSeconds (float) --
    
    The calculated DPU usage in seconds for all autoscaled workers.

ListColumnStatisticsTaskRuns (new)

Link ¶

List all task runs for a particular account.

See also: AWS API Documentation

Request Syntax

client.list_column_statistics_task_runs(
    MaxResults=123,
    NextToken='string'
)

type MaxResults:

integer

param MaxResults:

The maximum size of the response.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation call.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRunIds': [
        'string',
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- ColumnStatisticsTaskRunIds (list) --
  
  A list of column statistics task run IDs.
  - (string) --
- NextToken (string) --
  
  A continuation token, if not all task run IDs have yet been returned.