AWS API Changes

2024/10/31 - AWS Glue - 6 new2 updated api methods

Changes Add schedule support for AWS Glue column statistics

StopColumnStatisticsTaskRunSchedule (new)

Link ¶

Stops a column statistics task run schedule.

Request Syntax

client.stop_column_statistics_task_run_schedule(
    DatabaseName='string',
    TableName='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to stop a column statistic task run schedule.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

CreateColumnStatisticsTaskSettings (new)

Link ¶

Creates settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.create_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string',
    Role='string',
    Schedule='string',
    ColumnNameList=[
        'string',
    ],
    SampleSize=123.0,
    CatalogID='string',
    SecurityConfiguration='string',
    Tags={
        'string': 'string'
    }
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to generate column statistics.

type Role:

string

param Role:

[REQUIRED]

The role used for running the column statistics.

type Schedule:

string

param Schedule:

A schedule for running the column statistics, specified in CRON syntax.

type ColumnNameList:

list

param ColumnNameList:

A list of column names for which to run statistics.

(string) --

type SampleSize:

float

param SampleSize:

The percentage of data to sample.

type CatalogID:

string

param CatalogID:

The ID of the Data Catalog in which the database resides.

type SecurityConfiguration:

string

param SecurityConfiguration:

Name of the security configuration that is used to encrypt CloudWatch logs.

type Tags:

dict

param Tags:

A map of tags.

(string) --
- (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

GetColumnStatisticsTaskSettings (new)

Link ¶

Gets settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to retrieve column statistics.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskSettings': {
        'DatabaseName': 'string',
        'TableName': 'string',
        'Schedule': {
            'ScheduleExpression': 'string',
            'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING'
        },
        'ColumnNameList': [
            'string',
        ],
        'CatalogID': 'string',
        'Role': 'string',
        'SampleSize': 123.0,
        'SecurityConfiguration': 'string'
    }
}

Response Structure

(dict) --
- ColumnStatisticsTaskSettings (dict) --
  
  A ColumnStatisticsTaskSettings object representing the settings for the column statistics task.
  - DatabaseName (string) --
    
    The name of the database where the table resides.
  - TableName (string) --
    
    The name of the table for which to generate column statistics.
  - Schedule (dict) --
    
    A schedule for running the column statistics, specified in CRON syntax.
    - ScheduleExpression (string) --
      
      A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).
    - State (string) --
      
      The state of the schedule.
  - ColumnNameList (list) --
    
    A list of column names for which to run statistics.
    - (string) --
  - CatalogID (string) --
    
    The ID of the Data Catalog in which the database resides.
  - Role (string) --
    
    The role used for running the column statistics.
  - SampleSize (float) --
    
    The percentage of data to sample.
  - SecurityConfiguration (string) --
    
    Name of the security configuration that is used to encrypt CloudWatch logs.

DeleteColumnStatisticsTaskSettings (new)

Link ¶

Deletes settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.delete_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to delete column statistics.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

UpdateColumnStatisticsTaskSettings (new)

Link ¶

Updates settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.update_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string',
    Role='string',
    Schedule='string',
    ColumnNameList=[
        'string',
    ],
    SampleSize=123.0,
    CatalogID='string',
    SecurityConfiguration='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to generate column statistics.

type Role:

string

param Role:

The role used for running the column statistics.

type Schedule:

string

param Schedule:

A schedule for running the column statistics, specified in CRON syntax.

type ColumnNameList:

list

param ColumnNameList:

A list of column names for which to run statistics.

(string) --

type SampleSize:

float

param SampleSize:

The percentage of data to sample.

type CatalogID:

string

param CatalogID:

The ID of the Data Catalog in which the database resides.

type SecurityConfiguration:

string

param SecurityConfiguration:

Name of the security configuration that is used to encrypt CloudWatch logs.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

StartColumnStatisticsTaskRunSchedule (new)

Link ¶

Starts a column statistics task run schedule.

See also: AWS API Documentation

Request Syntax

client.start_column_statistics_task_run_schedule(
    DatabaseName='string',
    TableName='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to start a column statistic task run schedule.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

GetColumnStatisticsTaskRun (updated)

Link ¶
Changes (response)

{'ColumnStatisticsTaskRun': {'ComputationType': 'FULL | INCREMENTAL'}}

Get the associated metadata/information for a task run, given a task run ID.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_run(
    ColumnStatisticsTaskRunId='string'
)

type ColumnStatisticsTaskRunId:

string

param ColumnStatisticsTaskRunId:

[REQUIRED]

The identifier for the particular column statistics task run.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRun': {
        'CustomerId': 'string',
        'ColumnStatisticsTaskRunId': 'string',
        'DatabaseName': 'string',
        'TableName': 'string',
        'ColumnNameList': [
            'string',
        ],
        'CatalogID': 'string',
        'Role': 'string',
        'SampleSize': 123.0,
        'SecurityConfiguration': 'string',
        'NumberOfWorkers': 123,
        'WorkerType': 'string',
        'ComputationType': 'FULL'|'INCREMENTAL',
        'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
        'CreationTime': datetime(2015, 1, 1),
        'LastUpdated': datetime(2015, 1, 1),
        'StartTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1),
        'ErrorMessage': 'string',
        'DPUSeconds': 123.0
    }
}

Response Structure

(dict) --
- ColumnStatisticsTaskRun (dict) --
  
  A ColumnStatisticsTaskRun object representing the details of the column stats run.
  - CustomerId (string) --
    
    The Amazon Web Services account ID.
  - ColumnStatisticsTaskRunId (string) --
    
    The identifier for the particular column statistics task run.
  - DatabaseName (string) --
    
    The database where the table resides.
  - TableName (string) --
    
    The name of the table for which column statistics is generated.
  - ColumnNameList (list) --
    
    A list of the column names. If none is supplied, all column names for the table will be used by default.
    - (string) --
  - CatalogID (string) --
    
    The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
  - Role (string) --
    
    The IAM role that the service assumes to generate statistics.
  - SampleSize (float) --
    
    The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
  - SecurityConfiguration (string) --
    
    Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
  - NumberOfWorkers (integer) --
    
    The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
  - WorkerType (string) --
    
    The type of workers being used for generating stats. The default is g.1x.
  - ComputationType (string) --
    
    The type of column statistics computation.
  - Status (string) --
    
    The status of the task run.
  - CreationTime (datetime) --
    
    The time that this task was created.
  - LastUpdated (datetime) --
    
    The last point in time when this task was modified.
  - StartTime (datetime) --
    
    The start time of the task.
  - EndTime (datetime) --
    
    The end time of the task.
  - ErrorMessage (string) --
    
    The error message for the job.
  - DPUSeconds (float) --
    
    The calculated DPU usage in seconds for all autoscaled workers.

GetColumnStatisticsTaskRuns (updated)

Link ¶
Changes (response)

{'ColumnStatisticsTaskRuns': {'ComputationType': 'FULL | INCREMENTAL'}}

Retrieves information about all runs associated with the specified table.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_runs(
    DatabaseName='string',
    TableName='string',
    MaxResults=123,
    NextToken='string'
)

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type MaxResults:

integer

param MaxResults:

The maximum size of the response.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation call.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRuns': [
        {
            'CustomerId': 'string',
            'ColumnStatisticsTaskRunId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'ColumnNameList': [
                'string',
            ],
            'CatalogID': 'string',
            'Role': 'string',
            'SampleSize': 123.0,
            'SecurityConfiguration': 'string',
            'NumberOfWorkers': 123,
            'WorkerType': 'string',
            'ComputationType': 'FULL'|'INCREMENTAL',
            'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'StartTime': datetime(2015, 1, 1),
            'EndTime': datetime(2015, 1, 1),
            'ErrorMessage': 'string',
            'DPUSeconds': 123.0
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- ColumnStatisticsTaskRuns (list) --
  
  A list of column statistics task runs.
  - (dict) --
    
    The object that shows the details of the column stats run.
    - CustomerId (string) --
      
      The Amazon Web Services account ID.
    - ColumnStatisticsTaskRunId (string) --
      
      The identifier for the particular column statistics task run.
    - DatabaseName (string) --
      
      The database where the table resides.
    - TableName (string) --
      
      The name of the table for which column statistics is generated.
    - ColumnNameList (list) --
      
      A list of the column names. If none is supplied, all column names for the table will be used by default.
      - (string) --
    - CatalogID (string) --
      
      The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.
    - Role (string) --
      
      The IAM role that the service assumes to generate statistics.
    - SampleSize (float) --
      
      The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.
    - SecurityConfiguration (string) --
      
      Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.
    - NumberOfWorkers (integer) --
      
      The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.
    - WorkerType (string) --
      
      The type of workers being used for generating stats. The default is g.1x.
    - ComputationType (string) --
      
      The type of column statistics computation.
    - Status (string) --
      
      The status of the task run.
    - CreationTime (datetime) --
      
      The time that this task was created.
    - LastUpdated (datetime) --
      
      The last point in time when this task was modified.
    - StartTime (datetime) --
      
      The start time of the task.
    - EndTime (datetime) --
      
      The end time of the task.
    - ErrorMessage (string) --
      
      The error message for the job.
    - DPUSeconds (float) --
      
      The calculated DPU usage in seconds for all autoscaled workers.
- NextToken (string) --
  
  A continuation token, if not all task runs have yet been returned.