AWS Glue

2024/10/31 - AWS Glue - 6 new2 updated api methods

Changes  Add schedule support for AWS Glue column statistics

StopColumnStatisticsTaskRunSchedule (new) Link ¶

Stops a column statistics task run schedule.

See also: AWS API Documentation

Request Syntax

client.stop_column_statistics_task_run_schedule(
    DatabaseName='string',
    TableName='string'
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to stop a column statistic task run schedule.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

CreateColumnStatisticsTaskSettings (new) Link ¶

Creates settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.create_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string',
    Role='string',
    Schedule='string',
    ColumnNameList=[
        'string',
    ],
    SampleSize=123.0,
    CatalogID='string',
    SecurityConfiguration='string',
    Tags={
        'string': 'string'
    }
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to generate column statistics.

type Role:

string

param Role:

[REQUIRED]

The role used for running the column statistics.

type Schedule:

string

param Schedule:

A schedule for running the column statistics, specified in CRON syntax.

type ColumnNameList:

list

param ColumnNameList:

A list of column names for which to run statistics.

  • (string) --

type SampleSize:

float

param SampleSize:

The percentage of data to sample.

type CatalogID:

string

param CatalogID:

The ID of the Data Catalog in which the database resides.

type SecurityConfiguration:

string

param SecurityConfiguration:

Name of the security configuration that is used to encrypt CloudWatch logs.

type Tags:

dict

param Tags:

A map of tags.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

GetColumnStatisticsTaskSettings (new) Link ¶

Gets settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string'
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to retrieve column statistics.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskSettings': {
        'DatabaseName': 'string',
        'TableName': 'string',
        'Schedule': {
            'ScheduleExpression': 'string',
            'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING'
        },
        'ColumnNameList': [
            'string',
        ],
        'CatalogID': 'string',
        'Role': 'string',
        'SampleSize': 123.0,
        'SecurityConfiguration': 'string'
    }
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskSettings (dict) --

      A ColumnStatisticsTaskSettings object representing the settings for the column statistics task.

      • DatabaseName (string) --

        The name of the database where the table resides.

      • TableName (string) --

        The name of the table for which to generate column statistics.

      • Schedule (dict) --

        A schedule for running the column statistics, specified in CRON syntax.

        • ScheduleExpression (string) --

          A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

        • State (string) --

          The state of the schedule.

      • ColumnNameList (list) --

        A list of column names for which to run statistics.

        • (string) --

      • CatalogID (string) --

        The ID of the Data Catalog in which the database resides.

      • Role (string) --

        The role used for running the column statistics.

      • SampleSize (float) --

        The percentage of data to sample.

      • SecurityConfiguration (string) --

        Name of the security configuration that is used to encrypt CloudWatch logs.

DeleteColumnStatisticsTaskSettings (new) Link ¶

Deletes settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.delete_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string'
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to delete column statistics.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

UpdateColumnStatisticsTaskSettings (new) Link ¶

Updates settings for a column statistics task.

See also: AWS API Documentation

Request Syntax

client.update_column_statistics_task_settings(
    DatabaseName='string',
    TableName='string',
    Role='string',
    Schedule='string',
    ColumnNameList=[
        'string',
    ],
    SampleSize=123.0,
    CatalogID='string',
    SecurityConfiguration='string'
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to generate column statistics.

type Role:

string

param Role:

The role used for running the column statistics.

type Schedule:

string

param Schedule:

A schedule for running the column statistics, specified in CRON syntax.

type ColumnNameList:

list

param ColumnNameList:

A list of column names for which to run statistics.

  • (string) --

type SampleSize:

float

param SampleSize:

The percentage of data to sample.

type CatalogID:

string

param CatalogID:

The ID of the Data Catalog in which the database resides.

type SecurityConfiguration:

string

param SecurityConfiguration:

Name of the security configuration that is used to encrypt CloudWatch logs.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

StartColumnStatisticsTaskRunSchedule (new) Link ¶

Starts a column statistics task run schedule.

See also: AWS API Documentation

Request Syntax

client.start_column_statistics_task_run_schedule(
    DatabaseName='string',
    TableName='string'
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table for which to start a column statistic task run schedule.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

GetColumnStatisticsTaskRun (updated) Link ¶
Changes (response)
{'ColumnStatisticsTaskRun': {'ComputationType': 'FULL | INCREMENTAL'}}

Get the associated metadata/information for a task run, given a task run ID.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_run(
    ColumnStatisticsTaskRunId='string'
)
type ColumnStatisticsTaskRunId:

string

param ColumnStatisticsTaskRunId:

[REQUIRED]

The identifier for the particular column statistics task run.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRun': {
        'CustomerId': 'string',
        'ColumnStatisticsTaskRunId': 'string',
        'DatabaseName': 'string',
        'TableName': 'string',
        'ColumnNameList': [
            'string',
        ],
        'CatalogID': 'string',
        'Role': 'string',
        'SampleSize': 123.0,
        'SecurityConfiguration': 'string',
        'NumberOfWorkers': 123,
        'WorkerType': 'string',
        'ComputationType': 'FULL'|'INCREMENTAL',
        'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
        'CreationTime': datetime(2015, 1, 1),
        'LastUpdated': datetime(2015, 1, 1),
        'StartTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1),
        'ErrorMessage': 'string',
        'DPUSeconds': 123.0
    }
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskRun (dict) --

      A ColumnStatisticsTaskRun object representing the details of the column stats run.

      • CustomerId (string) --

        The Amazon Web Services account ID.

      • ColumnStatisticsTaskRunId (string) --

        The identifier for the particular column statistics task run.

      • DatabaseName (string) --

        The database where the table resides.

      • TableName (string) --

        The name of the table for which column statistics is generated.

      • ColumnNameList (list) --

        A list of the column names. If none is supplied, all column names for the table will be used by default.

        • (string) --

      • CatalogID (string) --

        The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.

      • Role (string) --

        The IAM role that the service assumes to generate statistics.

      • SampleSize (float) --

        The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

      • SecurityConfiguration (string) --

        Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

      • NumberOfWorkers (integer) --

        The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

      • WorkerType (string) --

        The type of workers being used for generating stats. The default is g.1x.

      • ComputationType (string) --

        The type of column statistics computation.

      • Status (string) --

        The status of the task run.

      • CreationTime (datetime) --

        The time that this task was created.

      • LastUpdated (datetime) --

        The last point in time when this task was modified.

      • StartTime (datetime) --

        The start time of the task.

      • EndTime (datetime) --

        The end time of the task.

      • ErrorMessage (string) --

        The error message for the job.

      • DPUSeconds (float) --

        The calculated DPU usage in seconds for all autoscaled workers.

GetColumnStatisticsTaskRuns (updated) Link ¶
Changes (response)
{'ColumnStatisticsTaskRuns': {'ComputationType': 'FULL | INCREMENTAL'}}

Retrieves information about all runs associated with the specified table.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_task_runs(
    DatabaseName='string',
    TableName='string',
    MaxResults=123,
    NextToken='string'
)
type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database where the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type MaxResults:

integer

param MaxResults:

The maximum size of the response.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation call.

rtype:

dict

returns:

Response Syntax

{
    'ColumnStatisticsTaskRuns': [
        {
            'CustomerId': 'string',
            'ColumnStatisticsTaskRunId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'ColumnNameList': [
                'string',
            ],
            'CatalogID': 'string',
            'Role': 'string',
            'SampleSize': 123.0,
            'SecurityConfiguration': 'string',
            'NumberOfWorkers': 123,
            'WorkerType': 'string',
            'ComputationType': 'FULL'|'INCREMENTAL',
            'Status': 'STARTING'|'RUNNING'|'SUCCEEDED'|'FAILED'|'STOPPED',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'StartTime': datetime(2015, 1, 1),
            'EndTime': datetime(2015, 1, 1),
            'ErrorMessage': 'string',
            'DPUSeconds': 123.0
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • ColumnStatisticsTaskRuns (list) --

      A list of column statistics task runs.

      • (dict) --

        The object that shows the details of the column stats run.

        • CustomerId (string) --

          The Amazon Web Services account ID.

        • ColumnStatisticsTaskRunId (string) --

          The identifier for the particular column statistics task run.

        • DatabaseName (string) --

          The database where the table resides.

        • TableName (string) --

          The name of the table for which column statistics is generated.

        • ColumnNameList (list) --

          A list of the column names. If none is supplied, all column names for the table will be used by default.

          • (string) --

        • CatalogID (string) --

          The ID of the Data Catalog where the table resides. If none is supplied, the Amazon Web Services account ID is used by default.

        • Role (string) --

          The IAM role that the service assumes to generate statistics.

        • SampleSize (float) --

          The percentage of rows used to generate statistics. If none is supplied, the entire table will be used to generate stats.

        • SecurityConfiguration (string) --

          Name of the security configuration that is used to encrypt CloudWatch logs for the column stats task run.

        • NumberOfWorkers (integer) --

          The number of workers used to generate column statistics. The job is preconfigured to autoscale up to 25 instances.

        • WorkerType (string) --

          The type of workers being used for generating stats. The default is g.1x.

        • ComputationType (string) --

          The type of column statistics computation.

        • Status (string) --

          The status of the task run.

        • CreationTime (datetime) --

          The time that this task was created.

        • LastUpdated (datetime) --

          The last point in time when this task was modified.

        • StartTime (datetime) --

          The start time of the task.

        • EndTime (datetime) --

          The end time of the task.

        • ErrorMessage (string) --

          The error message for the job.

        • DPUSeconds (float) --

          The calculated DPU usage in seconds for all autoscaled workers.

    • NextToken (string) --

      A continuation token, if not all task runs have yet been returned.