AWS Glue

2023/11/14 - AWS Glue - 6 new api methods

Changes  Introduces new storage optimization APIs to support automatic compaction of Apache Iceberg tables.

CreateTableOptimizer (new) Link ¶

Creates a new table optimizer for a specific function. compaction is the only currently supported optimizer type.

See also: AWS API Documentation

Request Syntax

client.create_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction',
    TableOptimizerConfiguration={
        'roleArn': 'string',
        'enabled': True|False
    }
)
type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer. Currently, the only valid value is compaction.

type TableOptimizerConfiguration:

dict

param TableOptimizerConfiguration:

[REQUIRED]

A TableOptimizerConfiguration object representing the configuration of a table optimizer.

  • roleArn (string) --

    A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.

  • enabled (boolean) --

    Whether table optimization is enabled.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

ListTableOptimizerRuns (new) Link ¶

Lists the history of previous optimizer runs for a specific table.

See also: AWS API Documentation

Request Syntax

client.list_table_optimizer_runs(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction',
    MaxResults=123,
    NextToken='string'
)
type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer. Currently, the only valid value is compaction.

type MaxResults:

integer

param MaxResults:

The maximum number of optimizer runs to return on each call.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation call.

rtype:

dict

returns:

Response Syntax

{
    'CatalogId': 'string',
    'DatabaseName': 'string',
    'TableName': 'string',
    'NextToken': 'string',
    'TableOptimizerRuns': [
        {
            'eventType': 'starting'|'completed'|'failed'|'in_progress',
            'startTimestamp': datetime(2015, 1, 1),
            'endTimestamp': datetime(2015, 1, 1),
            'metrics': {
                'NumberOfBytesCompacted': 'string',
                'NumberOfFilesCompacted': 'string',
                'NumberOfDpus': 'string',
                'JobDurationInHour': 'string'
            },
            'error': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • CatalogId (string) --

      The Catalog ID of the table.

    • DatabaseName (string) --

      The name of the database in the catalog in which the table resides.

    • TableName (string) --

      The name of the table.

    • NextToken (string) --

      A continuation token for paginating the returned list of optimizer runs, returned if the current segment of the list is not the last.

    • TableOptimizerRuns (list) --

      A list of the optimizer runs associated with a table.

      • (dict) --

        Contains details for a table optimizer run.

        • eventType (string) --

          An event type representing the status of the table optimizer run.

        • startTimestamp (datetime) --

          Represents the epoch timestamp at which the compaction job was started within Lake Formation.

        • endTimestamp (datetime) --

          Represents the epoch timestamp at which the compaction job ended.

        • metrics (dict) --

          A RunMetrics object containing metrics for the optimizer run.

          • NumberOfBytesCompacted (string) --

            The number of bytes removed by the compaction job run.

          • NumberOfFilesCompacted (string) --

            The number of files removed by the compaction job run.

          • NumberOfDpus (string) --

            The number of DPU hours consumed by the job.

          • JobDurationInHour (string) --

            The duration of the job in hours.

        • error (string) --

          An error that occured during the optimizer run.

DeleteTableOptimizer (new) Link ¶

Deletes an optimizer and all associated metadata for a table. The optimization will no longer be performed on the table.

See also: AWS API Documentation

Request Syntax

client.delete_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction'
)
type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

UpdateTableOptimizer (new) Link ¶

Updates the configuration for an existing table optimizer.

See also: AWS API Documentation

Request Syntax

client.update_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction',
    TableOptimizerConfiguration={
        'roleArn': 'string',
        'enabled': True|False
    }
)
type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer. Currently, the only valid value is compaction.

type TableOptimizerConfiguration:

dict

param TableOptimizerConfiguration:

[REQUIRED]

A TableOptimizerConfiguration object representing the configuration of a table optimizer.

  • roleArn (string) --

    A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.

  • enabled (boolean) --

    Whether table optimization is enabled.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

BatchGetTableOptimizer (new) Link ¶

Returns the configuration for the specified table optimizers.

See also: AWS API Documentation

Request Syntax

client.batch_get_table_optimizer(
    Entries=[
        {
            'catalogId': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'type': 'compaction'
        },
    ]
)
type Entries:

list

param Entries:

[REQUIRED]

A list of BatchGetTableOptimizerEntry objects specifying the table optimizers to retrieve.

  • (dict) --

    Represents a table optimizer to retrieve in the BatchGetTableOptimizer operation.

    • catalogId (string) --

      The Catalog ID of the table.

    • databaseName (string) --

      The name of the database in the catalog in which the table resides.

    • tableName (string) --

      The name of the table.

    • type (string) --

      The type of table optimizer.

rtype:

dict

returns:

Response Syntax

{
    'TableOptimizers': [
        {
            'catalogId': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'tableOptimizer': {
                'type': 'compaction',
                'configuration': {
                    'roleArn': 'string',
                    'enabled': True|False
                },
                'lastRun': {
                    'eventType': 'starting'|'completed'|'failed'|'in_progress',
                    'startTimestamp': datetime(2015, 1, 1),
                    'endTimestamp': datetime(2015, 1, 1),
                    'metrics': {
                        'NumberOfBytesCompacted': 'string',
                        'NumberOfFilesCompacted': 'string',
                        'NumberOfDpus': 'string',
                        'JobDurationInHour': 'string'
                    },
                    'error': 'string'
                }
            }
        },
    ],
    'Failures': [
        {
            'error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            },
            'catalogId': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'type': 'compaction'
        },
    ]
}

Response Structure

  • (dict) --

    • TableOptimizers (list) --

      A list of BatchTableOptimizer objects.

      • (dict) --

        Contains details for one of the table optimizers returned by the BatchGetTableOptimizer operation.

        • catalogId (string) --

          The Catalog ID of the table.

        • databaseName (string) --

          The name of the database in the catalog in which the table resides.

        • tableName (string) --

          The name of the table.

        • tableOptimizer (dict) --

          A TableOptimizer object that contains details on the configuration and last run of a table optimzer.

          • type (string) --

            The type of table optimizer. Currently, the only valid value is compaction.

          • configuration (dict) --

            A TableOptimizerConfiguration object that was specified when creating or updating a table optimizer.

            • roleArn (string) --

              A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.

            • enabled (boolean) --

              Whether table optimization is enabled.

          • lastRun (dict) --

            A TableOptimizerRun object representing the last run of the table optimizer.

            • eventType (string) --

              An event type representing the status of the table optimizer run.

            • startTimestamp (datetime) --

              Represents the epoch timestamp at which the compaction job was started within Lake Formation.

            • endTimestamp (datetime) --

              Represents the epoch timestamp at which the compaction job ended.

            • metrics (dict) --

              A RunMetrics object containing metrics for the optimizer run.

              • NumberOfBytesCompacted (string) --

                The number of bytes removed by the compaction job run.

              • NumberOfFilesCompacted (string) --

                The number of files removed by the compaction job run.

              • NumberOfDpus (string) --

                The number of DPU hours consumed by the job.

              • JobDurationInHour (string) --

                The duration of the job in hours.

            • error (string) --

              An error that occured during the optimizer run.

    • Failures (list) --

      A list of errors from the operation.

      • (dict) --

        Contains details on one of the errors in the error list returned by the BatchGetTableOptimizer operation.

        • error (dict) --

          An ErrorDetail object containing code and message details about the error.

          • ErrorCode (string) --

            The code associated with this error.

          • ErrorMessage (string) --

            A message describing the error.

        • catalogId (string) --

          The Catalog ID of the table.

        • databaseName (string) --

          The name of the database in the catalog in which the table resides.

        • tableName (string) --

          The name of the table.

        • type (string) --

          The type of table optimizer.

GetTableOptimizer (new) Link ¶

Returns the configuration of all optimizers associated with a specified table.

See also: AWS API Documentation

Request Syntax

client.get_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction'
)
type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer.

rtype:

dict

returns:

Response Syntax

{
    'CatalogId': 'string',
    'DatabaseName': 'string',
    'TableName': 'string',
    'TableOptimizer': {
        'type': 'compaction',
        'configuration': {
            'roleArn': 'string',
            'enabled': True|False
        },
        'lastRun': {
            'eventType': 'starting'|'completed'|'failed'|'in_progress',
            'startTimestamp': datetime(2015, 1, 1),
            'endTimestamp': datetime(2015, 1, 1),
            'metrics': {
                'NumberOfBytesCompacted': 'string',
                'NumberOfFilesCompacted': 'string',
                'NumberOfDpus': 'string',
                'JobDurationInHour': 'string'
            },
            'error': 'string'
        }
    }
}

Response Structure

  • (dict) --

    • CatalogId (string) --

      The Catalog ID of the table.

    • DatabaseName (string) --

      The name of the database in the catalog in which the table resides.

    • TableName (string) --

      The name of the table.

    • TableOptimizer (dict) --

      The optimizer associated with the specified table.

      • type (string) --

        The type of table optimizer. Currently, the only valid value is compaction.

      • configuration (dict) --

        A TableOptimizerConfiguration object that was specified when creating or updating a table optimizer.

        • roleArn (string) --

          A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.

        • enabled (boolean) --

          Whether table optimization is enabled.

      • lastRun (dict) --

        A TableOptimizerRun object representing the last run of the table optimizer.

        • eventType (string) --

          An event type representing the status of the table optimizer run.

        • startTimestamp (datetime) --

          Represents the epoch timestamp at which the compaction job was started within Lake Formation.

        • endTimestamp (datetime) --

          Represents the epoch timestamp at which the compaction job ended.

        • metrics (dict) --

          A RunMetrics object containing metrics for the optimizer run.

          • NumberOfBytesCompacted (string) --

            The number of bytes removed by the compaction job run.

          • NumberOfFilesCompacted (string) --

            The number of files removed by the compaction job run.

          • NumberOfDpus (string) --

            The number of DPU hours consumed by the job.

          • JobDurationInHour (string) --

            The duration of the job in hours.

        • error (string) --

          An error that occured during the optimizer run.