AWS API Changes

2025/08/07 - AWS Glue - 8 updated api methods

Changes AWS Glue Data Catalog now supports Iceberg Optimization settings at the Catalog level, and supports new options to control the optimization job run rate.

BatchGetTableOptimizer (updated)

Link ¶
Changes (response)

{'TableOptimizers': {'tableOptimizer': {'configuration': {'compactionConfiguration': {'icebergConfiguration': {'deleteFileThreshold': 'integer',
                                                                                                               'minInputFiles': 'integer'}},
                                                          'orphanFileDeletionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}},
                                                          'retentionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}}},
                                        'configurationSource': 'catalog | '
                                                               'table'}}}

Returns the configuration for the specified table optimizers.

See also: AWS API Documentation

Request Syntax

client.batch_get_table_optimizer(
    Entries=[
        {
            'catalogId': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'type': 'compaction'|'retention'|'orphan_file_deletion'
        },
    ]
)

type Entries:

list

param Entries:

[REQUIRED]

A list of BatchGetTableOptimizerEntry objects specifying the table optimizers to retrieve.

(dict) --

Represents a table optimizer to retrieve in the BatchGetTableOptimizer operation.
- catalogId (string) --
  
  The Catalog ID of the table.
- databaseName (string) --
  
  The name of the database in the catalog in which the table resides.
- tableName (string) --
  
  The name of the table.
- type (string) --
  
  The type of table optimizer.

rtype:

dict

returns:

Response Syntax

{
    'TableOptimizers': [
        {
            'catalogId': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'tableOptimizer': {
                'type': 'compaction'|'retention'|'orphan_file_deletion',
                'configuration': {
                    'roleArn': 'string',
                    'enabled': True|False,
                    'vpcConfiguration': {
                        'glueConnectionName': 'string'
                    },
                    'compactionConfiguration': {
                        'icebergConfiguration': {
                            'strategy': 'binpack'|'sort'|'z-order',
                            'minInputFiles': 123,
                            'deleteFileThreshold': 123
                        }
                    },
                    'retentionConfiguration': {
                        'icebergConfiguration': {
                            'snapshotRetentionPeriodInDays': 123,
                            'numberOfSnapshotsToRetain': 123,
                            'cleanExpiredFiles': True|False,
                            'runRateInHours': 123
                        }
                    },
                    'orphanFileDeletionConfiguration': {
                        'icebergConfiguration': {
                            'orphanFileRetentionPeriodInDays': 123,
                            'location': 'string',
                            'runRateInHours': 123
                        }
                    }
                },
                'lastRun': {
                    'eventType': 'starting'|'completed'|'failed'|'in_progress',
                    'startTimestamp': datetime(2015, 1, 1),
                    'endTimestamp': datetime(2015, 1, 1),
                    'metrics': {
                        'NumberOfBytesCompacted': 'string',
                        'NumberOfFilesCompacted': 'string',
                        'NumberOfDpus': 'string',
                        'JobDurationInHour': 'string'
                    },
                    'error': 'string',
                    'compactionMetrics': {
                        'IcebergMetrics': {
                            'NumberOfBytesCompacted': 123,
                            'NumberOfFilesCompacted': 123,
                            'DpuHours': 123.0,
                            'NumberOfDpus': 123,
                            'JobDurationInHour': 123.0
                        }
                    },
                    'compactionStrategy': 'binpack'|'sort'|'z-order',
                    'retentionMetrics': {
                        'IcebergMetrics': {
                            'NumberOfDataFilesDeleted': 123,
                            'NumberOfManifestFilesDeleted': 123,
                            'NumberOfManifestListsDeleted': 123,
                            'DpuHours': 123.0,
                            'NumberOfDpus': 123,
                            'JobDurationInHour': 123.0
                        }
                    },
                    'orphanFileDeletionMetrics': {
                        'IcebergMetrics': {
                            'NumberOfOrphanFilesDeleted': 123,
                            'DpuHours': 123.0,
                            'NumberOfDpus': 123,
                            'JobDurationInHour': 123.0
                        }
                    }
                },
                'configurationSource': 'catalog'|'table'
            }
        },
    ],
    'Failures': [
        {
            'error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            },
            'catalogId': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'type': 'compaction'|'retention'|'orphan_file_deletion'
        },
    ]
}

Response Structure

(dict) --
- TableOptimizers (list) --
  
  A list of BatchTableOptimizer objects.
  - (dict) --
    
    Contains details for one of the table optimizers returned by the BatchGetTableOptimizer operation.
    - catalogId (string) --
      
      The Catalog ID of the table.
    - databaseName (string) --
      
      The name of the database in the catalog in which the table resides.
    - tableName (string) --
      
      The name of the table.
    - tableOptimizer (dict) --
      
      A TableOptimizer object that contains details on the configuration and last run of a table optimizer.
      - type (string) --
        
        The type of table optimizer. The valid values are:
        
        compaction: for managing compaction with a table optimizer.
        
        retention: for managing the retention of snapshot with a table optimizer.
        
        orphan_file_deletion: for managing the deletion of orphan files with a table optimizer.
      - configuration (dict) --
        
        A TableOptimizerConfiguration object that was specified when creating or updating a table optimizer.
        
        roleArn (string) --
        
        A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.
        
        enabled (boolean) --
        
        Whether table optimization is enabled.
        
        vpcConfiguration (dict) --
        
        A TableOptimizerVpcConfiguration object representing the VPC configuration for a table optimizer.
        
        This configuration is necessary to perform optimization on tables that are in a customer VPC.
        
        Note
        
        This is a Tagged Union structure. Only one of the following top level keys will be set: glueConnectionName. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:
        
        'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
        
        glueConnectionName (string) --
        
        The name of the Glue connection used for the VPC for the table optimizer.
        
        compactionConfiguration (dict) --
        
        The configuration for a compaction optimizer. This configuration defines how data files in your table will be compacted to improve query performance and reduce storage costs.
        
        icebergConfiguration (dict) --
        
        The configuration for an Iceberg compaction optimizer.
        
        strategy (string) --
        
        The strategy to use for compaction. Valid values are:
        
        binpack: Combines small files into larger files, typically targeting sizes over 100MB, while applying any pending deletes. This is the recommended compaction strategy for most use cases.
        
        sort: Organizes data based on specified columns which are sorted hierarchically during compaction, improving query performance for filtered operations. This strategy is recommended when your queries frequently filter on specific columns. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
        
        z-order: Optimizes data organization by blending multiple attributes into a single scalar value that can be used for sorting, allowing efficient querying across multiple dimensions. This strategy is recommended when you need to query data across multiple dimensions simultaneously. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
        
        If an input is not provided, the default value 'binpack' will be used.
        
        minInputFiles (integer) --
        
        The minimum number of data files that must be present in a partition before compaction will actually compact files. This parameter helps control when compaction is triggered, preventing unnecessary compaction operations on partitions with few files. If an input is not provided, the default value 100 will be used.
        
        deleteFileThreshold (integer) --
        
        The minimum number of deletes that must be present in a data file to make it eligible for compaction. This parameter helps optimize compaction by focusing on files that contain a significant number of delete operations, which can improve query performance by removing deleted records. If an input is not provided, the default value 1 will be used.
        
        retentionConfiguration (dict) --
        
        The configuration for a snapshot retention optimizer.
        
        icebergConfiguration (dict) --
        
        The configuration for an Iceberg snapshot retention optimizer.
        
        snapshotRetentionPeriodInDays (integer) --
        
        The number of days to retain the Iceberg snapshots. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 5 will be used.
        
        numberOfSnapshotsToRetain (integer) --
        
        The number of Iceberg snapshots to retain within the retention period. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 1 will be used.
        
        cleanExpiredFiles (boolean) --
        
        If set to false, snapshots are only deleted from table metadata, and the underlying data and metadata files are not deleted.
        
        runRateInHours (integer) --
        
        The interval in hours between retention job runs. This parameter controls how frequently the retention optimizer will run to clean up expired snapshots. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.
        
        orphanFileDeletionConfiguration (dict) --
        
        The configuration for an orphan file deletion optimizer.
        
        icebergConfiguration (dict) --
        
        The configuration for an Iceberg orphan file deletion optimizer.
        
        orphanFileRetentionPeriodInDays (integer) --
        
        The number of days that orphan files should be retained before file deletion. If an input is not provided, the default value 3 will be used.
        
        location (string) --
        
        Specifies a directory in which to look for files (defaults to the table's location). You may choose a sub-directory rather than the top-level table location.
        
        runRateInHours (integer) --
        
        The interval in hours between orphan file deletion job runs. This parameter controls how frequently the orphan file deletion optimizer will run to clean up orphan files. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.
      - lastRun (dict) --
        
        A TableOptimizerRun object representing the last run of the table optimizer.
        
        eventType (string) --
        
        An event type representing the status of the table optimizer run.
        
        startTimestamp (datetime) --
        
        Represents the epoch timestamp at which the compaction job was started within Lake Formation.
        
        endTimestamp (datetime) --
        
        Represents the epoch timestamp at which the compaction job ended.
        
        metrics (dict) --
        
        A RunMetrics object containing metrics for the optimizer run.
        
        This member is deprecated. See the individual metric members for compaction, retention, and orphan file deletion.
        
        NumberOfBytesCompacted (string) --
        
        The number of bytes removed by the compaction job run.
        
        NumberOfFilesCompacted (string) --
        
        The number of files removed by the compaction job run.
        
        NumberOfDpus (string) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (string) --
        
        The duration of the job in hours.
        
        error (string) --
        
        An error that occured during the optimizer run.
        
        compactionMetrics (dict) --
        
        A CompactionMetrics object containing metrics for the optimizer run.
        
        IcebergMetrics (dict) --
        
        A structure containing the Iceberg compaction metrics for the optimizer run.
        
        NumberOfBytesCompacted (integer) --
        
        The number of bytes removed by the compaction job run.
        
        NumberOfFilesCompacted (integer) --
        
        The number of files removed by the compaction job run.
        
        DpuHours (float) --
        
        The number of DPU hours consumed by the job.
        
        NumberOfDpus (integer) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (float) --
        
        The duration of the job in hours.
        
        compactionStrategy (string) --
        
        The strategy used for the compaction run. Indicates which algorithm was applied to determine how files were selected and combined during the compaction process. Valid values are:
        
        binpack: Combines small files into larger files, typically targeting sizes over 100MB, while applying any pending deletes. This is the recommended compaction strategy for most use cases.
        
        sort: Organizes data based on specified columns which are sorted hierarchically during compaction, improving query performance for filtered operations. This strategy is recommended when your queries frequently filter on specific columns. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
        
        z-order: Optimizes data organization by blending multiple attributes into a single scalar value that can be used for sorting, allowing efficient querying across multiple dimensions. This strategy is recommended when you need to query data across multiple dimensions simultaneously. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
        
        retentionMetrics (dict) --
        
        A RetentionMetrics object containing metrics for the optimizer run.
        
        IcebergMetrics (dict) --
        
        A structure containing the Iceberg retention metrics for the optimizer run.
        
        NumberOfDataFilesDeleted (integer) --
        
        The number of data files deleted by the retention job run.
        
        NumberOfManifestFilesDeleted (integer) --
        
        The number of manifest files deleted by the retention job run.
        
        NumberOfManifestListsDeleted (integer) --
        
        The number of manifest lists deleted by the retention job run.
        
        DpuHours (float) --
        
        The number of DPU hours consumed by the job.
        
        NumberOfDpus (integer) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (float) --
        
        The duration of the job in hours.
        
        orphanFileDeletionMetrics (dict) --
        
        An OrphanFileDeletionMetrics object containing metrics for the optimizer run.
        
        IcebergMetrics (dict) --
        
        A structure containing the Iceberg orphan file deletion metrics for the optimizer run.
        
        NumberOfOrphanFilesDeleted (integer) --
        
        The number of orphan files deleted by the orphan file deletion job run.
        
        DpuHours (float) --
        
        The number of DPU hours consumed by the job.
        
        NumberOfDpus (integer) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (float) --
        
        The duration of the job in hours.
      - configurationSource (string) --
        
        Specifies the source of the optimizer configuration. This indicates how the table optimizer was configured and which entity or service initiated the configuration.
- Failures (list) --
  
  A list of errors from the operation.
  - (dict) --
    
    Contains details on one of the errors in the error list returned by the BatchGetTableOptimizer operation.
    - error (dict) --
      
      An ErrorDetail object containing code and message details about the error.
      - ErrorCode (string) --
        
        The code associated with this error.
      - ErrorMessage (string) --
        
        A message describing the error.
    - catalogId (string) --
      
      The Catalog ID of the table.
    - databaseName (string) --
      
      The name of the database in the catalog in which the table resides.
    - tableName (string) --
      
      The name of the table.
    - type (string) --
      
      The type of table optimizer.

CreateCatalog (updated)

Link ¶
Changes (request)

{'CatalogInput': {'CatalogProperties': {'IcebergOptimizationProperties': {'Compaction': {'string': 'string'},
                                                                          'OrphanFileDeletion': {'string': 'string'},
                                                                          'Retention': {'string': 'string'},
                                                                          'RoleArn': 'string'}}}}

Creates a new catalog in the Glue Data Catalog.

See also: AWS API Documentation

Request Syntax

client.create_catalog(
    Name='string',
    CatalogInput={
        'Description': 'string',
        'FederatedCatalog': {
            'Identifier': 'string',
            'ConnectionName': 'string',
            'ConnectionType': 'string'
        },
        'Parameters': {
            'string': 'string'
        },
        'TargetRedshiftCatalog': {
            'CatalogArn': 'string'
        },
        'CatalogProperties': {
            'DataLakeAccessProperties': {
                'DataLakeAccess': True|False,
                'DataTransferRole': 'string',
                'KmsKey': 'string',
                'CatalogType': 'string'
            },
            'IcebergOptimizationProperties': {
                'RoleArn': 'string',
                'Compaction': {
                    'string': 'string'
                },
                'Retention': {
                    'string': 'string'
                },
                'OrphanFileDeletion': {
                    'string': 'string'
                }
            },
            'CustomProperties': {
                'string': 'string'
            }
        },
        'CreateTableDefaultPermissions': [
            {
                'Principal': {
                    'DataLakePrincipalIdentifier': 'string'
                },
                'Permissions': [
                    'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                ]
            },
        ],
        'CreateDatabaseDefaultPermissions': [
            {
                'Principal': {
                    'DataLakePrincipalIdentifier': 'string'
                },
                'Permissions': [
                    'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                ]
            },
        ],
        'AllowFullTableExternalDataAccess': 'True'|'False'
    },
    Tags={
        'string': 'string'
    }
)

type Name:

string

param Name:

[REQUIRED]

The name of the catalog to create.

type CatalogInput:

dict

param CatalogInput:

[REQUIRED]

A CatalogInput object that defines the metadata for the catalog.

Description (string) --

Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern. A description of the catalog.
FederatedCatalog (dict) --

A FederatedCatalog object. A FederatedCatalog structure that references an entity outside the Glue Data Catalog, for example a Redshift database.
- Identifier (string) --
  
  A unique identifier for the federated catalog.
- ConnectionName (string) --
  
  The name of the connection to an external data source, for example a Redshift-federated catalog.
- ConnectionType (string) --
  
  The type of connection used to access the federated catalog, specifying the protocol or method for connection to the external data source.
Parameters (dict) --

A map array of key-value pairs that define the parameters and properties of the catalog.
- (string) --
  - (string) --
TargetRedshiftCatalog (dict) --

A TargetRedshiftCatalog object that describes a target catalog for resource linking.
- CatalogArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the catalog resource.
CatalogProperties (dict) --

A CatalogProperties object that specifies data lake access properties and other custom properties.
- DataLakeAccessProperties (dict) --
  
  A DataLakeAccessProperties object that specifies properties to configure data lake access for your catalog resource in the Glue Data Catalog.
  - DataLakeAccess (boolean) --
    
    Turns on or off data lake access for Apache Spark applications that access Amazon Redshift databases in the Data Catalog from any non-Redshift engine, such as Amazon Athena, Amazon EMR, or Glue ETL.
  - DataTransferRole (string) --
    
    A role that will be assumed by Glue for transferring data into/out of the staging bucket during a query.
  - KmsKey (string) --
    
    An encryption key that will be used for the staging bucket that will be created along with the catalog.
  - CatalogType (string) --
    
    Specifies a federated catalog type for the native catalog resource. The currently supported type is aws:redshift.
- IcebergOptimizationProperties (dict) --
  
  A structure that specifies Iceberg table optimization properties for the catalog. This includes configuration for compaction, retention, and orphan file deletion operations that can be applied to Iceberg tables in this catalog.
  - RoleArn (string) --
    
    The Amazon Resource Name (ARN) of the IAM role that will be assumed to perform Iceberg table optimization operations.
  - Compaction (dict) --
    
    A map of key-value pairs that specify configuration parameters for Iceberg table compaction operations, which optimize the layout of data files to improve query performance.
    - (string) --
      - (string) --
  - Retention (dict) --
    
    A map of key-value pairs that specify configuration parameters for Iceberg table retention operations, which manage the lifecycle of table snapshots to control storage costs.
    - (string) --
      - (string) --
  - OrphanFileDeletion (dict) --
    
    A map of key-value pairs that specify configuration parameters for Iceberg orphan file deletion operations, which identify and remove files that are no longer referenced by the table metadata.
    - (string) --
      - (string) --
- CustomProperties (dict) --
  
  Additional key-value properties for the catalog, such as column statistics optimizations.
  - (string) --
    - (string) --
CreateTableDefaultPermissions (list) --

An array of PrincipalPermissions objects. Creates a set of default permissions on the table(s) for principals. Used by Amazon Web Services Lake Formation. Typically should be explicitly set as an empty list.
- (dict) --
  
  Permissions granted to a principal.
  - Principal (dict) --
    
    The principal who is granted permissions.
    - DataLakePrincipalIdentifier (string) --
      
      An identifier for the Lake Formation principal.
  - Permissions (list) --
    
    The permissions that are granted to the principal.
    - (string) --
CreateDatabaseDefaultPermissions (list) --

An array of PrincipalPermissions objects. Creates a set of default permissions on the database(s) for principals. Used by Amazon Web Services Lake Formation. Typically should be explicitly set as an empty list.
- (dict) --
  
  Permissions granted to a principal.
  - Principal (dict) --
    
    The principal who is granted permissions.
    - DataLakePrincipalIdentifier (string) --
      
      An identifier for the Lake Formation principal.
  - Permissions (list) --
    
    The permissions that are granted to the principal.
    - (string) --
AllowFullTableExternalDataAccess (string) --

Allows third-party engines to access data in Amazon S3 locations that are registered with Lake Formation.

type Tags:

dict

param Tags:

A map array of key-value pairs, not more than 50 pairs. Each key is a UTF-8 string, not less than 1 or more than 128 bytes long. Each value is a UTF-8 string, not more than 256 bytes long. The tags you assign to the catalog.

(string) --
- (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

CreateTableOptimizer (updated)

Link ¶
Changes (request)

{'TableOptimizerConfiguration': {'compactionConfiguration': {'icebergConfiguration': {'deleteFileThreshold': 'integer',
                                                                                      'minInputFiles': 'integer'}},
                                 'orphanFileDeletionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}},
                                 'retentionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}}}}

Creates a new table optimizer for a specific function.

See also: AWS API Documentation

Request Syntax

client.create_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction'|'retention'|'orphan_file_deletion',
    TableOptimizerConfiguration={
        'roleArn': 'string',
        'enabled': True|False,
        'vpcConfiguration': {
            'glueConnectionName': 'string'
        },
        'compactionConfiguration': {
            'icebergConfiguration': {
                'strategy': 'binpack'|'sort'|'z-order',
                'minInputFiles': 123,
                'deleteFileThreshold': 123
            }
        },
        'retentionConfiguration': {
            'icebergConfiguration': {
                'snapshotRetentionPeriodInDays': 123,
                'numberOfSnapshotsToRetain': 123,
                'cleanExpiredFiles': True|False,
                'runRateInHours': 123
            }
        },
        'orphanFileDeletionConfiguration': {
            'icebergConfiguration': {
                'orphanFileRetentionPeriodInDays': 123,
                'location': 'string',
                'runRateInHours': 123
            }
        }
    }
)

type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer.

type TableOptimizerConfiguration:

dict

param TableOptimizerConfiguration:

[REQUIRED]

A TableOptimizerConfiguration object representing the configuration of a table optimizer.

roleArn (string) --

A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.
enabled (boolean) --

Whether table optimization is enabled.
vpcConfiguration (dict) --

A TableOptimizerVpcConfiguration object representing the VPC configuration for a table optimizer.

This configuration is necessary to perform optimization on tables that are in a customer VPC.

Note

This is a Tagged Union structure. Only one of the following top level keys can be set: glueConnectionName.
- glueConnectionName (string) --
  
  The name of the Glue connection used for the VPC for the table optimizer.
compactionConfiguration (dict) --

The configuration for a compaction optimizer. This configuration defines how data files in your table will be compacted to improve query performance and reduce storage costs.
- icebergConfiguration (dict) --
  
  The configuration for an Iceberg compaction optimizer.
  - strategy (string) --
    
    The strategy to use for compaction. Valid values are:
    - binpack: Combines small files into larger files, typically targeting sizes over 100MB, while applying any pending deletes. This is the recommended compaction strategy for most use cases.
    - sort: Organizes data based on specified columns which are sorted hierarchically during compaction, improving query performance for filtered operations. This strategy is recommended when your queries frequently filter on specific columns. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
    - z-order: Optimizes data organization by blending multiple attributes into a single scalar value that can be used for sorting, allowing efficient querying across multiple dimensions. This strategy is recommended when you need to query data across multiple dimensions simultaneously. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
    If an input is not provided, the default value 'binpack' will be used.
  - minInputFiles (integer) --
    
    The minimum number of data files that must be present in a partition before compaction will actually compact files. This parameter helps control when compaction is triggered, preventing unnecessary compaction operations on partitions with few files. If an input is not provided, the default value 100 will be used.
  - deleteFileThreshold (integer) --
    
    The minimum number of deletes that must be present in a data file to make it eligible for compaction. This parameter helps optimize compaction by focusing on files that contain a significant number of delete operations, which can improve query performance by removing deleted records. If an input is not provided, the default value 1 will be used.
retentionConfiguration (dict) --

The configuration for a snapshot retention optimizer.
- icebergConfiguration (dict) --
  
  The configuration for an Iceberg snapshot retention optimizer.
  - snapshotRetentionPeriodInDays (integer) --
    
    The number of days to retain the Iceberg snapshots. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 5 will be used.
  - numberOfSnapshotsToRetain (integer) --
    
    The number of Iceberg snapshots to retain within the retention period. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 1 will be used.
  - cleanExpiredFiles (boolean) --
    
    If set to false, snapshots are only deleted from table metadata, and the underlying data and metadata files are not deleted.
  - runRateInHours (integer) --
    
    The interval in hours between retention job runs. This parameter controls how frequently the retention optimizer will run to clean up expired snapshots. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.
orphanFileDeletionConfiguration (dict) --

The configuration for an orphan file deletion optimizer.
- icebergConfiguration (dict) --
  
  The configuration for an Iceberg orphan file deletion optimizer.
  - orphanFileRetentionPeriodInDays (integer) --
    
    The number of days that orphan files should be retained before file deletion. If an input is not provided, the default value 3 will be used.
  - location (string) --
    
    Specifies a directory in which to look for files (defaults to the table's location). You may choose a sub-directory rather than the top-level table location.
  - runRateInHours (integer) --
    
    The interval in hours between orphan file deletion job runs. This parameter controls how frequently the orphan file deletion optimizer will run to clean up orphan files. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

GetCatalog (updated)

Link ¶
Changes (response)

{'Catalog': {'CatalogProperties': {'IcebergOptimizationProperties': {'Compaction': {'string': 'string'},
                                                                     'LastUpdatedTime': 'timestamp',
                                                                     'OrphanFileDeletion': {'string': 'string'},
                                                                     'Retention': {'string': 'string'},
                                                                     'RoleArn': 'string'}}}}

The name of the Catalog to retrieve. This should be all lowercase.

See also: AWS API Documentation

Request Syntax

client.get_catalog(
    CatalogId='string'
)

type CatalogId:

string

param CatalogId:

[REQUIRED]

The ID of the parent catalog in which the catalog resides. If none is provided, the Amazon Web Services Account Number is used by default.

rtype:

dict

returns:

Response Syntax

{
    'Catalog': {
        'CatalogId': 'string',
        'Name': 'string',
        'ResourceArn': 'string',
        'Description': 'string',
        'Parameters': {
            'string': 'string'
        },
        'CreateTime': datetime(2015, 1, 1),
        'UpdateTime': datetime(2015, 1, 1),
        'TargetRedshiftCatalog': {
            'CatalogArn': 'string'
        },
        'FederatedCatalog': {
            'Identifier': 'string',
            'ConnectionName': 'string',
            'ConnectionType': 'string'
        },
        'CatalogProperties': {
            'DataLakeAccessProperties': {
                'DataLakeAccess': True|False,
                'DataTransferRole': 'string',
                'KmsKey': 'string',
                'ManagedWorkgroupName': 'string',
                'ManagedWorkgroupStatus': 'string',
                'RedshiftDatabaseName': 'string',
                'StatusMessage': 'string',
                'CatalogType': 'string'
            },
            'IcebergOptimizationProperties': {
                'RoleArn': 'string',
                'Compaction': {
                    'string': 'string'
                },
                'Retention': {
                    'string': 'string'
                },
                'OrphanFileDeletion': {
                    'string': 'string'
                },
                'LastUpdatedTime': datetime(2015, 1, 1)
            },
            'CustomProperties': {
                'string': 'string'
            }
        },
        'CreateTableDefaultPermissions': [
            {
                'Principal': {
                    'DataLakePrincipalIdentifier': 'string'
                },
                'Permissions': [
                    'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                ]
            },
        ],
        'CreateDatabaseDefaultPermissions': [
            {
                'Principal': {
                    'DataLakePrincipalIdentifier': 'string'
                },
                'Permissions': [
                    'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                ]
            },
        ],
        'AllowFullTableExternalDataAccess': 'True'|'False'
    }
}

Response Structure

(dict) --
- Catalog (dict) --
  
  A Catalog object. The definition of the specified catalog in the Glue Data Catalog.
  - CatalogId (string) --
    
    The ID of the catalog. To grant access to the default catalog, this field should not be provided.
  - Name (string) --
    
    The name of the catalog. Cannot be the same as the account ID.
  - ResourceArn (string) --
    
    The Amazon Resource Name (ARN) assigned to the catalog resource.
  - Description (string) --
    
    Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern. A description of the catalog.
  - Parameters (dict) --
    
    A map array of key-value pairs that define parameters and properties of the catalog.
    - (string) --
      - (string) --
  - CreateTime (datetime) --
    
    The time at which the catalog was created.
  - UpdateTime (datetime) --
    
    The time at which the catalog was last updated.
  - TargetRedshiftCatalog (dict) --
    
    A TargetRedshiftCatalog object that describes a target catalog for database resource linking.
    - CatalogArn (string) --
      
      The Amazon Resource Name (ARN) of the catalog resource.
  - FederatedCatalog (dict) --
    
    A FederatedCatalog object that points to an entity outside the Glue Data Catalog.
    - Identifier (string) --
      
      A unique identifier for the federated catalog.
    - ConnectionName (string) --
      
      The name of the connection to an external data source, for example a Redshift-federated catalog.
    - ConnectionType (string) --
      
      The type of connection used to access the federated catalog, specifying the protocol or method for connection to the external data source.
  - CatalogProperties (dict) --
    
    A CatalogProperties object that specifies data lake access properties and other custom properties.
    - DataLakeAccessProperties (dict) --
      
      A DataLakeAccessProperties object with input properties to configure data lake access for your catalog resource in the Glue Data Catalog.
      - DataLakeAccess (boolean) --
        
        Turns on or off data lake access for Apache Spark applications that access Amazon Redshift databases in the Data Catalog.
      - DataTransferRole (string) --
        
        A role that will be assumed by Glue for transferring data into/out of the staging bucket during a query.
      - KmsKey (string) --
        
        An encryption key that will be used for the staging bucket that will be created along with the catalog.
      - ManagedWorkgroupName (string) --
        
        The managed Redshift Serverless compute name that is created for your catalog resource.
      - ManagedWorkgroupStatus (string) --
        
        The managed Redshift Serverless compute status.
      - RedshiftDatabaseName (string) --
        
        The default Redshift database resource name in the managed compute.
      - StatusMessage (string) --
        
        A message that gives more detailed information about the managed workgroup status.
      - CatalogType (string) --
        
        Specifies a federated catalog type for the native catalog resource. The currently supported type is aws:redshift.
    - IcebergOptimizationProperties (dict) --
      
      An IcebergOptimizationPropertiesOutput object that specifies Iceberg table optimization settings for the catalog, including configurations for compaction, retention, and orphan file deletion operations.
      - RoleArn (string) --
        
        The Amazon Resource Name (ARN) of the IAM role that is used to perform Iceberg table optimization operations.
      - Compaction (dict) --
        
        A map of key-value pairs that specify configuration parameters for Iceberg table compaction operations, which optimize the layout of data files to improve query performance.
        
        (string) --
        
        (string) --
      - Retention (dict) --
        
        A map of key-value pairs that specify configuration parameters for Iceberg table retention operations, which manage the lifecycle of table snapshots to control storage costs.
        
        (string) --
        
        (string) --
      - OrphanFileDeletion (dict) --
        
        A map of key-value pairs that specify configuration parameters for Iceberg orphan file deletion operations, which identify and remove files that are no longer referenced by the table metadata.
        
        (string) --
        
        (string) --
      - LastUpdatedTime (datetime) --
        
        The timestamp when the Iceberg optimization properties were last updated.
    - CustomProperties (dict) --
      
      Additional key-value properties for the catalog, such as column statistics optimizations.
      - (string) --
        
        (string) --
  - CreateTableDefaultPermissions (list) --
    
    An array of PrincipalPermissions objects. Creates a set of default permissions on the table(s) for principals. Used by Amazon Web Services Lake Formation. Not used in the normal course of Glue operations.
    - (dict) --
      
      Permissions granted to a principal.
      - Principal (dict) --
        
        The principal who is granted permissions.
        
        DataLakePrincipalIdentifier (string) --
        
        An identifier for the Lake Formation principal.
      - Permissions (list) --
        
        The permissions that are granted to the principal.
        
        (string) --
  - CreateDatabaseDefaultPermissions (list) --
    
    An array of PrincipalPermissions objects. Creates a set of default permissions on the database(s) for principals. Used by Amazon Web Services Lake Formation. Not used in the normal course of Glue operations.
    - (dict) --
      
      Permissions granted to a principal.
      - Principal (dict) --
        
        The principal who is granted permissions.
        
        DataLakePrincipalIdentifier (string) --
        
        An identifier for the Lake Formation principal.
      - Permissions (list) --
        
        The permissions that are granted to the principal.
        
        (string) --
  - AllowFullTableExternalDataAccess (string) --
    
    Allows third-party engines to access data in Amazon S3 locations that are registered with Lake Formation.

GetCatalogs (updated)

Link ¶
Changes (response)

{'CatalogList': {'CatalogProperties': {'IcebergOptimizationProperties': {'Compaction': {'string': 'string'},
                                                                         'LastUpdatedTime': 'timestamp',
                                                                         'OrphanFileDeletion': {'string': 'string'},
                                                                         'Retention': {'string': 'string'},
                                                                         'RoleArn': 'string'}}}}

Retrieves all catalogs defined in a catalog in the Glue Data Catalog. For a Redshift-federated catalog use case, this operation returns the list of catalogs mapped to Redshift databases in the Redshift namespace catalog.

See also: AWS API Documentation

Request Syntax

client.get_catalogs(
    ParentCatalogId='string',
    NextToken='string',
    MaxResults=123,
    Recursive=True|False,
    IncludeRoot=True|False
)

type ParentCatalogId:

string

param ParentCatalogId:

The ID of the parent catalog in which the catalog resides. If none is provided, the Amazon Web Services Account Number is used by default.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation call.

type MaxResults:

integer

param MaxResults:

The maximum number of catalogs to return in one response.

type Recursive:

boolean

param Recursive:

Whether to list all catalogs across the catalog hierarchy, starting from the ParentCatalogId. Defaults to false . When true, all catalog objects in the ParentCatalogID hierarchy are enumerated in the response.

type IncludeRoot:

boolean

param IncludeRoot:

Whether to list the default catalog in the account and region in the response. Defaults to false. When true and ParentCatalogId = NULL | Amazon Web Services Account ID, all catalogs and the default catalog are enumerated in the response.

When the ParentCatalogId is not equal to null, and this attribute is passed as false or true, an InvalidInputException is thrown.

rtype:

dict

returns:

Response Syntax

{
    'CatalogList': [
        {
            'CatalogId': 'string',
            'Name': 'string',
            'ResourceArn': 'string',
            'Description': 'string',
            'Parameters': {
                'string': 'string'
            },
            'CreateTime': datetime(2015, 1, 1),
            'UpdateTime': datetime(2015, 1, 1),
            'TargetRedshiftCatalog': {
                'CatalogArn': 'string'
            },
            'FederatedCatalog': {
                'Identifier': 'string',
                'ConnectionName': 'string',
                'ConnectionType': 'string'
            },
            'CatalogProperties': {
                'DataLakeAccessProperties': {
                    'DataLakeAccess': True|False,
                    'DataTransferRole': 'string',
                    'KmsKey': 'string',
                    'ManagedWorkgroupName': 'string',
                    'ManagedWorkgroupStatus': 'string',
                    'RedshiftDatabaseName': 'string',
                    'StatusMessage': 'string',
                    'CatalogType': 'string'
                },
                'IcebergOptimizationProperties': {
                    'RoleArn': 'string',
                    'Compaction': {
                        'string': 'string'
                    },
                    'Retention': {
                        'string': 'string'
                    },
                    'OrphanFileDeletion': {
                        'string': 'string'
                    },
                    'LastUpdatedTime': datetime(2015, 1, 1)
                },
                'CustomProperties': {
                    'string': 'string'
                }
            },
            'CreateTableDefaultPermissions': [
                {
                    'Principal': {
                        'DataLakePrincipalIdentifier': 'string'
                    },
                    'Permissions': [
                        'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                    ]
                },
            ],
            'CreateDatabaseDefaultPermissions': [
                {
                    'Principal': {
                        'DataLakePrincipalIdentifier': 'string'
                    },
                    'Permissions': [
                        'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                    ]
                },
            ],
            'AllowFullTableExternalDataAccess': 'True'|'False'
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- CatalogList (list) --
  
  An array of Catalog objects. A list of Catalog objects from the specified parent catalog.
  - (dict) --
    
    The catalog object represents a logical grouping of databases in the Glue Data Catalog or a federated source. You can now create a Redshift-federated catalog or a catalog containing resource links to Redshift databases in another account or region.
    - CatalogId (string) --
      
      The ID of the catalog. To grant access to the default catalog, this field should not be provided.
    - Name (string) --
      
      The name of the catalog. Cannot be the same as the account ID.
    - ResourceArn (string) --
      
      The Amazon Resource Name (ARN) assigned to the catalog resource.
    - Description (string) --
      
      Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern. A description of the catalog.
    - Parameters (dict) --
      
      A map array of key-value pairs that define parameters and properties of the catalog.
      - (string) --
        
        (string) --
    - CreateTime (datetime) --
      
      The time at which the catalog was created.
    - UpdateTime (datetime) --
      
      The time at which the catalog was last updated.
    - TargetRedshiftCatalog (dict) --
      
      A TargetRedshiftCatalog object that describes a target catalog for database resource linking.
      - CatalogArn (string) --
        
        The Amazon Resource Name (ARN) of the catalog resource.
    - FederatedCatalog (dict) --
      
      A FederatedCatalog object that points to an entity outside the Glue Data Catalog.
      - Identifier (string) --
        
        A unique identifier for the federated catalog.
      - ConnectionName (string) --
        
        The name of the connection to an external data source, for example a Redshift-federated catalog.
      - ConnectionType (string) --
        
        The type of connection used to access the federated catalog, specifying the protocol or method for connection to the external data source.
    - CatalogProperties (dict) --
      
      A CatalogProperties object that specifies data lake access properties and other custom properties.
      - DataLakeAccessProperties (dict) --
        
        A DataLakeAccessProperties object with input properties to configure data lake access for your catalog resource in the Glue Data Catalog.
        
        DataLakeAccess (boolean) --
        
        Turns on or off data lake access for Apache Spark applications that access Amazon Redshift databases in the Data Catalog.
        
        DataTransferRole (string) --
        
        A role that will be assumed by Glue for transferring data into/out of the staging bucket during a query.
        
        KmsKey (string) --
        
        An encryption key that will be used for the staging bucket that will be created along with the catalog.
        
        ManagedWorkgroupName (string) --
        
        The managed Redshift Serverless compute name that is created for your catalog resource.
        
        ManagedWorkgroupStatus (string) --
        
        The managed Redshift Serverless compute status.
        
        RedshiftDatabaseName (string) --
        
        The default Redshift database resource name in the managed compute.
        
        StatusMessage (string) --
        
        A message that gives more detailed information about the managed workgroup status.
        
        CatalogType (string) --
        
        Specifies a federated catalog type for the native catalog resource. The currently supported type is aws:redshift.
      - IcebergOptimizationProperties (dict) --
        
        An IcebergOptimizationPropertiesOutput object that specifies Iceberg table optimization settings for the catalog, including configurations for compaction, retention, and orphan file deletion operations.
        
        RoleArn (string) --
        
        The Amazon Resource Name (ARN) of the IAM role that is used to perform Iceberg table optimization operations.
        
        Compaction (dict) --
        
        A map of key-value pairs that specify configuration parameters for Iceberg table compaction operations, which optimize the layout of data files to improve query performance.
        
        (string) --
        
        (string) --
        
        Retention (dict) --
        
        A map of key-value pairs that specify configuration parameters for Iceberg table retention operations, which manage the lifecycle of table snapshots to control storage costs.
        
        (string) --
        
        (string) --
        
        OrphanFileDeletion (dict) --
        
        A map of key-value pairs that specify configuration parameters for Iceberg orphan file deletion operations, which identify and remove files that are no longer referenced by the table metadata.
        
        (string) --
        
        (string) --
        
        LastUpdatedTime (datetime) --
        
        The timestamp when the Iceberg optimization properties were last updated.
      - CustomProperties (dict) --
        
        Additional key-value properties for the catalog, such as column statistics optimizations.
        
        (string) --
        
        (string) --
    - CreateTableDefaultPermissions (list) --
      
      An array of PrincipalPermissions objects. Creates a set of default permissions on the table(s) for principals. Used by Amazon Web Services Lake Formation. Not used in the normal course of Glue operations.
      - (dict) --
        
        Permissions granted to a principal.
        
        Principal (dict) --
        
        The principal who is granted permissions.
        
        DataLakePrincipalIdentifier (string) --
        
        An identifier for the Lake Formation principal.
        
        Permissions (list) --
        
        The permissions that are granted to the principal.
        
        (string) --
    - CreateDatabaseDefaultPermissions (list) --
      
      An array of PrincipalPermissions objects. Creates a set of default permissions on the database(s) for principals. Used by Amazon Web Services Lake Formation. Not used in the normal course of Glue operations.
      - (dict) --
        
        Permissions granted to a principal.
        
        Principal (dict) --
        
        The principal who is granted permissions.
        
        DataLakePrincipalIdentifier (string) --
        
        An identifier for the Lake Formation principal.
        
        Permissions (list) --
        
        The permissions that are granted to the principal.
        
        (string) --
    - AllowFullTableExternalDataAccess (string) --
      
      Allows third-party engines to access data in Amazon S3 locations that are registered with Lake Formation.
- NextToken (string) --
  
  A continuation token for paginating the returned list of tokens, returned if the current segment of the list is not the last.

GetTableOptimizer (updated)

Link ¶
Changes (response)

{'TableOptimizer': {'configuration': {'compactionConfiguration': {'icebergConfiguration': {'deleteFileThreshold': 'integer',
                                                                                           'minInputFiles': 'integer'}},
                                      'orphanFileDeletionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}},
                                      'retentionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}}},
                    'configurationSource': 'catalog | table'}}

Returns the configuration of all optimizers associated with a specified table.

See also: AWS API Documentation

Request Syntax

client.get_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction'|'retention'|'orphan_file_deletion'
)

type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer.

rtype:

dict

returns:

Response Syntax

{
    'CatalogId': 'string',
    'DatabaseName': 'string',
    'TableName': 'string',
    'TableOptimizer': {
        'type': 'compaction'|'retention'|'orphan_file_deletion',
        'configuration': {
            'roleArn': 'string',
            'enabled': True|False,
            'vpcConfiguration': {
                'glueConnectionName': 'string'
            },
            'compactionConfiguration': {
                'icebergConfiguration': {
                    'strategy': 'binpack'|'sort'|'z-order',
                    'minInputFiles': 123,
                    'deleteFileThreshold': 123
                }
            },
            'retentionConfiguration': {
                'icebergConfiguration': {
                    'snapshotRetentionPeriodInDays': 123,
                    'numberOfSnapshotsToRetain': 123,
                    'cleanExpiredFiles': True|False,
                    'runRateInHours': 123
                }
            },
            'orphanFileDeletionConfiguration': {
                'icebergConfiguration': {
                    'orphanFileRetentionPeriodInDays': 123,
                    'location': 'string',
                    'runRateInHours': 123
                }
            }
        },
        'lastRun': {
            'eventType': 'starting'|'completed'|'failed'|'in_progress',
            'startTimestamp': datetime(2015, 1, 1),
            'endTimestamp': datetime(2015, 1, 1),
            'metrics': {
                'NumberOfBytesCompacted': 'string',
                'NumberOfFilesCompacted': 'string',
                'NumberOfDpus': 'string',
                'JobDurationInHour': 'string'
            },
            'error': 'string',
            'compactionMetrics': {
                'IcebergMetrics': {
                    'NumberOfBytesCompacted': 123,
                    'NumberOfFilesCompacted': 123,
                    'DpuHours': 123.0,
                    'NumberOfDpus': 123,
                    'JobDurationInHour': 123.0
                }
            },
            'compactionStrategy': 'binpack'|'sort'|'z-order',
            'retentionMetrics': {
                'IcebergMetrics': {
                    'NumberOfDataFilesDeleted': 123,
                    'NumberOfManifestFilesDeleted': 123,
                    'NumberOfManifestListsDeleted': 123,
                    'DpuHours': 123.0,
                    'NumberOfDpus': 123,
                    'JobDurationInHour': 123.0
                }
            },
            'orphanFileDeletionMetrics': {
                'IcebergMetrics': {
                    'NumberOfOrphanFilesDeleted': 123,
                    'DpuHours': 123.0,
                    'NumberOfDpus': 123,
                    'JobDurationInHour': 123.0
                }
            }
        },
        'configurationSource': 'catalog'|'table'
    }
}

Response Structure

(dict) --
- CatalogId (string) --
  
  The Catalog ID of the table.
- DatabaseName (string) --
  
  The name of the database in the catalog in which the table resides.
- TableName (string) --
  
  The name of the table.
- TableOptimizer (dict) --
  
  The optimizer associated with the specified table.
  - type (string) --
    
    The type of table optimizer. The valid values are:
    - compaction: for managing compaction with a table optimizer.
    - retention: for managing the retention of snapshot with a table optimizer.
    - orphan_file_deletion: for managing the deletion of orphan files with a table optimizer.
  - configuration (dict) --
    
    A TableOptimizerConfiguration object that was specified when creating or updating a table optimizer.
    - roleArn (string) --
      
      A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.
    - enabled (boolean) --
      
      Whether table optimization is enabled.
    - vpcConfiguration (dict) --
      
      A TableOptimizerVpcConfiguration object representing the VPC configuration for a table optimizer.
      
      This configuration is necessary to perform optimization on tables that are in a customer VPC.
      Note
      
      This is a Tagged Union structure. Only one of the following top level keys will be set: glueConnectionName. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:
      
      'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
      - glueConnectionName (string) --
        
        The name of the Glue connection used for the VPC for the table optimizer.
    - compactionConfiguration (dict) --
      
      The configuration for a compaction optimizer. This configuration defines how data files in your table will be compacted to improve query performance and reduce storage costs.
      - icebergConfiguration (dict) --
        
        The configuration for an Iceberg compaction optimizer.
        
        strategy (string) --
        
        The strategy to use for compaction. Valid values are:
        
        binpack: Combines small files into larger files, typically targeting sizes over 100MB, while applying any pending deletes. This is the recommended compaction strategy for most use cases.
        
        sort: Organizes data based on specified columns which are sorted hierarchically during compaction, improving query performance for filtered operations. This strategy is recommended when your queries frequently filter on specific columns. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
        
        z-order: Optimizes data organization by blending multiple attributes into a single scalar value that can be used for sorting, allowing efficient querying across multiple dimensions. This strategy is recommended when you need to query data across multiple dimensions simultaneously. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
        
        If an input is not provided, the default value 'binpack' will be used.
        
        minInputFiles (integer) --
        
        The minimum number of data files that must be present in a partition before compaction will actually compact files. This parameter helps control when compaction is triggered, preventing unnecessary compaction operations on partitions with few files. If an input is not provided, the default value 100 will be used.
        
        deleteFileThreshold (integer) --
        
        The minimum number of deletes that must be present in a data file to make it eligible for compaction. This parameter helps optimize compaction by focusing on files that contain a significant number of delete operations, which can improve query performance by removing deleted records. If an input is not provided, the default value 1 will be used.
    - retentionConfiguration (dict) --
      
      The configuration for a snapshot retention optimizer.
      - icebergConfiguration (dict) --
        
        The configuration for an Iceberg snapshot retention optimizer.
        
        snapshotRetentionPeriodInDays (integer) --
        
        The number of days to retain the Iceberg snapshots. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 5 will be used.
        
        numberOfSnapshotsToRetain (integer) --
        
        The number of Iceberg snapshots to retain within the retention period. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 1 will be used.
        
        cleanExpiredFiles (boolean) --
        
        If set to false, snapshots are only deleted from table metadata, and the underlying data and metadata files are not deleted.
        
        runRateInHours (integer) --
        
        The interval in hours between retention job runs. This parameter controls how frequently the retention optimizer will run to clean up expired snapshots. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.
    - orphanFileDeletionConfiguration (dict) --
      
      The configuration for an orphan file deletion optimizer.
      - icebergConfiguration (dict) --
        
        The configuration for an Iceberg orphan file deletion optimizer.
        
        orphanFileRetentionPeriodInDays (integer) --
        
        The number of days that orphan files should be retained before file deletion. If an input is not provided, the default value 3 will be used.
        
        location (string) --
        
        Specifies a directory in which to look for files (defaults to the table's location). You may choose a sub-directory rather than the top-level table location.
        
        runRateInHours (integer) --
        
        The interval in hours between orphan file deletion job runs. This parameter controls how frequently the orphan file deletion optimizer will run to clean up orphan files. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.
  - lastRun (dict) --
    
    A TableOptimizerRun object representing the last run of the table optimizer.
    - eventType (string) --
      
      An event type representing the status of the table optimizer run.
    - startTimestamp (datetime) --
      
      Represents the epoch timestamp at which the compaction job was started within Lake Formation.
    - endTimestamp (datetime) --
      
      Represents the epoch timestamp at which the compaction job ended.
    - metrics (dict) --
      
      A RunMetrics object containing metrics for the optimizer run.
      
      This member is deprecated. See the individual metric members for compaction, retention, and orphan file deletion.
      - NumberOfBytesCompacted (string) --
        
        The number of bytes removed by the compaction job run.
      - NumberOfFilesCompacted (string) --
        
        The number of files removed by the compaction job run.
      - NumberOfDpus (string) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
      - JobDurationInHour (string) --
        
        The duration of the job in hours.
    - error (string) --
      
      An error that occured during the optimizer run.
    - compactionMetrics (dict) --
      
      A CompactionMetrics object containing metrics for the optimizer run.
      - IcebergMetrics (dict) --
        
        A structure containing the Iceberg compaction metrics for the optimizer run.
        
        NumberOfBytesCompacted (integer) --
        
        The number of bytes removed by the compaction job run.
        
        NumberOfFilesCompacted (integer) --
        
        The number of files removed by the compaction job run.
        
        DpuHours (float) --
        
        The number of DPU hours consumed by the job.
        
        NumberOfDpus (integer) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (float) --
        
        The duration of the job in hours.
    - compactionStrategy (string) --
      
      The strategy used for the compaction run. Indicates which algorithm was applied to determine how files were selected and combined during the compaction process. Valid values are:
      - binpack: Combines small files into larger files, typically targeting sizes over 100MB, while applying any pending deletes. This is the recommended compaction strategy for most use cases.
      - sort: Organizes data based on specified columns which are sorted hierarchically during compaction, improving query performance for filtered operations. This strategy is recommended when your queries frequently filter on specific columns. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
      - z-order: Optimizes data organization by blending multiple attributes into a single scalar value that can be used for sorting, allowing efficient querying across multiple dimensions. This strategy is recommended when you need to query data across multiple dimensions simultaneously. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
    - retentionMetrics (dict) --
      
      A RetentionMetrics object containing metrics for the optimizer run.
      - IcebergMetrics (dict) --
        
        A structure containing the Iceberg retention metrics for the optimizer run.
        
        NumberOfDataFilesDeleted (integer) --
        
        The number of data files deleted by the retention job run.
        
        NumberOfManifestFilesDeleted (integer) --
        
        The number of manifest files deleted by the retention job run.
        
        NumberOfManifestListsDeleted (integer) --
        
        The number of manifest lists deleted by the retention job run.
        
        DpuHours (float) --
        
        The number of DPU hours consumed by the job.
        
        NumberOfDpus (integer) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (float) --
        
        The duration of the job in hours.
    - orphanFileDeletionMetrics (dict) --
      
      An OrphanFileDeletionMetrics object containing metrics for the optimizer run.
      - IcebergMetrics (dict) --
        
        A structure containing the Iceberg orphan file deletion metrics for the optimizer run.
        
        NumberOfOrphanFilesDeleted (integer) --
        
        The number of orphan files deleted by the orphan file deletion job run.
        
        DpuHours (float) --
        
        The number of DPU hours consumed by the job.
        
        NumberOfDpus (integer) --
        
        The number of DPUs consumed by the job, rounded up to the nearest whole number.
        
        JobDurationInHour (float) --
        
        The duration of the job in hours.
  - configurationSource (string) --
    
    Specifies the source of the optimizer configuration. This indicates how the table optimizer was configured and which entity or service initiated the configuration.

UpdateCatalog (updated)

Link ¶
Changes (request)

{'CatalogInput': {'CatalogProperties': {'IcebergOptimizationProperties': {'Compaction': {'string': 'string'},
                                                                          'OrphanFileDeletion': {'string': 'string'},
                                                                          'Retention': {'string': 'string'},
                                                                          'RoleArn': 'string'}}}}

Updates an existing catalog's properties in the Glue Data Catalog.

See also: AWS API Documentation

Request Syntax

client.update_catalog(
    CatalogId='string',
    CatalogInput={
        'Description': 'string',
        'FederatedCatalog': {
            'Identifier': 'string',
            'ConnectionName': 'string',
            'ConnectionType': 'string'
        },
        'Parameters': {
            'string': 'string'
        },
        'TargetRedshiftCatalog': {
            'CatalogArn': 'string'
        },
        'CatalogProperties': {
            'DataLakeAccessProperties': {
                'DataLakeAccess': True|False,
                'DataTransferRole': 'string',
                'KmsKey': 'string',
                'CatalogType': 'string'
            },
            'IcebergOptimizationProperties': {
                'RoleArn': 'string',
                'Compaction': {
                    'string': 'string'
                },
                'Retention': {
                    'string': 'string'
                },
                'OrphanFileDeletion': {
                    'string': 'string'
                }
            },
            'CustomProperties': {
                'string': 'string'
            }
        },
        'CreateTableDefaultPermissions': [
            {
                'Principal': {
                    'DataLakePrincipalIdentifier': 'string'
                },
                'Permissions': [
                    'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                ]
            },
        ],
        'CreateDatabaseDefaultPermissions': [
            {
                'Principal': {
                    'DataLakePrincipalIdentifier': 'string'
                },
                'Permissions': [
                    'ALL'|'SELECT'|'ALTER'|'DROP'|'DELETE'|'INSERT'|'CREATE_DATABASE'|'CREATE_TABLE'|'DATA_LOCATION_ACCESS',
                ]
            },
        ],
        'AllowFullTableExternalDataAccess': 'True'|'False'
    }
)

type CatalogId:

string

param CatalogId:

[REQUIRED]

The ID of the catalog.

type CatalogInput:

dict

param CatalogInput:

[REQUIRED]

A CatalogInput object specifying the new properties of an existing catalog.

Description (string) --

Description string, not more than 2048 bytes long, matching the URI address multi-line string pattern. A description of the catalog.
FederatedCatalog (dict) --

A FederatedCatalog object. A FederatedCatalog structure that references an entity outside the Glue Data Catalog, for example a Redshift database.
- Identifier (string) --
  
  A unique identifier for the federated catalog.
- ConnectionName (string) --
  
  The name of the connection to an external data source, for example a Redshift-federated catalog.
- ConnectionType (string) --
  
  The type of connection used to access the federated catalog, specifying the protocol or method for connection to the external data source.
Parameters (dict) --

A map array of key-value pairs that define the parameters and properties of the catalog.
- (string) --
  - (string) --
TargetRedshiftCatalog (dict) --

A TargetRedshiftCatalog object that describes a target catalog for resource linking.
- CatalogArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the catalog resource.
CatalogProperties (dict) --

A CatalogProperties object that specifies data lake access properties and other custom properties.
- DataLakeAccessProperties (dict) --
  
  A DataLakeAccessProperties object that specifies properties to configure data lake access for your catalog resource in the Glue Data Catalog.
  - DataLakeAccess (boolean) --
    
    Turns on or off data lake access for Apache Spark applications that access Amazon Redshift databases in the Data Catalog from any non-Redshift engine, such as Amazon Athena, Amazon EMR, or Glue ETL.
  - DataTransferRole (string) --
    
    A role that will be assumed by Glue for transferring data into/out of the staging bucket during a query.
  - KmsKey (string) --
    
    An encryption key that will be used for the staging bucket that will be created along with the catalog.
  - CatalogType (string) --
    
    Specifies a federated catalog type for the native catalog resource. The currently supported type is aws:redshift.
- IcebergOptimizationProperties (dict) --
  
  A structure that specifies Iceberg table optimization properties for the catalog. This includes configuration for compaction, retention, and orphan file deletion operations that can be applied to Iceberg tables in this catalog.
  - RoleArn (string) --
    
    The Amazon Resource Name (ARN) of the IAM role that will be assumed to perform Iceberg table optimization operations.
  - Compaction (dict) --
    
    A map of key-value pairs that specify configuration parameters for Iceberg table compaction operations, which optimize the layout of data files to improve query performance.
    - (string) --
      - (string) --
  - Retention (dict) --
    
    A map of key-value pairs that specify configuration parameters for Iceberg table retention operations, which manage the lifecycle of table snapshots to control storage costs.
    - (string) --
      - (string) --
  - OrphanFileDeletion (dict) --
    
    A map of key-value pairs that specify configuration parameters for Iceberg orphan file deletion operations, which identify and remove files that are no longer referenced by the table metadata.
    - (string) --
      - (string) --
- CustomProperties (dict) --
  
  Additional key-value properties for the catalog, such as column statistics optimizations.
  - (string) --
    - (string) --
CreateTableDefaultPermissions (list) --

An array of PrincipalPermissions objects. Creates a set of default permissions on the table(s) for principals. Used by Amazon Web Services Lake Formation. Typically should be explicitly set as an empty list.
- (dict) --
  
  Permissions granted to a principal.
  - Principal (dict) --
    
    The principal who is granted permissions.
    - DataLakePrincipalIdentifier (string) --
      
      An identifier for the Lake Formation principal.
  - Permissions (list) --
    
    The permissions that are granted to the principal.
    - (string) --
CreateDatabaseDefaultPermissions (list) --

An array of PrincipalPermissions objects. Creates a set of default permissions on the database(s) for principals. Used by Amazon Web Services Lake Formation. Typically should be explicitly set as an empty list.
- (dict) --
  
  Permissions granted to a principal.
  - Principal (dict) --
    
    The principal who is granted permissions.
    - DataLakePrincipalIdentifier (string) --
      
      An identifier for the Lake Formation principal.
  - Permissions (list) --
    
    The permissions that are granted to the principal.
    - (string) --
AllowFullTableExternalDataAccess (string) --

Allows third-party engines to access data in Amazon S3 locations that are registered with Lake Formation.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --

UpdateTableOptimizer (updated)

Link ¶
Changes (request)

{'TableOptimizerConfiguration': {'compactionConfiguration': {'icebergConfiguration': {'deleteFileThreshold': 'integer',
                                                                                      'minInputFiles': 'integer'}},
                                 'orphanFileDeletionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}},
                                 'retentionConfiguration': {'icebergConfiguration': {'runRateInHours': 'integer'}}}}

Updates the configuration for an existing table optimizer.

See also: AWS API Documentation

Request Syntax

client.update_table_optimizer(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Type='compaction'|'retention'|'orphan_file_deletion',
    TableOptimizerConfiguration={
        'roleArn': 'string',
        'enabled': True|False,
        'vpcConfiguration': {
            'glueConnectionName': 'string'
        },
        'compactionConfiguration': {
            'icebergConfiguration': {
                'strategy': 'binpack'|'sort'|'z-order',
                'minInputFiles': 123,
                'deleteFileThreshold': 123
            }
        },
        'retentionConfiguration': {
            'icebergConfiguration': {
                'snapshotRetentionPeriodInDays': 123,
                'numberOfSnapshotsToRetain': 123,
                'cleanExpiredFiles': True|False,
                'runRateInHours': 123
            }
        },
        'orphanFileDeletionConfiguration': {
            'icebergConfiguration': {
                'orphanFileRetentionPeriodInDays': 123,
                'location': 'string',
                'runRateInHours': 123
            }
        }
    }
)

type CatalogId:

string

param CatalogId:

[REQUIRED]

The Catalog ID of the table.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the database in the catalog in which the table resides.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table.

type Type:

string

param Type:

[REQUIRED]

The type of table optimizer.

type TableOptimizerConfiguration:

dict

param TableOptimizerConfiguration:

[REQUIRED]

A TableOptimizerConfiguration object representing the configuration of a table optimizer.

roleArn (string) --

A role passed by the caller which gives the service permission to update the resources associated with the optimizer on the caller's behalf.
enabled (boolean) --

Whether table optimization is enabled.
vpcConfiguration (dict) --

A TableOptimizerVpcConfiguration object representing the VPC configuration for a table optimizer.

This configuration is necessary to perform optimization on tables that are in a customer VPC.

Note

This is a Tagged Union structure. Only one of the following top level keys can be set: glueConnectionName.
- glueConnectionName (string) --
  
  The name of the Glue connection used for the VPC for the table optimizer.
compactionConfiguration (dict) --

The configuration for a compaction optimizer. This configuration defines how data files in your table will be compacted to improve query performance and reduce storage costs.
- icebergConfiguration (dict) --
  
  The configuration for an Iceberg compaction optimizer.
  - strategy (string) --
    
    The strategy to use for compaction. Valid values are:
    - binpack: Combines small files into larger files, typically targeting sizes over 100MB, while applying any pending deletes. This is the recommended compaction strategy for most use cases.
    - sort: Organizes data based on specified columns which are sorted hierarchically during compaction, improving query performance for filtered operations. This strategy is recommended when your queries frequently filter on specific columns. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
    - z-order: Optimizes data organization by blending multiple attributes into a single scalar value that can be used for sorting, allowing efficient querying across multiple dimensions. This strategy is recommended when you need to query data across multiple dimensions simultaneously. To use this strategy, you must first define a sort order in your Iceberg table properties using the sort_order table property.
    If an input is not provided, the default value 'binpack' will be used.
  - minInputFiles (integer) --
    
    The minimum number of data files that must be present in a partition before compaction will actually compact files. This parameter helps control when compaction is triggered, preventing unnecessary compaction operations on partitions with few files. If an input is not provided, the default value 100 will be used.
  - deleteFileThreshold (integer) --
    
    The minimum number of deletes that must be present in a data file to make it eligible for compaction. This parameter helps optimize compaction by focusing on files that contain a significant number of delete operations, which can improve query performance by removing deleted records. If an input is not provided, the default value 1 will be used.
retentionConfiguration (dict) --

The configuration for a snapshot retention optimizer.
- icebergConfiguration (dict) --
  
  The configuration for an Iceberg snapshot retention optimizer.
  - snapshotRetentionPeriodInDays (integer) --
    
    The number of days to retain the Iceberg snapshots. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 5 will be used.
  - numberOfSnapshotsToRetain (integer) --
    
    The number of Iceberg snapshots to retain within the retention period. If an input is not provided, the corresponding Iceberg table configuration field will be used or if not present, the default value 1 will be used.
  - cleanExpiredFiles (boolean) --
    
    If set to false, snapshots are only deleted from table metadata, and the underlying data and metadata files are not deleted.
  - runRateInHours (integer) --
    
    The interval in hours between retention job runs. This parameter controls how frequently the retention optimizer will run to clean up expired snapshots. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.
orphanFileDeletionConfiguration (dict) --

The configuration for an orphan file deletion optimizer.
- icebergConfiguration (dict) --
  
  The configuration for an Iceberg orphan file deletion optimizer.
  - orphanFileRetentionPeriodInDays (integer) --
    
    The number of days that orphan files should be retained before file deletion. If an input is not provided, the default value 3 will be used.
  - location (string) --
    
    Specifies a directory in which to look for files (defaults to the table's location). You may choose a sub-directory rather than the top-level table location.
  - runRateInHours (integer) --
    
    The interval in hours between orphan file deletion job runs. This parameter controls how frequently the orphan file deletion optimizer will run to clean up orphan files. The value must be between 3 and 168 hours (7 days). If an input is not provided, the default value 24 will be used.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

(dict) --