AWS Glue

2020/09/09 - AWS Glue - 1 new 1 updated api methods

Changes  Adding support for partitionIndexes to improve GetPartitions performance.

GetPartitionIndexes (new) Link ¶

Retrieves the partition indexes associated with a table.

See also: AWS API Documentation

Request Syntax

client.get_partition_indexes(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    NextToken='string'
)
type CatalogId

string

param CatalogId

The catalog ID where the table resides.

type DatabaseName

string

param DatabaseName

[REQUIRED]

Specifies the name of a database from which you want to retrieve partition indexes.

type TableName

string

param TableName

[REQUIRED]

Specifies the name of a table for which you want to retrieve the partition indexes.

type NextToken

string

param NextToken

A continuation token, included if this is a continuation call.

rtype

dict

returns

Response Syntax

{
    'PartitionIndexDescriptorList': [
        {
            'IndexName': 'string',
            'Keys': [
                {
                    'Name': 'string',
                    'Type': 'string'
                },
            ],
            'IndexStatus': 'ACTIVE'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • PartitionIndexDescriptorList (list) --

      A list of index descriptors.

      • (dict) --

        A descriptor for a partition index in a table.

        • IndexName (string) --

          The name of the partition index.

        • Keys (list) --

          A list of one or more keys, as KeySchemaElement structures, for the partition index.

          • (dict) --

            A partition key pair consisting of a name and a type.

            • Name (string) --

              The name of a partition key.

            • Type (string) --

              The type of a partition key.

        • IndexStatus (string) --

          The status of the partition index.

    • NextToken (string) --

      A continuation token, present if the current list segment is not the last.

CreateTable (updated) Link ¶
Changes (request)
{'PartitionIndexes': [{'IndexName': 'string', 'Keys': ['string']}]}

Creates a new table definition in the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.create_table(
    CatalogId='string',
    DatabaseName='string',
    TableInput={
        'Name': 'string',
        'Description': 'string',
        'Owner': 'string',
        'LastAccessTime': datetime(2015, 1, 1),
        'LastAnalyzedTime': datetime(2015, 1, 1),
        'Retention': 123,
        'StorageDescriptor': {
            'Columns': [
                {
                    'Name': 'string',
                    'Type': 'string',
                    'Comment': 'string',
                    'Parameters': {
                        'string': 'string'
                    }
                },
            ],
            'Location': 'string',
            'InputFormat': 'string',
            'OutputFormat': 'string',
            'Compressed': True|False,
            'NumberOfBuckets': 123,
            'SerdeInfo': {
                'Name': 'string',
                'SerializationLibrary': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
            'BucketColumns': [
                'string',
            ],
            'SortColumns': [
                {
                    'Column': 'string',
                    'SortOrder': 123
                },
            ],
            'Parameters': {
                'string': 'string'
            },
            'SkewedInfo': {
                'SkewedColumnNames': [
                    'string',
                ],
                'SkewedColumnValues': [
                    'string',
                ],
                'SkewedColumnValueLocationMaps': {
                    'string': 'string'
                }
            },
            'StoredAsSubDirectories': True|False
        },
        'PartitionKeys': [
            {
                'Name': 'string',
                'Type': 'string',
                'Comment': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
        ],
        'ViewOriginalText': 'string',
        'ViewExpandedText': 'string',
        'TableType': 'string',
        'Parameters': {
            'string': 'string'
        },
        'TargetTable': {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'Name': 'string'
        }
    },
    PartitionIndexes=[
        {
            'Keys': [
                'string',
            ],
            'IndexName': 'string'
        },
    ]
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog in which to create the Table . If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The catalog database in which to create the new table. For Hive compatibility, this name is entirely lowercase.

type TableInput

dict

param TableInput

[REQUIRED]

The TableInput object that defines the metadata table to create in the catalog.

  • Name (string) -- [REQUIRED]

    The table name. For Hive compatibility, this is folded to lowercase when it is stored.

  • Description (string) --

    A description of the table.

  • Owner (string) --

    The table owner.

  • LastAccessTime (datetime) --

    The last time that the table was accessed.

  • LastAnalyzedTime (datetime) --

    The last time that column statistics were computed for this table.

  • Retention (integer) --

    The retention time for this table.

  • StorageDescriptor (dict) --

    A storage descriptor containing information about the physical storage of this table.

    • Columns (list) --

      A list of the Columns in the table.

      • (dict) --

        A column in a Table .

        • Name (string) -- [REQUIRED]

          The name of the Column .

        • Type (string) --

          The data type of the Column .

        • Comment (string) --

          A free-form text comment.

        • Parameters (dict) --

          These key-value pairs define properties associated with the column.

          • (string) --

            • (string) --

    • Location (string) --

      The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

    • InputFormat (string) --

      The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.

    • OutputFormat (string) --

      The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.

    • Compressed (boolean) --

      True if the data in the table is compressed, or False if not.

    • NumberOfBuckets (integer) --

      Must be specified if the table contains any dimension columns.

    • SerdeInfo (dict) --

      The serialization/deserialization (SerDe) information.

      • Name (string) --

        Name of the SerDe.

      • SerializationLibrary (string) --

        Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .

      • Parameters (dict) --

        These key-value pairs define initialization parameters for the SerDe.

        • (string) --

          • (string) --

    • BucketColumns (list) --

      A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

      • (string) --

    • SortColumns (list) --

      A list specifying the sort order of each bucket in the table.

      • (dict) --

        Specifies the sort order of a sorted column.

        • Column (string) -- [REQUIRED]

          The name of the column.

        • SortOrder (integer) -- [REQUIRED]

          Indicates that the column is sorted in ascending order ( == 1 ), or in descending order ( ==0 ).

    • Parameters (dict) --

      The user-supplied properties in key-value form.

      • (string) --

        • (string) --

    • SkewedInfo (dict) --

      The information about values that appear frequently in a column (skewed values).

      • SkewedColumnNames (list) --

        A list of names of columns that contain skewed values.

        • (string) --

      • SkewedColumnValues (list) --

        A list of values that appear so frequently as to be considered skewed.

        • (string) --

      • SkewedColumnValueLocationMaps (dict) --

        A mapping of skewed values to the columns that contain them.

        • (string) --

          • (string) --

    • StoredAsSubDirectories (boolean) --

      True if the table data is stored in subdirectories, or False if not.

  • PartitionKeys (list) --

    A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

    When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:

    "PartitionKeys": []

    • (dict) --

      A column in a Table .

      • Name (string) -- [REQUIRED]

        The name of the Column .

      • Type (string) --

        The data type of the Column .

      • Comment (string) --

        A free-form text comment.

      • Parameters (dict) --

        These key-value pairs define properties associated with the column.

        • (string) --

          • (string) --

  • ViewOriginalText (string) --

    If the table is a view, the original text of the view; otherwise null .

  • ViewExpandedText (string) --

    If the table is a view, the expanded text of the view; otherwise null .

  • TableType (string) --

    The type of this table ( EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).

  • Parameters (dict) --

    These key-value pairs define properties associated with the table.

    • (string) --

      • (string) --

  • TargetTable (dict) --

    A TableIdentifier structure that describes a target table for resource linking.

    • CatalogId (string) --

      The ID of the Data Catalog in which the table resides.

    • DatabaseName (string) --

      The name of the catalog database that contains the target table.

    • Name (string) --

      The name of the target table.

type PartitionIndexes

list

param PartitionIndexes

A list of partition indexes, PartitionIndex structures, to create in the table.

  • (dict) --

    A structure for a partition index.

    • Keys (list) -- [REQUIRED]

      The keys for the partition index.

      • (string) --

    • IndexName (string) -- [REQUIRED]

      The name of the partition index.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --