AWS Glue

2021/02/23 - AWS Glue - 1 updated api methods

Changes  Update glue client to latest version

GetPartitions (updated) Link ΒΆ
Changes (request)
{'ExcludeColumnSchema': 'boolean'}

Retrieves information about the partitions in a table.

See also: AWS API Documentation

Request Syntax

client.get_partitions(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Expression='string',
    NextToken='string',
    Segment={
        'SegmentNumber': 123,
        'TotalSegments': 123
    },
    MaxResults=123,
    ExcludeColumnSchema=True|False
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is provided, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type Expression

string

param Expression

An expression that filters the partitions to be returned.

The expression uses SQL syntax similar to the SQL WHERE filter clause. The SQL statement parser JSQLParser parses the expression.

Operators : The following are the operators that you can use in the Expression API call:

=

Checks whether the values of the two operands are equal; if yes, then the condition becomes true.

Example: Assume 'variable a' holds 10 and 'variable b' holds 20.

(a = b) is not true.

< >

Checks whether the values of two operands are equal; if the values are not equal, then the condition becomes true.

Example: (a < > b) is true.

>

Checks whether the value of the left operand is greater than the value of the right operand; if yes, then the condition becomes true.

Example: (a > b) is not true.

<

Checks whether the value of the left operand is less than the value of the right operand; if yes, then the condition becomes true.

Example: (a < b) is true.

>=

Checks whether the value of the left operand is greater than or equal to the value of the right operand; if yes, then the condition becomes true.

Example: (a >= b) is not true.

<=

Checks whether the value of the left operand is less than or equal to the value of the right operand; if yes, then the condition becomes true.

Example: (a <= b) is true.

AND, OR, IN, BETWEEN, LIKE, NOT, IS NULL

Logical operators.

Supported Partition Key Types : The following are the supported partition keys.

  • string

  • date

  • timestamp

  • int

  • bigint

  • long

  • tinyint

  • smallint

  • decimal

If an invalid type is encountered, an exception is thrown.

The following list shows the valid operators on each type. When you define a crawler, the partitionKey type is created as a STRING , to be compatible with the catalog partitions.

Sample API Call :

type NextToken

string

param NextToken

A continuation token, if this is not the first call to retrieve these partitions.

type Segment

dict

param Segment

The segment of the table's partitions to scan in this request.

  • SegmentNumber (integer) -- [REQUIRED]

    The zero-based index number of the segment. For example, if the total number of segments is 4, SegmentNumber values range from 0 through 3.

  • TotalSegments (integer) -- [REQUIRED]

    The total number of segments.

type MaxResults

integer

param MaxResults

The maximum number of partitions to return in a single response.

type ExcludeColumnSchema

boolean

param ExcludeColumnSchema

rtype

dict

returns

Response Syntax

{
    'Partitions': [
        {
            'Values': [
                'string',
            ],
            'DatabaseName': 'string',
            'TableName': 'string',
            'CreationTime': datetime(2015, 1, 1),
            'LastAccessTime': datetime(2015, 1, 1),
            'StorageDescriptor': {
                'Columns': [
                    {
                        'Name': 'string',
                        'Type': 'string',
                        'Comment': 'string',
                        'Parameters': {
                            'string': 'string'
                        }
                    },
                ],
                'Location': 'string',
                'InputFormat': 'string',
                'OutputFormat': 'string',
                'Compressed': True|False,
                'NumberOfBuckets': 123,
                'SerdeInfo': {
                    'Name': 'string',
                    'SerializationLibrary': 'string',
                    'Parameters': {
                        'string': 'string'
                    }
                },
                'BucketColumns': [
                    'string',
                ],
                'SortColumns': [
                    {
                        'Column': 'string',
                        'SortOrder': 123
                    },
                ],
                'Parameters': {
                    'string': 'string'
                },
                'SkewedInfo': {
                    'SkewedColumnNames': [
                        'string',
                    ],
                    'SkewedColumnValues': [
                        'string',
                    ],
                    'SkewedColumnValueLocationMaps': {
                        'string': 'string'
                    }
                },
                'StoredAsSubDirectories': True|False,
                'SchemaReference': {
                    'SchemaId': {
                        'SchemaArn': 'string',
                        'SchemaName': 'string',
                        'RegistryName': 'string'
                    },
                    'SchemaVersionId': 'string',
                    'SchemaVersionNumber': 123
                }
            },
            'Parameters': {
                'string': 'string'
            },
            'LastAnalyzedTime': datetime(2015, 1, 1),
            'CatalogId': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Partitions (list) --

      A list of requested partitions.

      • (dict) --

        Represents a slice of table data.

        • Values (list) --

          The values of the partition.

          • (string) --

        • DatabaseName (string) --

          The name of the catalog database in which to create the partition.

        • TableName (string) --

          The name of the database table in which to create the partition.

        • CreationTime (datetime) --

          The time at which the partition was created.

        • LastAccessTime (datetime) --

          The last time at which the partition was accessed.

        • StorageDescriptor (dict) --

          Provides information about the physical location where the partition is stored.

          • Columns (list) --

            A list of the Columns in the table.

            • (dict) --

              A column in a Table .

              • Name (string) --

                The name of the Column .

              • Type (string) --

                The data type of the Column .

              • Comment (string) --

                A free-form text comment.

              • Parameters (dict) --

                These key-value pairs define properties associated with the column.

                • (string) --

                  • (string) --

          • Location (string) --

            The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

          • InputFormat (string) --

            The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.

          • OutputFormat (string) --

            The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.

          • Compressed (boolean) --

            True if the data in the table is compressed, or False if not.

          • NumberOfBuckets (integer) --

            Must be specified if the table contains any dimension columns.

          • SerdeInfo (dict) --

            The serialization/deserialization (SerDe) information.

            • Name (string) --

              Name of the SerDe.

            • SerializationLibrary (string) --

              Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .

            • Parameters (dict) --

              These key-value pairs define initialization parameters for the SerDe.

              • (string) --

                • (string) --

          • BucketColumns (list) --

            A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

            • (string) --

          • SortColumns (list) --

            A list specifying the sort order of each bucket in the table.

            • (dict) --

              Specifies the sort order of a sorted column.

              • Column (string) --

                The name of the column.

              • SortOrder (integer) --

                Indicates that the column is sorted in ascending order ( == 1 ), or in descending order ( ==0 ).

          • Parameters (dict) --

            The user-supplied properties in key-value form.

            • (string) --

              • (string) --

          • SkewedInfo (dict) --

            The information about values that appear frequently in a column (skewed values).

            • SkewedColumnNames (list) --

              A list of names of columns that contain skewed values.

              • (string) --

            • SkewedColumnValues (list) --

              A list of values that appear so frequently as to be considered skewed.

              • (string) --

            • SkewedColumnValueLocationMaps (dict) --

              A mapping of skewed values to the columns that contain them.

              • (string) --

                • (string) --

          • StoredAsSubDirectories (boolean) --

            True if the table data is stored in subdirectories, or False if not.

          • SchemaReference (dict) --

            An object that references a schema stored in the AWS Glue Schema Registry.

            When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

            • SchemaId (dict) --

              A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

              • SchemaArn (string) --

                The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

              • SchemaName (string) --

                The name of the schema. One of SchemaArn or SchemaName has to be provided.

              • RegistryName (string) --

                The name of the schema registry that contains the schema.

            • SchemaVersionId (string) --

              The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

            • SchemaVersionNumber (integer) --

              The version number of the schema.

        • Parameters (dict) --

          These key-value pairs define partition parameters.

          • (string) --

            • (string) --

        • LastAnalyzedTime (datetime) --

          The last time at which column statistics were computed for this partition.

        • CatalogId (string) --

          The ID of the Data Catalog in which the partition resides.

    • NextToken (string) --

      A continuation token, if the returned list of partitions does not include the last one.