AWS Glue

2018/01/19 - AWS Glue - 3 new1 updated api methods

Changes  Update glue client to latest version

DeleteTableVersion (new) Link ¶

Deletes a specified version of a table.

See also: AWS API Documentation

Request Syntax

client.delete_table_version(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    VersionId='string'
)
type CatalogId:

string

param CatalogId:

The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table. For Hive compatibility, this name is entirely lowercase.

type VersionId:

string

param VersionId:

[REQUIRED]

The ID of the table version to be deleted.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

GetTableVersion (new) Link ¶

Retrieves a specified version of a table.

See also: AWS API Documentation

Request Syntax

client.get_table_version(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    VersionId='string'
)
type CatalogId:

string

param CatalogId:

The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table. For Hive compatibility, this name is entirely lowercase.

type VersionId:

string

param VersionId:

The ID value of the table version to be retrieved.

rtype:

dict

returns:

Response Syntax

{
    'TableVersion': {
        'Table': {
            'Name': 'string',
            'DatabaseName': 'string',
            'Description': 'string',
            'Owner': 'string',
            'CreateTime': datetime(2015, 1, 1),
            'UpdateTime': datetime(2015, 1, 1),
            'LastAccessTime': datetime(2015, 1, 1),
            'LastAnalyzedTime': datetime(2015, 1, 1),
            'Retention': 123,
            'StorageDescriptor': {
                'Columns': [
                    {
                        'Name': 'string',
                        'Type': 'string',
                        'Comment': 'string'
                    },
                ],
                'Location': 'string',
                'InputFormat': 'string',
                'OutputFormat': 'string',
                'Compressed': True|False,
                'NumberOfBuckets': 123,
                'SerdeInfo': {
                    'Name': 'string',
                    'SerializationLibrary': 'string',
                    'Parameters': {
                        'string': 'string'
                    }
                },
                'BucketColumns': [
                    'string',
                ],
                'SortColumns': [
                    {
                        'Column': 'string',
                        'SortOrder': 123
                    },
                ],
                'Parameters': {
                    'string': 'string'
                },
                'SkewedInfo': {
                    'SkewedColumnNames': [
                        'string',
                    ],
                    'SkewedColumnValues': [
                        'string',
                    ],
                    'SkewedColumnValueLocationMaps': {
                        'string': 'string'
                    }
                },
                'StoredAsSubDirectories': True|False
            },
            'PartitionKeys': [
                {
                    'Name': 'string',
                    'Type': 'string',
                    'Comment': 'string'
                },
            ],
            'ViewOriginalText': 'string',
            'ViewExpandedText': 'string',
            'TableType': 'string',
            'Parameters': {
                'string': 'string'
            },
            'CreatedBy': 'string'
        },
        'VersionId': 'string'
    }
}

Response Structure

  • (dict) --

    • TableVersion (dict) --

      The requested table version.

      • Table (dict) --

        The table in question

        • Name (string) --

          Name of the table. For Hive compatibility, this must be entirely lowercase.

        • DatabaseName (string) --

          Name of the metadata database where the table metadata resides. For Hive compatibility, this must be all lowercase.

        • Description (string) --

          Description of the table.

        • Owner (string) --

          Owner of the table.

        • CreateTime (datetime) --

          Time when the table definition was created in the Data Catalog.

        • UpdateTime (datetime) --

          Last time the table was updated.

        • LastAccessTime (datetime) --

          Last time the table was accessed. This is usually taken from HDFS, and may not be reliable.

        • LastAnalyzedTime (datetime) --

          Last time column statistics were computed for this table.

        • Retention (integer) --

          Retention time for this table.

        • StorageDescriptor (dict) --

          A storage descriptor containing information about the physical storage of this table.

          • Columns (list) --

            A list of the Columns in the table.

            • (dict) --

              A column in a Table.

              • Name (string) --

                The name of the Column.

              • Type (string) --

                The datatype of data in the Column.

              • Comment (string) --

                Free-form text comment.

          • Location (string) --

            The physical location of the table. By default this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

          • InputFormat (string) --

            The input format: SequenceFileInputFormat (binary), or TextInputFormat, or a custom format.

          • OutputFormat (string) --

            The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat, or a custom format.

          • Compressed (boolean) --

            True if the data in the table is compressed, or False if not.

          • NumberOfBuckets (integer) --

            Must be specified if the table contains any dimension columns.

          • SerdeInfo (dict) --

            Serialization/deserialization (SerDe) information.

            • Name (string) --

              Name of the SerDe.

            • SerializationLibrary (string) --

              Usually the class that implements the SerDe. An example is: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

            • Parameters (dict) --

              A list of initialization parameters for the SerDe, in key-value form.

              • (string) --

                • (string) --

          • BucketColumns (list) --

            A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

            • (string) --

          • SortColumns (list) --

            A list specifying the sort order of each bucket in the table.

            • (dict) --

              Specifies the sort order of a sorted column.

              • Column (string) --

                The name of the column.

              • SortOrder (integer) --

                Indicates that the column is sorted in ascending order ( == 1), or in descending order ( ==0).

          • Parameters (dict) --

            User-supplied properties in key-value form.

            • (string) --

              • (string) --

          • SkewedInfo (dict) --

            Information about values that appear very frequently in a column (skewed values).

            • SkewedColumnNames (list) --

              A list of names of columns that contain skewed values.

              • (string) --

            • SkewedColumnValues (list) --

              A list of values that appear so frequently as to be considered skewed.

              • (string) --

            • SkewedColumnValueLocationMaps (dict) --

              A mapping of skewed values to the columns that contain them.

              • (string) --

                • (string) --

          • StoredAsSubDirectories (boolean) --

            True if the table data is stored in subdirectories, or False if not.

        • PartitionKeys (list) --

          A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

          • (dict) --

            A column in a Table.

            • Name (string) --

              The name of the Column.

            • Type (string) --

              The datatype of data in the Column.

            • Comment (string) --

              Free-form text comment.

        • ViewOriginalText (string) --

          If the table is a view, the original text of the view; otherwise null.

        • ViewExpandedText (string) --

          If the table is a view, the expanded text of the view; otherwise null.

        • TableType (string) --

          The type of this table ( EXTERNAL_TABLE, VIRTUAL_VIEW, etc.).

        • Parameters (dict) --

          Properties associated with this table, as a list of key-value pairs.

          • (string) --

            • (string) --

        • CreatedBy (string) --

          Person or entity who created the table.

      • VersionId (string) --

        The ID value that identifies this table version.

BatchDeleteTableVersion (new) Link ¶

Deletes a specified batch of versions of a table.

See also: AWS API Documentation

Request Syntax

client.batch_delete_table_version(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    VersionIds=[
        'string',
    ]
)
type CatalogId:

string

param CatalogId:

The ID of the Data Catalog where the tables reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The database in the catalog in which the table resides. For Hive compatibility, this name is entirely lowercase.

type TableName:

string

param TableName:

[REQUIRED]

The name of the table. For Hive compatibility, this name is entirely lowercase.

type VersionIds:

list

param VersionIds:

[REQUIRED]

A list of the IDs of versions to be deleted.

  • (string) --

rtype:

dict

returns:

Response Syntax

{
    'Errors': [
        {
            'TableName': 'string',
            'VersionId': 'string',
            'ErrorDetail': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    • Errors (list) --

      A list of errors encountered while trying to delete the specified table versions.

      • (dict) --

        An error record for table-version operations.

        • TableName (string) --

          The name of the table in question.

        • VersionId (string) --

          The ID value of the version in question.

        • ErrorDetail (dict) --

          Detail about the error.

          • ErrorCode (string) --

            The code associated with this error.

          • ErrorMessage (string) --

            A message describing the error.

UpdateTable (updated) Link ¶
Changes (request)
{'SkipArchive': 'boolean'}

Updates a metadata table in the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.update_table(
    CatalogId='string',
    DatabaseName='string',
    TableInput={
        'Name': 'string',
        'Description': 'string',
        'Owner': 'string',
        'LastAccessTime': datetime(2015, 1, 1),
        'LastAnalyzedTime': datetime(2015, 1, 1),
        'Retention': 123,
        'StorageDescriptor': {
            'Columns': [
                {
                    'Name': 'string',
                    'Type': 'string',
                    'Comment': 'string'
                },
            ],
            'Location': 'string',
            'InputFormat': 'string',
            'OutputFormat': 'string',
            'Compressed': True|False,
            'NumberOfBuckets': 123,
            'SerdeInfo': {
                'Name': 'string',
                'SerializationLibrary': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
            'BucketColumns': [
                'string',
            ],
            'SortColumns': [
                {
                    'Column': 'string',
                    'SortOrder': 123
                },
            ],
            'Parameters': {
                'string': 'string'
            },
            'SkewedInfo': {
                'SkewedColumnNames': [
                    'string',
                ],
                'SkewedColumnValues': [
                    'string',
                ],
                'SkewedColumnValueLocationMaps': {
                    'string': 'string'
                }
            },
            'StoredAsSubDirectories': True|False
        },
        'PartitionKeys': [
            {
                'Name': 'string',
                'Type': 'string',
                'Comment': 'string'
            },
        ],
        'ViewOriginalText': 'string',
        'ViewExpandedText': 'string',
        'TableType': 'string',
        'Parameters': {
            'string': 'string'
        }
    },
    SkipArchive=True|False
)
type CatalogId:

string

param CatalogId:

The ID of the Data Catalog where the table resides. If none is supplied, the AWS account ID is used by default.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The name of the catalog database in which the table resides. For Hive compatibility, this name is entirely lowercase.

type TableInput:

dict

param TableInput:

[REQUIRED]

An updated TableInput object to define the metadata table in the catalog.

  • Name (string) -- [REQUIRED]

    Name of the table. For Hive compatibility, this is folded to lowercase when it is stored.

  • Description (string) --

    Description of the table.

  • Owner (string) --

    Owner of the table.

  • LastAccessTime (datetime) --

    Last time the table was accessed.

  • LastAnalyzedTime (datetime) --

    Last time column statistics were computed for this table.

  • Retention (integer) --

    Retention time for this table.

  • StorageDescriptor (dict) --

    A storage descriptor containing information about the physical storage of this table.

    • Columns (list) --

      A list of the Columns in the table.

      • (dict) --

        A column in a Table.

        • Name (string) -- [REQUIRED]

          The name of the Column.

        • Type (string) --

          The datatype of data in the Column.

        • Comment (string) --

          Free-form text comment.

    • Location (string) --

      The physical location of the table. By default this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

    • InputFormat (string) --

      The input format: SequenceFileInputFormat (binary), or TextInputFormat, or a custom format.

    • OutputFormat (string) --

      The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat, or a custom format.

    • Compressed (boolean) --

      True if the data in the table is compressed, or False if not.

    • NumberOfBuckets (integer) --

      Must be specified if the table contains any dimension columns.

    • SerdeInfo (dict) --

      Serialization/deserialization (SerDe) information.

      • Name (string) --

        Name of the SerDe.

      • SerializationLibrary (string) --

        Usually the class that implements the SerDe. An example is: org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe.

      • Parameters (dict) --

        A list of initialization parameters for the SerDe, in key-value form.

        • (string) --

          • (string) --

    • BucketColumns (list) --

      A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

      • (string) --

    • SortColumns (list) --

      A list specifying the sort order of each bucket in the table.

      • (dict) --

        Specifies the sort order of a sorted column.

        • Column (string) -- [REQUIRED]

          The name of the column.

        • SortOrder (integer) -- [REQUIRED]

          Indicates that the column is sorted in ascending order ( == 1), or in descending order ( ==0).

    • Parameters (dict) --

      User-supplied properties in key-value form.

      • (string) --

        • (string) --

    • SkewedInfo (dict) --

      Information about values that appear very frequently in a column (skewed values).

      • SkewedColumnNames (list) --

        A list of names of columns that contain skewed values.

        • (string) --

      • SkewedColumnValues (list) --

        A list of values that appear so frequently as to be considered skewed.

        • (string) --

      • SkewedColumnValueLocationMaps (dict) --

        A mapping of skewed values to the columns that contain them.

        • (string) --

          • (string) --

    • StoredAsSubDirectories (boolean) --

      True if the table data is stored in subdirectories, or False if not.

  • PartitionKeys (list) --

    A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

    • (dict) --

      A column in a Table.

      • Name (string) -- [REQUIRED]

        The name of the Column.

      • Type (string) --

        The datatype of data in the Column.

      • Comment (string) --

        Free-form text comment.

  • ViewOriginalText (string) --

    If the table is a view, the original text of the view; otherwise null.

  • ViewExpandedText (string) --

    If the table is a view, the expanded text of the view; otherwise null.

  • TableType (string) --

    The type of this table ( EXTERNAL_TABLE, VIRTUAL_VIEW, etc.).

  • Parameters (dict) --

    Properties associated with this table, as a list of key-value pairs.

    • (string) --

      • (string) --

type SkipArchive:

boolean

param SkipArchive:

By default, UpdateTable always creates an archived version of the table before updating it. If skipArchive is set to true, however, UpdateTable does not create the archived version.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --