AWS Glue

2022/03/18 - AWS Glue - 9 new 3 updated api methods

Changes  Added 9 new APIs for AWS Glue Interactive Sessions: ListSessions, StopSession, CreateSession, GetSession, DeleteSession, RunStatement, GetStatement, ListStatements, CancelStatement

CreateSession (new) Link ¶

Creates a new session.

See also: AWS API Documentation

Request Syntax

client.create_session(
    Id='string',
    Description='string',
    Role='string',
    Command={
        'Name': 'string',
        'PythonVersion': 'string'
    },
    Timeout=123,
    IdleTimeout=123,
    DefaultArguments={
        'string': 'string'
    },
    Connections={
        'Connections': [
            'string',
        ]
    },
    MaxCapacity=123.0,
    NumberOfWorkers=123,
    WorkerType='Standard'|'G.1X'|'G.2X',
    SecurityConfiguration='string',
    GlueVersion='string',
    Tags={
        'string': 'string'
    },
    RequestOrigin='string'
)
type Id

string

param Id

[REQUIRED]

The ID of the session request.

type Description

string

param Description

The description of the session.

type Role

string

param Role

[REQUIRED]

The IAM Role ARN

type Command

dict

param Command

[REQUIRED]

The SessionCommand that runs the job.

  • Name (string) --

    Specifies the name of the SessionCommand.Can be 'glueetl' or 'gluestreaming'.

  • PythonVersion (string) --

    Specifies the Python version. The Python version indicates the version supported for jobs of type Spark.

type Timeout

integer

param Timeout

The number of seconds before request times out.

type IdleTimeout

integer

param IdleTimeout

The number of seconds when idle before request times out.

type DefaultArguments

dict

param DefaultArguments

A map array of key-value pairs. Max is 75 pairs.

  • (string) --

    • (string) --

type Connections

dict

param Connections

The number of connections to use for the session.

  • Connections (list) --

    A list of connections used by the job.

    • (string) --

type MaxCapacity

float

param MaxCapacity

The number of AWS Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.

type NumberOfWorkers

integer

param NumberOfWorkers

The number of workers to use for the session.

type WorkerType

string

param WorkerType

The Worker Type. Can be one of G.1X, G.2X, Standard

type SecurityConfiguration

string

param SecurityConfiguration

The name of the SecurityConfiguration structure to be used with the session

type GlueVersion

string

param GlueVersion

The Glue version determines the versions of Apache Spark and Python that AWS Glue supports. The GlueVersion must be greater than 2.0.

type Tags

dict

param Tags

The map of key value pairs (tags) belonging to the session.

  • (string) --

    • (string) --

type RequestOrigin

string

param RequestOrigin

The origin of the request.

rtype

dict

returns

Response Syntax

{
    'Session': {
        'Id': 'string',
        'CreatedOn': datetime(2015, 1, 1),
        'Status': 'PROVISIONING'|'READY'|'FAILED'|'TIMEOUT'|'STOPPING'|'STOPPED',
        'ErrorMessage': 'string',
        'Description': 'string',
        'Role': 'string',
        'Command': {
            'Name': 'string',
            'PythonVersion': 'string'
        },
        'DefaultArguments': {
            'string': 'string'
        },
        'Connections': {
            'Connections': [
                'string',
            ]
        },
        'Progress': 123.0,
        'MaxCapacity': 123.0,
        'SecurityConfiguration': 'string',
        'GlueVersion': 'string'
    }
}

Response Structure

  • (dict) --

    • Session (dict) --

      Returns the session object in the response.

      • Id (string) --

        The ID of the session.

      • CreatedOn (datetime) --

        The time and date when the session was created.

      • Status (string) --

        The session status.

      • ErrorMessage (string) --

        The error message displayed during the session.

      • Description (string) --

        The description of the session.

      • Role (string) --

        The name or Amazon Resource Name (ARN) of the IAM role associated with the Session.

      • Command (dict) --

        The command object.See SessionCommand.

        • Name (string) --

          Specifies the name of the SessionCommand.Can be 'glueetl' or 'gluestreaming'.

        • PythonVersion (string) --

          Specifies the Python version. The Python version indicates the version supported for jobs of type Spark.

      • DefaultArguments (dict) --

        A map array of key-value pairs. Max is 75 pairs.

        • (string) --

          • (string) --

      • Connections (dict) --

        The number of connections used for the session.

        • Connections (list) --

          A list of connections used by the job.

          • (string) --

      • Progress (float) --

        The code execution progress of the session.

      • MaxCapacity (float) --

        The number of AWS Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.

      • SecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used with the session.

      • GlueVersion (string) --

        The Glue version determines the versions of Apache Spark and Python that AWS Glue supports. The GlueVersion must be greater than 2.0.

GetStatement (new) Link ¶

Retrieves the statement.

See also: AWS API Documentation

Request Syntax

client.get_statement(
    SessionId='string',
    Id=123,
    RequestOrigin='string'
)
type SessionId

string

param SessionId

[REQUIRED]

The Session ID of the statement.

type Id

integer

param Id

[REQUIRED]

The Id of the statement.

type RequestOrigin

string

param RequestOrigin

The origin of the request.

rtype

dict

returns

Response Syntax

{
    'Statement': {
        'Id': 123,
        'Code': 'string',
        'State': 'WAITING'|'RUNNING'|'AVAILABLE'|'CANCELLING'|'CANCELLED'|'ERROR',
        'Output': {
            'Data': {
                'TextPlain': 'string'
            },
            'ExecutionCount': 123,
            'Status': 'WAITING'|'RUNNING'|'AVAILABLE'|'CANCELLING'|'CANCELLED'|'ERROR',
            'ErrorName': 'string',
            'ErrorValue': 'string',
            'Traceback': [
                'string',
            ]
        },
        'Progress': 123.0,
        'StartedOn': 123,
        'CompletedOn': 123
    }
}

Response Structure

  • (dict) --

    • Statement (dict) --

      Returns the statement.

      • Id (integer) --

        The ID of the statement.

      • Code (string) --

        The execution code of the statement.

      • State (string) --

        The state while request is actioned.

      • Output (dict) --

        The output in JSON.

        • Data (dict) --

          The code execution output.

          • TextPlain (string) --

            The code execution output in text format.

        • ExecutionCount (integer) --

          The execution count of the output.

        • Status (string) --

          The status of the code execution output.

        • ErrorName (string) --

          The name of the error in the output.

        • ErrorValue (string) --

          The error value of the output.

        • Traceback (list) --

          The traceback of the output.

          • (string) --

      • Progress (float) --

        The code execution progress.

      • StartedOn (integer) --

        The unix time and date that the job definition was started.

      • CompletedOn (integer) --

        The unix time and date that the job definition was completed.

ListSessions (new) Link ¶

Retrieve a session..

See also: AWS API Documentation

Request Syntax

client.list_sessions(
    NextToken='string',
    MaxResults=123,
    Tags={
        'string': 'string'
    },
    RequestOrigin='string'
)
type NextToken

string

param NextToken

The token for the next set of results, or null if there are no more result.

type MaxResults

integer

param MaxResults

The maximum number of results.

type Tags

dict

param Tags

Tags belonging to the session.

  • (string) --

    • (string) --

type RequestOrigin

string

param RequestOrigin

The origin of the request.

rtype

dict

returns

Response Syntax

{
    'Ids': [
        'string',
    ],
    'Sessions': [
        {
            'Id': 'string',
            'CreatedOn': datetime(2015, 1, 1),
            'Status': 'PROVISIONING'|'READY'|'FAILED'|'TIMEOUT'|'STOPPING'|'STOPPED',
            'ErrorMessage': 'string',
            'Description': 'string',
            'Role': 'string',
            'Command': {
                'Name': 'string',
                'PythonVersion': 'string'
            },
            'DefaultArguments': {
                'string': 'string'
            },
            'Connections': {
                'Connections': [
                    'string',
                ]
            },
            'Progress': 123.0,
            'MaxCapacity': 123.0,
            'SecurityConfiguration': 'string',
            'GlueVersion': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Ids (list) --

      Returns the Id of the session.

      • (string) --

    • Sessions (list) --

      Returns the session object.

      • (dict) --

        The period in which a remote Spark runtime environment is running.

        • Id (string) --

          The ID of the session.

        • CreatedOn (datetime) --

          The time and date when the session was created.

        • Status (string) --

          The session status.

        • ErrorMessage (string) --

          The error message displayed during the session.

        • Description (string) --

          The description of the session.

        • Role (string) --

          The name or Amazon Resource Name (ARN) of the IAM role associated with the Session.

        • Command (dict) --

          The command object.See SessionCommand.

          • Name (string) --

            Specifies the name of the SessionCommand.Can be 'glueetl' or 'gluestreaming'.

          • PythonVersion (string) --

            Specifies the Python version. The Python version indicates the version supported for jobs of type Spark.

        • DefaultArguments (dict) --

          A map array of key-value pairs. Max is 75 pairs.

          • (string) --

            • (string) --

        • Connections (dict) --

          The number of connections used for the session.

          • Connections (list) --

            A list of connections used by the job.

            • (string) --

        • Progress (float) --

          The code execution progress of the session.

        • MaxCapacity (float) --

          The number of AWS Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.

        • SecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used with the session.

        • GlueVersion (string) --

          The Glue version determines the versions of Apache Spark and Python that AWS Glue supports. The GlueVersion must be greater than 2.0.

    • NextToken (string) --

      The token for the next set of results, or null if there are no more result.

DeleteSession (new) Link ¶

Deletes the session.

See also: AWS API Documentation

Request Syntax

client.delete_session(
    Id='string',
    RequestOrigin='string'
)
type Id

string

param Id

[REQUIRED]

The ID of the session to be deleted.

type RequestOrigin

string

param RequestOrigin

The name of the origin of the delete session request.

rtype

dict

returns

Response Syntax

{
    'Id': 'string'
}

Response Structure

  • (dict) --

    • Id (string) --

      Returns the ID of the deleted session.

CancelStatement (new) Link ¶

Cancels the statement..

See also: AWS API Documentation

Request Syntax

client.cancel_statement(
    SessionId='string',
    Id=123,
    RequestOrigin='string'
)
type SessionId

string

param SessionId

[REQUIRED]

The Session ID of the statement to be cancelled.

type Id

integer

param Id

[REQUIRED]

The ID of the statement to be cancelled.

type RequestOrigin

string

param RequestOrigin

The origin of the request to cancel the statement.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

RunStatement (new) Link ¶

Executes the statement.

See also: AWS API Documentation

Request Syntax

client.run_statement(
    SessionId='string',
    Code='string',
    RequestOrigin='string'
)
type SessionId

string

param SessionId

[REQUIRED]

The Session Id of the statement to be run.

type Code

string

param Code

[REQUIRED]

The statement code to be run.

type RequestOrigin

string

param RequestOrigin

The origin of the request.

rtype

dict

returns

Response Syntax

{
    'Id': 123
}

Response Structure

  • (dict) --

    • Id (integer) --

      Returns the Id of the statement that was run.

StopSession (new) Link ¶

Stops the session.

See also: AWS API Documentation

Request Syntax

client.stop_session(
    Id='string',
    RequestOrigin='string'
)
type Id

string

param Id

[REQUIRED]

The ID of the session to be stopped.

type RequestOrigin

string

param RequestOrigin

The origin of the request.

rtype

dict

returns

Response Syntax

{
    'Id': 'string'
}

Response Structure

  • (dict) --

    • Id (string) --

      Returns the Id of the stopped session.

ListStatements (new) Link ¶

Lists statements for the session.

See also: AWS API Documentation

Request Syntax

client.list_statements(
    SessionId='string',
    RequestOrigin='string',
    NextToken='string'
)
type SessionId

string

param SessionId

[REQUIRED]

The Session ID of the statements.

type RequestOrigin

string

param RequestOrigin

The origin of the request to list statements.

type NextToken

string

param NextToken

rtype

dict

returns

Response Syntax

{
    'Statements': [
        {
            'Id': 123,
            'Code': 'string',
            'State': 'WAITING'|'RUNNING'|'AVAILABLE'|'CANCELLING'|'CANCELLED'|'ERROR',
            'Output': {
                'Data': {
                    'TextPlain': 'string'
                },
                'ExecutionCount': 123,
                'Status': 'WAITING'|'RUNNING'|'AVAILABLE'|'CANCELLING'|'CANCELLED'|'ERROR',
                'ErrorName': 'string',
                'ErrorValue': 'string',
                'Traceback': [
                    'string',
                ]
            },
            'Progress': 123.0,
            'StartedOn': 123,
            'CompletedOn': 123
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Statements (list) --

      Returns the list of statements.

      • (dict) --

        The statement or request for a particular action to occur in a session.

        • Id (integer) --

          The ID of the statement.

        • Code (string) --

          The execution code of the statement.

        • State (string) --

          The state while request is actioned.

        • Output (dict) --

          The output in JSON.

          • Data (dict) --

            The code execution output.

            • TextPlain (string) --

              The code execution output in text format.

          • ExecutionCount (integer) --

            The execution count of the output.

          • Status (string) --

            The status of the code execution output.

          • ErrorName (string) --

            The name of the error in the output.

          • ErrorValue (string) --

            The error value of the output.

          • Traceback (list) --

            The traceback of the output.

            • (string) --

        • Progress (float) --

          The code execution progress.

        • StartedOn (integer) --

          The unix time and date that the job definition was started.

        • CompletedOn (integer) --

          The unix time and date that the job definition was completed.

    • NextToken (string) --

GetSession (new) Link ¶

Retrieves the session.

See also: AWS API Documentation

Request Syntax

client.get_session(
    Id='string',
    RequestOrigin='string'
)
type Id

string

param Id

[REQUIRED]

The ID of the session.

type RequestOrigin

string

param RequestOrigin

The origin of the request.

rtype

dict

returns

Response Syntax

{
    'Session': {
        'Id': 'string',
        'CreatedOn': datetime(2015, 1, 1),
        'Status': 'PROVISIONING'|'READY'|'FAILED'|'TIMEOUT'|'STOPPING'|'STOPPED',
        'ErrorMessage': 'string',
        'Description': 'string',
        'Role': 'string',
        'Command': {
            'Name': 'string',
            'PythonVersion': 'string'
        },
        'DefaultArguments': {
            'string': 'string'
        },
        'Connections': {
            'Connections': [
                'string',
            ]
        },
        'Progress': 123.0,
        'MaxCapacity': 123.0,
        'SecurityConfiguration': 'string',
        'GlueVersion': 'string'
    }
}

Response Structure

  • (dict) --

    • Session (dict) --

      The session object is returned in the response.

      • Id (string) --

        The ID of the session.

      • CreatedOn (datetime) --

        The time and date when the session was created.

      • Status (string) --

        The session status.

      • ErrorMessage (string) --

        The error message displayed during the session.

      • Description (string) --

        The description of the session.

      • Role (string) --

        The name or Amazon Resource Name (ARN) of the IAM role associated with the Session.

      • Command (dict) --

        The command object.See SessionCommand.

        • Name (string) --

          Specifies the name of the SessionCommand.Can be 'glueetl' or 'gluestreaming'.

        • PythonVersion (string) --

          Specifies the Python version. The Python version indicates the version supported for jobs of type Spark.

      • DefaultArguments (dict) --

        A map array of key-value pairs. Max is 75 pairs.

        • (string) --

          • (string) --

      • Connections (dict) --

        The number of connections used for the session.

        • Connections (list) --

          A list of connections used by the job.

          • (string) --

      • Progress (float) --

        The code execution progress of the session.

      • MaxCapacity (float) --

        The number of AWS Glue data processing units (DPUs) that can be allocated when the job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB memory.

      • SecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used with the session.

      • GlueVersion (string) --

        The Glue version determines the versions of Apache Spark and Python that AWS Glue supports. The GlueVersion must be greater than 2.0.

GetUnfilteredPartitionMetadata (updated) Link ¶
Changes (request)
{'AuditContext': {'AllColumnsRequested': 'boolean',
                  'RequestedColumns': ['string']}}

See also: AWS API Documentation

Request Syntax

client.get_unfiltered_partition_metadata(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    PartitionValues=[
        'string',
    ],
    AuditContext={
        'AdditionalAuditContext': 'string',
        'RequestedColumns': [
            'string',
        ],
        'AllColumnsRequested': True|False
    },
    SupportedPermissionTypes=[
        'COLUMN_PERMISSION'|'CELL_FILTER_PERMISSION',
    ]
)
type CatalogId

string

param CatalogId

[REQUIRED]

type DatabaseName

string

param DatabaseName

[REQUIRED]

type TableName

string

param TableName

[REQUIRED]

type PartitionValues

list

param PartitionValues

[REQUIRED]

  • (string) --

type AuditContext

dict

param AuditContext

A structure containing information for audit.

  • AdditionalAuditContext (string) --

    The context for the audit..

  • RequestedColumns (list) --

    The requested columns for audit.

    • (string) --

  • AllColumnsRequested (boolean) --

    All columns request for audit.

type SupportedPermissionTypes

list

param SupportedPermissionTypes

[REQUIRED]

  • (string) --

rtype

dict

returns

Response Syntax

{
    'Partition': {
        'Values': [
            'string',
        ],
        'DatabaseName': 'string',
        'TableName': 'string',
        'CreationTime': datetime(2015, 1, 1),
        'LastAccessTime': datetime(2015, 1, 1),
        'StorageDescriptor': {
            'Columns': [
                {
                    'Name': 'string',
                    'Type': 'string',
                    'Comment': 'string',
                    'Parameters': {
                        'string': 'string'
                    }
                },
            ],
            'Location': 'string',
            'AdditionalLocations': [
                'string',
            ],
            'InputFormat': 'string',
            'OutputFormat': 'string',
            'Compressed': True|False,
            'NumberOfBuckets': 123,
            'SerdeInfo': {
                'Name': 'string',
                'SerializationLibrary': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
            'BucketColumns': [
                'string',
            ],
            'SortColumns': [
                {
                    'Column': 'string',
                    'SortOrder': 123
                },
            ],
            'Parameters': {
                'string': 'string'
            },
            'SkewedInfo': {
                'SkewedColumnNames': [
                    'string',
                ],
                'SkewedColumnValues': [
                    'string',
                ],
                'SkewedColumnValueLocationMaps': {
                    'string': 'string'
                }
            },
            'StoredAsSubDirectories': True|False,
            'SchemaReference': {
                'SchemaId': {
                    'SchemaArn': 'string',
                    'SchemaName': 'string',
                    'RegistryName': 'string'
                },
                'SchemaVersionId': 'string',
                'SchemaVersionNumber': 123
            }
        },
        'Parameters': {
            'string': 'string'
        },
        'LastAnalyzedTime': datetime(2015, 1, 1),
        'CatalogId': 'string'
    },
    'AuthorizedColumns': [
        'string',
    ],
    'IsRegisteredWithLakeFormation': True|False
}

Response Structure

  • (dict) --

    • Partition (dict) --

      Represents a slice of table data.

      • Values (list) --

        The values of the partition.

        • (string) --

      • DatabaseName (string) --

        The name of the catalog database in which to create the partition.

      • TableName (string) --

        The name of the database table in which to create the partition.

      • CreationTime (datetime) --

        The time at which the partition was created.

      • LastAccessTime (datetime) --

        The last time at which the partition was accessed.

      • StorageDescriptor (dict) --

        Provides information about the physical location where the partition is stored.

        • Columns (list) --

          A list of the Columns in the table.

          • (dict) --

            A column in a Table .

            • Name (string) --

              The name of the Column .

            • Type (string) --

              The data type of the Column .

            • Comment (string) --

              A free-form text comment.

            • Parameters (dict) --

              These key-value pairs define properties associated with the column.

              • (string) --

                • (string) --

        • Location (string) --

          The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

        • AdditionalLocations (list) --

          • (string) --

        • InputFormat (string) --

          The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.

        • OutputFormat (string) --

          The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.

        • Compressed (boolean) --

          True if the data in the table is compressed, or False if not.

        • NumberOfBuckets (integer) --

          Must be specified if the table contains any dimension columns.

        • SerdeInfo (dict) --

          The serialization/deserialization (SerDe) information.

          • Name (string) --

            Name of the SerDe.

          • SerializationLibrary (string) --

            Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .

          • Parameters (dict) --

            These key-value pairs define initialization parameters for the SerDe.

            • (string) --

              • (string) --

        • BucketColumns (list) --

          A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

          • (string) --

        • SortColumns (list) --

          A list specifying the sort order of each bucket in the table.

          • (dict) --

            Specifies the sort order of a sorted column.

            • Column (string) --

              The name of the column.

            • SortOrder (integer) --

              Indicates that the column is sorted in ascending order ( == 1 ), or in descending order ( ==0 ).

        • Parameters (dict) --

          The user-supplied properties in key-value form.

          • (string) --

            • (string) --

        • SkewedInfo (dict) --

          The information about values that appear frequently in a column (skewed values).

          • SkewedColumnNames (list) --

            A list of names of columns that contain skewed values.

            • (string) --

          • SkewedColumnValues (list) --

            A list of values that appear so frequently as to be considered skewed.

            • (string) --

          • SkewedColumnValueLocationMaps (dict) --

            A mapping of skewed values to the columns that contain them.

            • (string) --

              • (string) --

        • StoredAsSubDirectories (boolean) --

          True if the table data is stored in subdirectories, or False if not.

        • SchemaReference (dict) --

          An object that references a schema stored in the Glue Schema Registry.

          When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

          • SchemaId (dict) --

            A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

            • SchemaArn (string) --

              The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

            • SchemaName (string) --

              The name of the schema. One of SchemaArn or SchemaName has to be provided.

            • RegistryName (string) --

              The name of the schema registry that contains the schema.

          • SchemaVersionId (string) --

            The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

          • SchemaVersionNumber (integer) --

            The version number of the schema.

      • Parameters (dict) --

        These key-value pairs define partition parameters.

        • (string) --

          • (string) --

      • LastAnalyzedTime (datetime) --

        The last time at which column statistics were computed for this partition.

      • CatalogId (string) --

        The ID of the Data Catalog in which the partition resides.

    • AuthorizedColumns (list) --

      • (string) --

    • IsRegisteredWithLakeFormation (boolean) --

GetUnfilteredPartitionsMetadata (updated) Link ¶
Changes (request)
{'AuditContext': {'AllColumnsRequested': 'boolean',
                  'RequestedColumns': ['string']}}

See also: AWS API Documentation

Request Syntax

client.get_unfiltered_partitions_metadata(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    Expression='string',
    AuditContext={
        'AdditionalAuditContext': 'string',
        'RequestedColumns': [
            'string',
        ],
        'AllColumnsRequested': True|False
    },
    SupportedPermissionTypes=[
        'COLUMN_PERMISSION'|'CELL_FILTER_PERMISSION',
    ],
    NextToken='string',
    Segment={
        'SegmentNumber': 123,
        'TotalSegments': 123
    },
    MaxResults=123
)
type CatalogId

string

param CatalogId

[REQUIRED]

type DatabaseName

string

param DatabaseName

[REQUIRED]

type TableName

string

param TableName

[REQUIRED]

type Expression

string

param Expression

type AuditContext

dict

param AuditContext

A structure containing information for audit.

  • AdditionalAuditContext (string) --

    The context for the audit..

  • RequestedColumns (list) --

    The requested columns for audit.

    • (string) --

  • AllColumnsRequested (boolean) --

    All columns request for audit.

type SupportedPermissionTypes

list

param SupportedPermissionTypes

[REQUIRED]

  • (string) --

type NextToken

string

param NextToken

type Segment

dict

param Segment

Defines a non-overlapping region of a table's partitions, allowing multiple requests to be run in parallel.

  • SegmentNumber (integer) -- [REQUIRED]

    The zero-based index number of the segment. For example, if the total number of segments is 4, SegmentNumber values range from 0 through 3.

  • TotalSegments (integer) -- [REQUIRED]

    The total number of segments.

type MaxResults

integer

param MaxResults

rtype

dict

returns

Response Syntax

{
    'UnfilteredPartitions': [
        {
            'Partition': {
                'Values': [
                    'string',
                ],
                'DatabaseName': 'string',
                'TableName': 'string',
                'CreationTime': datetime(2015, 1, 1),
                'LastAccessTime': datetime(2015, 1, 1),
                'StorageDescriptor': {
                    'Columns': [
                        {
                            'Name': 'string',
                            'Type': 'string',
                            'Comment': 'string',
                            'Parameters': {
                                'string': 'string'
                            }
                        },
                    ],
                    'Location': 'string',
                    'AdditionalLocations': [
                        'string',
                    ],
                    'InputFormat': 'string',
                    'OutputFormat': 'string',
                    'Compressed': True|False,
                    'NumberOfBuckets': 123,
                    'SerdeInfo': {
                        'Name': 'string',
                        'SerializationLibrary': 'string',
                        'Parameters': {
                            'string': 'string'
                        }
                    },
                    'BucketColumns': [
                        'string',
                    ],
                    'SortColumns': [
                        {
                            'Column': 'string',
                            'SortOrder': 123
                        },
                    ],
                    'Parameters': {
                        'string': 'string'
                    },
                    'SkewedInfo': {
                        'SkewedColumnNames': [
                            'string',
                        ],
                        'SkewedColumnValues': [
                            'string',
                        ],
                        'SkewedColumnValueLocationMaps': {
                            'string': 'string'
                        }
                    },
                    'StoredAsSubDirectories': True|False,
                    'SchemaReference': {
                        'SchemaId': {
                            'SchemaArn': 'string',
                            'SchemaName': 'string',
                            'RegistryName': 'string'
                        },
                        'SchemaVersionId': 'string',
                        'SchemaVersionNumber': 123
                    }
                },
                'Parameters': {
                    'string': 'string'
                },
                'LastAnalyzedTime': datetime(2015, 1, 1),
                'CatalogId': 'string'
            },
            'AuthorizedColumns': [
                'string',
            ],
            'IsRegisteredWithLakeFormation': True|False
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • UnfilteredPartitions (list) --

      • (dict) --

        • Partition (dict) --

          Represents a slice of table data.

          • Values (list) --

            The values of the partition.

            • (string) --

          • DatabaseName (string) --

            The name of the catalog database in which to create the partition.

          • TableName (string) --

            The name of the database table in which to create the partition.

          • CreationTime (datetime) --

            The time at which the partition was created.

          • LastAccessTime (datetime) --

            The last time at which the partition was accessed.

          • StorageDescriptor (dict) --

            Provides information about the physical location where the partition is stored.

            • Columns (list) --

              A list of the Columns in the table.

              • (dict) --

                A column in a Table .

                • Name (string) --

                  The name of the Column .

                • Type (string) --

                  The data type of the Column .

                • Comment (string) --

                  A free-form text comment.

                • Parameters (dict) --

                  These key-value pairs define properties associated with the column.

                  • (string) --

                    • (string) --

            • Location (string) --

              The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

            • AdditionalLocations (list) --

              • (string) --

            • InputFormat (string) --

              The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.

            • OutputFormat (string) --

              The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.

            • Compressed (boolean) --

              True if the data in the table is compressed, or False if not.

            • NumberOfBuckets (integer) --

              Must be specified if the table contains any dimension columns.

            • SerdeInfo (dict) --

              The serialization/deserialization (SerDe) information.

              • Name (string) --

                Name of the SerDe.

              • SerializationLibrary (string) --

                Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .

              • Parameters (dict) --

                These key-value pairs define initialization parameters for the SerDe.

                • (string) --

                  • (string) --

            • BucketColumns (list) --

              A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

              • (string) --

            • SortColumns (list) --

              A list specifying the sort order of each bucket in the table.

              • (dict) --

                Specifies the sort order of a sorted column.

                • Column (string) --

                  The name of the column.

                • SortOrder (integer) --

                  Indicates that the column is sorted in ascending order ( == 1 ), or in descending order ( ==0 ).

            • Parameters (dict) --

              The user-supplied properties in key-value form.

              • (string) --

                • (string) --

            • SkewedInfo (dict) --

              The information about values that appear frequently in a column (skewed values).

              • SkewedColumnNames (list) --

                A list of names of columns that contain skewed values.

                • (string) --

              • SkewedColumnValues (list) --

                A list of values that appear so frequently as to be considered skewed.

                • (string) --

              • SkewedColumnValueLocationMaps (dict) --

                A mapping of skewed values to the columns that contain them.

                • (string) --

                  • (string) --

            • StoredAsSubDirectories (boolean) --

              True if the table data is stored in subdirectories, or False if not.

            • SchemaReference (dict) --

              An object that references a schema stored in the Glue Schema Registry.

              When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

              • SchemaId (dict) --

                A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

                • SchemaArn (string) --

                  The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

                • SchemaName (string) --

                  The name of the schema. One of SchemaArn or SchemaName has to be provided.

                • RegistryName (string) --

                  The name of the schema registry that contains the schema.

              • SchemaVersionId (string) --

                The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

              • SchemaVersionNumber (integer) --

                The version number of the schema.

          • Parameters (dict) --

            These key-value pairs define partition parameters.

            • (string) --

              • (string) --

          • LastAnalyzedTime (datetime) --

            The last time at which column statistics were computed for this partition.

          • CatalogId (string) --

            The ID of the Data Catalog in which the partition resides.

        • AuthorizedColumns (list) --

          • (string) --

        • IsRegisteredWithLakeFormation (boolean) --

    • NextToken (string) --

GetUnfilteredTableMetadata (updated) Link ¶
Changes (request)
{'AuditContext': {'AllColumnsRequested': 'boolean',
                  'RequestedColumns': ['string']}}

See also: AWS API Documentation

Request Syntax

client.get_unfiltered_table_metadata(
    CatalogId='string',
    DatabaseName='string',
    Name='string',
    AuditContext={
        'AdditionalAuditContext': 'string',
        'RequestedColumns': [
            'string',
        ],
        'AllColumnsRequested': True|False
    },
    SupportedPermissionTypes=[
        'COLUMN_PERMISSION'|'CELL_FILTER_PERMISSION',
    ]
)
type CatalogId

string

param CatalogId

[REQUIRED]

type DatabaseName

string

param DatabaseName

[REQUIRED]

type Name

string

param Name

[REQUIRED]

type AuditContext

dict

param AuditContext

A structure containing information for audit.

  • AdditionalAuditContext (string) --

    The context for the audit..

  • RequestedColumns (list) --

    The requested columns for audit.

    • (string) --

  • AllColumnsRequested (boolean) --

    All columns request for audit.

type SupportedPermissionTypes

list

param SupportedPermissionTypes

[REQUIRED]

  • (string) --

rtype

dict

returns

Response Syntax

{
    'Table': {
        'Name': 'string',
        'DatabaseName': 'string',
        'Description': 'string',
        'Owner': 'string',
        'CreateTime': datetime(2015, 1, 1),
        'UpdateTime': datetime(2015, 1, 1),
        'LastAccessTime': datetime(2015, 1, 1),
        'LastAnalyzedTime': datetime(2015, 1, 1),
        'Retention': 123,
        'StorageDescriptor': {
            'Columns': [
                {
                    'Name': 'string',
                    'Type': 'string',
                    'Comment': 'string',
                    'Parameters': {
                        'string': 'string'
                    }
                },
            ],
            'Location': 'string',
            'AdditionalLocations': [
                'string',
            ],
            'InputFormat': 'string',
            'OutputFormat': 'string',
            'Compressed': True|False,
            'NumberOfBuckets': 123,
            'SerdeInfo': {
                'Name': 'string',
                'SerializationLibrary': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
            'BucketColumns': [
                'string',
            ],
            'SortColumns': [
                {
                    'Column': 'string',
                    'SortOrder': 123
                },
            ],
            'Parameters': {
                'string': 'string'
            },
            'SkewedInfo': {
                'SkewedColumnNames': [
                    'string',
                ],
                'SkewedColumnValues': [
                    'string',
                ],
                'SkewedColumnValueLocationMaps': {
                    'string': 'string'
                }
            },
            'StoredAsSubDirectories': True|False,
            'SchemaReference': {
                'SchemaId': {
                    'SchemaArn': 'string',
                    'SchemaName': 'string',
                    'RegistryName': 'string'
                },
                'SchemaVersionId': 'string',
                'SchemaVersionNumber': 123
            }
        },
        'PartitionKeys': [
            {
                'Name': 'string',
                'Type': 'string',
                'Comment': 'string',
                'Parameters': {
                    'string': 'string'
                }
            },
        ],
        'ViewOriginalText': 'string',
        'ViewExpandedText': 'string',
        'TableType': 'string',
        'Parameters': {
            'string': 'string'
        },
        'CreatedBy': 'string',
        'IsRegisteredWithLakeFormation': True|False,
        'TargetTable': {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'Name': 'string'
        },
        'CatalogId': 'string',
        'VersionId': 'string'
    },
    'AuthorizedColumns': [
        'string',
    ],
    'IsRegisteredWithLakeFormation': True|False,
    'CellFilters': [
        {
            'ColumnName': 'string',
            'RowFilterExpression': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • Table (dict) --

      Represents a collection of related data organized in columns and rows.

      • Name (string) --

        The table name. For Hive compatibility, this must be entirely lowercase.

      • DatabaseName (string) --

        The name of the database where the table metadata resides. For Hive compatibility, this must be all lowercase.

      • Description (string) --

        A description of the table.

      • Owner (string) --

        The owner of the table.

      • CreateTime (datetime) --

        The time when the table definition was created in the Data Catalog.

      • UpdateTime (datetime) --

        The last time that the table was updated.

      • LastAccessTime (datetime) --

        The last time that the table was accessed. This is usually taken from HDFS, and might not be reliable.

      • LastAnalyzedTime (datetime) --

        The last time that column statistics were computed for this table.

      • Retention (integer) --

        The retention time for this table.

      • StorageDescriptor (dict) --

        A storage descriptor containing information about the physical storage of this table.

        • Columns (list) --

          A list of the Columns in the table.

          • (dict) --

            A column in a Table .

            • Name (string) --

              The name of the Column .

            • Type (string) --

              The data type of the Column .

            • Comment (string) --

              A free-form text comment.

            • Parameters (dict) --

              These key-value pairs define properties associated with the column.

              • (string) --

                • (string) --

        • Location (string) --

          The physical location of the table. By default, this takes the form of the warehouse location, followed by the database location in the warehouse, followed by the table name.

        • AdditionalLocations (list) --

          • (string) --

        • InputFormat (string) --

          The input format: SequenceFileInputFormat (binary), or TextInputFormat , or a custom format.

        • OutputFormat (string) --

          The output format: SequenceFileOutputFormat (binary), or IgnoreKeyTextOutputFormat , or a custom format.

        • Compressed (boolean) --

          True if the data in the table is compressed, or False if not.

        • NumberOfBuckets (integer) --

          Must be specified if the table contains any dimension columns.

        • SerdeInfo (dict) --

          The serialization/deserialization (SerDe) information.

          • Name (string) --

            Name of the SerDe.

          • SerializationLibrary (string) --

            Usually the class that implements the SerDe. An example is org.apache.hadoop.hive.serde2.columnar.ColumnarSerDe .

          • Parameters (dict) --

            These key-value pairs define initialization parameters for the SerDe.

            • (string) --

              • (string) --

        • BucketColumns (list) --

          A list of reducer grouping columns, clustering columns, and bucketing columns in the table.

          • (string) --

        • SortColumns (list) --

          A list specifying the sort order of each bucket in the table.

          • (dict) --

            Specifies the sort order of a sorted column.

            • Column (string) --

              The name of the column.

            • SortOrder (integer) --

              Indicates that the column is sorted in ascending order ( == 1 ), or in descending order ( ==0 ).

        • Parameters (dict) --

          The user-supplied properties in key-value form.

          • (string) --

            • (string) --

        • SkewedInfo (dict) --

          The information about values that appear frequently in a column (skewed values).

          • SkewedColumnNames (list) --

            A list of names of columns that contain skewed values.

            • (string) --

          • SkewedColumnValues (list) --

            A list of values that appear so frequently as to be considered skewed.

            • (string) --

          • SkewedColumnValueLocationMaps (dict) --

            A mapping of skewed values to the columns that contain them.

            • (string) --

              • (string) --

        • StoredAsSubDirectories (boolean) --

          True if the table data is stored in subdirectories, or False if not.

        • SchemaReference (dict) --

          An object that references a schema stored in the Glue Schema Registry.

          When creating a table, you can pass an empty list of columns for the schema, and instead use a schema reference.

          • SchemaId (dict) --

            A structure that contains schema identity fields. Either this or the SchemaVersionId has to be provided.

            • SchemaArn (string) --

              The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

            • SchemaName (string) --

              The name of the schema. One of SchemaArn or SchemaName has to be provided.

            • RegistryName (string) --

              The name of the schema registry that contains the schema.

          • SchemaVersionId (string) --

            The unique ID assigned to a version of the schema. Either this or the SchemaId has to be provided.

          • SchemaVersionNumber (integer) --

            The version number of the schema.

      • PartitionKeys (list) --

        A list of columns by which the table is partitioned. Only primitive types are supported as partition keys.

        When you create a table used by Amazon Athena, and you do not specify any partitionKeys , you must at least set the value of partitionKeys to an empty list. For example:

        "PartitionKeys": []

        • (dict) --

          A column in a Table .

          • Name (string) --

            The name of the Column .

          • Type (string) --

            The data type of the Column .

          • Comment (string) --

            A free-form text comment.

          • Parameters (dict) --

            These key-value pairs define properties associated with the column.

            • (string) --

              • (string) --

      • ViewOriginalText (string) --

        If the table is a view, the original text of the view; otherwise null .

      • ViewExpandedText (string) --

        If the table is a view, the expanded text of the view; otherwise null .

      • TableType (string) --

        The type of this table ( EXTERNAL_TABLE , VIRTUAL_VIEW , etc.).

      • Parameters (dict) --

        These key-value pairs define properties associated with the table.

        • (string) --

          • (string) --

      • CreatedBy (string) --

        The person or entity who created the table.

      • IsRegisteredWithLakeFormation (boolean) --

        Indicates whether the table has been registered with Lake Formation.

      • TargetTable (dict) --

        A TableIdentifier structure that describes a target table for resource linking.

        • CatalogId (string) --

          The ID of the Data Catalog in which the table resides.

        • DatabaseName (string) --

          The name of the catalog database that contains the target table.

        • Name (string) --

          The name of the target table.

      • CatalogId (string) --

        The ID of the Data Catalog in which the table resides.

      • VersionId (string) --

    • AuthorizedColumns (list) --

      • (string) --

    • IsRegisteredWithLakeFormation (boolean) --

    • CellFilters (list) --

      • (dict) --

        • ColumnName (string) --

        • RowFilterExpression (string) --