AWS Glue

2020/06/25 - AWS Glue - 6 new api methods

Changes  This release adds new APIs to support column level statistics in AWS Glue Data Catalog

GetColumnStatisticsForPartition (new) Link ¶

Retrieves partition statistics of columns.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_for_partition(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    PartitionValues=[
        'string',
    ],
    ColumnNames=[
        'string',
    ]
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type PartitionValues

list

param PartitionValues

[REQUIRED]

A list of partition values identifying the partition.

  • (string) --

type ColumnNames

list

param ColumnNames

[REQUIRED]

A list of the column names.

  • (string) --

rtype

dict

returns

Response Syntax

{
    'ColumnStatisticsList': [
        {
            'ColumnName': 'string',
            'ColumnType': 'string',
            'AnalyzedTime': datetime(2015, 1, 1),
            'StatisticsData': {
                'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                'BooleanColumnStatisticsData': {
                    'NumberOfTrues': 123,
                    'NumberOfFalses': 123,
                    'NumberOfNulls': 123
                },
                'DateColumnStatisticsData': {
                    'MinimumValue': datetime(2015, 1, 1),
                    'MaximumValue': datetime(2015, 1, 1),
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DecimalColumnStatisticsData': {
                    'MinimumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'MaximumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DoubleColumnStatisticsData': {
                    'MinimumValue': 123.0,
                    'MaximumValue': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'LongColumnStatisticsData': {
                    'MinimumValue': 123,
                    'MaximumValue': 123,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'StringColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'BinaryColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123
                }
            }
        },
    ],
    'Errors': [
        {
            'ColumnName': 'string',
            'Error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    • ColumnStatisticsList (list) --

      List of ColumnStatistics that failed to be retrieved.

      • (dict) --

        Defines a column statistics.

        • ColumnName (string) --

          The name of the column.

        • ColumnType (string) --

          The type of the column.

        • AnalyzedTime (datetime) --

          The analyzed time of the column statistics.

        • StatisticsData (dict) --

          The statistics of the column.

          • Type (string) --

            The name of the column.

          • BooleanColumnStatisticsData (dict) --

            Boolean Column Statistics Data.

            • NumberOfTrues (integer) --

              Number of true value.

            • NumberOfFalses (integer) --

              Number of false value.

            • NumberOfNulls (integer) --

              Number of nulls.

          • DateColumnStatisticsData (dict) --

            Date Column Statistics Data.

            • MinimumValue (datetime) --

              Minimum value of the column.

            • MaximumValue (datetime) --

              Maximum value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • DecimalColumnStatisticsData (dict) --

            Decimal Column Statistics Data.

            • MinimumValue (dict) --

              Minimum value of the column.

              • UnscaledValue (bytes) --

                The unscaled numeric value.

              • Scale (integer) --

                The scale that determines where the decimal point falls in the unscaled value.

            • MaximumValue (dict) --

              Maximum value of the column.

              • UnscaledValue (bytes) --

                The unscaled numeric value.

              • Scale (integer) --

                The scale that determines where the decimal point falls in the unscaled value.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • DoubleColumnStatisticsData (dict) --

            Double Column Statistics Data.

            • MinimumValue (float) --

              Minimum value of the column.

            • MaximumValue (float) --

              Maximum value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • LongColumnStatisticsData (dict) --

            Long Column Statistics Data.

            • MinimumValue (integer) --

              Minimum value of the column.

            • MaximumValue (integer) --

              Maximum value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • StringColumnStatisticsData (dict) --

            String Column Statistics Data.

            • MaximumLength (integer) --

              Maximum value of the column.

            • AverageLength (float) --

              Average value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • BinaryColumnStatisticsData (dict) --

            Binary Column Statistics Data.

            • MaximumLength (integer) --

              Maximum length of the column.

            • AverageLength (float) --

              Average length of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

    • Errors (list) --

      Error occurred during retrieving column statistics data.

      • (dict) --

        Defines a column containing error.

        • ColumnName (string) --

          The name of the column.

        • Error (dict) --

          The error message occurred during operation.

          • ErrorCode (string) --

            The code associated with this error.

          • ErrorMessage (string) --

            A message describing the error.

UpdateColumnStatisticsForTable (new) Link ¶

Creates or updates table statistics of columns.

See also: AWS API Documentation

Request Syntax

client.update_column_statistics_for_table(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    ColumnStatisticsList=[
        {
            'ColumnName': 'string',
            'ColumnType': 'string',
            'AnalyzedTime': datetime(2015, 1, 1),
            'StatisticsData': {
                'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                'BooleanColumnStatisticsData': {
                    'NumberOfTrues': 123,
                    'NumberOfFalses': 123,
                    'NumberOfNulls': 123
                },
                'DateColumnStatisticsData': {
                    'MinimumValue': datetime(2015, 1, 1),
                    'MaximumValue': datetime(2015, 1, 1),
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DecimalColumnStatisticsData': {
                    'MinimumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'MaximumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DoubleColumnStatisticsData': {
                    'MinimumValue': 123.0,
                    'MaximumValue': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'LongColumnStatisticsData': {
                    'MinimumValue': 123,
                    'MaximumValue': 123,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'StringColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'BinaryColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123
                }
            }
        },
    ]
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type ColumnStatisticsList

list

param ColumnStatisticsList

[REQUIRED]

A list of the column statistics.

  • (dict) --

    Defines a column statistics.

    • ColumnName (string) -- [REQUIRED]

      The name of the column.

    • ColumnType (string) -- [REQUIRED]

      The type of the column.

    • AnalyzedTime (datetime) -- [REQUIRED]

      The analyzed time of the column statistics.

    • StatisticsData (dict) -- [REQUIRED]

      The statistics of the column.

      • Type (string) -- [REQUIRED]

        The name of the column.

      • BooleanColumnStatisticsData (dict) --

        Boolean Column Statistics Data.

        • NumberOfTrues (integer) -- [REQUIRED]

          Number of true value.

        • NumberOfFalses (integer) -- [REQUIRED]

          Number of false value.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

      • DateColumnStatisticsData (dict) --

        Date Column Statistics Data.

        • MinimumValue (datetime) --

          Minimum value of the column.

        • MaximumValue (datetime) --

          Maximum value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • DecimalColumnStatisticsData (dict) --

        Decimal Column Statistics Data.

        • MinimumValue (dict) --

          Minimum value of the column.

          • UnscaledValue (bytes) -- [REQUIRED]

            The unscaled numeric value.

          • Scale (integer) -- [REQUIRED]

            The scale that determines where the decimal point falls in the unscaled value.

        • MaximumValue (dict) --

          Maximum value of the column.

          • UnscaledValue (bytes) -- [REQUIRED]

            The unscaled numeric value.

          • Scale (integer) -- [REQUIRED]

            The scale that determines where the decimal point falls in the unscaled value.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • DoubleColumnStatisticsData (dict) --

        Double Column Statistics Data.

        • MinimumValue (float) --

          Minimum value of the column.

        • MaximumValue (float) --

          Maximum value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • LongColumnStatisticsData (dict) --

        Long Column Statistics Data.

        • MinimumValue (integer) --

          Minimum value of the column.

        • MaximumValue (integer) --

          Maximum value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • StringColumnStatisticsData (dict) --

        String Column Statistics Data.

        • MaximumLength (integer) -- [REQUIRED]

          Maximum value of the column.

        • AverageLength (float) -- [REQUIRED]

          Average value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • BinaryColumnStatisticsData (dict) --

        Binary Column Statistics Data.

        • MaximumLength (integer) -- [REQUIRED]

          Maximum length of the column.

        • AverageLength (float) -- [REQUIRED]

          Average length of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

rtype

dict

returns

Response Syntax

{
    'Errors': [
        {
            'ColumnStatistics': {
                'ColumnName': 'string',
                'ColumnType': 'string',
                'AnalyzedTime': datetime(2015, 1, 1),
                'StatisticsData': {
                    'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                    'BooleanColumnStatisticsData': {
                        'NumberOfTrues': 123,
                        'NumberOfFalses': 123,
                        'NumberOfNulls': 123
                    },
                    'DateColumnStatisticsData': {
                        'MinimumValue': datetime(2015, 1, 1),
                        'MaximumValue': datetime(2015, 1, 1),
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'DecimalColumnStatisticsData': {
                        'MinimumValue': {
                            'UnscaledValue': b'bytes',
                            'Scale': 123
                        },
                        'MaximumValue': {
                            'UnscaledValue': b'bytes',
                            'Scale': 123
                        },
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'DoubleColumnStatisticsData': {
                        'MinimumValue': 123.0,
                        'MaximumValue': 123.0,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'LongColumnStatisticsData': {
                        'MinimumValue': 123,
                        'MaximumValue': 123,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'StringColumnStatisticsData': {
                        'MaximumLength': 123,
                        'AverageLength': 123.0,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'BinaryColumnStatisticsData': {
                        'MaximumLength': 123,
                        'AverageLength': 123.0,
                        'NumberOfNulls': 123
                    }
                }
            },
            'Error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    • Errors (list) --

      List of ColumnStatisticsErrors.

      • (dict) --

        Defines a column containing error.

        • ColumnStatistics (dict) --

          The ColumnStatistics of the column.

          • ColumnName (string) --

            The name of the column.

          • ColumnType (string) --

            The type of the column.

          • AnalyzedTime (datetime) --

            The analyzed time of the column statistics.

          • StatisticsData (dict) --

            The statistics of the column.

            • Type (string) --

              The name of the column.

            • BooleanColumnStatisticsData (dict) --

              Boolean Column Statistics Data.

              • NumberOfTrues (integer) --

                Number of true value.

              • NumberOfFalses (integer) --

                Number of false value.

              • NumberOfNulls (integer) --

                Number of nulls.

            • DateColumnStatisticsData (dict) --

              Date Column Statistics Data.

              • MinimumValue (datetime) --

                Minimum value of the column.

              • MaximumValue (datetime) --

                Maximum value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • DecimalColumnStatisticsData (dict) --

              Decimal Column Statistics Data.

              • MinimumValue (dict) --

                Minimum value of the column.

                • UnscaledValue (bytes) --

                  The unscaled numeric value.

                • Scale (integer) --

                  The scale that determines where the decimal point falls in the unscaled value.

              • MaximumValue (dict) --

                Maximum value of the column.

                • UnscaledValue (bytes) --

                  The unscaled numeric value.

                • Scale (integer) --

                  The scale that determines where the decimal point falls in the unscaled value.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • DoubleColumnStatisticsData (dict) --

              Double Column Statistics Data.

              • MinimumValue (float) --

                Minimum value of the column.

              • MaximumValue (float) --

                Maximum value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • LongColumnStatisticsData (dict) --

              Long Column Statistics Data.

              • MinimumValue (integer) --

                Minimum value of the column.

              • MaximumValue (integer) --

                Maximum value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • StringColumnStatisticsData (dict) --

              String Column Statistics Data.

              • MaximumLength (integer) --

                Maximum value of the column.

              • AverageLength (float) --

                Average value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • BinaryColumnStatisticsData (dict) --

              Binary Column Statistics Data.

              • MaximumLength (integer) --

                Maximum length of the column.

              • AverageLength (float) --

                Average length of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

        • Error (dict) --

          The error message occurred during operation.

          • ErrorCode (string) --

            The code associated with this error.

          • ErrorMessage (string) --

            A message describing the error.

DeleteColumnStatisticsForPartition (new) Link ¶

Delete the partition column statistics of a column.

See also: AWS API Documentation

Request Syntax

client.delete_column_statistics_for_partition(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    PartitionValues=[
        'string',
    ],
    ColumnName='string'
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type PartitionValues

list

param PartitionValues

[REQUIRED]

A list of partition values identifying the partition.

  • (string) --

type ColumnName

string

param ColumnName

[REQUIRED]

Name of the column.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

DeleteColumnStatisticsForTable (new) Link ¶

Retrieves table statistics of columns.

See also: AWS API Documentation

Request Syntax

client.delete_column_statistics_for_table(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    ColumnName='string'
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type ColumnName

string

param ColumnName

[REQUIRED]

The name of the column.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

GetColumnStatisticsForTable (new) Link ¶

Retrieves table statistics of columns.

See also: AWS API Documentation

Request Syntax

client.get_column_statistics_for_table(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    ColumnNames=[
        'string',
    ]
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type ColumnNames

list

param ColumnNames

[REQUIRED]

A list of the column names.

  • (string) --

rtype

dict

returns

Response Syntax

{
    'ColumnStatisticsList': [
        {
            'ColumnName': 'string',
            'ColumnType': 'string',
            'AnalyzedTime': datetime(2015, 1, 1),
            'StatisticsData': {
                'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                'BooleanColumnStatisticsData': {
                    'NumberOfTrues': 123,
                    'NumberOfFalses': 123,
                    'NumberOfNulls': 123
                },
                'DateColumnStatisticsData': {
                    'MinimumValue': datetime(2015, 1, 1),
                    'MaximumValue': datetime(2015, 1, 1),
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DecimalColumnStatisticsData': {
                    'MinimumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'MaximumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DoubleColumnStatisticsData': {
                    'MinimumValue': 123.0,
                    'MaximumValue': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'LongColumnStatisticsData': {
                    'MinimumValue': 123,
                    'MaximumValue': 123,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'StringColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'BinaryColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123
                }
            }
        },
    ],
    'Errors': [
        {
            'ColumnName': 'string',
            'Error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    • ColumnStatisticsList (list) --

      List of ColumnStatistics that failed to be retrieved.

      • (dict) --

        Defines a column statistics.

        • ColumnName (string) --

          The name of the column.

        • ColumnType (string) --

          The type of the column.

        • AnalyzedTime (datetime) --

          The analyzed time of the column statistics.

        • StatisticsData (dict) --

          The statistics of the column.

          • Type (string) --

            The name of the column.

          • BooleanColumnStatisticsData (dict) --

            Boolean Column Statistics Data.

            • NumberOfTrues (integer) --

              Number of true value.

            • NumberOfFalses (integer) --

              Number of false value.

            • NumberOfNulls (integer) --

              Number of nulls.

          • DateColumnStatisticsData (dict) --

            Date Column Statistics Data.

            • MinimumValue (datetime) --

              Minimum value of the column.

            • MaximumValue (datetime) --

              Maximum value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • DecimalColumnStatisticsData (dict) --

            Decimal Column Statistics Data.

            • MinimumValue (dict) --

              Minimum value of the column.

              • UnscaledValue (bytes) --

                The unscaled numeric value.

              • Scale (integer) --

                The scale that determines where the decimal point falls in the unscaled value.

            • MaximumValue (dict) --

              Maximum value of the column.

              • UnscaledValue (bytes) --

                The unscaled numeric value.

              • Scale (integer) --

                The scale that determines where the decimal point falls in the unscaled value.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • DoubleColumnStatisticsData (dict) --

            Double Column Statistics Data.

            • MinimumValue (float) --

              Minimum value of the column.

            • MaximumValue (float) --

              Maximum value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • LongColumnStatisticsData (dict) --

            Long Column Statistics Data.

            • MinimumValue (integer) --

              Minimum value of the column.

            • MaximumValue (integer) --

              Maximum value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • StringColumnStatisticsData (dict) --

            String Column Statistics Data.

            • MaximumLength (integer) --

              Maximum value of the column.

            • AverageLength (float) --

              Average value of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

            • NumberOfDistinctValues (integer) --

              Number of distinct values.

          • BinaryColumnStatisticsData (dict) --

            Binary Column Statistics Data.

            • MaximumLength (integer) --

              Maximum length of the column.

            • AverageLength (float) --

              Average length of the column.

            • NumberOfNulls (integer) --

              Number of nulls.

    • Errors (list) --

      List of ColumnStatistics that failed to be retrieved.

      • (dict) --

        Defines a column containing error.

        • ColumnName (string) --

          The name of the column.

        • Error (dict) --

          The error message occurred during operation.

          • ErrorCode (string) --

            The code associated with this error.

          • ErrorMessage (string) --

            A message describing the error.

UpdateColumnStatisticsForPartition (new) Link ¶

Creates or updates partition statistics of columns.

See also: AWS API Documentation

Request Syntax

client.update_column_statistics_for_partition(
    CatalogId='string',
    DatabaseName='string',
    TableName='string',
    PartitionValues=[
        'string',
    ],
    ColumnStatisticsList=[
        {
            'ColumnName': 'string',
            'ColumnType': 'string',
            'AnalyzedTime': datetime(2015, 1, 1),
            'StatisticsData': {
                'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                'BooleanColumnStatisticsData': {
                    'NumberOfTrues': 123,
                    'NumberOfFalses': 123,
                    'NumberOfNulls': 123
                },
                'DateColumnStatisticsData': {
                    'MinimumValue': datetime(2015, 1, 1),
                    'MaximumValue': datetime(2015, 1, 1),
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DecimalColumnStatisticsData': {
                    'MinimumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'MaximumValue': {
                        'UnscaledValue': b'bytes',
                        'Scale': 123
                    },
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'DoubleColumnStatisticsData': {
                    'MinimumValue': 123.0,
                    'MaximumValue': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'LongColumnStatisticsData': {
                    'MinimumValue': 123,
                    'MaximumValue': 123,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'StringColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123,
                    'NumberOfDistinctValues': 123
                },
                'BinaryColumnStatisticsData': {
                    'MaximumLength': 123,
                    'AverageLength': 123.0,
                    'NumberOfNulls': 123
                }
            }
        },
    ]
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog where the partitions in question reside. If none is supplied, the AWS account ID is used by default.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The name of the catalog database where the partitions reside.

type TableName

string

param TableName

[REQUIRED]

The name of the partitions' table.

type PartitionValues

list

param PartitionValues

[REQUIRED]

A list of partition values identifying the partition.

  • (string) --

type ColumnStatisticsList

list

param ColumnStatisticsList

[REQUIRED]

A list of the column statistics.

  • (dict) --

    Defines a column statistics.

    • ColumnName (string) -- [REQUIRED]

      The name of the column.

    • ColumnType (string) -- [REQUIRED]

      The type of the column.

    • AnalyzedTime (datetime) -- [REQUIRED]

      The analyzed time of the column statistics.

    • StatisticsData (dict) -- [REQUIRED]

      The statistics of the column.

      • Type (string) -- [REQUIRED]

        The name of the column.

      • BooleanColumnStatisticsData (dict) --

        Boolean Column Statistics Data.

        • NumberOfTrues (integer) -- [REQUIRED]

          Number of true value.

        • NumberOfFalses (integer) -- [REQUIRED]

          Number of false value.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

      • DateColumnStatisticsData (dict) --

        Date Column Statistics Data.

        • MinimumValue (datetime) --

          Minimum value of the column.

        • MaximumValue (datetime) --

          Maximum value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • DecimalColumnStatisticsData (dict) --

        Decimal Column Statistics Data.

        • MinimumValue (dict) --

          Minimum value of the column.

          • UnscaledValue (bytes) -- [REQUIRED]

            The unscaled numeric value.

          • Scale (integer) -- [REQUIRED]

            The scale that determines where the decimal point falls in the unscaled value.

        • MaximumValue (dict) --

          Maximum value of the column.

          • UnscaledValue (bytes) -- [REQUIRED]

            The unscaled numeric value.

          • Scale (integer) -- [REQUIRED]

            The scale that determines where the decimal point falls in the unscaled value.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • DoubleColumnStatisticsData (dict) --

        Double Column Statistics Data.

        • MinimumValue (float) --

          Minimum value of the column.

        • MaximumValue (float) --

          Maximum value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • LongColumnStatisticsData (dict) --

        Long Column Statistics Data.

        • MinimumValue (integer) --

          Minimum value of the column.

        • MaximumValue (integer) --

          Maximum value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • StringColumnStatisticsData (dict) --

        String Column Statistics Data.

        • MaximumLength (integer) -- [REQUIRED]

          Maximum value of the column.

        • AverageLength (float) -- [REQUIRED]

          Average value of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

        • NumberOfDistinctValues (integer) -- [REQUIRED]

          Number of distinct values.

      • BinaryColumnStatisticsData (dict) --

        Binary Column Statistics Data.

        • MaximumLength (integer) -- [REQUIRED]

          Maximum length of the column.

        • AverageLength (float) -- [REQUIRED]

          Average length of the column.

        • NumberOfNulls (integer) -- [REQUIRED]

          Number of nulls.

rtype

dict

returns

Response Syntax

{
    'Errors': [
        {
            'ColumnStatistics': {
                'ColumnName': 'string',
                'ColumnType': 'string',
                'AnalyzedTime': datetime(2015, 1, 1),
                'StatisticsData': {
                    'Type': 'BOOLEAN'|'DATE'|'DECIMAL'|'DOUBLE'|'LONG'|'STRING'|'BINARY',
                    'BooleanColumnStatisticsData': {
                        'NumberOfTrues': 123,
                        'NumberOfFalses': 123,
                        'NumberOfNulls': 123
                    },
                    'DateColumnStatisticsData': {
                        'MinimumValue': datetime(2015, 1, 1),
                        'MaximumValue': datetime(2015, 1, 1),
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'DecimalColumnStatisticsData': {
                        'MinimumValue': {
                            'UnscaledValue': b'bytes',
                            'Scale': 123
                        },
                        'MaximumValue': {
                            'UnscaledValue': b'bytes',
                            'Scale': 123
                        },
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'DoubleColumnStatisticsData': {
                        'MinimumValue': 123.0,
                        'MaximumValue': 123.0,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'LongColumnStatisticsData': {
                        'MinimumValue': 123,
                        'MaximumValue': 123,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'StringColumnStatisticsData': {
                        'MaximumLength': 123,
                        'AverageLength': 123.0,
                        'NumberOfNulls': 123,
                        'NumberOfDistinctValues': 123
                    },
                    'BinaryColumnStatisticsData': {
                        'MaximumLength': 123,
                        'AverageLength': 123.0,
                        'NumberOfNulls': 123
                    }
                }
            },
            'Error': {
                'ErrorCode': 'string',
                'ErrorMessage': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    • Errors (list) --

      Error occurred during updating column statistics data.

      • (dict) --

        Defines a column containing error.

        • ColumnStatistics (dict) --

          The ColumnStatistics of the column.

          • ColumnName (string) --

            The name of the column.

          • ColumnType (string) --

            The type of the column.

          • AnalyzedTime (datetime) --

            The analyzed time of the column statistics.

          • StatisticsData (dict) --

            The statistics of the column.

            • Type (string) --

              The name of the column.

            • BooleanColumnStatisticsData (dict) --

              Boolean Column Statistics Data.

              • NumberOfTrues (integer) --

                Number of true value.

              • NumberOfFalses (integer) --

                Number of false value.

              • NumberOfNulls (integer) --

                Number of nulls.

            • DateColumnStatisticsData (dict) --

              Date Column Statistics Data.

              • MinimumValue (datetime) --

                Minimum value of the column.

              • MaximumValue (datetime) --

                Maximum value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • DecimalColumnStatisticsData (dict) --

              Decimal Column Statistics Data.

              • MinimumValue (dict) --

                Minimum value of the column.

                • UnscaledValue (bytes) --

                  The unscaled numeric value.

                • Scale (integer) --

                  The scale that determines where the decimal point falls in the unscaled value.

              • MaximumValue (dict) --

                Maximum value of the column.

                • UnscaledValue (bytes) --

                  The unscaled numeric value.

                • Scale (integer) --

                  The scale that determines where the decimal point falls in the unscaled value.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • DoubleColumnStatisticsData (dict) --

              Double Column Statistics Data.

              • MinimumValue (float) --

                Minimum value of the column.

              • MaximumValue (float) --

                Maximum value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • LongColumnStatisticsData (dict) --

              Long Column Statistics Data.

              • MinimumValue (integer) --

                Minimum value of the column.

              • MaximumValue (integer) --

                Maximum value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • StringColumnStatisticsData (dict) --

              String Column Statistics Data.

              • MaximumLength (integer) --

                Maximum value of the column.

              • AverageLength (float) --

                Average value of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

              • NumberOfDistinctValues (integer) --

                Number of distinct values.

            • BinaryColumnStatisticsData (dict) --

              Binary Column Statistics Data.

              • MaximumLength (integer) --

                Maximum length of the column.

              • AverageLength (float) --

                Average length of the column.

              • NumberOfNulls (integer) --

                Number of nulls.

        • Error (dict) --

          The error message occurred during operation.

          • ErrorCode (string) --

            The code associated with this error.

          • ErrorMessage (string) --

            A message describing the error.