AWS Glue

2022/02/02 - AWS Glue - 5 updated api methods

Changes  Launch Protobuf support for AWS Glue Schema Registry

CheckSchemaVersionValidity (updated) Link ¶
Changes (request)
{'DataFormat': {'PROTOBUF'}}

Validates the supplied schema. This call has no side effects, it simply validates using the supplied schema using DataFormat as the format. Since it does not take a schema set name, no compatibility checks are performed.

See also: AWS API Documentation

Request Syntax

client.check_schema_version_validity(
    DataFormat='AVRO'|'JSON'|'PROTOBUF',
    SchemaDefinition='string'
)
type DataFormat

string

param DataFormat

[REQUIRED]

The data format of the schema definition. Currently AVRO , JSON and PROTOBUF are supported.

type SchemaDefinition

string

param SchemaDefinition

[REQUIRED]

The definition of the schema that has to be validated.

rtype

dict

returns

Response Syntax

{
    'Valid': True|False,
    'Error': 'string'
}

Response Structure

  • (dict) --

    • Valid (boolean) --

      Return true, if the schema is valid and false otherwise.

    • Error (string) --

      A validation failure error message.

CreateSchema (updated) Link ¶
Changes (both)
{'DataFormat': {'PROTOBUF'}}

Creates a new schema set and registers the schema definition. Returns an error if the schema set already exists without actually registering the version.

When the schema set is created, a version checkpoint will be set to the first version. Compatibility mode "DISABLED" restricts any additional schema versions from being added after the first schema version. For all other compatibility modes, validation of compatibility settings will be applied only from the second version onwards when the RegisterSchemaVersion API is used.

When this API is called without a RegistryId , this will create an entry for a "default-registry" in the registry database tables, if it is not already present.

See also: AWS API Documentation

Request Syntax

client.create_schema(
    RegistryId={
        'RegistryName': 'string',
        'RegistryArn': 'string'
    },
    SchemaName='string',
    DataFormat='AVRO'|'JSON'|'PROTOBUF',
    Compatibility='NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL',
    Description='string',
    Tags={
        'string': 'string'
    },
    SchemaDefinition='string'
)
type RegistryId

dict

param RegistryId

This is a wrapper shape to contain the registry identity fields. If this is not provided, the default registry will be used. The ARN format for the same will be: arn:aws:glue:us-east-2:<customer id>:registry/default-registry:random-5-letter-id .

  • RegistryName (string) --

    Name of the registry. Used only for lookup. One of RegistryArn or RegistryName has to be provided.

  • RegistryArn (string) --

    Arn of the registry to be updated. One of RegistryArn or RegistryName has to be provided.

type SchemaName

string

param SchemaName

[REQUIRED]

Name of the schema to be created of max length of 255, and may only contain letters, numbers, hyphen, underscore, dollar sign, or hash mark. No whitespace.

type DataFormat

string

param DataFormat

[REQUIRED]

The data format of the schema definition. Currently AVRO , JSON and PROTOBUF are supported.

type Compatibility

string

param Compatibility

The compatibility mode of the schema. The possible values are:

  • NONE : No compatibility mode applies. You can use this choice in development scenarios or if you do not know the compatibility mode that you want to apply to schemas. Any new version added will be accepted without undergoing a compatibility check.

  • DISABLED : This compatibility choice prevents versioning for a particular schema. You can use this choice to prevent future versioning of a schema.

  • BACKWARD : This compatibility choice is recommended as it allows data receivers to read both the current and one previous schema version. This means that for instance, a new schema version cannot drop data fields or change the type of these fields, so they can't be read by readers using the previous version.

  • BACKWARD_ALL : This compatibility choice allows data receivers to read both the current and all previous schema versions. You can use this choice when you need to delete fields or add optional fields, and check compatibility against all previous schema versions.

  • FORWARD : This compatibility choice allows data receivers to read both the current and one next schema version, but not necessarily later versions. You can use this choice when you need to add fields or delete optional fields, but only check compatibility against the last schema version.

  • FORWARD_ALL : This compatibility choice allows data receivers to read written by producers of any new registered schema. You can use this choice when you need to add fields or delete optional fields, and check compatibility against all previous schema versions.

  • FULL : This compatibility choice allows data receivers to read data written by producers using the previous or next version of the schema, but not necessarily earlier or later versions. You can use this choice when you need to add or remove optional fields, but only check compatibility against the last schema version.

  • FULL_ALL : This compatibility choice allows data receivers to read data written by producers using all previous schema versions. You can use this choice when you need to add or remove optional fields, and check compatibility against all previous schema versions.

type Description

string

param Description

An optional description of the schema. If description is not provided, there will not be any automatic default value for this.

type Tags

dict

param Tags

Amazon Web Services tags that contain a key value pair and may be searched by console, command line, or API. If specified, follows the Amazon Web Services tags-on-create pattern.

  • (string) --

    • (string) --

type SchemaDefinition

string

param SchemaDefinition

The schema definition using the DataFormat setting for SchemaName .

rtype

dict

returns

Response Syntax

{
    'RegistryName': 'string',
    'RegistryArn': 'string',
    'SchemaName': 'string',
    'SchemaArn': 'string',
    'Description': 'string',
    'DataFormat': 'AVRO'|'JSON'|'PROTOBUF',
    'Compatibility': 'NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL',
    'SchemaCheckpoint': 123,
    'LatestSchemaVersion': 123,
    'NextSchemaVersion': 123,
    'SchemaStatus': 'AVAILABLE'|'PENDING'|'DELETING',
    'Tags': {
        'string': 'string'
    },
    'SchemaVersionId': 'string',
    'SchemaVersionStatus': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING'
}

Response Structure

  • (dict) --

    • RegistryName (string) --

      The name of the registry.

    • RegistryArn (string) --

      The Amazon Resource Name (ARN) of the registry.

    • SchemaName (string) --

      The name of the schema.

    • SchemaArn (string) --

      The Amazon Resource Name (ARN) of the schema.

    • Description (string) --

      A description of the schema if specified when created.

    • DataFormat (string) --

      The data format of the schema definition. Currently AVRO , JSON and PROTOBUF are supported.

    • Compatibility (string) --

      The schema compatibility mode.

    • SchemaCheckpoint (integer) --

      The version number of the checkpoint (the last time the compatibility mode was changed).

    • LatestSchemaVersion (integer) --

      The latest version of the schema associated with the returned schema definition.

    • NextSchemaVersion (integer) --

      The next version of the schema associated with the returned schema definition.

    • SchemaStatus (string) --

      The status of the schema.

    • Tags (dict) --

      The tags for the schema.

      • (string) --

        • (string) --

    • SchemaVersionId (string) --

      The unique identifier of the first schema version.

    • SchemaVersionStatus (string) --

      The status of the first schema version created.

GetSchema (updated) Link ¶
Changes (response)
{'DataFormat': {'PROTOBUF'}}

Describes the specified schema in detail.

See also: AWS API Documentation

Request Syntax

client.get_schema(
    SchemaId={
        'SchemaArn': 'string',
        'SchemaName': 'string',
        'RegistryName': 'string'
    }
)
type SchemaId

dict

param SchemaId

[REQUIRED]

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaArn (string) --

    The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaName (string) --

    The name of the schema. One of SchemaArn or SchemaName has to be provided.

  • RegistryName (string) --

    The name of the schema registry that contains the schema.

rtype

dict

returns

Response Syntax

{
    'RegistryName': 'string',
    'RegistryArn': 'string',
    'SchemaName': 'string',
    'SchemaArn': 'string',
    'Description': 'string',
    'DataFormat': 'AVRO'|'JSON'|'PROTOBUF',
    'Compatibility': 'NONE'|'DISABLED'|'BACKWARD'|'BACKWARD_ALL'|'FORWARD'|'FORWARD_ALL'|'FULL'|'FULL_ALL',
    'SchemaCheckpoint': 123,
    'LatestSchemaVersion': 123,
    'NextSchemaVersion': 123,
    'SchemaStatus': 'AVAILABLE'|'PENDING'|'DELETING',
    'CreatedTime': 'string',
    'UpdatedTime': 'string'
}

Response Structure

  • (dict) --

    • RegistryName (string) --

      The name of the registry.

    • RegistryArn (string) --

      The Amazon Resource Name (ARN) of the registry.

    • SchemaName (string) --

      The name of the schema.

    • SchemaArn (string) --

      The Amazon Resource Name (ARN) of the schema.

    • Description (string) --

      A description of schema if specified when created

    • DataFormat (string) --

      The data format of the schema definition. Currently AVRO , JSON and PROTOBUF are supported.

    • Compatibility (string) --

      The compatibility mode of the schema.

    • SchemaCheckpoint (integer) --

      The version number of the checkpoint (the last time the compatibility mode was changed).

    • LatestSchemaVersion (integer) --

      The latest version of the schema associated with the returned schema definition.

    • NextSchemaVersion (integer) --

      The next version of the schema associated with the returned schema definition.

    • SchemaStatus (string) --

      The status of the schema.

    • CreatedTime (string) --

      The date and time the schema was created.

    • UpdatedTime (string) --

      The date and time the schema was updated.

GetSchemaByDefinition (updated) Link ¶
Changes (response)
{'DataFormat': {'PROTOBUF'}}

Retrieves a schema by the SchemaDefinition . The schema definition is sent to the Schema Registry, canonicalized, and hashed. If the hash is matched within the scope of the SchemaName or ARN (or the default registry, if none is supplied), that schema’s metadata is returned. Otherwise, a 404 or NotFound error is returned. Schema versions in Deleted statuses will not be included in the results.

See also: AWS API Documentation

Request Syntax

client.get_schema_by_definition(
    SchemaId={
        'SchemaArn': 'string',
        'SchemaName': 'string',
        'RegistryName': 'string'
    },
    SchemaDefinition='string'
)
type SchemaId

dict

param SchemaId

[REQUIRED]

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaId$SchemaName: The name of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaArn (string) --

    The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaName (string) --

    The name of the schema. One of SchemaArn or SchemaName has to be provided.

  • RegistryName (string) --

    The name of the schema registry that contains the schema.

type SchemaDefinition

string

param SchemaDefinition

[REQUIRED]

The definition of the schema for which schema details are required.

rtype

dict

returns

Response Syntax

{
    'SchemaVersionId': 'string',
    'SchemaArn': 'string',
    'DataFormat': 'AVRO'|'JSON'|'PROTOBUF',
    'Status': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING',
    'CreatedTime': 'string'
}

Response Structure

  • (dict) --

    • SchemaVersionId (string) --

      The schema ID of the schema version.

    • SchemaArn (string) --

      The Amazon Resource Name (ARN) of the schema.

    • DataFormat (string) --

      The data format of the schema definition. Currently AVRO , JSON and PROTOBUF are supported.

    • Status (string) --

      The status of the schema version.

    • CreatedTime (string) --

      The date and time the schema was created.

GetSchemaVersion (updated) Link ¶
Changes (response)
{'DataFormat': {'PROTOBUF'}}

Get the specified schema by its unique ID assigned when a version of the schema is created or registered. Schema versions in Deleted status will not be included in the results.

See also: AWS API Documentation

Request Syntax

client.get_schema_version(
    SchemaId={
        'SchemaArn': 'string',
        'SchemaName': 'string',
        'RegistryName': 'string'
    },
    SchemaVersionId='string',
    SchemaVersionNumber={
        'LatestVersion': True|False,
        'VersionNumber': 123
    }
)
type SchemaId

dict

param SchemaId

This is a wrapper structure to contain schema identity fields. The structure contains:

  • SchemaId$SchemaArn: The Amazon Resource Name (ARN) of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaId$SchemaName: The name of the schema. Either SchemaArn or SchemaName and RegistryName has to be provided.

  • SchemaArn (string) --

    The Amazon Resource Name (ARN) of the schema. One of SchemaArn or SchemaName has to be provided.

  • SchemaName (string) --

    The name of the schema. One of SchemaArn or SchemaName has to be provided.

  • RegistryName (string) --

    The name of the schema registry that contains the schema.

type SchemaVersionId

string

param SchemaVersionId

The SchemaVersionId of the schema version. This field is required for fetching by schema ID. Either this or the SchemaId wrapper has to be provided.

type SchemaVersionNumber

dict

param SchemaVersionNumber

The version number of the schema.

  • LatestVersion (boolean) --

    The latest version available for the schema.

  • VersionNumber (integer) --

    The version number of the schema.

rtype

dict

returns

Response Syntax

{
    'SchemaVersionId': 'string',
    'SchemaDefinition': 'string',
    'DataFormat': 'AVRO'|'JSON'|'PROTOBUF',
    'SchemaArn': 'string',
    'VersionNumber': 123,
    'Status': 'AVAILABLE'|'PENDING'|'FAILURE'|'DELETING',
    'CreatedTime': 'string'
}

Response Structure

  • (dict) --

    • SchemaVersionId (string) --

      The SchemaVersionId of the schema version.

    • SchemaDefinition (string) --

      The schema definition for the schema ID.

    • DataFormat (string) --

      The data format of the schema definition. Currently AVRO , JSON and PROTOBUF are supported.

    • SchemaArn (string) --

      The Amazon Resource Name (ARN) of the schema.

    • VersionNumber (integer) --

      The version number of the schema.

    • Status (string) --

      The status of the schema version.

    • CreatedTime (string) --

      The date and time the schema version was created.