AWS Glue

2018/02/06 - AWS Glue - 4 updated api methods

Changes  This new feature will now allow customers to add a customized json classifier. They can specify a json path to indicate the object, array or field of the json documents they'd like crawlers to inspect when they crawl json files.

CreateClassifier (updated) Link ¶
Changes (request)
{'JsonClassifier': {'JsonPath': 'string', 'Name': 'string'}}

Creates a classifier in the user's account. This may be a GrokClassifier , an XMLClassifier , or abbrev JsonClassifier , depending on which field of the request is present.

See also: AWS API Documentation

Request Syntax

client.create_classifier(
    GrokClassifier={
        'Classification': 'string',
        'Name': 'string',
        'GrokPattern': 'string',
        'CustomPatterns': 'string'
    },
    XMLClassifier={
        'Classification': 'string',
        'Name': 'string',
        'RowTag': 'string'
    },
    JsonClassifier={
        'Name': 'string',
        'JsonPath': 'string'
    }
)
type GrokClassifier

dict

param GrokClassifier

A GrokClassifier object specifying the classifier to create.

  • Classification (string) -- [REQUIRED]

    An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.

  • Name (string) -- [REQUIRED]

    The name of the new classifier.

  • GrokPattern (string) -- [REQUIRED]

    The grok pattern used by this classifier.

  • CustomPatterns (string) --

    Optional custom grok patterns used by this classifier.

type XMLClassifier

dict

param XMLClassifier

An XMLClassifier object specifying the classifier to create.

  • Classification (string) -- [REQUIRED]

    An identifier of the data format that the classifier matches.

  • Name (string) -- [REQUIRED]

    The name of the classifier.

  • RowTag (string) --

    The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

type JsonClassifier

dict

param JsonClassifier

A JsonClassifier object specifying the classifier to create.

  • Name (string) -- [REQUIRED]

    The name of the classifier.

  • JsonPath (string) -- [REQUIRED]

    A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

GetClassifier (updated) Link ¶
Changes (response)
{'Classifier': {'JsonClassifier': {'CreationTime': 'timestamp',
                                   'JsonPath': 'string',
                                   'LastUpdated': 'timestamp',
                                   'Name': 'string',
                                   'Version': 'long'}}}

Retrieve a classifier by name.

See also: AWS API Documentation

Request Syntax

client.get_classifier(
    Name='string'
)
type Name

string

param Name

[REQUIRED]

Name of the classifier to retrieve.

rtype

dict

returns

Response Syntax

{
    'Classifier': {
        'GrokClassifier': {
            'Name': 'string',
            'Classification': 'string',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'Version': 123,
            'GrokPattern': 'string',
            'CustomPatterns': 'string'
        },
        'XMLClassifier': {
            'Name': 'string',
            'Classification': 'string',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'Version': 123,
            'RowTag': 'string'
        },
        'JsonClassifier': {
            'Name': 'string',
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'Version': 123,
            'JsonPath': 'string'
        }
    }
}

Response Structure

  • (dict) --

    • Classifier (dict) --

      The requested classifier.

      • GrokClassifier (dict) --

        A GrokClassifier object.

        • Name (string) --

          The name of the classifier.

        • Classification (string) --

          An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, and so on.

        • CreationTime (datetime) --

          The time this classifier was registered.

        • LastUpdated (datetime) --

          The time this classifier was last updated.

        • Version (integer) --

          The version of this classifier.

        • GrokPattern (string) --

          The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in Writing Custom Classifers.

        • CustomPatterns (string) --

          Optional custom grok patterns defined by this classifier. For more information, see custom patterns in Writing Custom Classifers.

      • XMLClassifier (dict) --

        An XMLClassifier object.

        • Name (string) --

          The name of the classifier.

        • Classification (string) --

          An identifier of the data format that the classifier matches.

        • CreationTime (datetime) --

          The time this classifier was registered.

        • LastUpdated (datetime) --

          The time this classifier was last updated.

        • Version (integer) --

          The version of this classifier.

        • RowTag (string) --

          The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

      • JsonClassifier (dict) --

        A JsonClassifier object.

        • Name (string) --

          The name of the classifier.

        • CreationTime (datetime) --

          The time this classifier was registered.

        • LastUpdated (datetime) --

          The time this classifier was last updated.

        • Version (integer) --

          The version of this classifier.

        • JsonPath (string) --

          A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

GetClassifiers (updated) Link ¶
Changes (response)
{'Classifiers': {'JsonClassifier': {'CreationTime': 'timestamp',
                                    'JsonPath': 'string',
                                    'LastUpdated': 'timestamp',
                                    'Name': 'string',
                                    'Version': 'long'}}}

Lists all classifier objects in the Data Catalog.

See also: AWS API Documentation

Request Syntax

client.get_classifiers(
    MaxResults=123,
    NextToken='string'
)
type MaxResults

integer

param MaxResults

Size of the list to return (optional).

type NextToken

string

param NextToken

An optional continuation token.

rtype

dict

returns

Response Syntax

{
    'Classifiers': [
        {
            'GrokClassifier': {
                'Name': 'string',
                'Classification': 'string',
                'CreationTime': datetime(2015, 1, 1),
                'LastUpdated': datetime(2015, 1, 1),
                'Version': 123,
                'GrokPattern': 'string',
                'CustomPatterns': 'string'
            },
            'XMLClassifier': {
                'Name': 'string',
                'Classification': 'string',
                'CreationTime': datetime(2015, 1, 1),
                'LastUpdated': datetime(2015, 1, 1),
                'Version': 123,
                'RowTag': 'string'
            },
            'JsonClassifier': {
                'Name': 'string',
                'CreationTime': datetime(2015, 1, 1),
                'LastUpdated': datetime(2015, 1, 1),
                'Version': 123,
                'JsonPath': 'string'
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Classifiers (list) --

      The requested list of classifier objects.

      • (dict) --

        Classifiers are written in Python and triggered during a crawl task. You can write your own classifiers to best categorize your data sources and specify the appropriate schemas to use for them. A classifier checks whether a given file is in a format it can handle, and if it is, the classifier creates a schema in the form of a StructType object that matches that data format.

        A classifier can be a grok classifier, an XML classifier, or a JSON classifier, asspecified in one of the fields in the Classifier object.

        • GrokClassifier (dict) --

          A GrokClassifier object.

          • Name (string) --

            The name of the classifier.

          • Classification (string) --

            An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, and so on.

          • CreationTime (datetime) --

            The time this classifier was registered.

          • LastUpdated (datetime) --

            The time this classifier was last updated.

          • Version (integer) --

            The version of this classifier.

          • GrokPattern (string) --

            The grok pattern applied to a data store by this classifier. For more information, see built-in patterns in Writing Custom Classifers.

          • CustomPatterns (string) --

            Optional custom grok patterns defined by this classifier. For more information, see custom patterns in Writing Custom Classifers.

        • XMLClassifier (dict) --

          An XMLClassifier object.

          • Name (string) --

            The name of the classifier.

          • Classification (string) --

            An identifier of the data format that the classifier matches.

          • CreationTime (datetime) --

            The time this classifier was registered.

          • LastUpdated (datetime) --

            The time this classifier was last updated.

          • Version (integer) --

            The version of this classifier.

          • RowTag (string) --

            The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

        • JsonClassifier (dict) --

          A JsonClassifier object.

          • Name (string) --

            The name of the classifier.

          • CreationTime (datetime) --

            The time this classifier was registered.

          • LastUpdated (datetime) --

            The time this classifier was last updated.

          • Version (integer) --

            The version of this classifier.

          • JsonPath (string) --

            A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

    • NextToken (string) --

      A continuation token.

UpdateClassifier (updated) Link ¶
Changes (request)
{'JsonClassifier': {'JsonPath': 'string', 'Name': 'string'}}

Modifies an existing classifier (a GrokClassifier , XMLClassifier , or JsonClassifier , depending on which field is present).

See also: AWS API Documentation

Request Syntax

client.update_classifier(
    GrokClassifier={
        'Name': 'string',
        'Classification': 'string',
        'GrokPattern': 'string',
        'CustomPatterns': 'string'
    },
    XMLClassifier={
        'Name': 'string',
        'Classification': 'string',
        'RowTag': 'string'
    },
    JsonClassifier={
        'Name': 'string',
        'JsonPath': 'string'
    }
)
type GrokClassifier

dict

param GrokClassifier

A GrokClassifier object with updated fields.

  • Name (string) -- [REQUIRED]

    The name of the GrokClassifier .

  • Classification (string) --

    An identifier of the data format that the classifier matches, such as Twitter, JSON, Omniture logs, Amazon CloudWatch Logs, and so on.

  • GrokPattern (string) --

    The grok pattern used by this classifier.

  • CustomPatterns (string) --

    Optional custom grok patterns used by this classifier.

type XMLClassifier

dict

param XMLClassifier

An XMLClassifier object with updated fields.

  • Name (string) -- [REQUIRED]

    The name of the classifier.

  • Classification (string) --

    An identifier of the data format that the classifier matches.

  • RowTag (string) --

    The XML tag designating the element that contains each record in an XML document being parsed. Note that this cannot identify a self-closing element (closed by /> ). An empty row element that contains only attributes can be parsed as long as it ends with a closing tag (for example, <row item_a="A" item_b="B"></row> is okay, but <row item_a="A" item_b="B" /> is not).

type JsonClassifier

dict

param JsonClassifier

A JsonClassifier object with updated fields.

  • Name (string) -- [REQUIRED]

    The name of the classifier.

  • JsonPath (string) --

    A JsonPath string defining the JSON data for the classifier to classify. AWS Glue supports a subset of JsonPath, as described in Writing JsonPath Custom Classifiers.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --