AWS Glue

2018/01/12 - AWS Glue - 6 updated api methods

Changes  Support is added to generate ETL scripts in Scala which can now be run by AWS Glue ETL jobs. In addition, the trigger API now supports firing when any conditions are met (in addition to all conditions). Also, jobs can be triggered based on a "failed" or "stopped" job run (in addition to a "succeeded" job run).

CreateScript (updated) Link ¶
Changes (request, response)
Request
{'Language': 'PYTHON | SCALA'}
Response
{'ScalaCode': 'string'}

Transforms a directed acyclic graph (DAG) into code.

See also: AWS API Documentation

Request Syntax

client.create_script(
    DagNodes=[
        {
            'Id': 'string',
            'NodeType': 'string',
            'Args': [
                {
                    'Name': 'string',
                    'Value': 'string',
                    'Param': True|False
                },
            ],
            'LineNumber': 123
        },
    ],
    DagEdges=[
        {
            'Source': 'string',
            'Target': 'string',
            'TargetParameter': 'string'
        },
    ],
    Language='PYTHON'|'SCALA'
)
type DagNodes

list

param DagNodes

A list of the nodes in the DAG.

  • (dict) --

    Represents a node in a directed acyclic graph (DAG)

    • Id (string) -- [REQUIRED]

      A node identifier that is unique within the node's graph.

    • NodeType (string) -- [REQUIRED]

      The type of node this is.

    • Args (list) -- [REQUIRED]

      Properties of the node, in the form of name-value pairs.

      • (dict) --

        An argument or property of a node.

        • Name (string) -- [REQUIRED]

          The name of the argument or property.

        • Value (string) -- [REQUIRED]

          The value of the argument or property.

        • Param (boolean) --

          True if the value is used as a parameter.

    • LineNumber (integer) --

      The line number of the node.

type DagEdges

list

param DagEdges

A list of the edges in the DAG.

  • (dict) --

    Represents a directional edge in a directed acyclic graph (DAG).

    • Source (string) -- [REQUIRED]

      The ID of the node at which the edge starts.

    • Target (string) -- [REQUIRED]

      The ID of the node at which the edge ends.

    • TargetParameter (string) --

      The target of the edge.

type Language

string

param Language

The programming language of the resulting code from the DAG.

rtype

dict

returns

Response Syntax

{
    'PythonScript': 'string',
    'ScalaCode': 'string'
}

Response Structure

  • (dict) --

    • PythonScript (string) --

      The Python script generated from the DAG.

    • ScalaCode (string) --

      The Scala code generated from the DAG.

CreateTrigger (updated) Link ¶
Changes (request)
{'Predicate': {'Logical': {'ANY'}}}

Creates a new trigger.

See also: AWS API Documentation

Request Syntax

client.create_trigger(
    Name='string',
    Type='SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
    Schedule='string',
    Predicate={
        'Logical': 'AND'|'ANY',
        'Conditions': [
            {
                'LogicalOperator': 'EQUALS',
                'JobName': 'string',
                'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'
            },
        ]
    },
    Actions=[
        {
            'JobName': 'string',
            'Arguments': {
                'string': 'string'
            }
        },
    ],
    Description='string'
)
type Name

string

param Name

[REQUIRED]

The name of the trigger.

type Type

string

param Type

[REQUIRED]

The type of the new trigger.

type Schedule

string

param Schedule

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

This field is required when the trigger type is SCHEDULED.

type Predicate

dict

param Predicate

A predicate to specify when the new trigger should fire.

This field is required when the trigger type is CONDITIONAL.

  • Logical (string) --

    Currently "OR" is not supported.

  • Conditions (list) --

    A list of the conditions that determine when the trigger will fire.

    • (dict) --

      Defines a condition under which a trigger fires.

      • LogicalOperator (string) --

        A logical operator.

      • JobName (string) --

        The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

      • State (string) --

        The condition state. Currently, the values supported are SUCCEEDED, STOPPED and FAILED.

type Actions

list

param Actions

[REQUIRED]

The actions initiated by this trigger when it fires.

  • (dict) --

    Defines an action to be initiated by a trigger.

    • JobName (string) --

      The name of a job to be executed.

    • Arguments (dict) --

      Arguments to be passed to the job.

      You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

      For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

      For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

      • (string) --

        • (string) --

type Description

string

param Description

A description of the new trigger.

rtype

dict

returns

Response Syntax

{
    'Name': 'string'
}

Response Structure

  • (dict) --

    • Name (string) --

      The name of the trigger.

GetPlan (updated) Link ¶
Changes (request, response)
Request
{'Language': 'PYTHON | SCALA'}
Response
{'ScalaCode': 'string'}

Gets code to perform a specified mapping.

See also: AWS API Documentation

Request Syntax

client.get_plan(
    Mapping=[
        {
            'SourceTable': 'string',
            'SourcePath': 'string',
            'SourceType': 'string',
            'TargetTable': 'string',
            'TargetPath': 'string',
            'TargetType': 'string'
        },
    ],
    Source={
        'DatabaseName': 'string',
        'TableName': 'string'
    },
    Sinks=[
        {
            'DatabaseName': 'string',
            'TableName': 'string'
        },
    ],
    Location={
        'Jdbc': [
            {
                'Name': 'string',
                'Value': 'string',
                'Param': True|False
            },
        ],
        'S3': [
            {
                'Name': 'string',
                'Value': 'string',
                'Param': True|False
            },
        ]
    },
    Language='PYTHON'|'SCALA'
)
type Mapping

list

param Mapping

[REQUIRED]

The list of mappings from a source table to target tables.

  • (dict) --

    Defines a mapping.

    • SourceTable (string) --

      The name of the source table.

    • SourcePath (string) --

      The source path.

    • SourceType (string) --

      The source type.

    • TargetTable (string) --

      The target table.

    • TargetPath (string) --

      The target path.

    • TargetType (string) --

      The target type.

type Source

dict

param Source

[REQUIRED]

The source table.

  • DatabaseName (string) -- [REQUIRED]

    The database in which the table metadata resides.

  • TableName (string) -- [REQUIRED]

    The name of the table in question.

type Sinks

list

param Sinks

The target tables.

  • (dict) --

    Specifies a table definition in the Data Catalog.

    • DatabaseName (string) -- [REQUIRED]

      The database in which the table metadata resides.

    • TableName (string) -- [REQUIRED]

      The name of the table in question.

type Location

dict

param Location

Parameters for the mapping.

  • Jdbc (list) --

    A JDBC location.

    • (dict) --

      An argument or property of a node.

      • Name (string) -- [REQUIRED]

        The name of the argument or property.

      • Value (string) -- [REQUIRED]

        The value of the argument or property.

      • Param (boolean) --

        True if the value is used as a parameter.

  • S3 (list) --

    An Amazon S3 location.

    • (dict) --

      An argument or property of a node.

      • Name (string) -- [REQUIRED]

        The name of the argument or property.

      • Value (string) -- [REQUIRED]

        The value of the argument or property.

      • Param (boolean) --

        True if the value is used as a parameter.

type Language

string

param Language

The programming language of the code to perform the mapping.

rtype

dict

returns

Response Syntax

{
    'PythonScript': 'string',
    'ScalaCode': 'string'
}

Response Structure

  • (dict) --

    • PythonScript (string) --

      A Python script to perform the mapping.

    • ScalaCode (string) --

      Scala code to perform the mapping.

GetTrigger (updated) Link ¶
Changes (response)
{'Trigger': {'Predicate': {'Logical': {'ANY'}}}}

Retrieves the definition of a trigger.

See also: AWS API Documentation

Request Syntax

client.get_trigger(
    Name='string'
)
type Name

string

param Name

[REQUIRED]

The name of the trigger to retrieve.

rtype

dict

returns

Response Syntax

{
    'Trigger': {
        'Name': 'string',
        'Id': 'string',
        'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
        'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
        'Description': 'string',
        'Schedule': 'string',
        'Actions': [
            {
                'JobName': 'string',
                'Arguments': {
                    'string': 'string'
                }
            },
        ],
        'Predicate': {
            'Logical': 'AND'|'ANY',
            'Conditions': [
                {
                    'LogicalOperator': 'EQUALS',
                    'JobName': 'string',
                    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'
                },
            ]
        }
    }
}

Response Structure

  • (dict) --

    • Trigger (dict) --

      The requested trigger definition.

      • Name (string) --

        Name of the trigger.

      • Id (string) --

        Reserved for future use.

      • Type (string) --

        The type of trigger that this is.

      • State (string) --

        The current state of the trigger.

      • Description (string) --

        A description of this trigger.

      • Schedule (string) --

        A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

      • Actions (list) --

        The actions initiated by this trigger.

        • (dict) --

          Defines an action to be initiated by a trigger.

          • JobName (string) --

            The name of a job to be executed.

          • Arguments (dict) --

            Arguments to be passed to the job.

            You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

            For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

            For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

            • (string) --

              • (string) --

      • Predicate (dict) --

        The predicate of this trigger, which defines when it will fire.

        • Logical (string) --

          Currently "OR" is not supported.

        • Conditions (list) --

          A list of the conditions that determine when the trigger will fire.

          • (dict) --

            Defines a condition under which a trigger fires.

            • LogicalOperator (string) --

              A logical operator.

            • JobName (string) --

              The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

            • State (string) --

              The condition state. Currently, the values supported are SUCCEEDED, STOPPED and FAILED.

GetTriggers (updated) Link ¶
Changes (response)
{'Triggers': {'Predicate': {'Logical': {'ANY'}}}}

Gets all the triggers associated with a job.

See also: AWS API Documentation

Request Syntax

client.get_triggers(
    NextToken='string',
    DependentJobName='string',
    MaxResults=123
)
type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

type DependentJobName

string

param DependentJobName

The name of the job for which to retrieve triggers. The trigger that can start this job will be returned, and if there is no such trigger, all triggers will be returned.

type MaxResults

integer

param MaxResults

The maximum size of the response.

rtype

dict

returns

Response Syntax

{
    'Triggers': [
        {
            'Name': 'string',
            'Id': 'string',
            'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
            'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
            'Description': 'string',
            'Schedule': 'string',
            'Actions': [
                {
                    'JobName': 'string',
                    'Arguments': {
                        'string': 'string'
                    }
                },
            ],
            'Predicate': {
                'Logical': 'AND'|'ANY',
                'Conditions': [
                    {
                        'LogicalOperator': 'EQUALS',
                        'JobName': 'string',
                        'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'
                    },
                ]
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Triggers (list) --

      A list of triggers for the specified job.

      • (dict) --

        Information about a specific trigger.

        • Name (string) --

          Name of the trigger.

        • Id (string) --

          Reserved for future use.

        • Type (string) --

          The type of trigger that this is.

        • State (string) --

          The current state of the trigger.

        • Description (string) --

          A description of this trigger.

        • Schedule (string) --

          A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

        • Actions (list) --

          The actions initiated by this trigger.

          • (dict) --

            Defines an action to be initiated by a trigger.

            • JobName (string) --

              The name of a job to be executed.

            • Arguments (dict) --

              Arguments to be passed to the job.

              You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

              For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

              For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

              • (string) --

                • (string) --

        • Predicate (dict) --

          The predicate of this trigger, which defines when it will fire.

          • Logical (string) --

            Currently "OR" is not supported.

          • Conditions (list) --

            A list of the conditions that determine when the trigger will fire.

            • (dict) --

              Defines a condition under which a trigger fires.

              • LogicalOperator (string) --

                A logical operator.

              • JobName (string) --

                The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

              • State (string) --

                The condition state. Currently, the values supported are SUCCEEDED, STOPPED and FAILED.

    • NextToken (string) --

      A continuation token, if not all the requested triggers have yet been returned.

UpdateTrigger (updated) Link ¶
Changes (request, response)
Request
{'TriggerUpdate': {'Predicate': {'Logical': {'ANY'}}}}
Response
{'Trigger': {'Predicate': {'Logical': {'ANY'}}}}

Updates a trigger definition.

See also: AWS API Documentation

Request Syntax

client.update_trigger(
    Name='string',
    TriggerUpdate={
        'Name': 'string',
        'Description': 'string',
        'Schedule': 'string',
        'Actions': [
            {
                'JobName': 'string',
                'Arguments': {
                    'string': 'string'
                }
            },
        ],
        'Predicate': {
            'Logical': 'AND'|'ANY',
            'Conditions': [
                {
                    'LogicalOperator': 'EQUALS',
                    'JobName': 'string',
                    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'
                },
            ]
        }
    }
)
type Name

string

param Name

[REQUIRED]

The name of the trigger to update.

type TriggerUpdate

dict

param TriggerUpdate

[REQUIRED]

The new values with which to update the trigger.

  • Name (string) --

    Reserved for future use.

  • Description (string) --

    A description of this trigger.

  • Schedule (string) --

    A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

  • Actions (list) --

    The actions initiated by this trigger.

    • (dict) --

      Defines an action to be initiated by a trigger.

      • JobName (string) --

        The name of a job to be executed.

      • Arguments (dict) --

        Arguments to be passed to the job.

        You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

        For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

        For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

        • (string) --

          • (string) --

  • Predicate (dict) --

    The predicate of this trigger, which defines when it will fire.

    • Logical (string) --

      Currently "OR" is not supported.

    • Conditions (list) --

      A list of the conditions that determine when the trigger will fire.

      • (dict) --

        Defines a condition under which a trigger fires.

        • LogicalOperator (string) --

          A logical operator.

        • JobName (string) --

          The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

        • State (string) --

          The condition state. Currently, the values supported are SUCCEEDED, STOPPED and FAILED.

rtype

dict

returns

Response Syntax

{
    'Trigger': {
        'Name': 'string',
        'Id': 'string',
        'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
        'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
        'Description': 'string',
        'Schedule': 'string',
        'Actions': [
            {
                'JobName': 'string',
                'Arguments': {
                    'string': 'string'
                }
            },
        ],
        'Predicate': {
            'Logical': 'AND'|'ANY',
            'Conditions': [
                {
                    'LogicalOperator': 'EQUALS',
                    'JobName': 'string',
                    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'
                },
            ]
        }
    }
}

Response Structure

  • (dict) --

    • Trigger (dict) --

      The resulting trigger definition.

      • Name (string) --

        Name of the trigger.

      • Id (string) --

        Reserved for future use.

      • Type (string) --

        The type of trigger that this is.

      • State (string) --

        The current state of the trigger.

      • Description (string) --

        A description of this trigger.

      • Schedule (string) --

        A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

      • Actions (list) --

        The actions initiated by this trigger.

        • (dict) --

          Defines an action to be initiated by a trigger.

          • JobName (string) --

            The name of a job to be executed.

          • Arguments (dict) --

            Arguments to be passed to the job.

            You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

            For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

            For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

            • (string) --

              • (string) --

      • Predicate (dict) --

        The predicate of this trigger, which defines when it will fire.

        • Logical (string) --

          Currently "OR" is not supported.

        • Conditions (list) --

          A list of the conditions that determine when the trigger will fire.

          • (dict) --

            Defines a condition under which a trigger fires.

            • LogicalOperator (string) --

              A logical operator.

            • JobName (string) --

              The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

            • State (string) --

              The condition state. Currently, the values supported are SUCCEEDED, STOPPED and FAILED.