AWS Glue

2019/02/22 - AWS Glue - 11 new4 updated api methods

Changes  Update glue client to latest version

ListCrawlers (new) Link ¶

Retrieves the names of all crawler resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag will be retrieved.

See also: AWS API Documentation

Request Syntax

client.list_crawlers(
    MaxResults=123,
    NextToken='string',
    Tags={
        'string': 'string'
    }
)
type MaxResults:

integer

param MaxResults:

The maximum size of a list to return.

type NextToken:

string

param NextToken:

A continuation token, if this is a continuation request.

type Tags:

dict

param Tags:

Specifies to return only these tagged resources.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'CrawlerNames': [
        'string',
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • CrawlerNames (list) --

      The names of all crawlers in the account, or the crawlers with the specified tags.

      • (string) --

    • NextToken (string) --

      A continuation token, if the returned list does not contain the last metric available.

BatchGetTriggers (new) Link ¶

Returns a list of resource metadata for a given list of trigger names. After calling the ListTriggers operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

See also: AWS API Documentation

Request Syntax

client.batch_get_triggers(
    TriggerNames=[
        'string',
    ]
)
type TriggerNames:

list

param TriggerNames:

[REQUIRED]

A list of trigger names, which may be the names returned from the ListTriggers operation.

  • (string) --

rtype:

dict

returns:

Response Syntax

{
    'Triggers': [
        {
            'Name': 'string',
            'Id': 'string',
            'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
            'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
            'Description': 'string',
            'Schedule': 'string',
            'Actions': [
                {
                    'JobName': 'string',
                    'Arguments': {
                        'string': 'string'
                    },
                    'Timeout': 123,
                    'NotificationProperty': {
                        'NotifyDelayAfter': 123
                    },
                    'SecurityConfiguration': 'string'
                },
            ],
            'Predicate': {
                'Logical': 'AND'|'ANY',
                'Conditions': [
                    {
                        'LogicalOperator': 'EQUALS',
                        'JobName': 'string',
                        'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
                    },
                ]
            }
        },
    ],
    'TriggersNotFound': [
        'string',
    ]
}

Response Structure

  • (dict) --

    • Triggers (list) --

      A list of trigger definitions.

      • (dict) --

        Information about a specific trigger.

        • Name (string) --

          Name of the trigger.

        • Id (string) --

          Reserved for future use.

        • Type (string) --

          The type of trigger that this is.

        • State (string) --

          The current state of the trigger.

        • Description (string) --

          A description of this trigger.

        • Schedule (string) --

          A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

        • Actions (list) --

          The actions initiated by this trigger.

          • (dict) --

            Defines an action to be initiated by a trigger.

            • JobName (string) --

              The name of a job to be executed.

            • Arguments (dict) --

              The job arguments used when this trigger fires. For this job run, they replace the default arguments set in the job definition itself.

              You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

              For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

              For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

              • (string) --

                • (string) --

            • Timeout (integer) --

              The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

            • NotificationProperty (dict) --

              Specifies configuration properties of a job run notification.

              • NotifyDelayAfter (integer) --

                After a job run starts, the number of minutes to wait before sending a job run delay notification.

            • SecurityConfiguration (string) --

              The name of the SecurityConfiguration structure to be used with this action.

        • Predicate (dict) --

          The predicate of this trigger, which defines when it will fire.

          • Logical (string) --

            Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

          • Conditions (list) --

            A list of the conditions that determine when the trigger will fire.

            • (dict) --

              Defines a condition under which a trigger fires.

              • LogicalOperator (string) --

                A logical operator.

              • JobName (string) --

                The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

              • State (string) --

                The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.

    • TriggersNotFound (list) --

      A list of names of triggers not found.

      • (string) --

BatchGetDevEndpoints (new) Link ¶

Returns a list of resource metadata for a given list of DevEndpoint names. After calling the ListDevEndpoints operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

See also: AWS API Documentation

Request Syntax

client.batch_get_dev_endpoints(
    DevEndpointNames=[
        'string',
    ]
)
type DevEndpointNames:

list

param DevEndpointNames:

[REQUIRED]

The list of DevEndpoint names, which may be the names returned from the ListDevEndpoint operation.

  • (string) --

rtype:

dict

returns:

Response Syntax

{
    'DevEndpoints': [
        {
            'EndpointName': 'string',
            'RoleArn': 'string',
            'SecurityGroupIds': [
                'string',
            ],
            'SubnetId': 'string',
            'YarnEndpointAddress': 'string',
            'PrivateAddress': 'string',
            'ZeppelinRemoteSparkInterpreterPort': 123,
            'PublicAddress': 'string',
            'Status': 'string',
            'NumberOfNodes': 123,
            'AvailabilityZone': 'string',
            'VpcId': 'string',
            'ExtraPythonLibsS3Path': 'string',
            'ExtraJarsS3Path': 'string',
            'FailureReason': 'string',
            'LastUpdateStatus': 'string',
            'CreatedTimestamp': datetime(2015, 1, 1),
            'LastModifiedTimestamp': datetime(2015, 1, 1),
            'PublicKey': 'string',
            'PublicKeys': [
                'string',
            ],
            'SecurityConfiguration': 'string'
        },
    ],
    'DevEndpointsNotFound': [
        'string',
    ]
}

Response Structure

  • (dict) --

    • DevEndpoints (list) --

      A list of DevEndpoint definitions.

      • (dict) --

        A development endpoint where a developer can remotely debug ETL scripts.

        • EndpointName (string) --

          The name of the DevEndpoint.

        • RoleArn (string) --

          The AWS ARN of the IAM role used in this DevEndpoint.

        • SecurityGroupIds (list) --

          A list of security group identifiers used in this DevEndpoint.

          • (string) --

        • SubnetId (string) --

          The subnet ID for this DevEndpoint.

        • YarnEndpointAddress (string) --

          The YARN endpoint address used by this DevEndpoint.

        • PrivateAddress (string) --

          A private IP address to access the DevEndpoint within a VPC, if the DevEndpoint is created within one. The PrivateAddress field is present only when you create the DevEndpoint within your virtual private cloud (VPC).

        • ZeppelinRemoteSparkInterpreterPort (integer) --

          The Apache Zeppelin port for the remote Apache Spark interpreter.

        • PublicAddress (string) --

          The public IP address used by this DevEndpoint. The PublicAddress field is present only when you create a non-VPC (virtual private cloud) DevEndpoint.

        • Status (string) --

          The current status of this DevEndpoint.

        • NumberOfNodes (integer) --

          The number of AWS Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

        • AvailabilityZone (string) --

          The AWS availability zone where this DevEndpoint is located.

        • VpcId (string) --

          The ID of the virtual private cloud (VPC) used by this DevEndpoint.

        • ExtraPythonLibsS3Path (string) --

          Path(s) to one or more Python libraries in an S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

          Please note that only pure Python libraries can currently be used on a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported.

        • ExtraJarsS3Path (string) --

          Path to one or more Java Jars in an S3 bucket that should be loaded in your DevEndpoint.

          Please note that only pure Java/Scala libraries can currently be used on a DevEndpoint.

        • FailureReason (string) --

          The reason for a current failure in this DevEndpoint.

        • LastUpdateStatus (string) --

          The status of the last update.

        • CreatedTimestamp (datetime) --

          The point in time at which this DevEndpoint was created.

        • LastModifiedTimestamp (datetime) --

          The point in time at which this DevEndpoint was last modified.

        • PublicKey (string) --

          The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility, as the recommended attribute to use is public keys.

        • PublicKeys (list) --

          A list of public keys to be used by the DevEndpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

          • (string) --

        • SecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used with this DevEndpoint.

    • DevEndpointsNotFound (list) --

      A list of DevEndpoints not found.

      • (string) --

TagResource (new) Link ¶

Adds tags to a resource. A tag is a label you can assign to an AWS resource. In AWS Glue, you can tag only certain resources. For information about what resources you can tag, see AWS Tags in AWS Glue.

See also: AWS API Documentation

Request Syntax

client.tag_resource(
    ResourceArn='string',
    TagsToAdd={
        'string': 'string'
    }
)
type ResourceArn:

string

param ResourceArn:

[REQUIRED]

The ARN of the AWS Glue resource to which to add the tags. For more information about AWS Glue resource ARNs, see the AWS Glue ARN string pattern.

type TagsToAdd:

dict

param TagsToAdd:

[REQUIRED]

Tags to add to this resource.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

BatchGetJobs (new) Link ¶

Returns a list of resource metadata for a given list of job names. After calling the ListJobs operation, you can call this operation to access the data to which you have been granted permissions. This operation supports all IAM permissions, including permission conditions that uses tags.

See also: AWS API Documentation

Request Syntax

client.batch_get_jobs(
    JobNames=[
        'string',
    ]
)
type JobNames:

list

param JobNames:

[REQUIRED]

A list of job names, which may be the names returned from the ListJobs operation.

  • (string) --

rtype:

dict

returns:

Response Syntax

{
    'Jobs': [
        {
            'Name': 'string',
            'Description': 'string',
            'LogUri': 'string',
            'Role': 'string',
            'CreatedOn': datetime(2015, 1, 1),
            'LastModifiedOn': datetime(2015, 1, 1),
            'ExecutionProperty': {
                'MaxConcurrentRuns': 123
            },
            'Command': {
                'Name': 'string',
                'ScriptLocation': 'string'
            },
            'DefaultArguments': {
                'string': 'string'
            },
            'Connections': {
                'Connections': [
                    'string',
                ]
            },
            'MaxRetries': 123,
            'AllocatedCapacity': 123,
            'Timeout': 123,
            'MaxCapacity': 123.0,
            'NotificationProperty': {
                'NotifyDelayAfter': 123
            },
            'SecurityConfiguration': 'string'
        },
    ],
    'JobsNotFound': [
        'string',
    ]
}

Response Structure

  • (dict) --

    • Jobs (list) --

      A list of job definitions.

      • (dict) --

        Specifies a job definition.

        • Name (string) --

          The name you assign to this job definition.

        • Description (string) --

          Description of the job being defined.

        • LogUri (string) --

          This field is reserved for future use.

        • Role (string) --

          The name or ARN of the IAM role associated with this job.

        • CreatedOn (datetime) --

          The time and date that this job definition was created.

        • LastModifiedOn (datetime) --

          The last point in time when this job definition was modified.

        • ExecutionProperty (dict) --

          An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

          • MaxConcurrentRuns (integer) --

            The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

        • Command (dict) --

          The JobCommand that executes this job.

          • Name (string) --

            The name of the job command: this must be glueetl, for an Apache Spark ETL job, or pythonshell, for a Python shell job.

          • ScriptLocation (string) --

            Specifies the S3 path to a script that executes a job (required).

        • DefaultArguments (dict) --

          The default arguments for this job, specified as name-value pairs.

          You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

          For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

          For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

          • (string) --

            • (string) --

        • Connections (dict) --

          The connections used for this job.

          • Connections (list) --

            A list of connections used by the job.

            • (string) --

        • MaxRetries (integer) --

          The maximum number of times to retry this job after a JobRun fails.

        • AllocatedCapacity (integer) --

          This field is deprecated, use MaxCapacity instead.

          The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

        • Timeout (integer) --

          The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

        • MaxCapacity (float) --

          The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

          The value that can be allocated for MaxCapacity depends on whether you are running a python shell job, or an Apache Spark ETL job:

          • When you specify a python shell job ( ``JobCommand.Name``="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

          • When you specify an Apache Spark ETL job ( ``JobCommand.Name``="glueetl"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

        • NotificationProperty (dict) --

          Specifies configuration properties of a job notification.

          • NotifyDelayAfter (integer) --

            After a job run starts, the number of minutes to wait before sending a job run delay notification.

        • SecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used with this job.

    • JobsNotFound (list) --

      A list of names of jobs not found.

      • (string) --

ListJobs (new) Link ¶

Retrieves the names of all job resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag will be retrieved.

See also: AWS API Documentation

Request Syntax

client.list_jobs(
    NextToken='string',
    MaxResults=123,
    Tags={
        'string': 'string'
    }
)
type NextToken:

string

param NextToken:

A continuation token, if this is a continuation request.

type MaxResults:

integer

param MaxResults:

The maximum size of a list to return.

type Tags:

dict

param Tags:

Specifies to return only these tagged resources.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'JobNames': [
        'string',
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • JobNames (list) --

      The names of all jobs in the account, or the jobs with the specified tags.

      • (string) --

    • NextToken (string) --

      A continuation token, if the returned list does not contain the last metric available.

GetTags (new) Link ¶

Retrieves a list of tags associated with a resource.

See also: AWS API Documentation

Request Syntax

client.get_tags(
    ResourceArn='string'
)
type ResourceArn:

string

param ResourceArn:

[REQUIRED]

The Amazon ARN of the resource for which to retrieve tags.

rtype:

dict

returns:

Response Syntax

{
    'Tags': {
        'string': 'string'
    }
}

Response Structure

  • (dict) --

    • Tags (dict) --

      The requested tags.

      • (string) --

        • (string) --

ListTriggers (new) Link ¶

Retrieves the names of all trigger resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag will be retrieved.

See also: AWS API Documentation

Request Syntax

client.list_triggers(
    NextToken='string',
    DependentJobName='string',
    MaxResults=123,
    Tags={
        'string': 'string'
    }
)
type NextToken:

string

param NextToken:

A continuation token, if this is a continuation request.

type DependentJobName:

string

param DependentJobName:

The name of the job for which to retrieve triggers. The trigger that can start this job will be returned, and if there is no such trigger, all triggers will be returned.

type MaxResults:

integer

param MaxResults:

The maximum size of a list to return.

type Tags:

dict

param Tags:

Specifies to return only these tagged resources.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'TriggerNames': [
        'string',
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • TriggerNames (list) --

      The names of all triggers in the account, or the triggers with the specified tags.

      • (string) --

    • NextToken (string) --

      A continuation token, if the returned list does not contain the last metric available.

UntagResource (new) Link ¶

Removes tags from a resource.

See also: AWS API Documentation

Request Syntax

client.untag_resource(
    ResourceArn='string',
    TagsToRemove=[
        'string',
    ]
)
type ResourceArn:

string

param ResourceArn:

[REQUIRED]

The ARN of the resource from which to remove the tags.

type TagsToRemove:

list

param TagsToRemove:

[REQUIRED]

Tags to remove from this resource.

  • (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

ListDevEndpoints (new) Link ¶

Retrieves the names of all DevEndpoint resources in this AWS account, or the resources with the specified tag. This operation allows you to see which resources are available in your account, and their names.

This operation takes the optional Tags field which you can use as a filter on the response so that tagged resources can be retrieved as a group. If you choose to use tags filtering, only resources with the tag will be retrieved.

See also: AWS API Documentation

Request Syntax

client.list_dev_endpoints(
    NextToken='string',
    MaxResults=123,
    Tags={
        'string': 'string'
    }
)
type NextToken:

string

param NextToken:

A continuation token, if this is a continuation request.

type MaxResults:

integer

param MaxResults:

The maximum size of a list to return.

type Tags:

dict

param Tags:

Specifies to return only these tagged resources.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'DevEndpointNames': [
        'string',
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • DevEndpointNames (list) --

      The names of all DevEndpoints in the account, or the DevEndpoints with the specified tags.

      • (string) --

    • NextToken (string) --

      A continuation token, if the returned list does not contain the last metric available.

BatchGetCrawlers (new) Link ¶

Returns a list of resource metadata for a given list of crawler names. After calling the ListCrawlers operation, you can call this operation to access the data to which you have been granted permissions to based on tags.

See also: AWS API Documentation

Request Syntax

client.batch_get_crawlers(
    CrawlerNames=[
        'string',
    ]
)
type CrawlerNames:

list

param CrawlerNames:

[REQUIRED]

A list of crawler names, which may be the names returned from the ListCrawlers operation.

  • (string) --

rtype:

dict

returns:

Response Syntax

{
    'Crawlers': [
        {
            'Name': 'string',
            'Role': 'string',
            'Targets': {
                'S3Targets': [
                    {
                        'Path': 'string',
                        'Exclusions': [
                            'string',
                        ]
                    },
                ],
                'JdbcTargets': [
                    {
                        'ConnectionName': 'string',
                        'Path': 'string',
                        'Exclusions': [
                            'string',
                        ]
                    },
                ],
                'DynamoDBTargets': [
                    {
                        'Path': 'string'
                    },
                ]
            },
            'DatabaseName': 'string',
            'Description': 'string',
            'Classifiers': [
                'string',
            ],
            'SchemaChangePolicy': {
                'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
                'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
            },
            'State': 'READY'|'RUNNING'|'STOPPING',
            'TablePrefix': 'string',
            'Schedule': {
                'ScheduleExpression': 'string',
                'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING'
            },
            'CrawlElapsedTime': 123,
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'LastCrawl': {
                'Status': 'SUCCEEDED'|'CANCELLED'|'FAILED',
                'ErrorMessage': 'string',
                'LogGroup': 'string',
                'LogStream': 'string',
                'MessagePrefix': 'string',
                'StartTime': datetime(2015, 1, 1)
            },
            'Version': 123,
            'Configuration': 'string',
            'CrawlerSecurityConfiguration': 'string'
        },
    ],
    'CrawlersNotFound': [
        'string',
    ]
}

Response Structure

  • (dict) --

    • Crawlers (list) --

      A list of crawler definitions.

      • (dict) --

        Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the AWS Glue Data Catalog.

        • Name (string) --

          The crawler name.

        • Role (string) --

          The IAM role (or ARN of an IAM role) used to access customer resources, such as data in Amazon S3.

        • Targets (dict) --

          A collection of targets to crawl.

          • S3Targets (list) --

            Specifies Amazon S3 targets.

            • (dict) --

              Specifies a data store in Amazon S3.

              • Path (string) --

                The path to the Amazon S3 target.

              • Exclusions (list) --

                A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

                • (string) --

          • JdbcTargets (list) --

            Specifies JDBC targets.

            • (dict) --

              Specifies a JDBC data store to crawl.

              • ConnectionName (string) --

                The name of the connection to use to connect to the JDBC target.

              • Path (string) --

                The path of the JDBC target.

              • Exclusions (list) --

                A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

                • (string) --

          • DynamoDBTargets (list) --

            Specifies DynamoDB targets.

            • (dict) --

              Specifies a DynamoDB table to crawl.

              • Path (string) --

                The name of the DynamoDB table to crawl.

        • DatabaseName (string) --

          The database where metadata is written by this crawler.

        • Description (string) --

          A description of the crawler.

        • Classifiers (list) --

          A list of custom classifiers associated with the crawler.

          • (string) --

        • SchemaChangePolicy (dict) --

          Sets the behavior when the crawler finds a changed or deleted object.

          • UpdateBehavior (string) --

            The update behavior when the crawler finds a changed schema.

          • DeleteBehavior (string) --

            The deletion behavior when the crawler finds a deleted object.

        • State (string) --

          Indicates whether the crawler is running, or whether a run is pending.

        • TablePrefix (string) --

          The prefix added to the names of tables that are created.

        • Schedule (dict) --

          For scheduled crawlers, the schedule when the crawler runs.

          • ScheduleExpression (string) --

            A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

          • State (string) --

            The state of the schedule.

        • CrawlElapsedTime (integer) --

          If the crawler is running, contains the total time elapsed since the last crawl began.

        • CreationTime (datetime) --

          The time when the crawler was created.

        • LastUpdated (datetime) --

          The time the crawler was last updated.

        • LastCrawl (dict) --

          The status of the last crawl, and potentially error information if an error occurred.

          • Status (string) --

            Status of the last crawl.

          • ErrorMessage (string) --

            If an error occurred, the error information about the last crawl.

          • LogGroup (string) --

            The log group for the last crawl.

          • LogStream (string) --

            The log stream for the last crawl.

          • MessagePrefix (string) --

            The prefix for a message about this crawl.

          • StartTime (datetime) --

            The time at which the crawl started.

        • Version (integer) --

          The version of the crawler.

        • Configuration (string) --

          Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Configuring a Crawler.

        • CrawlerSecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used by this Crawler.

    • CrawlersNotFound (list) --

      A list of crawlers not found.

      • (string) --

CreateCrawler (updated) Link ¶
Changes (request)
{'Tags': {'string': 'string'}}

Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets field, the jdbcTargets field, or the DynamoDBTargets field.

See also: AWS API Documentation

Request Syntax

client.create_crawler(
    Name='string',
    Role='string',
    DatabaseName='string',
    Description='string',
    Targets={
        'S3Targets': [
            {
                'Path': 'string',
                'Exclusions': [
                    'string',
                ]
            },
        ],
        'JdbcTargets': [
            {
                'ConnectionName': 'string',
                'Path': 'string',
                'Exclusions': [
                    'string',
                ]
            },
        ],
        'DynamoDBTargets': [
            {
                'Path': 'string'
            },
        ]
    },
    Schedule='string',
    Classifiers=[
        'string',
    ],
    TablePrefix='string',
    SchemaChangePolicy={
        'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
        'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
    },
    Configuration='string',
    CrawlerSecurityConfiguration='string',
    Tags={
        'string': 'string'
    }
)
type Name:

string

param Name:

[REQUIRED]

Name of the new crawler.

type Role:

string

param Role:

[REQUIRED]

The IAM role (or ARN of an IAM role) used by the new crawler to access customer resources.

type DatabaseName:

string

param DatabaseName:

[REQUIRED]

The AWS Glue database where results are written, such as: arn:aws:daylight:us-east-1::database/sometable/*.

type Description:

string

param Description:

A description of the new crawler.

type Targets:

dict

param Targets:

[REQUIRED]

A list of collection of targets to crawl.

  • S3Targets (list) --

    Specifies Amazon S3 targets.

    • (dict) --

      Specifies a data store in Amazon S3.

      • Path (string) --

        The path to the Amazon S3 target.

      • Exclusions (list) --

        A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

        • (string) --

  • JdbcTargets (list) --

    Specifies JDBC targets.

    • (dict) --

      Specifies a JDBC data store to crawl.

      • ConnectionName (string) --

        The name of the connection to use to connect to the JDBC target.

      • Path (string) --

        The path of the JDBC target.

      • Exclusions (list) --

        A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

        • (string) --

  • DynamoDBTargets (list) --

    Specifies DynamoDB targets.

    • (dict) --

      Specifies a DynamoDB table to crawl.

      • Path (string) --

        The name of the DynamoDB table to crawl.

type Schedule:

string

param Schedule:

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

type Classifiers:

list

param Classifiers:

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

  • (string) --

type TablePrefix:

string

param TablePrefix:

The table prefix used for catalog tables that are created.

type SchemaChangePolicy:

dict

param SchemaChangePolicy:

Policy for the crawler's update and deletion behavior.

  • UpdateBehavior (string) --

    The update behavior when the crawler finds a changed schema.

  • DeleteBehavior (string) --

    The deletion behavior when the crawler finds a deleted object.

type Configuration:

string

param Configuration:

Crawler configuration information. This versioned JSON string allows users to specify aspects of a crawler's behavior. For more information, see Configuring a Crawler.

type CrawlerSecurityConfiguration:

string

param CrawlerSecurityConfiguration:

The name of the SecurityConfiguration structure to be used by this Crawler.

type Tags:

dict

param Tags:

The tags to use with this crawler request. You may use tags to limit access to the crawler. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

CreateDevEndpoint (updated) Link ¶
Changes (request)
{'Tags': {'string': 'string'}}

Creates a new DevEndpoint.

See also: AWS API Documentation

Request Syntax

client.create_dev_endpoint(
    EndpointName='string',
    RoleArn='string',
    SecurityGroupIds=[
        'string',
    ],
    SubnetId='string',
    PublicKey='string',
    PublicKeys=[
        'string',
    ],
    NumberOfNodes=123,
    ExtraPythonLibsS3Path='string',
    ExtraJarsS3Path='string',
    SecurityConfiguration='string',
    Tags={
        'string': 'string'
    }
)
type EndpointName:

string

param EndpointName:

[REQUIRED]

The name to be assigned to the new DevEndpoint.

type RoleArn:

string

param RoleArn:

[REQUIRED]

The IAM role for the DevEndpoint.

type SecurityGroupIds:

list

param SecurityGroupIds:

Security group IDs for the security groups to be used by the new DevEndpoint.

  • (string) --

type SubnetId:

string

param SubnetId:

The subnet ID for the new DevEndpoint to use.

type PublicKey:

string

param PublicKey:

The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility, as the recommended attribute to use is public keys.

type PublicKeys:

list

param PublicKeys:

A list of public keys to be used by the DevEndpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

  • (string) --

type NumberOfNodes:

integer

param NumberOfNodes:

The number of AWS Glue Data Processing Units (DPUs) to allocate to this DevEndpoint.

type ExtraPythonLibsS3Path:

string

param ExtraPythonLibsS3Path:

Path(s) to one or more Python libraries in an S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

Please note that only pure Python libraries can currently be used on a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported.

type ExtraJarsS3Path:

string

param ExtraJarsS3Path:

Path to one or more Java Jars in an S3 bucket that should be loaded in your DevEndpoint.

type SecurityConfiguration:

string

param SecurityConfiguration:

The name of the SecurityConfiguration structure to be used with this DevEndpoint.

type Tags:

dict

param Tags:

The tags to use with this DevEndpoint. You may use tags to limit access to the DevEndpoint. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'EndpointName': 'string',
    'Status': 'string',
    'SecurityGroupIds': [
        'string',
    ],
    'SubnetId': 'string',
    'RoleArn': 'string',
    'YarnEndpointAddress': 'string',
    'ZeppelinRemoteSparkInterpreterPort': 123,
    'NumberOfNodes': 123,
    'AvailabilityZone': 'string',
    'VpcId': 'string',
    'ExtraPythonLibsS3Path': 'string',
    'ExtraJarsS3Path': 'string',
    'FailureReason': 'string',
    'SecurityConfiguration': 'string',
    'CreatedTimestamp': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • EndpointName (string) --

      The name assigned to the new DevEndpoint.

    • Status (string) --

      The current status of the new DevEndpoint.

    • SecurityGroupIds (list) --

      The security groups assigned to the new DevEndpoint.

      • (string) --

    • SubnetId (string) --

      The subnet ID assigned to the new DevEndpoint.

    • RoleArn (string) --

      The AWS ARN of the role assigned to the new DevEndpoint.

    • YarnEndpointAddress (string) --

      The address of the YARN endpoint used by this DevEndpoint.

    • ZeppelinRemoteSparkInterpreterPort (integer) --

      The Apache Zeppelin port for the remote Apache Spark interpreter.

    • NumberOfNodes (integer) --

      The number of AWS Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

    • AvailabilityZone (string) --

      The AWS availability zone where this DevEndpoint is located.

    • VpcId (string) --

      The ID of the VPC used by this DevEndpoint.

    • ExtraPythonLibsS3Path (string) --

      Path(s) to one or more Python libraries in an S3 bucket that will be loaded in your DevEndpoint.

    • ExtraJarsS3Path (string) --

      Path to one or more Java Jars in an S3 bucket that will be loaded in your DevEndpoint.

    • FailureReason (string) --

      The reason for a current failure in this DevEndpoint.

    • SecurityConfiguration (string) --

      The name of the SecurityConfiguration structure being used with this DevEndpoint.

    • CreatedTimestamp (datetime) --

      The point in time at which this DevEndpoint was created.

CreateJob (updated) Link ¶
Changes (request)
{'Tags': {'string': 'string'}}

Creates a new job definition.

See also: AWS API Documentation

Request Syntax

client.create_job(
    Name='string',
    Description='string',
    LogUri='string',
    Role='string',
    ExecutionProperty={
        'MaxConcurrentRuns': 123
    },
    Command={
        'Name': 'string',
        'ScriptLocation': 'string'
    },
    DefaultArguments={
        'string': 'string'
    },
    Connections={
        'Connections': [
            'string',
        ]
    },
    MaxRetries=123,
    AllocatedCapacity=123,
    Timeout=123,
    MaxCapacity=123.0,
    NotificationProperty={
        'NotifyDelayAfter': 123
    },
    SecurityConfiguration='string',
    Tags={
        'string': 'string'
    }
)
type Name:

string

param Name:

[REQUIRED]

The name you assign to this job definition. It must be unique in your account.

type Description:

string

param Description:

Description of the job being defined.

type LogUri:

string

param LogUri:

This field is reserved for future use.

type Role:

string

param Role:

[REQUIRED]

The name or ARN of the IAM role associated with this job.

type ExecutionProperty:

dict

param ExecutionProperty:

An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

  • MaxConcurrentRuns (integer) --

    The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

type Command:

dict

param Command:

[REQUIRED]

The JobCommand that executes this job.

  • Name (string) --

    The name of the job command: this must be glueetl, for an Apache Spark ETL job, or pythonshell, for a Python shell job.

  • ScriptLocation (string) --

    Specifies the S3 path to a script that executes a job (required).

type DefaultArguments:

dict

param DefaultArguments:

The default arguments for this job.

You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

  • (string) --

    • (string) --

type Connections:

dict

param Connections:

The connections used for this job.

  • Connections (list) --

    A list of connections used by the job.

    • (string) --

type MaxRetries:

integer

param MaxRetries:

The maximum number of times to retry this job if it fails.

type AllocatedCapacity:

integer

param AllocatedCapacity:

This parameter is deprecated. Use MaxCapacity instead.

The number of AWS Glue data processing units (DPUs) to allocate to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

type Timeout:

integer

param Timeout:

The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

type MaxCapacity:

float

param MaxCapacity:

The number of AWS Glue data processing units (DPUs) that can be allocated when this job runs. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

The value that can be allocated for MaxCapacity depends on whether you are running a python shell job, or an Apache Spark ETL job:

  • When you specify a python shell job ( ``JobCommand.Name``="pythonshell"), you can allocate either 0.0625 or 1 DPU. The default is 0.0625 DPU.

  • When you specify an Apache Spark ETL job ( ``JobCommand.Name``="glueetl"), you can allocate from 2 to 100 DPUs. The default is 10 DPUs. This job type cannot have a fractional DPU allocation.

type NotificationProperty:

dict

param NotificationProperty:

Specifies configuration properties of a job notification.

  • NotifyDelayAfter (integer) --

    After a job run starts, the number of minutes to wait before sending a job run delay notification.

type SecurityConfiguration:

string

param SecurityConfiguration:

The name of the SecurityConfiguration structure to be used with this job.

type Tags:

dict

param Tags:

The tags to use with this job. You may use tags to limit access to the job. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'Name': 'string'
}

Response Structure

  • (dict) --

    • Name (string) --

      The unique name that was provided for this job definition.

CreateTrigger (updated) Link ¶
Changes (request)
{'Tags': {'string': 'string'}}

Creates a new trigger.

See also: AWS API Documentation

Request Syntax

client.create_trigger(
    Name='string',
    Type='SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
    Schedule='string',
    Predicate={
        'Logical': 'AND'|'ANY',
        'Conditions': [
            {
                'LogicalOperator': 'EQUALS',
                'JobName': 'string',
                'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
            },
        ]
    },
    Actions=[
        {
            'JobName': 'string',
            'Arguments': {
                'string': 'string'
            },
            'Timeout': 123,
            'NotificationProperty': {
                'NotifyDelayAfter': 123
            },
            'SecurityConfiguration': 'string'
        },
    ],
    Description='string',
    StartOnCreation=True|False,
    Tags={
        'string': 'string'
    }
)
type Name:

string

param Name:

[REQUIRED]

The name of the trigger.

type Type:

string

param Type:

[REQUIRED]

The type of the new trigger.

type Schedule:

string

param Schedule:

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *).

This field is required when the trigger type is SCHEDULED.

type Predicate:

dict

param Predicate:

A predicate to specify when the new trigger should fire.

This field is required when the trigger type is CONDITIONAL.

  • Logical (string) --

    Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

  • Conditions (list) --

    A list of the conditions that determine when the trigger will fire.

    • (dict) --

      Defines a condition under which a trigger fires.

      • LogicalOperator (string) --

        A logical operator.

      • JobName (string) --

        The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

      • State (string) --

        The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.

type Actions:

list

param Actions:

[REQUIRED]

The actions initiated by this trigger when it fires.

  • (dict) --

    Defines an action to be initiated by a trigger.

    • JobName (string) --

      The name of a job to be executed.

    • Arguments (dict) --

      The job arguments used when this trigger fires. For this job run, they replace the default arguments set in the job definition itself.

      You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

      For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

      For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

      • (string) --

        • (string) --

    • Timeout (integer) --

      The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

    • NotificationProperty (dict) --

      Specifies configuration properties of a job run notification.

      • NotifyDelayAfter (integer) --

        After a job run starts, the number of minutes to wait before sending a job run delay notification.

    • SecurityConfiguration (string) --

      The name of the SecurityConfiguration structure to be used with this action.

type Description:

string

param Description:

A description of the new trigger.

type StartOnCreation:

boolean

param StartOnCreation:

Set to true to start SCHEDULED and CONDITIONAL triggers when created. True not supported for ON_DEMAND triggers.

type Tags:

dict

param Tags:

The tags to use with this trigger. You may use tags to limit access to the trigger. For more information about tags in AWS Glue, see AWS Tags in AWS Glue in the developer guide.

  • (string) --

    • (string) --

rtype:

dict

returns:

Response Syntax

{
    'Name': 'string'
}

Response Structure

  • (dict) --

    • Name (string) --

      The name of the trigger.