AWS Glue

2018/08/25 - AWS Glue - 5 new 18 updated api methods

Changes  AWS Glue now supports data encryption at rest for ETL jobs and development endpoints. With encryption enabled, when you run ETL jobs, or development endpoints, Glue will use AWS KMS keys to write encrypted data at rest. You can also encrypt the metadata stored in the Glue Data Catalog using keys that you manage with AWS KMS. Additionally, you can use AWS KMS keys to encrypt the logs generated by crawlers and ETL jobs as well as encrypt ETL job bookmarks. Encryption settings for Glue crawlers, ETL jobs, and development endpoints can be configured using the security configurations in Glue. Glue Data Catalog encryption can be enabled via the settings for the Glue Data Catalog.

DeleteSecurityConfiguration (new) Link ¶

Deletes a specified security configuration.

See also: AWS API Documentation

Request Syntax

client.delete_security_configuration(
    Name='string'
)
type Name

string

param Name

[REQUIRED]

The name of the security configuration to delete.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

GetSecurityConfiguration (new) Link ¶

Retrieves a specified security configuration.

See also: AWS API Documentation

Request Syntax

client.get_security_configuration(
    Name='string'
)
type Name

string

param Name

[REQUIRED]

The name of the security configuration to retrieve.

rtype

dict

returns

Response Syntax

{
    'SecurityConfiguration': {
        'Name': 'string',
        'CreatedTimeStamp': datetime(2015, 1, 1),
        'EncryptionConfiguration': {
            'S3Encryption': [
                {
                    'S3EncryptionMode': 'DISABLED'|'SSE-KMS'|'SSE-S3',
                    'KmsKeyArn': 'string'
                },
            ],
            'CloudWatchEncryption': {
                'CloudWatchEncryptionMode': 'DISABLED'|'SSE-KMS',
                'KmsKeyArn': 'string'
            },
            'JobBookmarksEncryption': {
                'JobBookmarksEncryptionMode': 'DISABLED'|'CSE-KMS',
                'KmsKeyArn': 'string'
            }
        }
    }
}

Response Structure

  • (dict) --

    • SecurityConfiguration (dict) --

      The requested security configuration

      • Name (string) --

        The name of the security configuration.

      • CreatedTimeStamp (datetime) --

        The time at which this security configuration was created.

      • EncryptionConfiguration (dict) --

        The encryption configuration associated with this security configuration.

        • S3Encryption (list) --

          The encryption configuration for S3 data.

          • (dict) --

            Specifies how S3 data should be encrypted.

            • S3EncryptionMode (string) --

              The encryption mode to use for S3 data.

            • KmsKeyArn (string) --

              The AWS ARN of the KMS key to be used to encrypt the data.

        • CloudWatchEncryption (dict) --

          The encryption configuration for CloudWatch.

          • CloudWatchEncryptionMode (string) --

            The encryption mode to use for CloudWatch data.

          • KmsKeyArn (string) --

            The AWS ARN of the KMS key to be used to encrypt the data.

        • JobBookmarksEncryption (dict) --

          The encryption configuration for Job Bookmarks.

          • JobBookmarksEncryptionMode (string) --

            The encryption mode to use for Job bookmarks data.

          • KmsKeyArn (string) --

            The AWS ARN of the KMS key to be used to encrypt the data.

PutDataCatalogEncryptionSettings (new) Link ¶

Sets the security configuration for a specified catalog. Once the configuration has been set, the specified encryption is applied to every catalog write thereafter.

See also: AWS API Documentation

Request Syntax

client.put_data_catalog_encryption_settings(
    CatalogId='string',
    DataCatalogEncryptionSettings={
        'EncryptionAtRest': {
            'CatalogEncryptionMode': 'DISABLED'|'SSE-KMS',
            'SseAwsKmsKeyId': 'string'
        }
    }
)
type CatalogId

string

param CatalogId

The ID of the Data Catalog for which to set the security configuration. If none is supplied, the AWS account ID is used by default.

type DataCatalogEncryptionSettings

dict

param DataCatalogEncryptionSettings

[REQUIRED]

The security configuration to set.

  • EncryptionAtRest (dict) --

    Specifies encryption-at-rest configuration for the Data Catalog.

    • CatalogEncryptionMode (string) -- [REQUIRED]

      The encryption-at-rest mode for encrypting Data Catalog data.

    • SseAwsKmsKeyId (string) --

      The ID of the AWS KMS key to use for encryption at rest.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

CreateSecurityConfiguration (new) Link ¶

Creates a new security configuration.

See also: AWS API Documentation

Request Syntax

client.create_security_configuration(
    Name='string',
    EncryptionConfiguration={
        'S3Encryption': [
            {
                'S3EncryptionMode': 'DISABLED'|'SSE-KMS'|'SSE-S3',
                'KmsKeyArn': 'string'
            },
        ],
        'CloudWatchEncryption': {
            'CloudWatchEncryptionMode': 'DISABLED'|'SSE-KMS',
            'KmsKeyArn': 'string'
        },
        'JobBookmarksEncryption': {
            'JobBookmarksEncryptionMode': 'DISABLED'|'CSE-KMS',
            'KmsKeyArn': 'string'
        }
    }
)
type Name

string

param Name

[REQUIRED]

The name for the new security configuration.

type EncryptionConfiguration

dict

param EncryptionConfiguration

[REQUIRED]

The encryption configuration for the new security configuration.

  • S3Encryption (list) --

    The encryption configuration for S3 data.

    • (dict) --

      Specifies how S3 data should be encrypted.

      • S3EncryptionMode (string) --

        The encryption mode to use for S3 data.

      • KmsKeyArn (string) --

        The AWS ARN of the KMS key to be used to encrypt the data.

  • CloudWatchEncryption (dict) --

    The encryption configuration for CloudWatch.

    • CloudWatchEncryptionMode (string) --

      The encryption mode to use for CloudWatch data.

    • KmsKeyArn (string) --

      The AWS ARN of the KMS key to be used to encrypt the data.

  • JobBookmarksEncryption (dict) --

    The encryption configuration for Job Bookmarks.

    • JobBookmarksEncryptionMode (string) --

      The encryption mode to use for Job bookmarks data.

    • KmsKeyArn (string) --

      The AWS ARN of the KMS key to be used to encrypt the data.

rtype

dict

returns

Response Syntax

{
    'Name': 'string',
    'CreatedTimestamp': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • Name (string) --

      The name assigned to the new security configuration.

    • CreatedTimestamp (datetime) --

      The time at which the new security configuration was created.

GetSecurityConfigurations (new) Link ¶

Retrieves a list of all security configurations.

See also: AWS API Documentation

Request Syntax

client.get_security_configurations(
    MaxResults=123,
    NextToken='string'
)
type MaxResults

integer

param MaxResults

The maximum number of results to return.

type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

rtype

dict

returns

Response Syntax

{
    'SecurityConfigurations': [
        {
            'Name': 'string',
            'CreatedTimeStamp': datetime(2015, 1, 1),
            'EncryptionConfiguration': {
                'S3Encryption': [
                    {
                        'S3EncryptionMode': 'DISABLED'|'SSE-KMS'|'SSE-S3',
                        'KmsKeyArn': 'string'
                    },
                ],
                'CloudWatchEncryption': {
                    'CloudWatchEncryptionMode': 'DISABLED'|'SSE-KMS',
                    'KmsKeyArn': 'string'
                },
                'JobBookmarksEncryption': {
                    'JobBookmarksEncryptionMode': 'DISABLED'|'CSE-KMS',
                    'KmsKeyArn': 'string'
                }
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • SecurityConfigurations (list) --

      A list of security configurations.

      • (dict) --

        Specifies a security configuration.

        • Name (string) --

          The name of the security configuration.

        • CreatedTimeStamp (datetime) --

          The time at which this security configuration was created.

        • EncryptionConfiguration (dict) --

          The encryption configuration associated with this security configuration.

          • S3Encryption (list) --

            The encryption configuration for S3 data.

            • (dict) --

              Specifies how S3 data should be encrypted.

              • S3EncryptionMode (string) --

                The encryption mode to use for S3 data.

              • KmsKeyArn (string) --

                The AWS ARN of the KMS key to be used to encrypt the data.

          • CloudWatchEncryption (dict) --

            The encryption configuration for CloudWatch.

            • CloudWatchEncryptionMode (string) --

              The encryption mode to use for CloudWatch data.

            • KmsKeyArn (string) --

              The AWS ARN of the KMS key to be used to encrypt the data.

          • JobBookmarksEncryption (dict) --

            The encryption configuration for Job Bookmarks.

            • JobBookmarksEncryptionMode (string) --

              The encryption mode to use for Job bookmarks data.

            • KmsKeyArn (string) --

              The AWS ARN of the KMS key to be used to encrypt the data.

    • NextToken (string) --

      A continuation token, if there are more security configurations to return.

CreateCrawler (updated) Link ¶
Changes (request)
{'CrawlerSecurityConfiguration': 'string'}

Creates a new crawler with specified targets, role, configuration, and optional schedule. At least one crawl target must be specified, in the s3Targets field, the jdbcTargets field, or the DynamoDBTargets field.

See also: AWS API Documentation

Request Syntax

client.create_crawler(
    Name='string',
    Role='string',
    DatabaseName='string',
    Description='string',
    Targets={
        'S3Targets': [
            {
                'Path': 'string',
                'Exclusions': [
                    'string',
                ]
            },
        ],
        'JdbcTargets': [
            {
                'ConnectionName': 'string',
                'Path': 'string',
                'Exclusions': [
                    'string',
                ]
            },
        ],
        'DynamoDBTargets': [
            {
                'Path': 'string'
            },
        ]
    },
    Schedule='string',
    Classifiers=[
        'string',
    ],
    TablePrefix='string',
    SchemaChangePolicy={
        'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
        'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
    },
    Configuration='string',
    CrawlerSecurityConfiguration='string'
)
type Name

string

param Name

[REQUIRED]

Name of the new crawler.

type Role

string

param Role

[REQUIRED]

The IAM role (or ARN of an IAM role) used by the new crawler to access customer resources.

type DatabaseName

string

param DatabaseName

[REQUIRED]

The AWS Glue database where results are written, such as: arn:aws:daylight:us-east-1::database/sometable/* .

type Description

string

param Description

A description of the new crawler.

type Targets

dict

param Targets

[REQUIRED]

A list of collection of targets to crawl.

  • S3Targets (list) --

    Specifies Amazon S3 targets.

    • (dict) --

      Specifies a data store in Amazon S3.

      • Path (string) --

        The path to the Amazon S3 target.

      • Exclusions (list) --

        A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

        • (string) --

  • JdbcTargets (list) --

    Specifies JDBC targets.

    • (dict) --

      Specifies a JDBC data store to crawl.

      • ConnectionName (string) --

        The name of the connection to use to connect to the JDBC target.

      • Path (string) --

        The path of the JDBC target.

      • Exclusions (list) --

        A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

        • (string) --

  • DynamoDBTargets (list) --

    Specifies DynamoDB targets.

    • (dict) --

      Specifies a DynamoDB table to crawl.

      • Path (string) --

        The name of the DynamoDB table to crawl.

type Schedule

string

param Schedule

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

type Classifiers

list

param Classifiers

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

  • (string) --

type TablePrefix

string

param TablePrefix

The table prefix used for catalog tables that are created.

type SchemaChangePolicy

dict

param SchemaChangePolicy

Policy for the crawler's update and deletion behavior.

  • UpdateBehavior (string) --

    The update behavior when the crawler finds a changed schema.

  • DeleteBehavior (string) --

    The deletion behavior when the crawler finds a deleted object.

type Configuration

string

param Configuration

Crawler configuration information. This versioned JSON string allows users to specify aspects of a Crawler's behavior.

You can use this field to force partitions to inherit metadata such as classification, input format, output format, serde information, and schema from their parent table, rather than detect this information separately for each partition. Use the following JSON string to specify that behavior:

Example: '{ "Version": 1.0, "CrawlerOutput": { "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" } } }'

type CrawlerSecurityConfiguration

string

param CrawlerSecurityConfiguration

The name of the SecurityConfiguration structure to be used by this Crawler.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

CreateDevEndpoint (updated) Link ¶
Changes (both)
{'SecurityConfiguration': 'string'}

Creates a new DevEndpoint.

See also: AWS API Documentation

Request Syntax

client.create_dev_endpoint(
    EndpointName='string',
    RoleArn='string',
    SecurityGroupIds=[
        'string',
    ],
    SubnetId='string',
    PublicKey='string',
    PublicKeys=[
        'string',
    ],
    NumberOfNodes=123,
    ExtraPythonLibsS3Path='string',
    ExtraJarsS3Path='string',
    SecurityConfiguration='string'
)
type EndpointName

string

param EndpointName

[REQUIRED]

The name to be assigned to the new DevEndpoint.

type RoleArn

string

param RoleArn

[REQUIRED]

The IAM role for the DevEndpoint.

type SecurityGroupIds

list

param SecurityGroupIds

Security group IDs for the security groups to be used by the new DevEndpoint.

  • (string) --

type SubnetId

string

param SubnetId

The subnet ID for the new DevEndpoint to use.

type PublicKey

string

param PublicKey

The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility, as the recommended attribute to use is public keys.

type PublicKeys

list

param PublicKeys

A list of public keys to be used by the DevEndpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

Note

If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys: call the UpdateDevEndpoint API with the public key content in the deletePublicKeys attribute, and the list of new keys in the addPublicKeys attribute.

  • (string) --

type NumberOfNodes

integer

param NumberOfNodes

The number of AWS Glue Data Processing Units (DPUs) to allocate to this DevEndpoint.

type ExtraPythonLibsS3Path

string

param ExtraPythonLibsS3Path

Path(s) to one or more Python libraries in an S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

Please note that only pure Python libraries can currently be used on a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported.

type ExtraJarsS3Path

string

param ExtraJarsS3Path

Path to one or more Java Jars in an S3 bucket that should be loaded in your DevEndpoint.

type SecurityConfiguration

string

param SecurityConfiguration

The name of the SecurityConfiguration structure to be used with this DevEndpoint.

rtype

dict

returns

Response Syntax

{
    'EndpointName': 'string',
    'Status': 'string',
    'SecurityGroupIds': [
        'string',
    ],
    'SubnetId': 'string',
    'RoleArn': 'string',
    'YarnEndpointAddress': 'string',
    'ZeppelinRemoteSparkInterpreterPort': 123,
    'NumberOfNodes': 123,
    'AvailabilityZone': 'string',
    'VpcId': 'string',
    'ExtraPythonLibsS3Path': 'string',
    'ExtraJarsS3Path': 'string',
    'FailureReason': 'string',
    'SecurityConfiguration': 'string',
    'CreatedTimestamp': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • EndpointName (string) --

      The name assigned to the new DevEndpoint.

    • Status (string) --

      The current status of the new DevEndpoint.

    • SecurityGroupIds (list) --

      The security groups assigned to the new DevEndpoint.

      • (string) --

    • SubnetId (string) --

      The subnet ID assigned to the new DevEndpoint.

    • RoleArn (string) --

      The AWS ARN of the role assigned to the new DevEndpoint.

    • YarnEndpointAddress (string) --

      The address of the YARN endpoint used by this DevEndpoint.

    • ZeppelinRemoteSparkInterpreterPort (integer) --

      The Apache Zeppelin port for the remote Apache Spark interpreter.

    • NumberOfNodes (integer) --

      The number of AWS Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

    • AvailabilityZone (string) --

      The AWS availability zone where this DevEndpoint is located.

    • VpcId (string) --

      The ID of the VPC used by this DevEndpoint.

    • ExtraPythonLibsS3Path (string) --

      Path(s) to one or more Python libraries in an S3 bucket that will be loaded in your DevEndpoint.

    • ExtraJarsS3Path (string) --

      Path to one or more Java Jars in an S3 bucket that will be loaded in your DevEndpoint.

    • FailureReason (string) --

      The reason for a current failure in this DevEndpoint.

    • SecurityConfiguration (string) --

      The name of the SecurityConfiguration structure being used with this DevEndpoint.

    • CreatedTimestamp (datetime) --

      The point in time at which this DevEndpoint was created.

CreateJob (updated) Link ¶
Changes (request)
{'SecurityConfiguration': 'string'}

Creates a new job definition.

See also: AWS API Documentation

Request Syntax

client.create_job(
    Name='string',
    Description='string',
    LogUri='string',
    Role='string',
    ExecutionProperty={
        'MaxConcurrentRuns': 123
    },
    Command={
        'Name': 'string',
        'ScriptLocation': 'string'
    },
    DefaultArguments={
        'string': 'string'
    },
    Connections={
        'Connections': [
            'string',
        ]
    },
    MaxRetries=123,
    AllocatedCapacity=123,
    Timeout=123,
    NotificationProperty={
        'NotifyDelayAfter': 123
    },
    SecurityConfiguration='string'
)
type Name

string

param Name

[REQUIRED]

The name you assign to this job definition. It must be unique in your account.

type Description

string

param Description

Description of the job being defined.

type LogUri

string

param LogUri

This field is reserved for future use.

type Role

string

param Role

[REQUIRED]

The name or ARN of the IAM role associated with this job.

type ExecutionProperty

dict

param ExecutionProperty

An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

  • MaxConcurrentRuns (integer) --

    The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

type Command

dict

param Command

[REQUIRED]

The JobCommand that executes this job.

  • Name (string) --

    The name of the job command: this must be glueetl .

  • ScriptLocation (string) --

    Specifies the S3 path to a script that executes a job (required).

type DefaultArguments

dict

param DefaultArguments

The default arguments for this job.

You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

  • (string) --

    • (string) --

type Connections

dict

param Connections

The connections used for this job.

  • Connections (list) --

    A list of connections used by the job.

    • (string) --

type MaxRetries

integer

param MaxRetries

The maximum number of times to retry this job if it fails.

type AllocatedCapacity

integer

param AllocatedCapacity

The number of AWS Glue data processing units (DPUs) to allocate to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

type Timeout

integer

param Timeout

The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

type NotificationProperty

dict

param NotificationProperty

Specifies configuration properties of a job notification.

  • NotifyDelayAfter (integer) --

    After a job run starts, the number of minutes to wait before sending a job run delay notification.

type SecurityConfiguration

string

param SecurityConfiguration

The name of the SecurityConfiguration structure to be used with this job.

rtype

dict

returns

Response Syntax

{
    'Name': 'string'
}

Response Structure

  • (dict) --

    • Name (string) --

      The unique name that was provided for this job definition.

CreateTrigger (updated) Link ¶
Changes (request)
{'Actions': {'SecurityConfiguration': 'string'}}

Creates a new trigger.

See also: AWS API Documentation

Request Syntax

client.create_trigger(
    Name='string',
    Type='SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
    Schedule='string',
    Predicate={
        'Logical': 'AND'|'ANY',
        'Conditions': [
            {
                'LogicalOperator': 'EQUALS',
                'JobName': 'string',
                'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
            },
        ]
    },
    Actions=[
        {
            'JobName': 'string',
            'Arguments': {
                'string': 'string'
            },
            'Timeout': 123,
            'NotificationProperty': {
                'NotifyDelayAfter': 123
            },
            'SecurityConfiguration': 'string'
        },
    ],
    Description='string',
    StartOnCreation=True|False
)
type Name

string

param Name

[REQUIRED]

The name of the trigger.

type Type

string

param Type

[REQUIRED]

The type of the new trigger.

type Schedule

string

param Schedule

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

This field is required when the trigger type is SCHEDULED.

type Predicate

dict

param Predicate

A predicate to specify when the new trigger should fire.

This field is required when the trigger type is CONDITIONAL.

  • Logical (string) --

    Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

  • Conditions (list) --

    A list of the conditions that determine when the trigger will fire.

    • (dict) --

      Defines a condition under which a trigger fires.

      • LogicalOperator (string) --

        A logical operator.

      • JobName (string) --

        The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

      • State (string) --

        The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.

type Actions

list

param Actions

[REQUIRED]

The actions initiated by this trigger when it fires.

  • (dict) --

    Defines an action to be initiated by a trigger.

    • JobName (string) --

      The name of a job to be executed.

    • Arguments (dict) --

      Arguments to be passed to the job run.

      You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

      For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

      For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

      • (string) --

        • (string) --

    • Timeout (integer) --

      The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

    • NotificationProperty (dict) --

      Specifies configuration properties of a job run notification.

      • NotifyDelayAfter (integer) --

        After a job run starts, the number of minutes to wait before sending a job run delay notification.

    • SecurityConfiguration (string) --

      The name of the SecurityConfiguration structure to be used with this action.

type Description

string

param Description

A description of the new trigger.

type StartOnCreation

boolean

param StartOnCreation

Set to true to start SCHEDULED and CONDITIONAL triggers when created. True not supported for ON_DEMAND triggers.

rtype

dict

returns

Response Syntax

{
    'Name': 'string'
}

Response Structure

  • (dict) --

    • Name (string) --

      The name of the trigger.

GetCrawler (updated) Link ¶
Changes (response)
{'Crawler': {'CrawlerSecurityConfiguration': 'string'}}

Retrieves metadata for a specified crawler.

See also: AWS API Documentation

Request Syntax

client.get_crawler(
    Name='string'
)
type Name

string

param Name

[REQUIRED]

Name of the crawler to retrieve metadata for.

rtype

dict

returns

Response Syntax

{
    'Crawler': {
        'Name': 'string',
        'Role': 'string',
        'Targets': {
            'S3Targets': [
                {
                    'Path': 'string',
                    'Exclusions': [
                        'string',
                    ]
                },
            ],
            'JdbcTargets': [
                {
                    'ConnectionName': 'string',
                    'Path': 'string',
                    'Exclusions': [
                        'string',
                    ]
                },
            ],
            'DynamoDBTargets': [
                {
                    'Path': 'string'
                },
            ]
        },
        'DatabaseName': 'string',
        'Description': 'string',
        'Classifiers': [
            'string',
        ],
        'SchemaChangePolicy': {
            'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
            'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
        },
        'State': 'READY'|'RUNNING'|'STOPPING',
        'TablePrefix': 'string',
        'Schedule': {
            'ScheduleExpression': 'string',
            'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING'
        },
        'CrawlElapsedTime': 123,
        'CreationTime': datetime(2015, 1, 1),
        'LastUpdated': datetime(2015, 1, 1),
        'LastCrawl': {
            'Status': 'SUCCEEDED'|'CANCELLED'|'FAILED',
            'ErrorMessage': 'string',
            'LogGroup': 'string',
            'LogStream': 'string',
            'MessagePrefix': 'string',
            'StartTime': datetime(2015, 1, 1)
        },
        'Version': 123,
        'Configuration': 'string',
        'CrawlerSecurityConfiguration': 'string'
    }
}

Response Structure

  • (dict) --

    • Crawler (dict) --

      The metadata for the specified crawler.

      • Name (string) --

        The crawler name.

      • Role (string) --

        The IAM role (or ARN of an IAM role) used to access customer resources, such as data in Amazon S3.

      • Targets (dict) --

        A collection of targets to crawl.

        • S3Targets (list) --

          Specifies Amazon S3 targets.

          • (dict) --

            Specifies a data store in Amazon S3.

            • Path (string) --

              The path to the Amazon S3 target.

            • Exclusions (list) --

              A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

              • (string) --

        • JdbcTargets (list) --

          Specifies JDBC targets.

          • (dict) --

            Specifies a JDBC data store to crawl.

            • ConnectionName (string) --

              The name of the connection to use to connect to the JDBC target.

            • Path (string) --

              The path of the JDBC target.

            • Exclusions (list) --

              A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

              • (string) --

        • DynamoDBTargets (list) --

          Specifies DynamoDB targets.

          • (dict) --

            Specifies a DynamoDB table to crawl.

            • Path (string) --

              The name of the DynamoDB table to crawl.

      • DatabaseName (string) --

        The database where metadata is written by this crawler.

      • Description (string) --

        A description of the crawler.

      • Classifiers (list) --

        A list of custom classifiers associated with the crawler.

        • (string) --

      • SchemaChangePolicy (dict) --

        Sets the behavior when the crawler finds a changed or deleted object.

        • UpdateBehavior (string) --

          The update behavior when the crawler finds a changed schema.

        • DeleteBehavior (string) --

          The deletion behavior when the crawler finds a deleted object.

      • State (string) --

        Indicates whether the crawler is running, or whether a run is pending.

      • TablePrefix (string) --

        The prefix added to the names of tables that are created.

      • Schedule (dict) --

        For scheduled crawlers, the schedule when the crawler runs.

        • ScheduleExpression (string) --

          A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

        • State (string) --

          The state of the schedule.

      • CrawlElapsedTime (integer) --

        If the crawler is running, contains the total time elapsed since the last crawl began.

      • CreationTime (datetime) --

        The time when the crawler was created.

      • LastUpdated (datetime) --

        The time the crawler was last updated.

      • LastCrawl (dict) --

        The status of the last crawl, and potentially error information if an error occurred.

        • Status (string) --

          Status of the last crawl.

        • ErrorMessage (string) --

          If an error occurred, the error information about the last crawl.

        • LogGroup (string) --

          The log group for the last crawl.

        • LogStream (string) --

          The log stream for the last crawl.

        • MessagePrefix (string) --

          The prefix for a message about this crawl.

        • StartTime (datetime) --

          The time at which the crawl started.

      • Version (integer) --

        The version of the crawler.

      • Configuration (string) --

        Crawler configuration information. This versioned JSON string allows users to specify aspects of a Crawler's behavior.

        You can use this field to force partitions to inherit metadata such as classification, input format, output format, serde information, and schema from their parent table, rather than detect this information separately for each partition. Use the following JSON string to specify that behavior:

        Example: '{ "Version": 1.0, "CrawlerOutput": { "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" } } }'

      • CrawlerSecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used by this Crawler.

GetCrawlers (updated) Link ¶
Changes (response)
{'Crawlers': {'CrawlerSecurityConfiguration': 'string'}}

Retrieves metadata for all crawlers defined in the customer account.

See also: AWS API Documentation

Request Syntax

client.get_crawlers(
    MaxResults=123,
    NextToken='string'
)
type MaxResults

integer

param MaxResults

The number of crawlers to return on each call.

type NextToken

string

param NextToken

A continuation token, if this is a continuation request.

rtype

dict

returns

Response Syntax

{
    'Crawlers': [
        {
            'Name': 'string',
            'Role': 'string',
            'Targets': {
                'S3Targets': [
                    {
                        'Path': 'string',
                        'Exclusions': [
                            'string',
                        ]
                    },
                ],
                'JdbcTargets': [
                    {
                        'ConnectionName': 'string',
                        'Path': 'string',
                        'Exclusions': [
                            'string',
                        ]
                    },
                ],
                'DynamoDBTargets': [
                    {
                        'Path': 'string'
                    },
                ]
            },
            'DatabaseName': 'string',
            'Description': 'string',
            'Classifiers': [
                'string',
            ],
            'SchemaChangePolicy': {
                'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
                'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
            },
            'State': 'READY'|'RUNNING'|'STOPPING',
            'TablePrefix': 'string',
            'Schedule': {
                'ScheduleExpression': 'string',
                'State': 'SCHEDULED'|'NOT_SCHEDULED'|'TRANSITIONING'
            },
            'CrawlElapsedTime': 123,
            'CreationTime': datetime(2015, 1, 1),
            'LastUpdated': datetime(2015, 1, 1),
            'LastCrawl': {
                'Status': 'SUCCEEDED'|'CANCELLED'|'FAILED',
                'ErrorMessage': 'string',
                'LogGroup': 'string',
                'LogStream': 'string',
                'MessagePrefix': 'string',
                'StartTime': datetime(2015, 1, 1)
            },
            'Version': 123,
            'Configuration': 'string',
            'CrawlerSecurityConfiguration': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Crawlers (list) --

      A list of crawler metadata.

      • (dict) --

        Specifies a crawler program that examines a data source and uses classifiers to try to determine its schema. If successful, the crawler records metadata concerning the data source in the AWS Glue Data Catalog.

        • Name (string) --

          The crawler name.

        • Role (string) --

          The IAM role (or ARN of an IAM role) used to access customer resources, such as data in Amazon S3.

        • Targets (dict) --

          A collection of targets to crawl.

          • S3Targets (list) --

            Specifies Amazon S3 targets.

            • (dict) --

              Specifies a data store in Amazon S3.

              • Path (string) --

                The path to the Amazon S3 target.

              • Exclusions (list) --

                A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

                • (string) --

          • JdbcTargets (list) --

            Specifies JDBC targets.

            • (dict) --

              Specifies a JDBC data store to crawl.

              • ConnectionName (string) --

                The name of the connection to use to connect to the JDBC target.

              • Path (string) --

                The path of the JDBC target.

              • Exclusions (list) --

                A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

                • (string) --

          • DynamoDBTargets (list) --

            Specifies DynamoDB targets.

            • (dict) --

              Specifies a DynamoDB table to crawl.

              • Path (string) --

                The name of the DynamoDB table to crawl.

        • DatabaseName (string) --

          The database where metadata is written by this crawler.

        • Description (string) --

          A description of the crawler.

        • Classifiers (list) --

          A list of custom classifiers associated with the crawler.

          • (string) --

        • SchemaChangePolicy (dict) --

          Sets the behavior when the crawler finds a changed or deleted object.

          • UpdateBehavior (string) --

            The update behavior when the crawler finds a changed schema.

          • DeleteBehavior (string) --

            The deletion behavior when the crawler finds a deleted object.

        • State (string) --

          Indicates whether the crawler is running, or whether a run is pending.

        • TablePrefix (string) --

          The prefix added to the names of tables that are created.

        • Schedule (dict) --

          For scheduled crawlers, the schedule when the crawler runs.

          • ScheduleExpression (string) --

            A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

          • State (string) --

            The state of the schedule.

        • CrawlElapsedTime (integer) --

          If the crawler is running, contains the total time elapsed since the last crawl began.

        • CreationTime (datetime) --

          The time when the crawler was created.

        • LastUpdated (datetime) --

          The time the crawler was last updated.

        • LastCrawl (dict) --

          The status of the last crawl, and potentially error information if an error occurred.

          • Status (string) --

            Status of the last crawl.

          • ErrorMessage (string) --

            If an error occurred, the error information about the last crawl.

          • LogGroup (string) --

            The log group for the last crawl.

          • LogStream (string) --

            The log stream for the last crawl.

          • MessagePrefix (string) --

            The prefix for a message about this crawl.

          • StartTime (datetime) --

            The time at which the crawl started.

        • Version (integer) --

          The version of the crawler.

        • Configuration (string) --

          Crawler configuration information. This versioned JSON string allows users to specify aspects of a Crawler's behavior.

          You can use this field to force partitions to inherit metadata such as classification, input format, output format, serde information, and schema from their parent table, rather than detect this information separately for each partition. Use the following JSON string to specify that behavior:

          Example: '{ "Version": 1.0, "CrawlerOutput": { "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" } } }'

        • CrawlerSecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used by this Crawler.

    • NextToken (string) --

      A continuation token, if the returned list has not reached the end of those defined in this customer account.

GetDevEndpoint (updated) Link ¶
Changes (response)
{'DevEndpoint': {'SecurityConfiguration': 'string'}}

Retrieves information about a specified DevEndpoint.

See also: AWS API Documentation

Request Syntax

client.get_dev_endpoint(
    EndpointName='string'
)
type EndpointName

string

param EndpointName

[REQUIRED]

Name of the DevEndpoint for which to retrieve information.

rtype

dict

returns

Response Syntax

{
    'DevEndpoint': {
        'EndpointName': 'string',
        'RoleArn': 'string',
        'SecurityGroupIds': [
            'string',
        ],
        'SubnetId': 'string',
        'YarnEndpointAddress': 'string',
        'PrivateAddress': 'string',
        'ZeppelinRemoteSparkInterpreterPort': 123,
        'PublicAddress': 'string',
        'Status': 'string',
        'NumberOfNodes': 123,
        'AvailabilityZone': 'string',
        'VpcId': 'string',
        'ExtraPythonLibsS3Path': 'string',
        'ExtraJarsS3Path': 'string',
        'FailureReason': 'string',
        'LastUpdateStatus': 'string',
        'CreatedTimestamp': datetime(2015, 1, 1),
        'LastModifiedTimestamp': datetime(2015, 1, 1),
        'PublicKey': 'string',
        'PublicKeys': [
            'string',
        ],
        'SecurityConfiguration': 'string'
    }
}

Response Structure

  • (dict) --

    • DevEndpoint (dict) --

      A DevEndpoint definition.

      • EndpointName (string) --

        The name of the DevEndpoint.

      • RoleArn (string) --

        The AWS ARN of the IAM role used in this DevEndpoint.

      • SecurityGroupIds (list) --

        A list of security group identifiers used in this DevEndpoint.

        • (string) --

      • SubnetId (string) --

        The subnet ID for this DevEndpoint.

      • YarnEndpointAddress (string) --

        The YARN endpoint address used by this DevEndpoint.

      • PrivateAddress (string) --

        A private DNS to access the DevEndpoint within a VPC, if the DevEndpoint is created within one.

      • ZeppelinRemoteSparkInterpreterPort (integer) --

        The Apache Zeppelin port for the remote Apache Spark interpreter.

      • PublicAddress (string) --

        The public VPC address used by this DevEndpoint.

      • Status (string) --

        The current status of this DevEndpoint.

      • NumberOfNodes (integer) --

        The number of AWS Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

      • AvailabilityZone (string) --

        The AWS availability zone where this DevEndpoint is located.

      • VpcId (string) --

        The ID of the virtual private cloud (VPC) used by this DevEndpoint.

      • ExtraPythonLibsS3Path (string) --

        Path(s) to one or more Python libraries in an S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

        Please note that only pure Python libraries can currently be used on a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported.

      • ExtraJarsS3Path (string) --

        Path to one or more Java Jars in an S3 bucket that should be loaded in your DevEndpoint.

        Please note that only pure Java/Scala libraries can currently be used on a DevEndpoint.

      • FailureReason (string) --

        The reason for a current failure in this DevEndpoint.

      • LastUpdateStatus (string) --

        The status of the last update.

      • CreatedTimestamp (datetime) --

        The point in time at which this DevEndpoint was created.

      • LastModifiedTimestamp (datetime) --

        The point in time at which this DevEndpoint was last modified.

      • PublicKey (string) --

        The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility, as the recommended attribute to use is public keys.

      • PublicKeys (list) --

        A list of public keys to be used by the DevEndpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

        Note

        If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys: call the UpdateDevEndpoint API with the public key content in the deletePublicKeys attribute, and the list of new keys in the addPublicKeys attribute.

        • (string) --

      • SecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used with this DevEndpoint.

GetDevEndpoints (updated) Link ¶
Changes (response)
{'DevEndpoints': {'SecurityConfiguration': 'string'}}

Retrieves all the DevEndpoints in this AWS account.

See also: AWS API Documentation

Request Syntax

client.get_dev_endpoints(
    MaxResults=123,
    NextToken='string'
)
type MaxResults

integer

param MaxResults

The maximum size of information to return.

type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

rtype

dict

returns

Response Syntax

{
    'DevEndpoints': [
        {
            'EndpointName': 'string',
            'RoleArn': 'string',
            'SecurityGroupIds': [
                'string',
            ],
            'SubnetId': 'string',
            'YarnEndpointAddress': 'string',
            'PrivateAddress': 'string',
            'ZeppelinRemoteSparkInterpreterPort': 123,
            'PublicAddress': 'string',
            'Status': 'string',
            'NumberOfNodes': 123,
            'AvailabilityZone': 'string',
            'VpcId': 'string',
            'ExtraPythonLibsS3Path': 'string',
            'ExtraJarsS3Path': 'string',
            'FailureReason': 'string',
            'LastUpdateStatus': 'string',
            'CreatedTimestamp': datetime(2015, 1, 1),
            'LastModifiedTimestamp': datetime(2015, 1, 1),
            'PublicKey': 'string',
            'PublicKeys': [
                'string',
            ],
            'SecurityConfiguration': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • DevEndpoints (list) --

      A list of DevEndpoint definitions.

      • (dict) --

        A development endpoint where a developer can remotely debug ETL scripts.

        • EndpointName (string) --

          The name of the DevEndpoint.

        • RoleArn (string) --

          The AWS ARN of the IAM role used in this DevEndpoint.

        • SecurityGroupIds (list) --

          A list of security group identifiers used in this DevEndpoint.

          • (string) --

        • SubnetId (string) --

          The subnet ID for this DevEndpoint.

        • YarnEndpointAddress (string) --

          The YARN endpoint address used by this DevEndpoint.

        • PrivateAddress (string) --

          A private DNS to access the DevEndpoint within a VPC, if the DevEndpoint is created within one.

        • ZeppelinRemoteSparkInterpreterPort (integer) --

          The Apache Zeppelin port for the remote Apache Spark interpreter.

        • PublicAddress (string) --

          The public VPC address used by this DevEndpoint.

        • Status (string) --

          The current status of this DevEndpoint.

        • NumberOfNodes (integer) --

          The number of AWS Glue Data Processing Units (DPUs) allocated to this DevEndpoint.

        • AvailabilityZone (string) --

          The AWS availability zone where this DevEndpoint is located.

        • VpcId (string) --

          The ID of the virtual private cloud (VPC) used by this DevEndpoint.

        • ExtraPythonLibsS3Path (string) --

          Path(s) to one or more Python libraries in an S3 bucket that should be loaded in your DevEndpoint. Multiple values must be complete paths separated by a comma.

          Please note that only pure Python libraries can currently be used on a DevEndpoint. Libraries that rely on C extensions, such as the pandas Python data analysis library, are not yet supported.

        • ExtraJarsS3Path (string) --

          Path to one or more Java Jars in an S3 bucket that should be loaded in your DevEndpoint.

          Please note that only pure Java/Scala libraries can currently be used on a DevEndpoint.

        • FailureReason (string) --

          The reason for a current failure in this DevEndpoint.

        • LastUpdateStatus (string) --

          The status of the last update.

        • CreatedTimestamp (datetime) --

          The point in time at which this DevEndpoint was created.

        • LastModifiedTimestamp (datetime) --

          The point in time at which this DevEndpoint was last modified.

        • PublicKey (string) --

          The public key to be used by this DevEndpoint for authentication. This attribute is provided for backward compatibility, as the recommended attribute to use is public keys.

        • PublicKeys (list) --

          A list of public keys to be used by the DevEndpoints for authentication. The use of this attribute is preferred over a single public key because the public keys allow you to have a different private key per client.

          Note

          If you previously created an endpoint with a public key, you must remove that key to be able to set a list of public keys: call the UpdateDevEndpoint API with the public key content in the deletePublicKeys attribute, and the list of new keys in the addPublicKeys attribute.

          • (string) --

        • SecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used with this DevEndpoint.

    • NextToken (string) --

      A continuation token, if not all DevEndpoint definitions have yet been returned.

GetJob (updated) Link ¶
Changes (response)
{'Job': {'SecurityConfiguration': 'string'}}

Retrieves an existing job definition.

See also: AWS API Documentation

Request Syntax

client.get_job(
    JobName='string'
)
type JobName

string

param JobName

[REQUIRED]

The name of the job definition to retrieve.

rtype

dict

returns

Response Syntax

{
    'Job': {
        'Name': 'string',
        'Description': 'string',
        'LogUri': 'string',
        'Role': 'string',
        'CreatedOn': datetime(2015, 1, 1),
        'LastModifiedOn': datetime(2015, 1, 1),
        'ExecutionProperty': {
            'MaxConcurrentRuns': 123
        },
        'Command': {
            'Name': 'string',
            'ScriptLocation': 'string'
        },
        'DefaultArguments': {
            'string': 'string'
        },
        'Connections': {
            'Connections': [
                'string',
            ]
        },
        'MaxRetries': 123,
        'AllocatedCapacity': 123,
        'Timeout': 123,
        'NotificationProperty': {
            'NotifyDelayAfter': 123
        },
        'SecurityConfiguration': 'string'
    }
}

Response Structure

  • (dict) --

    • Job (dict) --

      The requested job definition.

      • Name (string) --

        The name you assign to this job definition.

      • Description (string) --

        Description of the job being defined.

      • LogUri (string) --

        This field is reserved for future use.

      • Role (string) --

        The name or ARN of the IAM role associated with this job.

      • CreatedOn (datetime) --

        The time and date that this job definition was created.

      • LastModifiedOn (datetime) --

        The last point in time when this job definition was modified.

      • ExecutionProperty (dict) --

        An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

        • MaxConcurrentRuns (integer) --

          The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

      • Command (dict) --

        The JobCommand that executes this job.

        • Name (string) --

          The name of the job command: this must be glueetl .

        • ScriptLocation (string) --

          Specifies the S3 path to a script that executes a job (required).

      • DefaultArguments (dict) --

        The default arguments for this job, specified as name-value pairs.

        You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

        For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

        For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

        • (string) --

          • (string) --

      • Connections (dict) --

        The connections used for this job.

        • Connections (list) --

          A list of connections used by the job.

          • (string) --

      • MaxRetries (integer) --

        The maximum number of times to retry this job after a JobRun fails.

      • AllocatedCapacity (integer) --

        The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

      • Timeout (integer) --

        The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

      • NotificationProperty (dict) --

        Specifies configuration properties of a job notification.

        • NotifyDelayAfter (integer) --

          After a job run starts, the number of minutes to wait before sending a job run delay notification.

      • SecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used with this job.

GetJobRun (updated) Link ¶
Changes (response)
{'JobRun': {'LogGroupName': 'string', 'SecurityConfiguration': 'string'}}

Retrieves the metadata for a given job run.

See also: AWS API Documentation

Request Syntax

client.get_job_run(
    JobName='string',
    RunId='string',
    PredecessorsIncluded=True|False
)
type JobName

string

param JobName

[REQUIRED]

Name of the job definition being run.

type RunId

string

param RunId

[REQUIRED]

The ID of the job run.

type PredecessorsIncluded

boolean

param PredecessorsIncluded

True if a list of predecessor runs should be returned.

rtype

dict

returns

Response Syntax

{
    'JobRun': {
        'Id': 'string',
        'Attempt': 123,
        'PreviousRunId': 'string',
        'TriggerName': 'string',
        'JobName': 'string',
        'StartedOn': datetime(2015, 1, 1),
        'LastModifiedOn': datetime(2015, 1, 1),
        'CompletedOn': datetime(2015, 1, 1),
        'JobRunState': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT',
        'Arguments': {
            'string': 'string'
        },
        'ErrorMessage': 'string',
        'PredecessorRuns': [
            {
                'JobName': 'string',
                'RunId': 'string'
            },
        ],
        'AllocatedCapacity': 123,
        'ExecutionTime': 123,
        'Timeout': 123,
        'NotificationProperty': {
            'NotifyDelayAfter': 123
        },
        'SecurityConfiguration': 'string',
        'LogGroupName': 'string'
    }
}

Response Structure

  • (dict) --

    • JobRun (dict) --

      The requested job-run metadata.

      • Id (string) --

        The ID of this job run.

      • Attempt (integer) --

        The number of the attempt to run this job.

      • PreviousRunId (string) --

        The ID of the previous run of this job. For example, the JobRunId specified in the StartJobRun action.

      • TriggerName (string) --

        The name of the trigger that started this job run.

      • JobName (string) --

        The name of the job definition being used in this run.

      • StartedOn (datetime) --

        The date and time at which this job run was started.

      • LastModifiedOn (datetime) --

        The last time this job run was modified.

      • CompletedOn (datetime) --

        The date and time this job run completed.

      • JobRunState (string) --

        The current state of the job run.

      • Arguments (dict) --

        The job arguments associated with this run. These override equivalent default arguments set for the job.

        You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

        For information about how to specify and consume your own job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

        For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

        • (string) --

          • (string) --

      • ErrorMessage (string) --

        An error message associated with this job run.

      • PredecessorRuns (list) --

        A list of predecessors to this job run.

        • (dict) --

          A job run that was used in the predicate of a conditional trigger that triggered this job run.

          • JobName (string) --

            The name of the job definition used by the predecessor job run.

          • RunId (string) --

            The job-run ID of the predecessor job run.

      • AllocatedCapacity (integer) --

        The number of AWS Glue data processing units (DPUs) allocated to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

      • ExecutionTime (integer) --

        The amount of time (in seconds) that the job run consumed resources.

      • Timeout (integer) --

        The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

      • NotificationProperty (dict) --

        Specifies configuration properties of a job run notification.

        • NotifyDelayAfter (integer) --

          After a job run starts, the number of minutes to wait before sending a job run delay notification.

      • SecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used with this job run.

      • LogGroupName (string) --

        The name of the log group for secure logging, that can be server-side encrypted in CloudWatch using KMS. This name can be /aws-glue/jobs/ , in which case the default encryption is NONE . If you add a role name and SecurityConfiguration name (in other words, /aws-glue/jobs-yourRoleName-yourSecurityConfigurationName/ ), then that security configuration will be used to encrypt the log group.

GetJobRuns (updated) Link ¶
Changes (response)
{'JobRuns': {'LogGroupName': 'string', 'SecurityConfiguration': 'string'}}

Retrieves metadata for all runs of a given job definition.

See also: AWS API Documentation

Request Syntax

client.get_job_runs(
    JobName='string',
    NextToken='string',
    MaxResults=123
)
type JobName

string

param JobName

[REQUIRED]

The name of the job definition for which to retrieve all job runs.

type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

type MaxResults

integer

param MaxResults

The maximum size of the response.

rtype

dict

returns

Response Syntax

{
    'JobRuns': [
        {
            'Id': 'string',
            'Attempt': 123,
            'PreviousRunId': 'string',
            'TriggerName': 'string',
            'JobName': 'string',
            'StartedOn': datetime(2015, 1, 1),
            'LastModifiedOn': datetime(2015, 1, 1),
            'CompletedOn': datetime(2015, 1, 1),
            'JobRunState': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT',
            'Arguments': {
                'string': 'string'
            },
            'ErrorMessage': 'string',
            'PredecessorRuns': [
                {
                    'JobName': 'string',
                    'RunId': 'string'
                },
            ],
            'AllocatedCapacity': 123,
            'ExecutionTime': 123,
            'Timeout': 123,
            'NotificationProperty': {
                'NotifyDelayAfter': 123
            },
            'SecurityConfiguration': 'string',
            'LogGroupName': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • JobRuns (list) --

      A list of job-run metatdata objects.

      • (dict) --

        Contains information about a job run.

        • Id (string) --

          The ID of this job run.

        • Attempt (integer) --

          The number of the attempt to run this job.

        • PreviousRunId (string) --

          The ID of the previous run of this job. For example, the JobRunId specified in the StartJobRun action.

        • TriggerName (string) --

          The name of the trigger that started this job run.

        • JobName (string) --

          The name of the job definition being used in this run.

        • StartedOn (datetime) --

          The date and time at which this job run was started.

        • LastModifiedOn (datetime) --

          The last time this job run was modified.

        • CompletedOn (datetime) --

          The date and time this job run completed.

        • JobRunState (string) --

          The current state of the job run.

        • Arguments (dict) --

          The job arguments associated with this run. These override equivalent default arguments set for the job.

          You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

          For information about how to specify and consume your own job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

          For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

          • (string) --

            • (string) --

        • ErrorMessage (string) --

          An error message associated with this job run.

        • PredecessorRuns (list) --

          A list of predecessors to this job run.

          • (dict) --

            A job run that was used in the predicate of a conditional trigger that triggered this job run.

            • JobName (string) --

              The name of the job definition used by the predecessor job run.

            • RunId (string) --

              The job-run ID of the predecessor job run.

        • AllocatedCapacity (integer) --

          The number of AWS Glue data processing units (DPUs) allocated to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

        • ExecutionTime (integer) --

          The amount of time (in seconds) that the job run consumed resources.

        • Timeout (integer) --

          The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

        • NotificationProperty (dict) --

          Specifies configuration properties of a job run notification.

          • NotifyDelayAfter (integer) --

            After a job run starts, the number of minutes to wait before sending a job run delay notification.

        • SecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used with this job run.

        • LogGroupName (string) --

          The name of the log group for secure logging, that can be server-side encrypted in CloudWatch using KMS. This name can be /aws-glue/jobs/ , in which case the default encryption is NONE . If you add a role name and SecurityConfiguration name (in other words, /aws-glue/jobs-yourRoleName-yourSecurityConfigurationName/ ), then that security configuration will be used to encrypt the log group.

    • NextToken (string) --

      A continuation token, if not all reequested job runs have been returned.

GetJobs (updated) Link ¶
Changes (response)
{'Jobs': {'SecurityConfiguration': 'string'}}

Retrieves all current job definitions.

See also: AWS API Documentation

Request Syntax

client.get_jobs(
    NextToken='string',
    MaxResults=123
)
type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

type MaxResults

integer

param MaxResults

The maximum size of the response.

rtype

dict

returns

Response Syntax

{
    'Jobs': [
        {
            'Name': 'string',
            'Description': 'string',
            'LogUri': 'string',
            'Role': 'string',
            'CreatedOn': datetime(2015, 1, 1),
            'LastModifiedOn': datetime(2015, 1, 1),
            'ExecutionProperty': {
                'MaxConcurrentRuns': 123
            },
            'Command': {
                'Name': 'string',
                'ScriptLocation': 'string'
            },
            'DefaultArguments': {
                'string': 'string'
            },
            'Connections': {
                'Connections': [
                    'string',
                ]
            },
            'MaxRetries': 123,
            'AllocatedCapacity': 123,
            'Timeout': 123,
            'NotificationProperty': {
                'NotifyDelayAfter': 123
            },
            'SecurityConfiguration': 'string'
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Jobs (list) --

      A list of job definitions.

      • (dict) --

        Specifies a job definition.

        • Name (string) --

          The name you assign to this job definition.

        • Description (string) --

          Description of the job being defined.

        • LogUri (string) --

          This field is reserved for future use.

        • Role (string) --

          The name or ARN of the IAM role associated with this job.

        • CreatedOn (datetime) --

          The time and date that this job definition was created.

        • LastModifiedOn (datetime) --

          The last point in time when this job definition was modified.

        • ExecutionProperty (dict) --

          An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

          • MaxConcurrentRuns (integer) --

            The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

        • Command (dict) --

          The JobCommand that executes this job.

          • Name (string) --

            The name of the job command: this must be glueetl .

          • ScriptLocation (string) --

            Specifies the S3 path to a script that executes a job (required).

        • DefaultArguments (dict) --

          The default arguments for this job, specified as name-value pairs.

          You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

          For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

          For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

          • (string) --

            • (string) --

        • Connections (dict) --

          The connections used for this job.

          • Connections (list) --

            A list of connections used by the job.

            • (string) --

        • MaxRetries (integer) --

          The maximum number of times to retry this job after a JobRun fails.

        • AllocatedCapacity (integer) --

          The number of AWS Glue data processing units (DPUs) allocated to runs of this job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

        • Timeout (integer) --

          The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

        • NotificationProperty (dict) --

          Specifies configuration properties of a job notification.

          • NotifyDelayAfter (integer) --

            After a job run starts, the number of minutes to wait before sending a job run delay notification.

        • SecurityConfiguration (string) --

          The name of the SecurityConfiguration structure to be used with this job.

    • NextToken (string) --

      A continuation token, if not all job definitions have yet been returned.

GetTrigger (updated) Link ¶
Changes (response)
{'Trigger': {'Actions': {'SecurityConfiguration': 'string'}}}

Retrieves the definition of a trigger.

See also: AWS API Documentation

Request Syntax

client.get_trigger(
    Name='string'
)
type Name

string

param Name

[REQUIRED]

The name of the trigger to retrieve.

rtype

dict

returns

Response Syntax

{
    'Trigger': {
        'Name': 'string',
        'Id': 'string',
        'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
        'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
        'Description': 'string',
        'Schedule': 'string',
        'Actions': [
            {
                'JobName': 'string',
                'Arguments': {
                    'string': 'string'
                },
                'Timeout': 123,
                'NotificationProperty': {
                    'NotifyDelayAfter': 123
                },
                'SecurityConfiguration': 'string'
            },
        ],
        'Predicate': {
            'Logical': 'AND'|'ANY',
            'Conditions': [
                {
                    'LogicalOperator': 'EQUALS',
                    'JobName': 'string',
                    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
                },
            ]
        }
    }
}

Response Structure

  • (dict) --

    • Trigger (dict) --

      The requested trigger definition.

      • Name (string) --

        Name of the trigger.

      • Id (string) --

        Reserved for future use.

      • Type (string) --

        The type of trigger that this is.

      • State (string) --

        The current state of the trigger.

      • Description (string) --

        A description of this trigger.

      • Schedule (string) --

        A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

      • Actions (list) --

        The actions initiated by this trigger.

        • (dict) --

          Defines an action to be initiated by a trigger.

          • JobName (string) --

            The name of a job to be executed.

          • Arguments (dict) --

            Arguments to be passed to the job run.

            You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

            For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

            For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

            • (string) --

              • (string) --

          • Timeout (integer) --

            The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

          • NotificationProperty (dict) --

            Specifies configuration properties of a job run notification.

            • NotifyDelayAfter (integer) --

              After a job run starts, the number of minutes to wait before sending a job run delay notification.

          • SecurityConfiguration (string) --

            The name of the SecurityConfiguration structure to be used with this action.

      • Predicate (dict) --

        The predicate of this trigger, which defines when it will fire.

        • Logical (string) --

          Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

        • Conditions (list) --

          A list of the conditions that determine when the trigger will fire.

          • (dict) --

            Defines a condition under which a trigger fires.

            • LogicalOperator (string) --

              A logical operator.

            • JobName (string) --

              The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

            • State (string) --

              The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.

GetTriggers (updated) Link ¶
Changes (response)
{'Triggers': {'Actions': {'SecurityConfiguration': 'string'}}}

Gets all the triggers associated with a job.

See also: AWS API Documentation

Request Syntax

client.get_triggers(
    NextToken='string',
    DependentJobName='string',
    MaxResults=123
)
type NextToken

string

param NextToken

A continuation token, if this is a continuation call.

type DependentJobName

string

param DependentJobName

The name of the job for which to retrieve triggers. The trigger that can start this job will be returned, and if there is no such trigger, all triggers will be returned.

type MaxResults

integer

param MaxResults

The maximum size of the response.

rtype

dict

returns

Response Syntax

{
    'Triggers': [
        {
            'Name': 'string',
            'Id': 'string',
            'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
            'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
            'Description': 'string',
            'Schedule': 'string',
            'Actions': [
                {
                    'JobName': 'string',
                    'Arguments': {
                        'string': 'string'
                    },
                    'Timeout': 123,
                    'NotificationProperty': {
                        'NotifyDelayAfter': 123
                    },
                    'SecurityConfiguration': 'string'
                },
            ],
            'Predicate': {
                'Logical': 'AND'|'ANY',
                'Conditions': [
                    {
                        'LogicalOperator': 'EQUALS',
                        'JobName': 'string',
                        'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
                    },
                ]
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Triggers (list) --

      A list of triggers for the specified job.

      • (dict) --

        Information about a specific trigger.

        • Name (string) --

          Name of the trigger.

        • Id (string) --

          Reserved for future use.

        • Type (string) --

          The type of trigger that this is.

        • State (string) --

          The current state of the trigger.

        • Description (string) --

          A description of this trigger.

        • Schedule (string) --

          A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

        • Actions (list) --

          The actions initiated by this trigger.

          • (dict) --

            Defines an action to be initiated by a trigger.

            • JobName (string) --

              The name of a job to be executed.

            • Arguments (dict) --

              Arguments to be passed to the job run.

              You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

              For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

              For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

              • (string) --

                • (string) --

            • Timeout (integer) --

              The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

            • NotificationProperty (dict) --

              Specifies configuration properties of a job run notification.

              • NotifyDelayAfter (integer) --

                After a job run starts, the number of minutes to wait before sending a job run delay notification.

            • SecurityConfiguration (string) --

              The name of the SecurityConfiguration structure to be used with this action.

        • Predicate (dict) --

          The predicate of this trigger, which defines when it will fire.

          • Logical (string) --

            Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

          • Conditions (list) --

            A list of the conditions that determine when the trigger will fire.

            • (dict) --

              Defines a condition under which a trigger fires.

              • LogicalOperator (string) --

                A logical operator.

              • JobName (string) --

                The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

              • State (string) --

                The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.

    • NextToken (string) --

      A continuation token, if not all the requested triggers have yet been returned.

StartJobRun (updated) Link ¶
Changes (request)
{'SecurityConfiguration': 'string'}

Starts a job run using a job definition.

See also: AWS API Documentation

Request Syntax

client.start_job_run(
    JobName='string',
    JobRunId='string',
    Arguments={
        'string': 'string'
    },
    AllocatedCapacity=123,
    Timeout=123,
    NotificationProperty={
        'NotifyDelayAfter': 123
    },
    SecurityConfiguration='string'
)
type JobName

string

param JobName

[REQUIRED]

The name of the job definition to use.

type JobRunId

string

param JobRunId

The ID of a previous JobRun to retry.

type Arguments

dict

param Arguments

The job arguments specifically for this run. They override the equivalent default arguments set for in the job definition itself.

You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

  • (string) --

    • (string) --

type AllocatedCapacity

integer

param AllocatedCapacity

The number of AWS Glue data processing units (DPUs) to allocate to this JobRun. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

type Timeout

integer

param Timeout

The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

type NotificationProperty

dict

param NotificationProperty

Specifies configuration properties of a job run notification.

  • NotifyDelayAfter (integer) --

    After a job run starts, the number of minutes to wait before sending a job run delay notification.

type SecurityConfiguration

string

param SecurityConfiguration

The name of the SecurityConfiguration structure to be used with this job run.

rtype

dict

returns

Response Syntax

{
    'JobRunId': 'string'
}

Response Structure

  • (dict) --

    • JobRunId (string) --

      The ID assigned to this job run.

UpdateCrawler (updated) Link ¶
Changes (request)
{'CrawlerSecurityConfiguration': 'string'}

Updates a crawler. If a crawler is running, you must stop it using StopCrawler before updating it.

See also: AWS API Documentation

Request Syntax

client.update_crawler(
    Name='string',
    Role='string',
    DatabaseName='string',
    Description='string',
    Targets={
        'S3Targets': [
            {
                'Path': 'string',
                'Exclusions': [
                    'string',
                ]
            },
        ],
        'JdbcTargets': [
            {
                'ConnectionName': 'string',
                'Path': 'string',
                'Exclusions': [
                    'string',
                ]
            },
        ],
        'DynamoDBTargets': [
            {
                'Path': 'string'
            },
        ]
    },
    Schedule='string',
    Classifiers=[
        'string',
    ],
    TablePrefix='string',
    SchemaChangePolicy={
        'UpdateBehavior': 'LOG'|'UPDATE_IN_DATABASE',
        'DeleteBehavior': 'LOG'|'DELETE_FROM_DATABASE'|'DEPRECATE_IN_DATABASE'
    },
    Configuration='string',
    CrawlerSecurityConfiguration='string'
)
type Name

string

param Name

[REQUIRED]

Name of the new crawler.

type Role

string

param Role

The IAM role (or ARN of an IAM role) used by the new crawler to access customer resources.

type DatabaseName

string

param DatabaseName

The AWS Glue database where results are stored, such as: arn:aws:daylight:us-east-1::database/sometable/* .

type Description

string

param Description

A description of the new crawler.

type Targets

dict

param Targets

A list of targets to crawl.

  • S3Targets (list) --

    Specifies Amazon S3 targets.

    • (dict) --

      Specifies a data store in Amazon S3.

      • Path (string) --

        The path to the Amazon S3 target.

      • Exclusions (list) --

        A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

        • (string) --

  • JdbcTargets (list) --

    Specifies JDBC targets.

    • (dict) --

      Specifies a JDBC data store to crawl.

      • ConnectionName (string) --

        The name of the connection to use to connect to the JDBC target.

      • Path (string) --

        The path of the JDBC target.

      • Exclusions (list) --

        A list of glob patterns used to exclude from the crawl. For more information, see Catalog Tables with a Crawler.

        • (string) --

  • DynamoDBTargets (list) --

    Specifies DynamoDB targets.

    • (dict) --

      Specifies a DynamoDB table to crawl.

      • Path (string) --

        The name of the DynamoDB table to crawl.

type Schedule

string

param Schedule

A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

type Classifiers

list

param Classifiers

A list of custom classifiers that the user has registered. By default, all built-in classifiers are included in a crawl, but these custom classifiers always override the default classifiers for a given classification.

  • (string) --

type TablePrefix

string

param TablePrefix

The table prefix used for catalog tables that are created.

type SchemaChangePolicy

dict

param SchemaChangePolicy

Policy for the crawler's update and deletion behavior.

  • UpdateBehavior (string) --

    The update behavior when the crawler finds a changed schema.

  • DeleteBehavior (string) --

    The deletion behavior when the crawler finds a deleted object.

type Configuration

string

param Configuration

Crawler configuration information. This versioned JSON string allows users to specify aspects of a Crawler's behavior.

You can use this field to force partitions to inherit metadata such as classification, input format, output format, serde information, and schema from their parent table, rather than detect this information separately for each partition. Use the following JSON string to specify that behavior:

Example: '{ "Version": 1.0, "CrawlerOutput": { "Partitions": { "AddOrUpdateBehavior": "InheritFromTable" } } }'

type CrawlerSecurityConfiguration

string

param CrawlerSecurityConfiguration

The name of the SecurityConfiguration structure to be used by this Crawler.

rtype

dict

returns

Response Syntax

{}

Response Structure

  • (dict) --

UpdateJob (updated) Link ¶
Changes (request)
{'JobUpdate': {'SecurityConfiguration': 'string'}}

Updates an existing job definition.

See also: AWS API Documentation

Request Syntax

client.update_job(
    JobName='string',
    JobUpdate={
        'Description': 'string',
        'LogUri': 'string',
        'Role': 'string',
        'ExecutionProperty': {
            'MaxConcurrentRuns': 123
        },
        'Command': {
            'Name': 'string',
            'ScriptLocation': 'string'
        },
        'DefaultArguments': {
            'string': 'string'
        },
        'Connections': {
            'Connections': [
                'string',
            ]
        },
        'MaxRetries': 123,
        'AllocatedCapacity': 123,
        'Timeout': 123,
        'NotificationProperty': {
            'NotifyDelayAfter': 123
        },
        'SecurityConfiguration': 'string'
    }
)
type JobName

string

param JobName

[REQUIRED]

Name of the job definition to update.

type JobUpdate

dict

param JobUpdate

[REQUIRED]

Specifies the values with which to update the job definition.

  • Description (string) --

    Description of the job being defined.

  • LogUri (string) --

    This field is reserved for future use.

  • Role (string) --

    The name or ARN of the IAM role associated with this job (required).

  • ExecutionProperty (dict) --

    An ExecutionProperty specifying the maximum number of concurrent runs allowed for this job.

    • MaxConcurrentRuns (integer) --

      The maximum number of concurrent runs allowed for the job. The default is 1. An error is returned when this threshold is reached. The maximum value you can specify is controlled by a service limit.

  • Command (dict) --

    The JobCommand that executes this job (required).

    • Name (string) --

      The name of the job command: this must be glueetl .

    • ScriptLocation (string) --

      Specifies the S3 path to a script that executes a job (required).

  • DefaultArguments (dict) --

    The default arguments for this job.

    You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

    For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

    For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

    • (string) --

      • (string) --

  • Connections (dict) --

    The connections used for this job.

    • Connections (list) --

      A list of connections used by the job.

      • (string) --

  • MaxRetries (integer) --

    The maximum number of times to retry this job if it fails.

  • AllocatedCapacity (integer) --

    The number of AWS Glue data processing units (DPUs) to allocate to this Job. From 2 to 100 DPUs can be allocated; the default is 10. A DPU is a relative measure of processing power that consists of 4 vCPUs of compute capacity and 16 GB of memory. For more information, see the AWS Glue pricing page.

  • Timeout (integer) --

    The job timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours).

  • NotificationProperty (dict) --

    Specifies configuration properties of a job notification.

    • NotifyDelayAfter (integer) --

      After a job run starts, the number of minutes to wait before sending a job run delay notification.

  • SecurityConfiguration (string) --

    The name of the SecurityConfiguration structure to be used with this job.

rtype

dict

returns

Response Syntax

{
    'JobName': 'string'
}

Response Structure

  • (dict) --

    • JobName (string) --

      Returns the name of the updated job definition.

UpdateTrigger (updated) Link ¶
Changes (request, response)
Request
{'TriggerUpdate': {'Actions': {'SecurityConfiguration': 'string'}}}
Response
{'Trigger': {'Actions': {'SecurityConfiguration': 'string'}}}

Updates a trigger definition.

See also: AWS API Documentation

Request Syntax

client.update_trigger(
    Name='string',
    TriggerUpdate={
        'Name': 'string',
        'Description': 'string',
        'Schedule': 'string',
        'Actions': [
            {
                'JobName': 'string',
                'Arguments': {
                    'string': 'string'
                },
                'Timeout': 123,
                'NotificationProperty': {
                    'NotifyDelayAfter': 123
                },
                'SecurityConfiguration': 'string'
            },
        ],
        'Predicate': {
            'Logical': 'AND'|'ANY',
            'Conditions': [
                {
                    'LogicalOperator': 'EQUALS',
                    'JobName': 'string',
                    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
                },
            ]
        }
    }
)
type Name

string

param Name

[REQUIRED]

The name of the trigger to update.

type TriggerUpdate

dict

param TriggerUpdate

[REQUIRED]

The new values with which to update the trigger.

  • Name (string) --

    Reserved for future use.

  • Description (string) --

    A description of this trigger.

  • Schedule (string) --

    A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

  • Actions (list) --

    The actions initiated by this trigger.

    • (dict) --

      Defines an action to be initiated by a trigger.

      • JobName (string) --

        The name of a job to be executed.

      • Arguments (dict) --

        Arguments to be passed to the job run.

        You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

        For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

        For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

        • (string) --

          • (string) --

      • Timeout (integer) --

        The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

      • NotificationProperty (dict) --

        Specifies configuration properties of a job run notification.

        • NotifyDelayAfter (integer) --

          After a job run starts, the number of minutes to wait before sending a job run delay notification.

      • SecurityConfiguration (string) --

        The name of the SecurityConfiguration structure to be used with this action.

  • Predicate (dict) --

    The predicate of this trigger, which defines when it will fire.

    • Logical (string) --

      Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

    • Conditions (list) --

      A list of the conditions that determine when the trigger will fire.

      • (dict) --

        Defines a condition under which a trigger fires.

        • LogicalOperator (string) --

          A logical operator.

        • JobName (string) --

          The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

        • State (string) --

          The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.

rtype

dict

returns

Response Syntax

{
    'Trigger': {
        'Name': 'string',
        'Id': 'string',
        'Type': 'SCHEDULED'|'CONDITIONAL'|'ON_DEMAND',
        'State': 'CREATING'|'CREATED'|'ACTIVATING'|'ACTIVATED'|'DEACTIVATING'|'DEACTIVATED'|'DELETING'|'UPDATING',
        'Description': 'string',
        'Schedule': 'string',
        'Actions': [
            {
                'JobName': 'string',
                'Arguments': {
                    'string': 'string'
                },
                'Timeout': 123,
                'NotificationProperty': {
                    'NotifyDelayAfter': 123
                },
                'SecurityConfiguration': 'string'
            },
        ],
        'Predicate': {
            'Logical': 'AND'|'ANY',
            'Conditions': [
                {
                    'LogicalOperator': 'EQUALS',
                    'JobName': 'string',
                    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT'
                },
            ]
        }
    }
}

Response Structure

  • (dict) --

    • Trigger (dict) --

      The resulting trigger definition.

      • Name (string) --

        Name of the trigger.

      • Id (string) --

        Reserved for future use.

      • Type (string) --

        The type of trigger that this is.

      • State (string) --

        The current state of the trigger.

      • Description (string) --

        A description of this trigger.

      • Schedule (string) --

        A cron expression used to specify the schedule (see Time-Based Schedules for Jobs and Crawlers. For example, to run something every day at 12:15 UTC, you would specify: cron(15 12 * * ? *) .

      • Actions (list) --

        The actions initiated by this trigger.

        • (dict) --

          Defines an action to be initiated by a trigger.

          • JobName (string) --

            The name of a job to be executed.

          • Arguments (dict) --

            Arguments to be passed to the job run.

            You can specify arguments here that your own job-execution script consumes, as well as arguments that AWS Glue itself consumes.

            For information about how to specify and consume your own Job arguments, see the Calling AWS Glue APIs in Python topic in the developer guide.

            For information about the key-value pairs that AWS Glue consumes to set up your job, see the Special Parameters Used by AWS Glue topic in the developer guide.

            • (string) --

              • (string) --

          • Timeout (integer) --

            The JobRun timeout in minutes. This is the maximum time that a job run can consume resources before it is terminated and enters TIMEOUT status. The default is 2,880 minutes (48 hours). This overrides the timeout value set in the parent job.

          • NotificationProperty (dict) --

            Specifies configuration properties of a job run notification.

            • NotifyDelayAfter (integer) --

              After a job run starts, the number of minutes to wait before sending a job run delay notification.

          • SecurityConfiguration (string) --

            The name of the SecurityConfiguration structure to be used with this action.

      • Predicate (dict) --

        The predicate of this trigger, which defines when it will fire.

        • Logical (string) --

          Optional field if only one condition is listed. If multiple conditions are listed, then this field is required.

        • Conditions (list) --

          A list of the conditions that determine when the trigger will fire.

          • (dict) --

            Defines a condition under which a trigger fires.

            • LogicalOperator (string) --

              A logical operator.

            • JobName (string) --

              The name of the Job to whose JobRuns this condition applies and on which this trigger waits.

            • State (string) --

              The condition state. Currently, the values supported are SUCCEEDED, STOPPED, TIMEOUT and FAILED.