AWS API Changes

2021/06/30 - AWS Glue DataBrew - 6 updated api methods

Changes Adds support for the output of job results to the AWS Glue Data Catalog.

CreateRecipeJob (updated)

Link ¶
Changes (request)

{'DataCatalogOutputs': [{'CatalogId': 'string',
                         'DatabaseName': 'string',
                         'DatabaseOptions': {'TableName': 'string',
                                             'TempDirectory': {'Bucket': 'string',
                                                               'Key': 'string'}},
                         'Overwrite': 'boolean',
                         'S3Options': {'Location': {'Bucket': 'string',
                                                    'Key': 'string'}},
                         'TableName': 'string'}]}

Creates a new job to transform input data, using steps defined in an existing Glue DataBrew recipe

See also: AWS API Documentation

Request Syntax

client.create_recipe_job(
    DatasetName='string',
    EncryptionKeyArn='string',
    EncryptionMode='SSE-KMS'|'SSE-S3',
    Name='string',
    LogSubscription='ENABLE'|'DISABLE',
    MaxCapacity=123,
    MaxRetries=123,
    Outputs=[
        {
            'CompressionFormat': 'GZIP'|'LZ4'|'SNAPPY'|'BZIP2'|'DEFLATE'|'LZO'|'BROTLI'|'ZSTD'|'ZLIB',
            'Format': 'CSV'|'JSON'|'PARQUET'|'GLUEPARQUET'|'AVRO'|'ORC'|'XML',
            'PartitionColumns': [
                'string',
            ],
            'Location': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Overwrite': True|False,
            'FormatOptions': {
                'Csv': {
                    'Delimiter': 'string'
                }
            }
        },
    ],
    DataCatalogOutputs=[
        {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'S3Options': {
                'Location': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'DatabaseOptions': {
                'TempDirectory': {
                    'Bucket': 'string',
                    'Key': 'string'
                },
                'TableName': 'string'
            },
            'Overwrite': True|False
        },
    ],
    ProjectName='string',
    RecipeReference={
        'Name': 'string',
        'RecipeVersion': 'string'
    },
    RoleArn='string',
    Tags={
        'string': 'string'
    },
    Timeout=123
)

type DatasetName:

string

param DatasetName:

The name of the dataset that this job processes.

type EncryptionKeyArn:

string

param EncryptionKeyArn:

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

type EncryptionMode:

string

param EncryptionMode:

The encryption mode for the job, which can be one of the following:

SSE-KMS - Server-side encryption with keys managed by KMS.
SSE-S3 - Server-side encryption with keys managed by Amazon S3.

type Name:

string

param Name:

[REQUIRED]

A unique name for the job. Valid characters are alphanumeric (A-Z, a-z, 0-9), hyphen (-), period (.), and space.

type LogSubscription:

string

param LogSubscription:

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

type MaxCapacity:

integer

param MaxCapacity:

The maximum number of nodes that DataBrew can consume when the job processes data.

type MaxRetries:

integer

param MaxRetries:

The maximum number of times to retry the job after a job run fails.

type Outputs:

list

param Outputs:

One or more artifacts that represent the output from running the job.

(dict) --

Represents options that specify how and where DataBrew writes the output generated by recipe jobs or profile jobs.
- CompressionFormat (string) --
  
  The compression algorithm used to compress the output text of the job.
- Format (string) --
  
  The data format of the output of the job.
- PartitionColumns (list) --
  
  The names of one or more partition columns for the output of the job.
  - (string) --
- Location (dict) -- [REQUIRED]
  
  The location in Amazon S3 where the job writes its output.
  - Bucket (string) -- [REQUIRED]
    
    The Amazon S3 bucket name.
  - Key (string) --
    
    The unique name of the object in the bucket.
- Overwrite (boolean) --
  
  A value that, if true, means that any data in the location specified for output is overwritten with new output.
- FormatOptions (dict) --
  
  Represents options that define how DataBrew formats job output files.
  - Csv (dict) --
    
    Represents a set of options that define the structure of comma-separated value (CSV) job output.
    - Delimiter (string) --
      
      A single character that specifies the delimiter used to create CSV job output.

type DataCatalogOutputs:

list

param DataCatalogOutputs:

One or more artifacts that represent the AWS Glue Data Catalog output from running the job.

(dict) --

Represents options that specify how and where DataBrew writes the output generated by recipe jobs.
- CatalogId (string) --
  
  The unique identifier of the AWS account that holds the Data Catalog that stores the data.
- DatabaseName (string) -- [REQUIRED]
  
  The name of a database in the Data Catalog.
- TableName (string) -- [REQUIRED]
  
  The name of a table in the Data Catalog.
- S3Options (dict) --
  
  Represents options that specify how and where DataBrew writes the S3 output generated by recipe jobs.
  - Location (dict) -- [REQUIRED]
    
    Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
    - Bucket (string) -- [REQUIRED]
      
      The Amazon S3 bucket name.
    - Key (string) --
      
      The unique name of the object in the bucket.
- DatabaseOptions (dict) --
  
  Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
  - TempDirectory (dict) --
    
    Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
    - Bucket (string) -- [REQUIRED]
      
      The Amazon S3 bucket name.
    - Key (string) --
      
      The unique name of the object in the bucket.
  - TableName (string) -- [REQUIRED]
    
    A prefix for the name of a table DataBrew will create in the database.
- Overwrite (boolean) --
  
  A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.

type ProjectName:

string

param ProjectName:

Either the name of an existing project, or a combination of a recipe and a dataset to associate with the recipe.

type RecipeReference:

dict

param RecipeReference:

Represents the name and version of a DataBrew recipe.

Name (string) -- [REQUIRED]

The name of the recipe.
RecipeVersion (string) --

The identifier for the version for the recipe.

type RoleArn:

string

param RoleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

type Tags:

dict

param Tags:

Metadata tags to apply to this job.

(string) --
- (string) --

type Timeout:

integer

param Timeout:

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

rtype:

dict

returns:

Response Syntax

{
    'Name': 'string'
}

Response Structure

(dict) --
- Name (string) --
  
  The name of the job that you created.

DescribeJob (updated)

Link ¶
Changes (response)

{'DataCatalogOutputs': [{'CatalogId': 'string',
                         'DatabaseName': 'string',
                         'DatabaseOptions': {'TableName': 'string',
                                             'TempDirectory': {'Bucket': 'string',
                                                               'Key': 'string'}},
                         'Overwrite': 'boolean',
                         'S3Options': {'Location': {'Bucket': 'string',
                                                    'Key': 'string'}},
                         'TableName': 'string'}]}

Returns the definition of a specific DataBrew job.

See also: AWS API Documentation

Request Syntax

client.describe_job(
    Name='string'
)

type Name:

string

param Name:

[REQUIRED]

The name of the job to be described.

rtype:

dict

returns:

Response Syntax

{
    'CreateDate': datetime(2015, 1, 1),
    'CreatedBy': 'string',
    'DatasetName': 'string',
    'EncryptionKeyArn': 'string',
    'EncryptionMode': 'SSE-KMS'|'SSE-S3',
    'Name': 'string',
    'Type': 'PROFILE'|'RECIPE',
    'LastModifiedBy': 'string',
    'LastModifiedDate': datetime(2015, 1, 1),
    'LogSubscription': 'ENABLE'|'DISABLE',
    'MaxCapacity': 123,
    'MaxRetries': 123,
    'Outputs': [
        {
            'CompressionFormat': 'GZIP'|'LZ4'|'SNAPPY'|'BZIP2'|'DEFLATE'|'LZO'|'BROTLI'|'ZSTD'|'ZLIB',
            'Format': 'CSV'|'JSON'|'PARQUET'|'GLUEPARQUET'|'AVRO'|'ORC'|'XML',
            'PartitionColumns': [
                'string',
            ],
            'Location': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Overwrite': True|False,
            'FormatOptions': {
                'Csv': {
                    'Delimiter': 'string'
                }
            }
        },
    ],
    'DataCatalogOutputs': [
        {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'S3Options': {
                'Location': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'DatabaseOptions': {
                'TempDirectory': {
                    'Bucket': 'string',
                    'Key': 'string'
                },
                'TableName': 'string'
            },
            'Overwrite': True|False
        },
    ],
    'ProjectName': 'string',
    'RecipeReference': {
        'Name': 'string',
        'RecipeVersion': 'string'
    },
    'ResourceArn': 'string',
    'RoleArn': 'string',
    'Tags': {
        'string': 'string'
    },
    'Timeout': 123,
    'JobSample': {
        'Mode': 'FULL_DATASET'|'CUSTOM_ROWS',
        'Size': 123
    }
}

Response Structure

(dict) --
- CreateDate (datetime) --
  
  The date and time that the job was created.
- CreatedBy (string) --
  
  The identifier (user name) of the user associated with the creation of the job.
- DatasetName (string) --
  
  The dataset that the job acts upon.
- EncryptionKeyArn (string) --
  
  The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.
- EncryptionMode (string) --
  
  The encryption mode for the job, which can be one of the following:
  - SSE-KMS - Server-side encryption with keys managed by KMS.
  - SSE-S3 - Server-side encryption with keys managed by Amazon S3.
- Name (string) --
  
  The name of the job.
- Type (string) --
  
  The job type, which must be one of the following:
  - PROFILE - The job analyzes the dataset to determine its size, data types, data distribution, and more.
  - RECIPE - The job applies one or more transformations to a dataset.
- LastModifiedBy (string) --
  
  The identifier (user name) of the user who last modified the job.
- LastModifiedDate (datetime) --
  
  The date and time that the job was last modified.
- LogSubscription (string) --
  
  Indicates whether Amazon CloudWatch logging is enabled for this job.
- MaxCapacity (integer) --
  
  The maximum number of compute nodes that DataBrew can consume when the job processes data.
- MaxRetries (integer) --
  
  The maximum number of times to retry the job after a job run fails.
- Outputs (list) --
  
  One or more artifacts that represent the output from running the job.
  - (dict) --
    
    Represents options that specify how and where DataBrew writes the output generated by recipe jobs or profile jobs.
    - CompressionFormat (string) --
      
      The compression algorithm used to compress the output text of the job.
    - Format (string) --
      
      The data format of the output of the job.
    - PartitionColumns (list) --
      
      The names of one or more partition columns for the output of the job.
      - (string) --
    - Location (dict) --
      
      The location in Amazon S3 where the job writes its output.
      - Bucket (string) --
        
        The Amazon S3 bucket name.
      - Key (string) --
        
        The unique name of the object in the bucket.
    - Overwrite (boolean) --
      
      A value that, if true, means that any data in the location specified for output is overwritten with new output.
    - FormatOptions (dict) --
      
      Represents options that define how DataBrew formats job output files.
      - Csv (dict) --
        
        Represents a set of options that define the structure of comma-separated value (CSV) job output.
        
        Delimiter (string) --
        
        A single character that specifies the delimiter used to create CSV job output.
- DataCatalogOutputs (list) --
  
  One or more artifacts that represent the AWS Glue Data Catalog output from running the job.
  - (dict) --
    
    Represents options that specify how and where DataBrew writes the output generated by recipe jobs.
    - CatalogId (string) --
      
      The unique identifier of the AWS account that holds the Data Catalog that stores the data.
    - DatabaseName (string) --
      
      The name of a database in the Data Catalog.
    - TableName (string) --
      
      The name of a table in the Data Catalog.
    - S3Options (dict) --
      
      Represents options that specify how and where DataBrew writes the S3 output generated by recipe jobs.
      - Location (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
    - DatabaseOptions (dict) --
      
      Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
      - TempDirectory (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
      - TableName (string) --
        
        A prefix for the name of a table DataBrew will create in the database.
    - Overwrite (boolean) --
      
      A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.
- ProjectName (string) --
  
  The DataBrew project associated with this job.
- RecipeReference (dict) --
  
  Represents the name and version of a DataBrew recipe.
  - Name (string) --
    
    The name of the recipe.
  - RecipeVersion (string) --
    
    The identifier for the version for the recipe.
- ResourceArn (string) --
  
  The Amazon Resource Name (ARN) of the job.
- RoleArn (string) --
  
  The ARN of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.
- Tags (dict) --
  
  Metadata tags associated with this job.
  - (string) --
    - (string) --
- Timeout (integer) --
  
  The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.
- JobSample (dict) --
  
  Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed.
  - Mode (string) --
    
    A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following:
    - FULL_DATASET - The profile job is run on the entire dataset.
    - CUSTOM_ROWS - The profile job is run on the number of rows specified in the Size parameter.
  - Size (integer) --
    
    The Size parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.
    
    Long.MAX_VALUE = 9223372036854775807

DescribeJobRun (updated)

Link ¶
Changes (response)

{'DataCatalogOutputs': [{'CatalogId': 'string',
                         'DatabaseName': 'string',
                         'DatabaseOptions': {'TableName': 'string',
                                             'TempDirectory': {'Bucket': 'string',
                                                               'Key': 'string'}},
                         'Overwrite': 'boolean',
                         'S3Options': {'Location': {'Bucket': 'string',
                                                    'Key': 'string'}},
                         'TableName': 'string'}]}

Represents one run of a DataBrew job.

See also: AWS API Documentation

Request Syntax

client.describe_job_run(
    Name='string',
    RunId='string'
)

type Name:

string

param Name:

[REQUIRED]

The name of the job being processed during this run.

type RunId:

string

param RunId:

[REQUIRED]

The unique identifier of the job run.

rtype:

dict

returns:

Response Syntax

{
    'Attempt': 123,
    'CompletedOn': datetime(2015, 1, 1),
    'DatasetName': 'string',
    'ErrorMessage': 'string',
    'ExecutionTime': 123,
    'JobName': 'string',
    'RunId': 'string',
    'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT',
    'LogSubscription': 'ENABLE'|'DISABLE',
    'LogGroupName': 'string',
    'Outputs': [
        {
            'CompressionFormat': 'GZIP'|'LZ4'|'SNAPPY'|'BZIP2'|'DEFLATE'|'LZO'|'BROTLI'|'ZSTD'|'ZLIB',
            'Format': 'CSV'|'JSON'|'PARQUET'|'GLUEPARQUET'|'AVRO'|'ORC'|'XML',
            'PartitionColumns': [
                'string',
            ],
            'Location': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Overwrite': True|False,
            'FormatOptions': {
                'Csv': {
                    'Delimiter': 'string'
                }
            }
        },
    ],
    'DataCatalogOutputs': [
        {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'S3Options': {
                'Location': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'DatabaseOptions': {
                'TempDirectory': {
                    'Bucket': 'string',
                    'Key': 'string'
                },
                'TableName': 'string'
            },
            'Overwrite': True|False
        },
    ],
    'RecipeReference': {
        'Name': 'string',
        'RecipeVersion': 'string'
    },
    'StartedBy': 'string',
    'StartedOn': datetime(2015, 1, 1),
    'JobSample': {
        'Mode': 'FULL_DATASET'|'CUSTOM_ROWS',
        'Size': 123
    }
}

Response Structure

(dict) --
- Attempt (integer) --
  
  The number of times that DataBrew has attempted to run the job.
- CompletedOn (datetime) --
  
  The date and time when the job completed processing.
- DatasetName (string) --
  
  The name of the dataset for the job to process.
- ErrorMessage (string) --
  
  A message indicating an error (if any) that was encountered when the job ran.
- ExecutionTime (integer) --
  
  The amount of time, in seconds, during which the job run consumed resources.
- JobName (string) --
  
  The name of the job being processed during this run.
- RunId (string) --
  
  The unique identifier of the job run.
- State (string) --
  
  The current state of the job run entity itself.
- LogSubscription (string) --
  
  The current status of Amazon CloudWatch logging for the job run.
- LogGroupName (string) --
  
  The name of an Amazon CloudWatch log group, where the job writes diagnostic messages when it runs.
- Outputs (list) --
  
  One or more output artifacts from a job run.
  - (dict) --
    
    Represents options that specify how and where DataBrew writes the output generated by recipe jobs or profile jobs.
    - CompressionFormat (string) --
      
      The compression algorithm used to compress the output text of the job.
    - Format (string) --
      
      The data format of the output of the job.
    - PartitionColumns (list) --
      
      The names of one or more partition columns for the output of the job.
      - (string) --
    - Location (dict) --
      
      The location in Amazon S3 where the job writes its output.
      - Bucket (string) --
        
        The Amazon S3 bucket name.
      - Key (string) --
        
        The unique name of the object in the bucket.
    - Overwrite (boolean) --
      
      A value that, if true, means that any data in the location specified for output is overwritten with new output.
    - FormatOptions (dict) --
      
      Represents options that define how DataBrew formats job output files.
      - Csv (dict) --
        
        Represents a set of options that define the structure of comma-separated value (CSV) job output.
        
        Delimiter (string) --
        
        A single character that specifies the delimiter used to create CSV job output.
- DataCatalogOutputs (list) --
  
  One or more artifacts that represent the AWS Glue Data Catalog output from running the job.
  - (dict) --
    
    Represents options that specify how and where DataBrew writes the output generated by recipe jobs.
    - CatalogId (string) --
      
      The unique identifier of the AWS account that holds the Data Catalog that stores the data.
    - DatabaseName (string) --
      
      The name of a database in the Data Catalog.
    - TableName (string) --
      
      The name of a table in the Data Catalog.
    - S3Options (dict) --
      
      Represents options that specify how and where DataBrew writes the S3 output generated by recipe jobs.
      - Location (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
    - DatabaseOptions (dict) --
      
      Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
      - TempDirectory (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
      - TableName (string) --
        
        A prefix for the name of a table DataBrew will create in the database.
    - Overwrite (boolean) --
      
      A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.
- RecipeReference (dict) --
  
  Represents the name and version of a DataBrew recipe.
  - Name (string) --
    
    The name of the recipe.
  - RecipeVersion (string) --
    
    The identifier for the version for the recipe.
- StartedBy (string) --
  
  The Amazon Resource Name (ARN) of the user who started the job run.
- StartedOn (datetime) --
  
  The date and time when the job run began.
- JobSample (dict) --
  
  Sample configuration for profile jobs only. Determines the number of rows on which the profile job will be executed. If a JobSample value is not provided, the default value will be used. The default value is CUSTOM_ROWS for the mode parameter and 20000 for the size parameter.
  - Mode (string) --
    
    A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following:
    - FULL_DATASET - The profile job is run on the entire dataset.
    - CUSTOM_ROWS - The profile job is run on the number of rows specified in the Size parameter.
  - Size (integer) --
    
    The Size parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.
    
    Long.MAX_VALUE = 9223372036854775807

ListJobRuns (updated)

Link ¶
Changes (response)

{'JobRuns': {'DataCatalogOutputs': [{'CatalogId': 'string',
                                     'DatabaseName': 'string',
                                     'DatabaseOptions': {'TableName': 'string',
                                                         'TempDirectory': {'Bucket': 'string',
                                                                           'Key': 'string'}},
                                     'Overwrite': 'boolean',
                                     'S3Options': {'Location': {'Bucket': 'string',
                                                                'Key': 'string'}},
                                     'TableName': 'string'}]}}

Lists all of the previous runs of a particular DataBrew job.

See also: AWS API Documentation

Request Syntax

client.list_job_runs(
    Name='string',
    MaxResults=123,
    NextToken='string'
)

type Name:

string

param Name:

[REQUIRED]

The name of the job.

type MaxResults:

integer

param MaxResults:

The maximum number of results to return in this request.

type NextToken:

string

param NextToken:

The token returned by a previous call to retrieve the next set of results.

rtype:

dict

returns:

Response Syntax

{
    'JobRuns': [
        {
            'Attempt': 123,
            'CompletedOn': datetime(2015, 1, 1),
            'DatasetName': 'string',
            'ErrorMessage': 'string',
            'ExecutionTime': 123,
            'JobName': 'string',
            'RunId': 'string',
            'State': 'STARTING'|'RUNNING'|'STOPPING'|'STOPPED'|'SUCCEEDED'|'FAILED'|'TIMEOUT',
            'LogSubscription': 'ENABLE'|'DISABLE',
            'LogGroupName': 'string',
            'Outputs': [
                {
                    'CompressionFormat': 'GZIP'|'LZ4'|'SNAPPY'|'BZIP2'|'DEFLATE'|'LZO'|'BROTLI'|'ZSTD'|'ZLIB',
                    'Format': 'CSV'|'JSON'|'PARQUET'|'GLUEPARQUET'|'AVRO'|'ORC'|'XML',
                    'PartitionColumns': [
                        'string',
                    ],
                    'Location': {
                        'Bucket': 'string',
                        'Key': 'string'
                    },
                    'Overwrite': True|False,
                    'FormatOptions': {
                        'Csv': {
                            'Delimiter': 'string'
                        }
                    }
                },
            ],
            'DataCatalogOutputs': [
                {
                    'CatalogId': 'string',
                    'DatabaseName': 'string',
                    'TableName': 'string',
                    'S3Options': {
                        'Location': {
                            'Bucket': 'string',
                            'Key': 'string'
                        }
                    },
                    'DatabaseOptions': {
                        'TempDirectory': {
                            'Bucket': 'string',
                            'Key': 'string'
                        },
                        'TableName': 'string'
                    },
                    'Overwrite': True|False
                },
            ],
            'RecipeReference': {
                'Name': 'string',
                'RecipeVersion': 'string'
            },
            'StartedBy': 'string',
            'StartedOn': datetime(2015, 1, 1),
            'JobSample': {
                'Mode': 'FULL_DATASET'|'CUSTOM_ROWS',
                'Size': 123
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- JobRuns (list) --
  
  A list of job runs that have occurred for the specified job.
  - (dict) --
    
    Represents one run of a DataBrew job.
    - Attempt (integer) --
      
      The number of times that DataBrew has attempted to run the job.
    - CompletedOn (datetime) --
      
      The date and time when the job completed processing.
    - DatasetName (string) --
      
      The name of the dataset for the job to process.
    - ErrorMessage (string) --
      
      A message indicating an error (if any) that was encountered when the job ran.
    - ExecutionTime (integer) --
      
      The amount of time, in seconds, during which a job run consumed resources.
    - JobName (string) --
      
      The name of the job being processed during this run.
    - RunId (string) --
      
      The unique identifier of the job run.
    - State (string) --
      
      The current state of the job run entity itself.
    - LogSubscription (string) --
      
      The current status of Amazon CloudWatch logging for the job run.
    - LogGroupName (string) --
      
      The name of an Amazon CloudWatch log group, where the job writes diagnostic messages when it runs.
    - Outputs (list) --
      
      One or more output artifacts from a job run.
      - (dict) --
        
        Represents options that specify how and where DataBrew writes the output generated by recipe jobs or profile jobs.
        
        CompressionFormat (string) --
        
        The compression algorithm used to compress the output text of the job.
        
        Format (string) --
        
        The data format of the output of the job.
        
        PartitionColumns (list) --
        
        The names of one or more partition columns for the output of the job.
        
        (string) --
        
        Location (dict) --
        
        The location in Amazon S3 where the job writes its output.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
        
        Overwrite (boolean) --
        
        A value that, if true, means that any data in the location specified for output is overwritten with new output.
        
        FormatOptions (dict) --
        
        Represents options that define how DataBrew formats job output files.
        
        Csv (dict) --
        
        Represents a set of options that define the structure of comma-separated value (CSV) job output.
        
        Delimiter (string) --
        
        A single character that specifies the delimiter used to create CSV job output.
    - DataCatalogOutputs (list) --
      
      One or more artifacts that represent the AWS Glue Data Catalog output from running the job.
      - (dict) --
        
        Represents options that specify how and where DataBrew writes the output generated by recipe jobs.
        
        CatalogId (string) --
        
        The unique identifier of the AWS account that holds the Data Catalog that stores the data.
        
        DatabaseName (string) --
        
        The name of a database in the Data Catalog.
        
        TableName (string) --
        
        The name of a table in the Data Catalog.
        
        S3Options (dict) --
        
        Represents options that specify how and where DataBrew writes the S3 output generated by recipe jobs.
        
        Location (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
        
        DatabaseOptions (dict) --
        
        Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
        
        TempDirectory (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
        
        TableName (string) --
        
        A prefix for the name of a table DataBrew will create in the database.
        
        Overwrite (boolean) --
        
        A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.
    - RecipeReference (dict) --
      
      The set of steps processed by the job.
      - Name (string) --
        
        The name of the recipe.
      - RecipeVersion (string) --
        
        The identifier for the version for the recipe.
    - StartedBy (string) --
      
      The Amazon Resource Name (ARN) of the user who initiated the job run.
    - StartedOn (datetime) --
      
      The date and time when the job run began.
    - JobSample (dict) --
      
      A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample value isn't provided, the default is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.
      - Mode (string) --
        
        A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following:
        
        FULL_DATASET - The profile job is run on the entire dataset.
        
        CUSTOM_ROWS - The profile job is run on the number of rows specified in the Size parameter.
      - Size (integer) --
        
        The Size parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.
        
        Long.MAX_VALUE = 9223372036854775807
- NextToken (string) --
  
  A token that you can use in a subsequent call to retrieve the next set of results.

ListJobs (updated)

Link ¶
Changes (response)

{'Jobs': {'DataCatalogOutputs': [{'CatalogId': 'string',
                                  'DatabaseName': 'string',
                                  'DatabaseOptions': {'TableName': 'string',
                                                      'TempDirectory': {'Bucket': 'string',
                                                                        'Key': 'string'}},
                                  'Overwrite': 'boolean',
                                  'S3Options': {'Location': {'Bucket': 'string',
                                                             'Key': 'string'}},
                                  'TableName': 'string'}]}}

Lists all of the DataBrew jobs that are defined.

See also: AWS API Documentation

Request Syntax

client.list_jobs(
    DatasetName='string',
    MaxResults=123,
    NextToken='string',
    ProjectName='string'
)

type DatasetName:

string

param DatasetName:

The name of a dataset. Using this parameter indicates to return only those jobs that act on the specified dataset.

type MaxResults:

integer

param MaxResults:

The maximum number of results to return in this request.

type NextToken:

string

param NextToken:

A token generated by DataBrew that specifies where to continue pagination if a previous request was truncated. To get the next set of pages, pass in the NextToken value from the response object of the previous page call.

type ProjectName:

string

param ProjectName:

The name of a project. Using this parameter indicates to return only those jobs that are associated with the specified project.

rtype:

dict

returns:

Response Syntax

{
    'Jobs': [
        {
            'AccountId': 'string',
            'CreatedBy': 'string',
            'CreateDate': datetime(2015, 1, 1),
            'DatasetName': 'string',
            'EncryptionKeyArn': 'string',
            'EncryptionMode': 'SSE-KMS'|'SSE-S3',
            'Name': 'string',
            'Type': 'PROFILE'|'RECIPE',
            'LastModifiedBy': 'string',
            'LastModifiedDate': datetime(2015, 1, 1),
            'LogSubscription': 'ENABLE'|'DISABLE',
            'MaxCapacity': 123,
            'MaxRetries': 123,
            'Outputs': [
                {
                    'CompressionFormat': 'GZIP'|'LZ4'|'SNAPPY'|'BZIP2'|'DEFLATE'|'LZO'|'BROTLI'|'ZSTD'|'ZLIB',
                    'Format': 'CSV'|'JSON'|'PARQUET'|'GLUEPARQUET'|'AVRO'|'ORC'|'XML',
                    'PartitionColumns': [
                        'string',
                    ],
                    'Location': {
                        'Bucket': 'string',
                        'Key': 'string'
                    },
                    'Overwrite': True|False,
                    'FormatOptions': {
                        'Csv': {
                            'Delimiter': 'string'
                        }
                    }
                },
            ],
            'DataCatalogOutputs': [
                {
                    'CatalogId': 'string',
                    'DatabaseName': 'string',
                    'TableName': 'string',
                    'S3Options': {
                        'Location': {
                            'Bucket': 'string',
                            'Key': 'string'
                        }
                    },
                    'DatabaseOptions': {
                        'TempDirectory': {
                            'Bucket': 'string',
                            'Key': 'string'
                        },
                        'TableName': 'string'
                    },
                    'Overwrite': True|False
                },
            ],
            'ProjectName': 'string',
            'RecipeReference': {
                'Name': 'string',
                'RecipeVersion': 'string'
            },
            'ResourceArn': 'string',
            'RoleArn': 'string',
            'Timeout': 123,
            'Tags': {
                'string': 'string'
            },
            'JobSample': {
                'Mode': 'FULL_DATASET'|'CUSTOM_ROWS',
                'Size': 123
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- Jobs (list) --
  
  A list of jobs that are defined.
  - (dict) --
    
    Represents all of the attributes of a DataBrew job.
    - AccountId (string) --
      
      The ID of the Amazon Web Services account that owns the job.
    - CreatedBy (string) --
      
      The Amazon Resource Name (ARN) of the user who created the job.
    - CreateDate (datetime) --
      
      The date and time that the job was created.
    - DatasetName (string) --
      
      A dataset that the job is to process.
    - EncryptionKeyArn (string) --
      
      The Amazon Resource Name (ARN) of an encryption key that is used to protect the job output. For more information, see Encrypting data written by DataBrew jobs
    - EncryptionMode (string) --
      
      The encryption mode for the job, which can be one of the following:
      - SSE-KMS - Server-side encryption with keys managed by KMS.
      - SSE-S3 - Server-side encryption with keys managed by Amazon S3.
    - Name (string) --
      
      The unique name of the job.
    - Type (string) --
      
      The job type of the job, which must be one of the following:
      - PROFILE - A job to analyze a dataset, to determine its size, data types, data distribution, and more.
      - RECIPE - A job to apply one or more transformations to a dataset.
    - LastModifiedBy (string) --
      
      The Amazon Resource Name (ARN) of the user who last modified the job.
    - LastModifiedDate (datetime) --
      
      The modification date and time of the job.
    - LogSubscription (string) --
      
      The current status of Amazon CloudWatch logging for the job.
    - MaxCapacity (integer) --
      
      The maximum number of nodes that can be consumed when the job processes data.
    - MaxRetries (integer) --
      
      The maximum number of times to retry the job after a job run fails.
    - Outputs (list) --
      
      One or more artifacts that represent output from running the job.
      - (dict) --
        
        Represents options that specify how and where DataBrew writes the output generated by recipe jobs or profile jobs.
        
        CompressionFormat (string) --
        
        The compression algorithm used to compress the output text of the job.
        
        Format (string) --
        
        The data format of the output of the job.
        
        PartitionColumns (list) --
        
        The names of one or more partition columns for the output of the job.
        
        (string) --
        
        Location (dict) --
        
        The location in Amazon S3 where the job writes its output.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
        
        Overwrite (boolean) --
        
        A value that, if true, means that any data in the location specified for output is overwritten with new output.
        
        FormatOptions (dict) --
        
        Represents options that define how DataBrew formats job output files.
        
        Csv (dict) --
        
        Represents a set of options that define the structure of comma-separated value (CSV) job output.
        
        Delimiter (string) --
        
        A single character that specifies the delimiter used to create CSV job output.
    - DataCatalogOutputs (list) --
      
      One or more artifacts that represent the AWS Glue Data Catalog output from running the job.
      - (dict) --
        
        Represents options that specify how and where DataBrew writes the output generated by recipe jobs.
        
        CatalogId (string) --
        
        The unique identifier of the AWS account that holds the Data Catalog that stores the data.
        
        DatabaseName (string) --
        
        The name of a database in the Data Catalog.
        
        TableName (string) --
        
        The name of a table in the Data Catalog.
        
        S3Options (dict) --
        
        Represents options that specify how and where DataBrew writes the S3 output generated by recipe jobs.
        
        Location (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
        
        DatabaseOptions (dict) --
        
        Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
        
        TempDirectory (dict) --
        
        Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
        
        Bucket (string) --
        
        The Amazon S3 bucket name.
        
        Key (string) --
        
        The unique name of the object in the bucket.
        
        TableName (string) --
        
        A prefix for the name of a table DataBrew will create in the database.
        
        Overwrite (boolean) --
        
        A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.
    - ProjectName (string) --
      
      The name of the project that the job is associated with.
    - RecipeReference (dict) --
      
      A set of steps that the job runs.
      - Name (string) --
        
        The name of the recipe.
      - RecipeVersion (string) --
        
        The identifier for the version for the recipe.
    - ResourceArn (string) --
      
      The unique Amazon Resource Name (ARN) for the job.
    - RoleArn (string) --
      
      The Amazon Resource Name (ARN) of the role to be assumed for this job.
    - Timeout (integer) --
      
      The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.
    - Tags (dict) --
      
      Metadata tags that have been applied to the job.
      - (string) --
        
        (string) --
    - JobSample (dict) --
      
      A sample configuration for profile jobs only, which determines the number of rows on which the profile job is run. If a JobSample value isn't provided, the default value is used. The default value is CUSTOM_ROWS for the mode parameter and 20,000 for the size parameter.
      - Mode (string) --
        
        A value that determines whether the profile job is run on the entire dataset or a specified number of rows. This value must be one of the following:
        
        FULL_DATASET - The profile job is run on the entire dataset.
        
        CUSTOM_ROWS - The profile job is run on the number of rows specified in the Size parameter.
      - Size (integer) --
        
        The Size parameter is only required when the mode is CUSTOM_ROWS. The profile job is run on the specified number of rows. The maximum value for size is Long.MAX_VALUE.
        
        Long.MAX_VALUE = 9223372036854775807
- NextToken (string) --
  
  A token that you can use in a subsequent call to retrieve the next set of results.

UpdateRecipeJob (updated)

Link ¶
Changes (request)

{'DataCatalogOutputs': [{'CatalogId': 'string',
                         'DatabaseName': 'string',
                         'DatabaseOptions': {'TableName': 'string',
                                             'TempDirectory': {'Bucket': 'string',
                                                               'Key': 'string'}},
                         'Overwrite': 'boolean',
                         'S3Options': {'Location': {'Bucket': 'string',
                                                    'Key': 'string'}},
                         'TableName': 'string'}]}

Modifies the definition of an existing DataBrew recipe job.

See also: AWS API Documentation

Request Syntax

client.update_recipe_job(
    EncryptionKeyArn='string',
    EncryptionMode='SSE-KMS'|'SSE-S3',
    Name='string',
    LogSubscription='ENABLE'|'DISABLE',
    MaxCapacity=123,
    MaxRetries=123,
    Outputs=[
        {
            'CompressionFormat': 'GZIP'|'LZ4'|'SNAPPY'|'BZIP2'|'DEFLATE'|'LZO'|'BROTLI'|'ZSTD'|'ZLIB',
            'Format': 'CSV'|'JSON'|'PARQUET'|'GLUEPARQUET'|'AVRO'|'ORC'|'XML',
            'PartitionColumns': [
                'string',
            ],
            'Location': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Overwrite': True|False,
            'FormatOptions': {
                'Csv': {
                    'Delimiter': 'string'
                }
            }
        },
    ],
    DataCatalogOutputs=[
        {
            'CatalogId': 'string',
            'DatabaseName': 'string',
            'TableName': 'string',
            'S3Options': {
                'Location': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'DatabaseOptions': {
                'TempDirectory': {
                    'Bucket': 'string',
                    'Key': 'string'
                },
                'TableName': 'string'
            },
            'Overwrite': True|False
        },
    ],
    RoleArn='string',
    Timeout=123
)

type EncryptionKeyArn:

string

param EncryptionKeyArn:

The Amazon Resource Name (ARN) of an encryption key that is used to protect the job.

type EncryptionMode:

string

param EncryptionMode:

The encryption mode for the job, which can be one of the following:

SSE-KMS - Server-side encryption with keys managed by KMS.
SSE-S3 - Server-side encryption with keys managed by Amazon S3.

type Name:

string

param Name:

[REQUIRED]

The name of the job to update.

type LogSubscription:

string

param LogSubscription:

Enables or disables Amazon CloudWatch logging for the job. If logging is enabled, CloudWatch writes one log stream for each job run.

type MaxCapacity:

integer

param MaxCapacity:

The maximum number of nodes that DataBrew can consume when the job processes data.

type MaxRetries:

integer

param MaxRetries:

The maximum number of times to retry the job after a job run fails.

type Outputs:

list

param Outputs:

One or more artifacts that represent the output from running the job.

(dict) --

Represents options that specify how and where DataBrew writes the output generated by recipe jobs or profile jobs.
- CompressionFormat (string) --
  
  The compression algorithm used to compress the output text of the job.
- Format (string) --
  
  The data format of the output of the job.
- PartitionColumns (list) --
  
  The names of one or more partition columns for the output of the job.
  - (string) --
- Location (dict) -- [REQUIRED]
  
  The location in Amazon S3 where the job writes its output.
  - Bucket (string) -- [REQUIRED]
    
    The Amazon S3 bucket name.
  - Key (string) --
    
    The unique name of the object in the bucket.
- Overwrite (boolean) --
  
  A value that, if true, means that any data in the location specified for output is overwritten with new output.
- FormatOptions (dict) --
  
  Represents options that define how DataBrew formats job output files.
  - Csv (dict) --
    
    Represents a set of options that define the structure of comma-separated value (CSV) job output.
    - Delimiter (string) --
      
      A single character that specifies the delimiter used to create CSV job output.

type DataCatalogOutputs:

list

param DataCatalogOutputs:

One or more artifacts that represent the AWS Glue Data Catalog output from running the job.

(dict) --

Represents options that specify how and where DataBrew writes the output generated by recipe jobs.
- CatalogId (string) --
  
  The unique identifier of the AWS account that holds the Data Catalog that stores the data.
- DatabaseName (string) -- [REQUIRED]
  
  The name of a database in the Data Catalog.
- TableName (string) -- [REQUIRED]
  
  The name of a table in the Data Catalog.
- S3Options (dict) --
  
  Represents options that specify how and where DataBrew writes the S3 output generated by recipe jobs.
  - Location (dict) -- [REQUIRED]
    
    Represents an Amazon S3 location (bucket name and object key) where DataBrew can write output from a job.
    - Bucket (string) -- [REQUIRED]
      
      The Amazon S3 bucket name.
    - Key (string) --
      
      The unique name of the object in the bucket.
- DatabaseOptions (dict) --
  
  Represents options that specify how and where DataBrew writes the database output generated by recipe jobs.
  - TempDirectory (dict) --
    
    Represents an Amazon S3 location (bucket name and object key) where DataBrew can store intermediate results.
    - Bucket (string) -- [REQUIRED]
      
      The Amazon S3 bucket name.
    - Key (string) --
      
      The unique name of the object in the bucket.
  - TableName (string) -- [REQUIRED]
    
    A prefix for the name of a table DataBrew will create in the database.
- Overwrite (boolean) --
  
  A value that, if true, means that any data in the location specified for output is overwritten with new output. Not supported with DatabaseOptions.

type RoleArn:

string

param RoleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the Identity and Access Management (IAM) role to be assumed when DataBrew runs the job.

type Timeout:

integer

param Timeout:

The job's timeout in minutes. A job that attempts to run longer than this timeout period ends with a status of TIMEOUT.

rtype:

dict

returns:

Response Syntax

{
    'Name': 'string'
}

Response Structure

(dict) --
- Name (string) --
  
  The name of the job that you updated.