AWS API Changes

2022/09/08 - Amazon EMR Containers - 3 updated api methods

Changes EMR on EKS now allows running Spark SQL using the newly introduced Spark SQL Job Driver in the Start Job Run API

DescribeJobRun (updated)

Link ¶
Changes (response)

{'jobRun': {'jobDriver': {'sparkSqlJobDriver': {'entryPoint': 'string',
                                                'sparkSqlParameters': 'string'}}}}

Displays detailed information about a job run. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.

See also: AWS API Documentation

Request Syntax

client.describe_job_run(
    id='string',
    virtualClusterId='string'
)

type id:

string

param id:

[REQUIRED]

The ID of the job run request.

type virtualClusterId:

string

param virtualClusterId:

[REQUIRED]

The ID of the virtual cluster for which the job run is submitted.

rtype:

dict

returns:

Response Syntax

{
    'jobRun': {
        'id': 'string',
        'name': 'string',
        'virtualClusterId': 'string',
        'arn': 'string',
        'state': 'PENDING'|'SUBMITTED'|'RUNNING'|'FAILED'|'CANCELLED'|'CANCEL_PENDING'|'COMPLETED',
        'clientToken': 'string',
        'executionRoleArn': 'string',
        'releaseLabel': 'string',
        'configurationOverrides': {
            'applicationConfiguration': [
                {
                    'classification': 'string',
                    'properties': {
                        'string': 'string'
                    },
                    'configurations': {'... recursive ...'}
                },
            ],
            'monitoringConfiguration': {
                'persistentAppUI': 'ENABLED'|'DISABLED',
                'cloudWatchMonitoringConfiguration': {
                    'logGroupName': 'string',
                    'logStreamNamePrefix': 'string'
                },
                's3MonitoringConfiguration': {
                    'logUri': 'string'
                }
            }
        },
        'jobDriver': {
            'sparkSubmitJobDriver': {
                'entryPoint': 'string',
                'entryPointArguments': [
                    'string',
                ],
                'sparkSubmitParameters': 'string'
            },
            'sparkSqlJobDriver': {
                'entryPoint': 'string',
                'sparkSqlParameters': 'string'
            }
        },
        'createdAt': datetime(2015, 1, 1),
        'createdBy': 'string',
        'finishedAt': datetime(2015, 1, 1),
        'stateDetails': 'string',
        'failureReason': 'INTERNAL_ERROR'|'USER_ERROR'|'VALIDATION_ERROR'|'CLUSTER_UNAVAILABLE',
        'tags': {
            'string': 'string'
        }
    }
}

Response Structure

(dict) --
- jobRun (dict) --
  
  The output displays information about a job run.
  - id (string) --
    
    The ID of the job run.
  - name (string) --
    
    The name of the job run.
  - virtualClusterId (string) --
    
    The ID of the job run's virtual cluster.
  - arn (string) --
    
    The ARN of job run.
  - state (string) --
    
    The state of the job run.
  - clientToken (string) --
    
    The client token used to start a job run.
  - executionRoleArn (string) --
    
    The execution role ARN of the job run.
  - releaseLabel (string) --
    
    The release version of Amazon EMR.
  - configurationOverrides (dict) --
    
    The configuration settings that are used to override default configuration.
    - applicationConfiguration (list) --
      
      The configurations for the application running by the job run.
      - (dict) --
        
        A configuration specification to be used when provisioning virtual clusters, which can include configurations for applications and software bundled with Amazon EMR on EKS. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.
        
        classification (string) --
        
        The classification within a configuration.
        
        properties (dict) --
        
        A set of properties specified within a configuration classification.
        
        (string) --
        
        (string) --
        
        configurations (list) --
        
        A list of additional configurations to apply within a configuration object.
    - monitoringConfiguration (dict) --
      
      The configurations for monitoring.
      - persistentAppUI (string) --
        
        Monitoring configurations for the persistent application UI.
      - cloudWatchMonitoringConfiguration (dict) --
        
        Monitoring configurations for CloudWatch.
        
        logGroupName (string) --
        
        The name of the log group for log publishing.
        
        logStreamNamePrefix (string) --
        
        The specified name prefix for log streams.
      - s3MonitoringConfiguration (dict) --
        
        Amazon S3 configuration for monitoring log publishing.
        
        logUri (string) --
        
        Amazon S3 destination URI for log publishing.
  - jobDriver (dict) --
    
    Parameters of job driver for the job run.
    - sparkSubmitJobDriver (dict) --
      
      The job driver parameters specified for spark submit.
      - entryPoint (string) --
        
        The entry point of job application.
      - entryPointArguments (list) --
        
        The arguments for job application.
        
        (string) --
      - sparkSubmitParameters (string) --
        
        The Spark submit parameters that are used for job runs.
    - sparkSqlJobDriver (dict) --
      
      The job driver for job type.
      - entryPoint (string) --
        
        The SQL file to be executed.
      - sparkSqlParameters (string) --
        
        The Spark parameters to be included in the Spark SQL command.
  - createdAt (datetime) --
    
    The date and time when the job run was created.
  - createdBy (string) --
    
    The user who created the job run.
  - finishedAt (datetime) --
    
    The date and time when the job run has finished.
  - stateDetails (string) --
    
    Additional details of the job run state.
  - failureReason (string) --
    
    The reasons why the job run has failed.
  - tags (dict) --
    
    The assigned tags of the job run.
    - (string) --
      - (string) --

ListJobRuns (updated)

Link ¶
Changes (response)

{'jobRuns': {'jobDriver': {'sparkSqlJobDriver': {'entryPoint': 'string',
                                                 'sparkSqlParameters': 'string'}}}}

Lists job runs based on a set of parameters. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.

See also: AWS API Documentation

Request Syntax

client.list_job_runs(
    virtualClusterId='string',
    createdBefore=datetime(2015, 1, 1),
    createdAfter=datetime(2015, 1, 1),
    name='string',
    states=[
        'PENDING'|'SUBMITTED'|'RUNNING'|'FAILED'|'CANCELLED'|'CANCEL_PENDING'|'COMPLETED',
    ],
    maxResults=123,
    nextToken='string'
)

type virtualClusterId:

string

param virtualClusterId:

[REQUIRED]

The ID of the virtual cluster for which to list the job run.

type createdBefore:

datetime

param createdBefore:

The date and time before which the job runs were submitted.

type createdAfter:

datetime

param createdAfter:

The date and time after which the job runs were submitted.

type name:

string

param name:

The name of the job run.

type states:

list

param states:

The states of the job run.

(string) --

type maxResults:

integer

param maxResults:

The maximum number of job runs that can be listed.

type nextToken:

string

param nextToken:

The token for the next set of job runs to return.

rtype:

dict

returns:

Response Syntax

{
    'jobRuns': [
        {
            'id': 'string',
            'name': 'string',
            'virtualClusterId': 'string',
            'arn': 'string',
            'state': 'PENDING'|'SUBMITTED'|'RUNNING'|'FAILED'|'CANCELLED'|'CANCEL_PENDING'|'COMPLETED',
            'clientToken': 'string',
            'executionRoleArn': 'string',
            'releaseLabel': 'string',
            'configurationOverrides': {
                'applicationConfiguration': [
                    {
                        'classification': 'string',
                        'properties': {
                            'string': 'string'
                        },
                        'configurations': {'... recursive ...'}
                    },
                ],
                'monitoringConfiguration': {
                    'persistentAppUI': 'ENABLED'|'DISABLED',
                    'cloudWatchMonitoringConfiguration': {
                        'logGroupName': 'string',
                        'logStreamNamePrefix': 'string'
                    },
                    's3MonitoringConfiguration': {
                        'logUri': 'string'
                    }
                }
            },
            'jobDriver': {
                'sparkSubmitJobDriver': {
                    'entryPoint': 'string',
                    'entryPointArguments': [
                        'string',
                    ],
                    'sparkSubmitParameters': 'string'
                },
                'sparkSqlJobDriver': {
                    'entryPoint': 'string',
                    'sparkSqlParameters': 'string'
                }
            },
            'createdAt': datetime(2015, 1, 1),
            'createdBy': 'string',
            'finishedAt': datetime(2015, 1, 1),
            'stateDetails': 'string',
            'failureReason': 'INTERNAL_ERROR'|'USER_ERROR'|'VALIDATION_ERROR'|'CLUSTER_UNAVAILABLE',
            'tags': {
                'string': 'string'
            }
        },
    ],
    'nextToken': 'string'
}

Response Structure

(dict) --
- jobRuns (list) --
  
  This output lists information about the specified job runs.
  - (dict) --
    
    This entity describes a job run. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.
    - id (string) --
      
      The ID of the job run.
    - name (string) --
      
      The name of the job run.
    - virtualClusterId (string) --
      
      The ID of the job run's virtual cluster.
    - arn (string) --
      
      The ARN of job run.
    - state (string) --
      
      The state of the job run.
    - clientToken (string) --
      
      The client token used to start a job run.
    - executionRoleArn (string) --
      
      The execution role ARN of the job run.
    - releaseLabel (string) --
      
      The release version of Amazon EMR.
    - configurationOverrides (dict) --
      
      The configuration settings that are used to override default configuration.
      - applicationConfiguration (list) --
        
        The configurations for the application running by the job run.
        
        (dict) --
        
        A configuration specification to be used when provisioning virtual clusters, which can include configurations for applications and software bundled with Amazon EMR on EKS. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.
        
        classification (string) --
        
        The classification within a configuration.
        
        properties (dict) --
        
        A set of properties specified within a configuration classification.
        
        (string) --
        
        (string) --
        
        configurations (list) --
        
        A list of additional configurations to apply within a configuration object.
      - monitoringConfiguration (dict) --
        
        The configurations for monitoring.
        
        persistentAppUI (string) --
        
        Monitoring configurations for the persistent application UI.
        
        cloudWatchMonitoringConfiguration (dict) --
        
        Monitoring configurations for CloudWatch.
        
        logGroupName (string) --
        
        The name of the log group for log publishing.
        
        logStreamNamePrefix (string) --
        
        The specified name prefix for log streams.
        
        s3MonitoringConfiguration (dict) --
        
        Amazon S3 configuration for monitoring log publishing.
        
        logUri (string) --
        
        Amazon S3 destination URI for log publishing.
    - jobDriver (dict) --
      
      Parameters of job driver for the job run.
      - sparkSubmitJobDriver (dict) --
        
        The job driver parameters specified for spark submit.
        
        entryPoint (string) --
        
        The entry point of job application.
        
        entryPointArguments (list) --
        
        The arguments for job application.
        
        (string) --
        
        sparkSubmitParameters (string) --
        
        The Spark submit parameters that are used for job runs.
      - sparkSqlJobDriver (dict) --
        
        The job driver for job type.
        
        entryPoint (string) --
        
        The SQL file to be executed.
        
        sparkSqlParameters (string) --
        
        The Spark parameters to be included in the Spark SQL command.
    - createdAt (datetime) --
      
      The date and time when the job run was created.
    - createdBy (string) --
      
      The user who created the job run.
    - finishedAt (datetime) --
      
      The date and time when the job run has finished.
    - stateDetails (string) --
      
      Additional details of the job run state.
    - failureReason (string) --
      
      The reasons why the job run has failed.
    - tags (dict) --
      
      The assigned tags of the job run.
      - (string) --
        
        (string) --
- nextToken (string) --
  
  This output displays the token for the next set of job runs.

StartJobRun (updated)

Link ¶
Changes (request)

{'jobDriver': {'sparkSqlJobDriver': {'entryPoint': 'string',
                                     'sparkSqlParameters': 'string'}}}

Starts a job run. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.

See also: AWS API Documentation

Request Syntax

client.start_job_run(
    name='string',
    virtualClusterId='string',
    clientToken='string',
    executionRoleArn='string',
    releaseLabel='string',
    jobDriver={
        'sparkSubmitJobDriver': {
            'entryPoint': 'string',
            'entryPointArguments': [
                'string',
            ],
            'sparkSubmitParameters': 'string'
        },
        'sparkSqlJobDriver': {
            'entryPoint': 'string',
            'sparkSqlParameters': 'string'
        }
    },
    configurationOverrides={
        'applicationConfiguration': [
            {
                'classification': 'string',
                'properties': {
                    'string': 'string'
                },
                'configurations': {'... recursive ...'}
            },
        ],
        'monitoringConfiguration': {
            'persistentAppUI': 'ENABLED'|'DISABLED',
            'cloudWatchMonitoringConfiguration': {
                'logGroupName': 'string',
                'logStreamNamePrefix': 'string'
            },
            's3MonitoringConfiguration': {
                'logUri': 'string'
            }
        }
    },
    tags={
        'string': 'string'
    }
)

type name:

string

param name:

The name of the job run.

type virtualClusterId:

string

param virtualClusterId:

[REQUIRED]

The virtual cluster ID for which the job run request is submitted.

type clientToken:

string

param clientToken:

[REQUIRED]

The client idempotency token of the job run request.

This field is autopopulated if not provided.

type executionRoleArn:

string

param executionRoleArn:

[REQUIRED]

The execution role ARN for the job run.

type releaseLabel:

string

param releaseLabel:

[REQUIRED]

The Amazon EMR release version to use for the job run.

type jobDriver:

dict

param jobDriver:

[REQUIRED]

The job driver for the job run.

sparkSubmitJobDriver (dict) --

The job driver parameters specified for spark submit.
- entryPoint (string) -- [REQUIRED]
  
  The entry point of job application.
- entryPointArguments (list) --
  
  The arguments for job application.
  - (string) --
- sparkSubmitParameters (string) --
  
  The Spark submit parameters that are used for job runs.
sparkSqlJobDriver (dict) --

The job driver for job type.
- entryPoint (string) --
  
  The SQL file to be executed.
- sparkSqlParameters (string) --
  
  The Spark parameters to be included in the Spark SQL command.

type configurationOverrides:

dict

param configurationOverrides:

The configuration overrides for the job run.

applicationConfiguration (list) --

The configurations for the application running by the job run.
- (dict) --
  
  A configuration specification to be used when provisioning virtual clusters, which can include configurations for applications and software bundled with Amazon EMR on EKS. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.
  - classification (string) -- [REQUIRED]
    
    The classification within a configuration.
  - properties (dict) --
    
    A set of properties specified within a configuration classification.
    - (string) --
      - (string) --
  - configurations (list) --
    
    A list of additional configurations to apply within a configuration object.
monitoringConfiguration (dict) --

The configurations for monitoring.
- persistentAppUI (string) --
  
  Monitoring configurations for the persistent application UI.
- cloudWatchMonitoringConfiguration (dict) --
  
  Monitoring configurations for CloudWatch.
  - logGroupName (string) -- [REQUIRED]
    
    The name of the log group for log publishing.
  - logStreamNamePrefix (string) --
    
    The specified name prefix for log streams.
- s3MonitoringConfiguration (dict) --
  
  Amazon S3 configuration for monitoring log publishing.
  - logUri (string) -- [REQUIRED]
    
    Amazon S3 destination URI for log publishing.

type tags:

dict

param tags:

The tags assigned to job runs.

(string) --
- (string) --

rtype:

dict

returns:

Response Syntax

{
    'id': 'string',
    'name': 'string',
    'arn': 'string',
    'virtualClusterId': 'string'
}

Response Structure

(dict) --
- id (string) --
  
  This output displays the started job run ID.
- name (string) --
  
  This output displays the name of the started job run.
- arn (string) --
  
  This output lists the ARN of job run.
- virtualClusterId (string) --
  
  This output displays the virtual cluster ID for which the job run was submitted.