2023/02/09 - Amazon EMR Containers - 3 updated api methods
Changes EMR on EKS allows configuring retry policies for job runs through the StartJobRun API. Using retry policies, a job cause a driver pod to be restarted automatically if it fails or is deleted. The job's status can be seen in the DescribeJobRun and ListJobRun APIs and monitored using CloudWatch events.
{'jobRun': {'retryPolicyConfiguration': {'maxAttempts': 'integer'}, 'retryPolicyExecution': {'currentAttemptCount': 'integer'}}}
Displays detailed information about a job run. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.
See also: AWS API Documentation
Request Syntax
client.describe_job_run( id='string', virtualClusterId='string' )
string
[REQUIRED]
The ID of the job run request.
string
[REQUIRED]
The ID of the virtual cluster for which the job run is submitted.
dict
Response Syntax
{ 'jobRun': { 'id': 'string', 'name': 'string', 'virtualClusterId': 'string', 'arn': 'string', 'state': 'PENDING'|'SUBMITTED'|'RUNNING'|'FAILED'|'CANCELLED'|'CANCEL_PENDING'|'COMPLETED', 'clientToken': 'string', 'executionRoleArn': 'string', 'releaseLabel': 'string', 'configurationOverrides': { 'applicationConfiguration': [ { 'classification': 'string', 'properties': { 'string': 'string' }, 'configurations': {'... recursive ...'} }, ], 'monitoringConfiguration': { 'persistentAppUI': 'ENABLED'|'DISABLED', 'cloudWatchMonitoringConfiguration': { 'logGroupName': 'string', 'logStreamNamePrefix': 'string' }, 's3MonitoringConfiguration': { 'logUri': 'string' } } }, 'jobDriver': { 'sparkSubmitJobDriver': { 'entryPoint': 'string', 'entryPointArguments': [ 'string', ], 'sparkSubmitParameters': 'string' }, 'sparkSqlJobDriver': { 'entryPoint': 'string', 'sparkSqlParameters': 'string' } }, 'createdAt': datetime(2015, 1, 1), 'createdBy': 'string', 'finishedAt': datetime(2015, 1, 1), 'stateDetails': 'string', 'failureReason': 'INTERNAL_ERROR'|'USER_ERROR'|'VALIDATION_ERROR'|'CLUSTER_UNAVAILABLE', 'tags': { 'string': 'string' }, 'retryPolicyConfiguration': { 'maxAttempts': 123 }, 'retryPolicyExecution': { 'currentAttemptCount': 123 } } }
Response Structure
(dict) --
jobRun (dict) --
The output displays information about a job run.
id (string) --
The ID of the job run.
name (string) --
The name of the job run.
virtualClusterId (string) --
The ID of the job run's virtual cluster.
arn (string) --
The ARN of job run.
state (string) --
The state of the job run.
clientToken (string) --
The client token used to start a job run.
executionRoleArn (string) --
The execution role ARN of the job run.
releaseLabel (string) --
The release version of Amazon EMR.
configurationOverrides (dict) --
The configuration settings that are used to override default configuration.
applicationConfiguration (list) --
The configurations for the application running by the job run.
(dict) --
A configuration specification to be used when provisioning virtual clusters, which can include configurations for applications and software bundled with Amazon EMR on EKS. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.
classification (string) --
The classification within a configuration.
properties (dict) --
A set of properties specified within a configuration classification.
(string) --
(string) --
configurations (list) --
A list of additional configurations to apply within a configuration object.
monitoringConfiguration (dict) --
The configurations for monitoring.
persistentAppUI (string) --
Monitoring configurations for the persistent application UI.
cloudWatchMonitoringConfiguration (dict) --
Monitoring configurations for CloudWatch.
logGroupName (string) --
The name of the log group for log publishing.
logStreamNamePrefix (string) --
The specified name prefix for log streams.
s3MonitoringConfiguration (dict) --
Amazon S3 configuration for monitoring log publishing.
logUri (string) --
Amazon S3 destination URI for log publishing.
jobDriver (dict) --
Parameters of job driver for the job run.
sparkSubmitJobDriver (dict) --
The job driver parameters specified for spark submit.
entryPoint (string) --
The entry point of job application.
entryPointArguments (list) --
The arguments for job application.
(string) --
sparkSubmitParameters (string) --
The Spark submit parameters that are used for job runs.
sparkSqlJobDriver (dict) --
The job driver for job type.
entryPoint (string) --
The SQL file to be executed.
sparkSqlParameters (string) --
The Spark parameters to be included in the Spark SQL command.
createdAt (datetime) --
The date and time when the job run was created.
createdBy (string) --
The user who created the job run.
finishedAt (datetime) --
The date and time when the job run has finished.
stateDetails (string) --
Additional details of the job run state.
failureReason (string) --
The reasons why the job run has failed.
tags (dict) --
The assigned tags of the job run.
(string) --
(string) --
retryPolicyConfiguration (dict) --
The configuration of the retry policy that the job runs on.
maxAttempts (integer) --
The maximum number of attempts on the job's driver.
retryPolicyExecution (dict) --
The current status of the retry policy executed on the job.
currentAttemptCount (integer) --
The current number of attempts made on the driver of the job.
{'jobRuns': {'retryPolicyConfiguration': {'maxAttempts': 'integer'}, 'retryPolicyExecution': {'currentAttemptCount': 'integer'}}}
Lists job runs based on a set of parameters. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.
See also: AWS API Documentation
Request Syntax
client.list_job_runs( virtualClusterId='string', createdBefore=datetime(2015, 1, 1), createdAfter=datetime(2015, 1, 1), name='string', states=[ 'PENDING'|'SUBMITTED'|'RUNNING'|'FAILED'|'CANCELLED'|'CANCEL_PENDING'|'COMPLETED', ], maxResults=123, nextToken='string' )
string
[REQUIRED]
The ID of the virtual cluster for which to list the job run.
datetime
The date and time before which the job runs were submitted.
datetime
The date and time after which the job runs were submitted.
string
The name of the job run.
list
The states of the job run.
(string) --
integer
The maximum number of job runs that can be listed.
string
The token for the next set of job runs to return.
dict
Response Syntax
{ 'jobRuns': [ { 'id': 'string', 'name': 'string', 'virtualClusterId': 'string', 'arn': 'string', 'state': 'PENDING'|'SUBMITTED'|'RUNNING'|'FAILED'|'CANCELLED'|'CANCEL_PENDING'|'COMPLETED', 'clientToken': 'string', 'executionRoleArn': 'string', 'releaseLabel': 'string', 'configurationOverrides': { 'applicationConfiguration': [ { 'classification': 'string', 'properties': { 'string': 'string' }, 'configurations': {'... recursive ...'} }, ], 'monitoringConfiguration': { 'persistentAppUI': 'ENABLED'|'DISABLED', 'cloudWatchMonitoringConfiguration': { 'logGroupName': 'string', 'logStreamNamePrefix': 'string' }, 's3MonitoringConfiguration': { 'logUri': 'string' } } }, 'jobDriver': { 'sparkSubmitJobDriver': { 'entryPoint': 'string', 'entryPointArguments': [ 'string', ], 'sparkSubmitParameters': 'string' }, 'sparkSqlJobDriver': { 'entryPoint': 'string', 'sparkSqlParameters': 'string' } }, 'createdAt': datetime(2015, 1, 1), 'createdBy': 'string', 'finishedAt': datetime(2015, 1, 1), 'stateDetails': 'string', 'failureReason': 'INTERNAL_ERROR'|'USER_ERROR'|'VALIDATION_ERROR'|'CLUSTER_UNAVAILABLE', 'tags': { 'string': 'string' }, 'retryPolicyConfiguration': { 'maxAttempts': 123 }, 'retryPolicyExecution': { 'currentAttemptCount': 123 } }, ], 'nextToken': 'string' }
Response Structure
(dict) --
jobRuns (list) --
This output lists information about the specified job runs.
(dict) --
This entity describes a job run. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.
id (string) --
The ID of the job run.
name (string) --
The name of the job run.
virtualClusterId (string) --
The ID of the job run's virtual cluster.
arn (string) --
The ARN of job run.
state (string) --
The state of the job run.
clientToken (string) --
The client token used to start a job run.
executionRoleArn (string) --
The execution role ARN of the job run.
releaseLabel (string) --
The release version of Amazon EMR.
configurationOverrides (dict) --
The configuration settings that are used to override default configuration.
applicationConfiguration (list) --
The configurations for the application running by the job run.
(dict) --
A configuration specification to be used when provisioning virtual clusters, which can include configurations for applications and software bundled with Amazon EMR on EKS. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.
classification (string) --
The classification within a configuration.
properties (dict) --
A set of properties specified within a configuration classification.
(string) --
(string) --
configurations (list) --
A list of additional configurations to apply within a configuration object.
monitoringConfiguration (dict) --
The configurations for monitoring.
persistentAppUI (string) --
Monitoring configurations for the persistent application UI.
cloudWatchMonitoringConfiguration (dict) --
Monitoring configurations for CloudWatch.
logGroupName (string) --
The name of the log group for log publishing.
logStreamNamePrefix (string) --
The specified name prefix for log streams.
s3MonitoringConfiguration (dict) --
Amazon S3 configuration for monitoring log publishing.
logUri (string) --
Amazon S3 destination URI for log publishing.
jobDriver (dict) --
Parameters of job driver for the job run.
sparkSubmitJobDriver (dict) --
The job driver parameters specified for spark submit.
entryPoint (string) --
The entry point of job application.
entryPointArguments (list) --
The arguments for job application.
(string) --
sparkSubmitParameters (string) --
The Spark submit parameters that are used for job runs.
sparkSqlJobDriver (dict) --
The job driver for job type.
entryPoint (string) --
The SQL file to be executed.
sparkSqlParameters (string) --
The Spark parameters to be included in the Spark SQL command.
createdAt (datetime) --
The date and time when the job run was created.
createdBy (string) --
The user who created the job run.
finishedAt (datetime) --
The date and time when the job run has finished.
stateDetails (string) --
Additional details of the job run state.
failureReason (string) --
The reasons why the job run has failed.
tags (dict) --
The assigned tags of the job run.
(string) --
(string) --
retryPolicyConfiguration (dict) --
The configuration of the retry policy that the job runs on.
maxAttempts (integer) --
The maximum number of attempts on the job's driver.
retryPolicyExecution (dict) --
The current status of the retry policy executed on the job.
currentAttemptCount (integer) --
The current number of attempts made on the driver of the job.
nextToken (string) --
This output displays the token for the next set of job runs.
{'retryPolicyConfiguration': {'maxAttempts': 'integer'}}
Starts a job run. A job run is a unit of work, such as a Spark jar, PySpark script, or SparkSQL query, that you submit to Amazon EMR on EKS.
See also: AWS API Documentation
Request Syntax
client.start_job_run( name='string', virtualClusterId='string', clientToken='string', executionRoleArn='string', releaseLabel='string', jobDriver={ 'sparkSubmitJobDriver': { 'entryPoint': 'string', 'entryPointArguments': [ 'string', ], 'sparkSubmitParameters': 'string' }, 'sparkSqlJobDriver': { 'entryPoint': 'string', 'sparkSqlParameters': 'string' } }, configurationOverrides={ 'applicationConfiguration': [ { 'classification': 'string', 'properties': { 'string': 'string' }, 'configurations': {'... recursive ...'} }, ], 'monitoringConfiguration': { 'persistentAppUI': 'ENABLED'|'DISABLED', 'cloudWatchMonitoringConfiguration': { 'logGroupName': 'string', 'logStreamNamePrefix': 'string' }, 's3MonitoringConfiguration': { 'logUri': 'string' } } }, tags={ 'string': 'string' }, jobTemplateId='string', jobTemplateParameters={ 'string': 'string' }, retryPolicyConfiguration={ 'maxAttempts': 123 } )
string
The name of the job run.
string
[REQUIRED]
The virtual cluster ID for which the job run request is submitted.
string
[REQUIRED]
The client idempotency token of the job run request.
This field is autopopulated if not provided.
string
The execution role ARN for the job run.
string
The Amazon EMR release version to use for the job run.
dict
The job driver for the job run.
sparkSubmitJobDriver (dict) --
The job driver parameters specified for spark submit.
entryPoint (string) -- [REQUIRED]
The entry point of job application.
entryPointArguments (list) --
The arguments for job application.
(string) --
sparkSubmitParameters (string) --
The Spark submit parameters that are used for job runs.
sparkSqlJobDriver (dict) --
The job driver for job type.
entryPoint (string) --
The SQL file to be executed.
sparkSqlParameters (string) --
The Spark parameters to be included in the Spark SQL command.
dict
The configuration overrides for the job run.
applicationConfiguration (list) --
The configurations for the application running by the job run.
(dict) --
A configuration specification to be used when provisioning virtual clusters, which can include configurations for applications and software bundled with Amazon EMR on EKS. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.
classification (string) -- [REQUIRED]
The classification within a configuration.
properties (dict) --
A set of properties specified within a configuration classification.
(string) --
(string) --
configurations (list) --
A list of additional configurations to apply within a configuration object.
monitoringConfiguration (dict) --
The configurations for monitoring.
persistentAppUI (string) --
Monitoring configurations for the persistent application UI.
cloudWatchMonitoringConfiguration (dict) --
Monitoring configurations for CloudWatch.
logGroupName (string) -- [REQUIRED]
The name of the log group for log publishing.
logStreamNamePrefix (string) --
The specified name prefix for log streams.
s3MonitoringConfiguration (dict) --
Amazon S3 configuration for monitoring log publishing.
logUri (string) -- [REQUIRED]
Amazon S3 destination URI for log publishing.
dict
The tags assigned to job runs.
(string) --
(string) --
string
The job template ID to be used to start the job run.
dict
The values of job template parameters to start a job run.
(string) --
(string) --
dict
The retry policy configuration for the job run.
maxAttempts (integer) -- [REQUIRED]
The maximum number of attempts on the job's driver.
dict
Response Syntax
{ 'id': 'string', 'name': 'string', 'arn': 'string', 'virtualClusterId': 'string' }
Response Structure
(dict) --
id (string) --
This output displays the started job run ID.
name (string) --
This output displays the name of the started job run.
arn (string) --
This output lists the ARN of job run.
virtualClusterId (string) --
This output displays the virtual cluster ID for which the job run was submitted.