EMR Serverless

2024/05/30 - EMR Serverless - 1 new 4 updated api methods

Changes  The release adds support for spark structured streaming.

ListJobRunAttempts (new) Link ¶

Lists all attempt of a job run.

See also: AWS API Documentation

Request Syntax

client.list_job_run_attempts(
    applicationId='string',
    jobRunId='string',
    nextToken='string',
    maxResults=123
)
type applicationId

string

param applicationId

[REQUIRED]

The ID of the application for which to list job runs.

type jobRunId

string

param jobRunId

[REQUIRED]

The ID of the job run to list.

type nextToken

string

param nextToken

The token for the next set of job run attempt results.

type maxResults

integer

param maxResults

The maximum number of job run attempts to list.

rtype

dict

returns

Response Syntax

{
    'jobRunAttempts': [
        {
            'applicationId': 'string',
            'id': 'string',
            'name': 'string',
            'mode': 'BATCH'|'STREAMING',
            'arn': 'string',
            'createdBy': 'string',
            'jobCreatedAt': datetime(2015, 1, 1),
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'executionRole': 'string',
            'state': 'SUBMITTED'|'PENDING'|'SCHEDULED'|'RUNNING'|'SUCCESS'|'FAILED'|'CANCELLING'|'CANCELLED',
            'stateDetails': 'string',
            'releaseLabel': 'string',
            'type': 'string',
            'attempt': 123
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • jobRunAttempts (list) --

      The array of the listed job run attempt objects.

      • (dict) --

        The summary of attributes associated with a job run attempt.

        • applicationId (string) --

          The ID of the application the job is running on.

        • id (string) --

          The ID of the job run attempt.

        • name (string) --

          The name of the job run attempt.

        • mode (string) --

          The mode of the job run attempt.

        • arn (string) --

          The Amazon Resource Name (ARN) of the job run.

        • createdBy (string) --

          The user who created the job run.

        • jobCreatedAt (datetime) --

          The date and time of when the job run was created.

        • createdAt (datetime) --

          The date and time when the job run attempt was created.

        • updatedAt (datetime) --

          The date and time of when the job run attempt was last updated.

        • executionRole (string) --

          The Amazon Resource Name (ARN) of the execution role of the job run..

        • state (string) --

          The state of the job run attempt.

        • stateDetails (string) --

          The state details of the job run attempt.

        • releaseLabel (string) --

          The Amazon EMR release label of the job run attempt.

        • type (string) --

          The type of the job run, such as Spark or Hive.

        • attempt (integer) --

          The attempt number of the job run execution.

    • nextToken (string) --

      The output displays the token for the next set of application results. This is required for pagination and is available as a response of the previous request.

GetDashboardForJobRun (updated) Link ¶
Changes (request)
{'attempt': 'integer'}

Creates and returns a URL that you can use to access the application UIs for a job run.

For jobs in a running state, the application UI is a live user interface such as the Spark or Tez web UI. For completed jobs, the application UI is a persistent application user interface such as the Spark History Server or persistent Tez UI.

Note

The URL is valid for one hour after you generate it. To access the application UI after that hour elapses, you must invoke the API again to generate a new URL.

See also: AWS API Documentation

Request Syntax

client.get_dashboard_for_job_run(
    applicationId='string',
    jobRunId='string',
    attempt=123
)
type applicationId

string

param applicationId

[REQUIRED]

The ID of the application.

type jobRunId

string

param jobRunId

[REQUIRED]

The ID of the job run.

type attempt

integer

param attempt

An optimal parameter that indicates the amount of attempts for the job. If not specified, this value defaults to the attempt of the latest job.

rtype

dict

returns

Response Syntax

{
    'url': 'string'
}

Response Structure

  • (dict) --

    • url (string) --

      The URL to view job run's dashboard.

GetJobRun (updated) Link ¶
Changes (request, response)
Request
{'attempt': 'integer'}
Response
{'jobRun': {'attempt': 'integer',
            'attemptCreatedAt': 'timestamp',
            'attemptUpdatedAt': 'timestamp',
            'mode': 'BATCH | STREAMING',
            'retryPolicy': {'maxAttempts': 'integer',
                            'maxFailedAttemptsPerHour': 'integer'}}}

Displays detailed information about a job run.

See also: AWS API Documentation

Request Syntax

client.get_job_run(
    applicationId='string',
    jobRunId='string',
    attempt=123
)
type applicationId

string

param applicationId

[REQUIRED]

The ID of the application on which the job run is submitted.

type jobRunId

string

param jobRunId

[REQUIRED]

The ID of the job run.

type attempt

integer

param attempt

An optimal parameter that indicates the amount of attempts for the job. If not specified, this value defaults to the attempt of the latest job.

rtype

dict

returns

Response Syntax

{
    'jobRun': {
        'applicationId': 'string',
        'jobRunId': 'string',
        'name': 'string',
        'arn': 'string',
        'createdBy': 'string',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'executionRole': 'string',
        'state': 'SUBMITTED'|'PENDING'|'SCHEDULED'|'RUNNING'|'SUCCESS'|'FAILED'|'CANCELLING'|'CANCELLED',
        'stateDetails': 'string',
        'releaseLabel': 'string',
        'configurationOverrides': {
            'applicationConfiguration': [
                {
                    'classification': 'string',
                    'properties': {
                        'string': 'string'
                    },
                    'configurations': {'... recursive ...'}
                },
            ],
            'monitoringConfiguration': {
                's3MonitoringConfiguration': {
                    'logUri': 'string',
                    'encryptionKeyArn': 'string'
                },
                'managedPersistenceMonitoringConfiguration': {
                    'enabled': True|False,
                    'encryptionKeyArn': 'string'
                },
                'cloudWatchLoggingConfiguration': {
                    'enabled': True|False,
                    'logGroupName': 'string',
                    'logStreamNamePrefix': 'string',
                    'encryptionKeyArn': 'string',
                    'logTypes': {
                        'string': [
                            'string',
                        ]
                    }
                },
                'prometheusMonitoringConfiguration': {
                    'remoteWriteUrl': 'string'
                }
            }
        },
        'jobDriver': {
            'sparkSubmit': {
                'entryPoint': 'string',
                'entryPointArguments': [
                    'string',
                ],
                'sparkSubmitParameters': 'string'
            },
            'hive': {
                'query': 'string',
                'initQueryFile': 'string',
                'parameters': 'string'
            }
        },
        'tags': {
            'string': 'string'
        },
        'totalResourceUtilization': {
            'vCPUHour': 123.0,
            'memoryGBHour': 123.0,
            'storageGBHour': 123.0
        },
        'networkConfiguration': {
            'subnetIds': [
                'string',
            ],
            'securityGroupIds': [
                'string',
            ]
        },
        'totalExecutionDurationSeconds': 123,
        'executionTimeoutMinutes': 123,
        'billedResourceUtilization': {
            'vCPUHour': 123.0,
            'memoryGBHour': 123.0,
            'storageGBHour': 123.0
        },
        'mode': 'BATCH'|'STREAMING',
        'retryPolicy': {
            'maxAttempts': 123,
            'maxFailedAttemptsPerHour': 123
        },
        'attempt': 123,
        'attemptCreatedAt': datetime(2015, 1, 1),
        'attemptUpdatedAt': datetime(2015, 1, 1)
    }
}

Response Structure

  • (dict) --

    • jobRun (dict) --

      The output displays information about the job run.

      • applicationId (string) --

        The ID of the application the job is running on.

      • jobRunId (string) --

        The ID of the job run.

      • name (string) --

        The optional job run name. This doesn't have to be unique.

      • arn (string) --

        The execution role ARN of the job run.

      • createdBy (string) --

        The user who created the job run.

      • createdAt (datetime) --

        The date and time when the job run was created.

      • updatedAt (datetime) --

        The date and time when the job run was updated.

      • executionRole (string) --

        The execution role ARN of the job run.

      • state (string) --

        The state of the job run.

      • stateDetails (string) --

        The state details of the job run.

      • releaseLabel (string) --

        The Amazon EMR release associated with the application your job is running on.

      • configurationOverrides (dict) --

        The configuration settings that are used to override default configuration.

        • applicationConfiguration (list) --

          The override configurations for the application.

          • (dict) --

            A configuration specification to be used when provisioning an application. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.

            • classification (string) --

              The classification within a configuration.

            • properties (dict) --

              A set of properties specified within a configuration classification.

              • (string) --

                • (string) --

            • configurations (list) --

              A list of additional configurations to apply within a configuration object.

        • monitoringConfiguration (dict) --

          The override configurations for monitoring.

          • s3MonitoringConfiguration (dict) --

            The Amazon S3 configuration for monitoring log publishing.

            • logUri (string) --

              The Amazon S3 destination URI for log publishing.

            • encryptionKeyArn (string) --

              The KMS key ARN to encrypt the logs published to the given Amazon S3 destination.

          • managedPersistenceMonitoringConfiguration (dict) --

            The managed log persistence configuration for a job run.

            • enabled (boolean) --

              Enables managed logging and defaults to true. If set to false, managed logging will be turned off.

            • encryptionKeyArn (string) --

              The KMS key ARN to encrypt the logs stored in managed log persistence.

          • cloudWatchLoggingConfiguration (dict) --

            The Amazon CloudWatch configuration for monitoring logs. You can configure your jobs to send log information to CloudWatch.

            • enabled (boolean) --

              Enables CloudWatch logging.

            • logGroupName (string) --

              The name of the log group in Amazon CloudWatch Logs where you want to publish your logs.

            • logStreamNamePrefix (string) --

              Prefix for the CloudWatch log stream name.

            • encryptionKeyArn (string) --

              The Key Management Service (KMS) key ARN to encrypt the logs that you store in CloudWatch Logs.

            • logTypes (dict) --

              The types of logs that you want to publish to CloudWatch. If you don't specify any log types, driver STDOUT and STDERR logs will be published to CloudWatch Logs by default. For more information including the supported worker types for Hive and Spark, see Logging for EMR Serverless with CloudWatch.

              • Key Valid Values : SPARK_DRIVER , SPARK_EXECUTOR , HIVE_DRIVER , TEZ_TASK

              • Array Members Valid Values : STDOUT , STDERR , HIVE_LOG , TEZ_AM , SYSTEM_LOGS

              • (string) --

                Worker type for an analytics framework.

                • (list) --

                  • (string) --

                    Log type for a Spark/Hive job-run.

          • prometheusMonitoringConfiguration (dict) --

            The monitoring configuration object you can configure to send metrics to Amazon Managed Service for Prometheus for a job run.

            • remoteWriteUrl (string) --

              The remote write URL in the Amazon Managed Service for Prometheus workspace to send metrics to.

      • jobDriver (dict) --

        The job driver for the job run.

        Note

        This is a Tagged Union structure. Only one of the following top level keys will be set: sparkSubmit, hive. If a client receives an unknown member it will set SDK_UNKNOWN_MEMBER as the top level key, which maps to the name or tag of the unknown member. The structure of SDK_UNKNOWN_MEMBER is as follows:

        'SDK_UNKNOWN_MEMBER': {'name': 'UnknownMemberName'}
        • sparkSubmit (dict) --

          The job driver parameters specified for Spark.

          • entryPoint (string) --

            The entry point for the Spark submit job run.

          • entryPointArguments (list) --

            The arguments for the Spark submit job run.

            • (string) --

          • sparkSubmitParameters (string) --

            The parameters for the Spark submit job run.

        • hive (dict) --

          The job driver parameters specified for Hive.

          • query (string) --

            The query for the Hive job run.

          • initQueryFile (string) --

            The query file for the Hive job run.

          • parameters (string) --

            The parameters for the Hive job run.

      • tags (dict) --

        The tags assigned to the job run.

        • (string) --

          • (string) --

      • totalResourceUtilization (dict) --

        The aggregate vCPU, memory, and storage resources used from the time the job starts to execute, until the time the job terminates, rounded up to the nearest second.

        • vCPUHour (float) --

          The aggregated vCPU used per hour from the time job start executing till the time job is terminated.

        • memoryGBHour (float) --

          The aggregated memory used per hour from the time job start executing till the time job is terminated.

        • storageGBHour (float) --

          The aggregated storage used per hour from the time job start executing till the time job is terminated.

      • networkConfiguration (dict) --

        The network configuration for customer VPC connectivity.

        • subnetIds (list) --

          The array of subnet Ids for customer VPC connectivity.

          • (string) --

        • securityGroupIds (list) --

          The array of security group Ids for customer VPC connectivity.

          • (string) --

      • totalExecutionDurationSeconds (integer) --

        The job run total execution duration in seconds. This field is only available for job runs in a COMPLETED , FAILED , or CANCELLED state.

      • executionTimeoutMinutes (integer) --

        Returns the job run timeout value from the StartJobRun call. If no timeout was specified, then it returns the default timeout of 720 minutes.

      • billedResourceUtilization (dict) --

        The aggregate vCPU, memory, and storage that Amazon Web Services has billed for the job run. The billed resources include a 1-minute minimum usage for workers, plus additional storage over 20 GB per worker. Note that billed resources do not include usage for idle pre-initialized workers.

        • vCPUHour (float) --

          The aggregated vCPU used per hour from the time the job starts executing until the job is terminated.

        • memoryGBHour (float) --

          The aggregated memory used per hour from the time the job starts executing until the job is terminated.

        • storageGBHour (float) --

          The aggregated storage used per hour from the time the job starts executing until the job is terminated.

      • mode (string) --

        The mode of the job run.

      • retryPolicy (dict) --

        The retry policy of the job run.

        • maxAttempts (integer) --

          Maximum number of attempts for the job run. This parameter is only applicable for BATCH mode.

        • maxFailedAttemptsPerHour (integer) --

          Maximum number of failed attempts per hour. This [arameter is only applicable for STREAMING mode.

      • attempt (integer) --

        The attempt of the job run.

      • attemptCreatedAt (datetime) --

        The date and time of when the job run attempt was created.

      • attemptUpdatedAt (datetime) --

        The date and time of when the job run attempt was last updated.

ListJobRuns (updated) Link ¶
Changes (request, response)
Request
{'mode': 'BATCH | STREAMING'}
Response
{'jobRuns': {'attempt': 'integer',
             'attemptCreatedAt': 'timestamp',
             'attemptUpdatedAt': 'timestamp',
             'mode': 'BATCH | STREAMING'}}

Lists job runs based on a set of parameters.

See also: AWS API Documentation

Request Syntax

client.list_job_runs(
    applicationId='string',
    nextToken='string',
    maxResults=123,
    createdAtAfter=datetime(2015, 1, 1),
    createdAtBefore=datetime(2015, 1, 1),
    states=[
        'SUBMITTED'|'PENDING'|'SCHEDULED'|'RUNNING'|'SUCCESS'|'FAILED'|'CANCELLING'|'CANCELLED',
    ],
    mode='BATCH'|'STREAMING'
)
type applicationId

string

param applicationId

[REQUIRED]

The ID of the application for which to list the job run.

type nextToken

string

param nextToken

The token for the next set of job run results.

type maxResults

integer

param maxResults

The maximum number of job runs that can be listed.

type createdAtAfter

datetime

param createdAtAfter

The lower bound of the option to filter by creation date and time.

type createdAtBefore

datetime

param createdAtBefore

The upper bound of the option to filter by creation date and time.

type states

list

param states

An optional filter for job run states. Note that if this filter contains multiple states, the resulting list will be grouped by the state.

  • (string) --

type mode

string

param mode

The mode of the job runs to list.

rtype

dict

returns

Response Syntax

{
    'jobRuns': [
        {
            'applicationId': 'string',
            'id': 'string',
            'name': 'string',
            'mode': 'BATCH'|'STREAMING',
            'arn': 'string',
            'createdBy': 'string',
            'createdAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'executionRole': 'string',
            'state': 'SUBMITTED'|'PENDING'|'SCHEDULED'|'RUNNING'|'SUCCESS'|'FAILED'|'CANCELLING'|'CANCELLED',
            'stateDetails': 'string',
            'releaseLabel': 'string',
            'type': 'string',
            'attempt': 123,
            'attemptCreatedAt': datetime(2015, 1, 1),
            'attemptUpdatedAt': datetime(2015, 1, 1)
        },
    ],
    'nextToken': 'string'
}

Response Structure

  • (dict) --

    • jobRuns (list) --

      The output lists information about the specified job runs.

      • (dict) --

        The summary of attributes associated with a job run.

        • applicationId (string) --

          The ID of the application the job is running on.

        • id (string) --

          The ID of the job run.

        • name (string) --

          The optional job run name. This doesn't have to be unique.

        • mode (string) --

          The mode of the job run.

        • arn (string) --

          The ARN of the job run.

        • createdBy (string) --

          The user who created the job run.

        • createdAt (datetime) --

          The date and time when the job run was created.

        • updatedAt (datetime) --

          The date and time when the job run was last updated.

        • executionRole (string) --

          The execution role ARN of the job run.

        • state (string) --

          The state of the job run.

        • stateDetails (string) --

          The state details of the job run.

        • releaseLabel (string) --

          The Amazon EMR release associated with the application your job is running on.

        • type (string) --

          The type of job run, such as Spark or Hive.

        • attempt (integer) --

          The attempt number of the job run execution.

        • attemptCreatedAt (datetime) --

          The date and time of when the job run attempt was created.

        • attemptUpdatedAt (datetime) --

          The date and time of when the job run attempt was last updated.

    • nextToken (string) --

      The output displays the token for the next set of job run results. This is required for pagination and is available as a response of the previous request.

StartJobRun (updated) Link ¶
Changes (request)
{'mode': 'BATCH | STREAMING',
 'retryPolicy': {'maxAttempts': 'integer',
                 'maxFailedAttemptsPerHour': 'integer'}}

Starts a job run.

See also: AWS API Documentation

Request Syntax

client.start_job_run(
    applicationId='string',
    clientToken='string',
    executionRoleArn='string',
    jobDriver={
        'sparkSubmit': {
            'entryPoint': 'string',
            'entryPointArguments': [
                'string',
            ],
            'sparkSubmitParameters': 'string'
        },
        'hive': {
            'query': 'string',
            'initQueryFile': 'string',
            'parameters': 'string'
        }
    },
    configurationOverrides={
        'applicationConfiguration': [
            {
                'classification': 'string',
                'properties': {
                    'string': 'string'
                },
                'configurations': {'... recursive ...'}
            },
        ],
        'monitoringConfiguration': {
            's3MonitoringConfiguration': {
                'logUri': 'string',
                'encryptionKeyArn': 'string'
            },
            'managedPersistenceMonitoringConfiguration': {
                'enabled': True|False,
                'encryptionKeyArn': 'string'
            },
            'cloudWatchLoggingConfiguration': {
                'enabled': True|False,
                'logGroupName': 'string',
                'logStreamNamePrefix': 'string',
                'encryptionKeyArn': 'string',
                'logTypes': {
                    'string': [
                        'string',
                    ]
                }
            },
            'prometheusMonitoringConfiguration': {
                'remoteWriteUrl': 'string'
            }
        }
    },
    tags={
        'string': 'string'
    },
    executionTimeoutMinutes=123,
    name='string',
    mode='BATCH'|'STREAMING',
    retryPolicy={
        'maxAttempts': 123,
        'maxFailedAttemptsPerHour': 123
    }
)
type applicationId

string

param applicationId

[REQUIRED]

The ID of the application on which to run the job.

type clientToken

string

param clientToken

[REQUIRED]

The client idempotency token of the job run to start. Its value must be unique for each request.

This field is autopopulated if not provided.

type executionRoleArn

string

param executionRoleArn

[REQUIRED]

The execution role ARN for the job run.

type jobDriver

dict

param jobDriver

The job driver for the job run.

Note

This is a Tagged Union structure. Only one of the following top level keys can be set: sparkSubmit, hive.

  • sparkSubmit (dict) --

    The job driver parameters specified for Spark.

    • entryPoint (string) -- [REQUIRED]

      The entry point for the Spark submit job run.

    • entryPointArguments (list) --

      The arguments for the Spark submit job run.

      • (string) --

    • sparkSubmitParameters (string) --

      The parameters for the Spark submit job run.

  • hive (dict) --

    The job driver parameters specified for Hive.

    • query (string) -- [REQUIRED]

      The query for the Hive job run.

    • initQueryFile (string) --

      The query file for the Hive job run.

    • parameters (string) --

      The parameters for the Hive job run.

type configurationOverrides

dict

param configurationOverrides

The configuration overrides for the job run.

  • applicationConfiguration (list) --

    The override configurations for the application.

    • (dict) --

      A configuration specification to be used when provisioning an application. A configuration consists of a classification, properties, and optional nested configurations. A classification refers to an application-specific configuration file. Properties are the settings you want to change in that file.

      • classification (string) -- [REQUIRED]

        The classification within a configuration.

      • properties (dict) --

        A set of properties specified within a configuration classification.

        • (string) --

          • (string) --

      • configurations (list) --

        A list of additional configurations to apply within a configuration object.

  • monitoringConfiguration (dict) --

    The override configurations for monitoring.

    • s3MonitoringConfiguration (dict) --

      The Amazon S3 configuration for monitoring log publishing.

      • logUri (string) --

        The Amazon S3 destination URI for log publishing.

      • encryptionKeyArn (string) --

        The KMS key ARN to encrypt the logs published to the given Amazon S3 destination.

    • managedPersistenceMonitoringConfiguration (dict) --

      The managed log persistence configuration for a job run.

      • enabled (boolean) --

        Enables managed logging and defaults to true. If set to false, managed logging will be turned off.

      • encryptionKeyArn (string) --

        The KMS key ARN to encrypt the logs stored in managed log persistence.

    • cloudWatchLoggingConfiguration (dict) --

      The Amazon CloudWatch configuration for monitoring logs. You can configure your jobs to send log information to CloudWatch.

      • enabled (boolean) -- [REQUIRED]

        Enables CloudWatch logging.

      • logGroupName (string) --

        The name of the log group in Amazon CloudWatch Logs where you want to publish your logs.

      • logStreamNamePrefix (string) --

        Prefix for the CloudWatch log stream name.

      • encryptionKeyArn (string) --

        The Key Management Service (KMS) key ARN to encrypt the logs that you store in CloudWatch Logs.

      • logTypes (dict) --

        The types of logs that you want to publish to CloudWatch. If you don't specify any log types, driver STDOUT and STDERR logs will be published to CloudWatch Logs by default. For more information including the supported worker types for Hive and Spark, see Logging for EMR Serverless with CloudWatch.

        • Key Valid Values : SPARK_DRIVER , SPARK_EXECUTOR , HIVE_DRIVER , TEZ_TASK

        • Array Members Valid Values : STDOUT , STDERR , HIVE_LOG , TEZ_AM , SYSTEM_LOGS

        • (string) --

          Worker type for an analytics framework.

          • (list) --

            • (string) --

              Log type for a Spark/Hive job-run.

    • prometheusMonitoringConfiguration (dict) --

      The monitoring configuration object you can configure to send metrics to Amazon Managed Service for Prometheus for a job run.

      • remoteWriteUrl (string) --

        The remote write URL in the Amazon Managed Service for Prometheus workspace to send metrics to.

type tags

dict

param tags

The tags assigned to the job run.

  • (string) --

    • (string) --

type executionTimeoutMinutes

integer

param executionTimeoutMinutes

The maximum duration for the job run to run. If the job run runs beyond this duration, it will be automatically cancelled.

type name

string

param name

The optional job run name. This doesn't have to be unique.

type mode

string

param mode

The mode of the job run when it starts.

type retryPolicy

dict

param retryPolicy

The retry policy when job run starts.

  • maxAttempts (integer) --

    Maximum number of attempts for the job run. This parameter is only applicable for BATCH mode.

  • maxFailedAttemptsPerHour (integer) --

    Maximum number of failed attempts per hour. This [arameter is only applicable for STREAMING mode.

rtype

dict

returns

Response Syntax

{
    'applicationId': 'string',
    'jobRunId': 'string',
    'arn': 'string'
}

Response Structure

  • (dict) --

    • applicationId (string) --

      This output displays the application ID on which the job run was submitted.

    • jobRunId (string) --

      The output contains the ID of the started job run.

    • arn (string) --

      This output displays the ARN of the job run..