Amazon EMR

2023/05/10 - Amazon EMR - 3 updated api methods

Changes  EMR Studio now supports programmatically executing a Notebooks on an EMR on EKS cluster. In addition, notebooks can now be executed by specifying its location in S3.

DescribeNotebookExecution (updated) Link ¶
Changes (response)
{'NotebookExecution': {'EnvironmentVariables': {'string': 'string'},
                       'ExecutionEngine': {'ExecutionRoleArn': 'string'},
                       'NotebookS3Location': {'Bucket': 'string',
                                              'Key': 'string'},
                       'OutputNotebookFormat': 'HTML',
                       'OutputNotebookS3Location': {'Bucket': 'string',
                                                    'Key': 'string'}}}

Provides details of a notebook execution.

See also: AWS API Documentation

Request Syntax

client.describe_notebook_execution(
    NotebookExecutionId='string'
)
type NotebookExecutionId

string

param NotebookExecutionId

[REQUIRED]

The unique identifier of the notebook execution.

rtype

dict

returns

Response Syntax

{
    'NotebookExecution': {
        'NotebookExecutionId': 'string',
        'EditorId': 'string',
        'ExecutionEngine': {
            'Id': 'string',
            'Type': 'EMR',
            'MasterInstanceSecurityGroupId': 'string',
            'ExecutionRoleArn': 'string'
        },
        'NotebookExecutionName': 'string',
        'NotebookParams': 'string',
        'Status': 'START_PENDING'|'STARTING'|'RUNNING'|'FINISHING'|'FINISHED'|'FAILING'|'FAILED'|'STOP_PENDING'|'STOPPING'|'STOPPED',
        'StartTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1),
        'Arn': 'string',
        'OutputNotebookURI': 'string',
        'LastStateChangeReason': 'string',
        'NotebookInstanceSecurityGroupId': 'string',
        'Tags': [
            {
                'Key': 'string',
                'Value': 'string'
            },
        ],
        'NotebookS3Location': {
            'Bucket': 'string',
            'Key': 'string'
        },
        'OutputNotebookS3Location': {
            'Bucket': 'string',
            'Key': 'string'
        },
        'OutputNotebookFormat': 'HTML',
        'EnvironmentVariables': {
            'string': 'string'
        }
    }
}

Response Structure

  • (dict) --

    • NotebookExecution (dict) --

      Properties of the notebook execution.

      • NotebookExecutionId (string) --

        The unique identifier of a notebook execution.

      • EditorId (string) --

        The unique identifier of the Amazon EMR Notebook that is used for the notebook execution.

      • ExecutionEngine (dict) --

        The execution engine, such as an Amazon EMR cluster, used to run the Amazon EMR notebook and perform the notebook execution.

        • Id (string) --

          The unique identifier of the execution engine. For an Amazon EMR cluster, this is the cluster ID.

        • Type (string) --

          The type of execution engine. A value of EMR specifies an Amazon EMR cluster.

        • MasterInstanceSecurityGroupId (string) --

          An optional unique ID of an Amazon EC2 security group to associate with the master instance of the Amazon EMR cluster for this notebook execution. For more information see Specifying Amazon EC2 Security Groups for Amazon EMR Notebooks in the EMR Management Guide .

        • ExecutionRoleArn (string) --

          The execution role ARN required for the notebook execution.

      • NotebookExecutionName (string) --

        A name for the notebook execution.

      • NotebookParams (string) --

        Input parameters in JSON format passed to the Amazon EMR Notebook at runtime for execution.

      • Status (string) --

        The status of the notebook execution.

        • START_PENDING indicates that the cluster has received the execution request but execution has not begun.

        • STARTING indicates that the execution is starting on the cluster.

        • RUNNING indicates that the execution is being processed by the cluster.

        • FINISHING indicates that execution processing is in the final stages.

        • FINISHED indicates that the execution has completed without error.

        • FAILING indicates that the execution is failing and will not finish successfully.

        • FAILED indicates that the execution failed.

        • STOP_PENDING indicates that the cluster has received a StopNotebookExecution request and the stop is pending.

        • STOPPING indicates that the cluster is in the process of stopping the execution as a result of a StopNotebookExecution request.

        • STOPPED indicates that the execution stopped because of a StopNotebookExecution request.

      • StartTime (datetime) --

        The timestamp when notebook execution started.

      • EndTime (datetime) --

        The timestamp when notebook execution ended.

      • Arn (string) --

        The Amazon Resource Name (ARN) of the notebook execution.

      • OutputNotebookURI (string) --

        The location of the notebook execution's output file in Amazon S3.

      • LastStateChangeReason (string) --

        The reason for the latest status change of the notebook execution.

      • NotebookInstanceSecurityGroupId (string) --

        The unique identifier of the Amazon EC2 security group associated with the Amazon EMR Notebook instance. For more information see Specifying Amazon EC2 Security Groups for Amazon EMR Notebooks in the Amazon EMR Management Guide .

      • Tags (list) --

        A list of tags associated with a notebook execution. Tags are user-defined key-value pairs that consist of a required key string with a maximum of 128 characters and an optional value string with a maximum of 256 characters.

        • (dict) --

          A key-value pair containing user-defined metadata that you can associate with an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters.

          • Key (string) --

            A user-defined key, which is the minimum required information for a valid tag. For more information, see Tag.

          • Value (string) --

            A user-defined value, which is optional in a tag. For more information, see Tag Clusters.

      • NotebookS3Location (dict) --

        The Amazon S3 location that stores the notebook execution input.

        • Bucket (string) --

          The Amazon S3 bucket that stores the notebook execution input.

        • Key (string) --

          The key to the Amazon S3 location that stores the notebook execution input.

      • OutputNotebookS3Location (dict) --

        The Amazon S3 location for the notebook execution output.

        • Bucket (string) --

          The Amazon S3 bucket that stores the notebook execution output.

        • Key (string) --

          The key to the Amazon S3 location that stores the notebook execution output.

      • OutputNotebookFormat (string) --

        The output format for the notebook execution.

      • EnvironmentVariables (dict) --

        The environment variables associated with the notebook execution.

        • (string) --

          • (string) --

ListNotebookExecutions (updated) Link ¶
Changes (request, response)
Request
{'ExecutionEngineId': 'string'}
Response
{'NotebookExecutions': {'ExecutionEngineId': 'string',
                        'NotebookS3Location': {'Bucket': 'string',
                                               'Key': 'string'}}}

Provides summaries of all notebook executions. You can filter the list based on multiple criteria such as status, time range, and editor id. Returns a maximum of 50 notebook executions and a marker to track the paging of a longer notebook execution list across multiple ListNotebookExecutions calls.

See also: AWS API Documentation

Request Syntax

client.list_notebook_executions(
    EditorId='string',
    Status='START_PENDING'|'STARTING'|'RUNNING'|'FINISHING'|'FINISHED'|'FAILING'|'FAILED'|'STOP_PENDING'|'STOPPING'|'STOPPED',
    From=datetime(2015, 1, 1),
    To=datetime(2015, 1, 1),
    Marker='string',
    ExecutionEngineId='string'
)
type EditorId

string

param EditorId

The unique ID of the editor associated with the notebook execution.

type Status

string

param Status

The status filter for listing notebook executions.

  • START_PENDING indicates that the cluster has received the execution request but execution has not begun.

  • STARTING indicates that the execution is starting on the cluster.

  • RUNNING indicates that the execution is being processed by the cluster.

  • FINISHING indicates that execution processing is in the final stages.

  • FINISHED indicates that the execution has completed without error.

  • FAILING indicates that the execution is failing and will not finish successfully.

  • FAILED indicates that the execution failed.

  • STOP_PENDING indicates that the cluster has received a StopNotebookExecution request and the stop is pending.

  • STOPPING indicates that the cluster is in the process of stopping the execution as a result of a StopNotebookExecution request.

  • STOPPED indicates that the execution stopped because of a StopNotebookExecution request.

type From

datetime

param From

The beginning of time range filter for listing notebook executions. The default is the timestamp of 30 days ago.

type To

datetime

param To

The end of time range filter for listing notebook executions. The default is the current timestamp.

type Marker

string

param Marker

The pagination token, returned by a previous ListNotebookExecutions call, that indicates the start of the list for this ListNotebookExecutions call.

type ExecutionEngineId

string

param ExecutionEngineId

The unique ID of the execution engine.

rtype

dict

returns

Response Syntax

{
    'NotebookExecutions': [
        {
            'NotebookExecutionId': 'string',
            'EditorId': 'string',
            'NotebookExecutionName': 'string',
            'Status': 'START_PENDING'|'STARTING'|'RUNNING'|'FINISHING'|'FINISHED'|'FAILING'|'FAILED'|'STOP_PENDING'|'STOPPING'|'STOPPED',
            'StartTime': datetime(2015, 1, 1),
            'EndTime': datetime(2015, 1, 1),
            'NotebookS3Location': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'ExecutionEngineId': 'string'
        },
    ],
    'Marker': 'string'
}

Response Structure

  • (dict) --

    • NotebookExecutions (list) --

      A list of notebook executions.

      • (dict) --

        Details for a notebook execution. The details include information such as the unique ID and status of the notebook execution.

        • NotebookExecutionId (string) --

          The unique identifier of the notebook execution.

        • EditorId (string) --

          The unique identifier of the editor associated with the notebook execution.

        • NotebookExecutionName (string) --

          The name of the notebook execution.

        • Status (string) --

          The status of the notebook execution.

          • START_PENDING indicates that the cluster has received the execution request but execution has not begun.

          • STARTING indicates that the execution is starting on the cluster.

          • RUNNING indicates that the execution is being processed by the cluster.

          • FINISHING indicates that execution processing is in the final stages.

          • FINISHED indicates that the execution has completed without error.

          • FAILING indicates that the execution is failing and will not finish successfully.

          • FAILED indicates that the execution failed.

          • STOP_PENDING indicates that the cluster has received a StopNotebookExecution request and the stop is pending.

          • STOPPING indicates that the cluster is in the process of stopping the execution as a result of a StopNotebookExecution request.

          • STOPPED indicates that the execution stopped because of a StopNotebookExecution request.

        • StartTime (datetime) --

          The timestamp when notebook execution started.

        • EndTime (datetime) --

          The timestamp when notebook execution started.

        • NotebookS3Location (dict) --

          The Amazon S3 location that stores the notebook execution input.

          • Bucket (string) --

            The Amazon S3 bucket that stores the notebook execution input.

          • Key (string) --

            The key to the Amazon S3 location that stores the notebook execution input.

        • ExecutionEngineId (string) --

          The unique ID of the execution engine for the notebook execution.

    • Marker (string) --

      A pagination token that a subsequent ListNotebookExecutions can use to determine the next set of results to retrieve.

StartNotebookExecution (updated) Link ¶
Changes (request)
{'EnvironmentVariables': {'string': 'string'},
 'ExecutionEngine': {'ExecutionRoleArn': 'string'},
 'NotebookS3Location': {'Bucket': 'string', 'Key': 'string'},
 'OutputNotebookFormat': 'HTML',
 'OutputNotebookS3Location': {'Bucket': 'string', 'Key': 'string'}}

Starts a notebook execution.

See also: AWS API Documentation

Request Syntax

client.start_notebook_execution(
    EditorId='string',
    RelativePath='string',
    NotebookExecutionName='string',
    NotebookParams='string',
    ExecutionEngine={
        'Id': 'string',
        'Type': 'EMR',
        'MasterInstanceSecurityGroupId': 'string',
        'ExecutionRoleArn': 'string'
    },
    ServiceRole='string',
    NotebookInstanceSecurityGroupId='string',
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ],
    NotebookS3Location={
        'Bucket': 'string',
        'Key': 'string'
    },
    OutputNotebookS3Location={
        'Bucket': 'string',
        'Key': 'string'
    },
    OutputNotebookFormat='HTML',
    EnvironmentVariables={
        'string': 'string'
    }
)
type EditorId

string

param EditorId

The unique identifier of the Amazon EMR Notebook to use for notebook execution.

type RelativePath

string

param RelativePath

The path and file name of the notebook file for this execution, relative to the path specified for the Amazon EMR Notebook. For example, if you specify a path of s3://MyBucket/MyNotebooks when you create an Amazon EMR Notebook for a notebook with an ID of e-ABCDEFGHIJK1234567890ABCD (the EditorID of this request), and you specify a RelativePath of my_notebook_executions/notebook_execution.ipynb , the location of the file for the notebook execution is s3://MyBucket/MyNotebooks/e-ABCDEFGHIJK1234567890ABCD/my_notebook_executions/notebook_execution.ipynb .

type NotebookExecutionName

string

param NotebookExecutionName

An optional name for the notebook execution.

type NotebookParams

string

param NotebookParams

Input parameters in JSON format passed to the Amazon EMR Notebook at runtime for execution.

type ExecutionEngine

dict

param ExecutionEngine

[REQUIRED]

Specifies the execution engine (cluster) that runs the notebook execution.

  • Id (string) -- [REQUIRED]

    The unique identifier of the execution engine. For an Amazon EMR cluster, this is the cluster ID.

  • Type (string) --

    The type of execution engine. A value of EMR specifies an Amazon EMR cluster.

  • MasterInstanceSecurityGroupId (string) --

    An optional unique ID of an Amazon EC2 security group to associate with the master instance of the Amazon EMR cluster for this notebook execution. For more information see Specifying Amazon EC2 Security Groups for Amazon EMR Notebooks in the EMR Management Guide .

  • ExecutionRoleArn (string) --

    The execution role ARN required for the notebook execution.

type ServiceRole

string

param ServiceRole

[REQUIRED]

The name or ARN of the IAM role that is used as the service role for Amazon EMR (the Amazon EMR role) for the notebook execution.

type NotebookInstanceSecurityGroupId

string

param NotebookInstanceSecurityGroupId

The unique identifier of the Amazon EC2 security group to associate with the Amazon EMR Notebook for this notebook execution.

type Tags

list

param Tags

A list of tags associated with a notebook execution. Tags are user-defined key-value pairs that consist of a required key string with a maximum of 128 characters and an optional value string with a maximum of 256 characters.

  • (dict) --

    A key-value pair containing user-defined metadata that you can associate with an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tag Clusters.

    • Key (string) --

      A user-defined key, which is the minimum required information for a valid tag. For more information, see Tag.

    • Value (string) --

      A user-defined value, which is optional in a tag. For more information, see Tag Clusters.

type NotebookS3Location

dict

param NotebookS3Location

The Amazon S3 location for the notebook execution input.

  • Bucket (string) --

    The Amazon S3 bucket that stores the notebook execution input.

  • Key (string) --

    The key to the Amazon S3 location that stores the notebook execution input.

type OutputNotebookS3Location

dict

param OutputNotebookS3Location

The Amazon S3 location for the notebook execution output.

  • Bucket (string) --

    The Amazon S3 bucket that stores the notebook execution output.

  • Key (string) --

    The key to the Amazon S3 location that stores the notebook execution output.

type OutputNotebookFormat

string

param OutputNotebookFormat

The output format for the notebook execution.

type EnvironmentVariables

dict

param EnvironmentVariables

The environment variables associated with the notebook execution.

  • (string) --

    • (string) --

rtype

dict

returns

Response Syntax

{
    'NotebookExecutionId': 'string'
}

Response Structure

  • (dict) --

    • NotebookExecutionId (string) --

      The unique identifier of the notebook execution.