Amazon Elastic MapReduce

2014/12/17 - Amazon Elastic MapReduce - 3 updated api methods

DescribeCluster (updated) Link ¶
Changes (response)
{'Cluster': {'MasterPublicDnsName': 'string',
             'NormalizedInstanceHours': 'integer'}}

Provides cluster-level details including status, hardware and software configuration, VPC settings, and so on. For information about the cluster steps, see ListSteps.

Request Syntax

client.describe_cluster(
    ClusterId='string'
)
type ClusterId

string

param ClusterId

[REQUIRED]

The identifier of the cluster to describe.

rtype

dict

returns

Response Syntax

{
    'Cluster': {
        'Id': 'string',
        'Name': 'string',
        'Status': {
            'State': 'STARTING'|'BOOTSTRAPPING'|'RUNNING'|'WAITING'|'TERMINATING'|'TERMINATED'|'TERMINATED_WITH_ERRORS',
            'StateChangeReason': {
                'Code': 'INTERNAL_ERROR'|'VALIDATION_ERROR'|'INSTANCE_FAILURE'|'BOOTSTRAP_FAILURE'|'USER_REQUEST'|'STEP_FAILURE'|'ALL_STEPS_COMPLETED',
                'Message': 'string'
            },
            'Timeline': {
                'CreationDateTime': datetime(2015, 1, 1),
                'ReadyDateTime': datetime(2015, 1, 1),
                'EndDateTime': datetime(2015, 1, 1)
            }
        },
        'Ec2InstanceAttributes': {
            'Ec2KeyName': 'string',
            'Ec2SubnetId': 'string',
            'Ec2AvailabilityZone': 'string',
            'IamInstanceProfile': 'string'
        },
        'LogUri': 'string',
        'RequestedAmiVersion': 'string',
        'RunningAmiVersion': 'string',
        'AutoTerminate': True|False,
        'TerminationProtected': True|False,
        'VisibleToAllUsers': True|False,
        'Applications': [
            {
                'Name': 'string',
                'Version': 'string',
                'Args': [
                    'string',
                ],
                'AdditionalInfo': {
                    'string': 'string'
                }
            },
        ],
        'Tags': [
            {
                'Key': 'string',
                'Value': 'string'
            },
        ],
        'ServiceRole': 'string',
        'NormalizedInstanceHours': 123,
        'MasterPublicDnsName': 'string'
    }
}

Response Structure

  • (dict) --

    This output contains the description of the cluster.

    • Cluster (dict) --

      This output contains the details for the requested cluster.

      • Id (string) --

        The unique identifier for the cluster.

      • Name (string) --

        The name of the cluster.

      • Status (dict) --

        The current status details about the cluster.

        • State (string) --

          The current state of the cluster.

        • StateChangeReason (dict) --

          The reason for the cluster status change.

          • Code (string) --

            The programmatic code for the state change reason.

          • Message (string) --

            The descriptive message for the state change reason.

        • Timeline (dict) --

          A timeline that represents the status of a cluster over the lifetime of the cluster.

          • CreationDateTime (datetime) --

            The creation date and time of the cluster.

          • ReadyDateTime (datetime) --

            The date and time when the cluster was ready to execute steps.

          • EndDateTime (datetime) --

            The date and time when the cluster was terminated.

      • Ec2InstanceAttributes (dict) --

        Provides information about the EC2 instances in a cluster grouped by category. For example, key name, subnet ID, IAM instance profile, and so on.

        • Ec2KeyName (string) --

          The name of the Amazon EC2 key pair to use when connecting with SSH into the master node as a user named "hadoop".

        • Ec2SubnetId (string) --

          To launch the job flow in Amazon VPC, set this parameter to the identifier of the Amazon VPC subnet where you want the job flow to launch. If you do not specify this value, the job flow is launched in the normal AWS cloud, outside of a VPC.

          Amazon VPC currently does not support cluster compute quadruple extra large (cc1.4xlarge) instances. Thus, you cannot specify the cc1.4xlarge instance type for nodes of a job flow launched in a VPC.

        • Ec2AvailabilityZone (string) --

          The Availability Zone in which the cluster will run.

        • IamInstanceProfile (string) --

          The IAM role that was specified when the job flow was launched. The EC2 instances of the job flow assume this role.

      • LogUri (string) --

        The path to the Amazon S3 location where logs for this cluster are stored.

      • RequestedAmiVersion (string) --

        The AMI version requested for this cluster.

      • RunningAmiVersion (string) --

        The AMI version running on this cluster. This differs from the requested version only if the requested version is a meta version, such as "latest".

      • AutoTerminate (boolean) --

        Specifies whether the cluster should terminate after completing all steps.

      • TerminationProtected (boolean) --

        Indicates whether Amazon EMR will lock the cluster to prevent the EC2 instances from being terminated by an API call or user intervention, or in the event of a cluster error.

      • VisibleToAllUsers (boolean) --

        Indicates whether the job flow is visible to all IAM users of the AWS account associated with the job flow. If this value is set to true , all IAM users of that AWS account can view and manage the job flow if they have the proper policy permissions set. If this value is false , only the IAM user that created the cluster can view and manage it. This value can be changed using the SetVisibleToAllUsers action.

      • Applications (list) --

        The applications installed on this cluster.

        • (dict) --

          An application is any Amazon or third-party software that you can add to the cluster. This structure contains a list of strings that indicates the software to use with the cluster and accepts a user argument list. Amazon EMR accepts and forwards the argument list to the corresponding installation script as bootstrap action argument. For more information, see Launch a Job Flow on the MapR Distribution for Hadoop. Currently supported values are:

          • "mapr-m3" - launch the job flow using MapR M3 Edition.

          • "mapr-m5" - launch the job flow using MapR M5 Edition.

          • "mapr" with the user arguments specifying "--edition,m3" or "--edition,m5" - launch the job flow using MapR M3 or M5 Edition, respectively.

          • Name (string) --

            The name of the application.

          • Version (string) --

            The version of the application.

          • Args (list) --

            Arguments for Amazon EMR to pass to the application.

            • (string) --

          • AdditionalInfo (dict) --

            This option is for advanced users only. This is meta information about third-party applications that third-party vendors use for testing purposes.

            • (string) --

              • (string) --

      • Tags (list) --

        A list of tags associated with a cluster.

        • (dict) --

          A key/value pair containing user-defined metadata that you can associate with an Amazon EMR resource. Tags make it easier to associate clusters in various ways, such as grouping clusters to track your Amazon EMR resource allocation costs. For more information, see Tagging Amazon EMR Resources.

      • ServiceRole (string) --

        The IAM role that will be assumed by the Amazon EMR service to access AWS resources on your behalf.

      • NormalizedInstanceHours (integer) --

        An approximation of the cost of the job flow, represented in m1.small/hours. This value is incremented one time for every hour an m1.small instance runs. Larger instances are weighted more, so an EC2 instance that is roughly four times more expensive would result in the normalized instance hours being incremented by four. This result is only an approximation and does not reflect the actual billing rate.

      • MasterPublicDnsName (string) --

        The public DNS name of the master Ec2 instance.

ListClusters (updated) Link ¶
Changes (response)
{'Clusters': {'NormalizedInstanceHours': 'integer'}}

Provides the status of all clusters visible to this AWS account. Allows you to filter the list of clusters based on certain criteria; for example, filtering by cluster creation date and time or by status. This call returns a maximum of 50 clusters per call, but returns a marker to track the paging of the cluster list across multiple ListClusters calls.

Request Syntax

client.list_clusters(
    CreatedAfter=datetime(2015, 1, 1),
    CreatedBefore=datetime(2015, 1, 1),
    ClusterStates=[
        'STARTING'|'BOOTSTRAPPING'|'RUNNING'|'WAITING'|'TERMINATING'|'TERMINATED'|'TERMINATED_WITH_ERRORS',
    ],
    Marker='string'
)
type CreatedAfter

datetime

param CreatedAfter

The creation date and time beginning value filter for listing clusters .

type CreatedBefore

datetime

param CreatedBefore

The creation date and time end value filter for listing clusters .

type ClusterStates

list

param ClusterStates

The cluster state filters to apply when listing clusters.

  • (string) --

type Marker

string

param Marker

The pagination token that indicates the next set of results to retrieve.

rtype

dict

returns

Response Syntax

{
    'Clusters': [
        {
            'Id': 'string',
            'Name': 'string',
            'Status': {
                'State': 'STARTING'|'BOOTSTRAPPING'|'RUNNING'|'WAITING'|'TERMINATING'|'TERMINATED'|'TERMINATED_WITH_ERRORS',
                'StateChangeReason': {
                    'Code': 'INTERNAL_ERROR'|'VALIDATION_ERROR'|'INSTANCE_FAILURE'|'BOOTSTRAP_FAILURE'|'USER_REQUEST'|'STEP_FAILURE'|'ALL_STEPS_COMPLETED',
                    'Message': 'string'
                },
                'Timeline': {
                    'CreationDateTime': datetime(2015, 1, 1),
                    'ReadyDateTime': datetime(2015, 1, 1),
                    'EndDateTime': datetime(2015, 1, 1)
                }
            },
            'NormalizedInstanceHours': 123
        },
    ],
    'Marker': 'string'
}

Response Structure

  • (dict) --

    This contains a ClusterSummaryList with the cluster details; for example, the cluster IDs, names, and status.

    • Clusters (list) --

      The list of clusters for the account based on the given filters.

      • (dict) --

        The summary description of the cluster.

        • Id (string) --

          The unique identifier for the cluster.

        • Name (string) --

          The name of the cluster.

        • Status (dict) --

          The details about the current status of the cluster.

          • State (string) --

            The current state of the cluster.

          • StateChangeReason (dict) --

            The reason for the cluster status change.

            • Code (string) --

              The programmatic code for the state change reason.

            • Message (string) --

              The descriptive message for the state change reason.

          • Timeline (dict) --

            A timeline that represents the status of a cluster over the lifetime of the cluster.

            • CreationDateTime (datetime) --

              The creation date and time of the cluster.

            • ReadyDateTime (datetime) --

              The date and time when the cluster was ready to execute steps.

            • EndDateTime (datetime) --

              The date and time when the cluster was terminated.

        • NormalizedInstanceHours (integer) --

          An approximation of the cost of the job flow, represented in m1.small/hours. This value is incremented one time for every hour an m1.small instance runs. Larger instances are weighted more, so an EC2 instance that is roughly four times more expensive would result in the normalized instance hours being incremented by four. This result is only an approximation and does not reflect the actual billing rate.

    • Marker (string) --

      The pagination token that indicates the next set of results to retrieve.

ListSteps (updated) Link ¶
Changes (request, response)
Request
{'StepIds': ['string']}
Response
{'Steps': {'ActionOnFailure': 'TERMINATE_JOB_FLOW | TERMINATE_CLUSTER | '
                              'CANCEL_AND_WAIT | CONTINUE',
           'Config': {'Args': ['string'],
                      'Jar': 'string',
                      'MainClass': 'string',
                      'Properties': {'string': 'string'}}}}

Provides a list of steps for the cluster.

Request Syntax

client.list_steps(
    ClusterId='string',
    StepStates=[
        'PENDING'|'RUNNING'|'COMPLETED'|'CANCELLED'|'FAILED'|'INTERRUPTED',
    ],
    StepIds=[
        'string',
    ],
    Marker='string'
)
type ClusterId

string

param ClusterId

[REQUIRED]

The identifier of the cluster for which to list the steps.

type StepStates

list

param StepStates

The filter to limit the step list based on certain states.

  • (string) --

type StepIds

list

param StepIds

The filter to limit the step list based on the identifier of the steps.

  • (string) --

type Marker

string

param Marker

The pagination token that indicates the next set of results to retrieve.

rtype

dict

returns

Response Syntax

{
    'Steps': [
        {
            'Id': 'string',
            'Name': 'string',
            'Config': {
                'Jar': 'string',
                'Properties': {
                    'string': 'string'
                },
                'MainClass': 'string',
                'Args': [
                    'string',
                ]
            },
            'ActionOnFailure': 'TERMINATE_JOB_FLOW'|'TERMINATE_CLUSTER'|'CANCEL_AND_WAIT'|'CONTINUE',
            'Status': {
                'State': 'PENDING'|'RUNNING'|'COMPLETED'|'CANCELLED'|'FAILED'|'INTERRUPTED',
                'StateChangeReason': {
                    'Code': 'NONE',
                    'Message': 'string'
                },
                'Timeline': {
                    'CreationDateTime': datetime(2015, 1, 1),
                    'StartDateTime': datetime(2015, 1, 1),
                    'EndDateTime': datetime(2015, 1, 1)
                }
            }
        },
    ],
    'Marker': 'string'
}

Response Structure

  • (dict) --

    This output contains the list of steps.

    • Steps (list) --

      The filtered list of steps for the cluster.

      • (dict) --

        The summary of the cluster step.

        • Id (string) --

          The identifier of the cluster step.

        • Name (string) --

          The name of the cluster step.

        • Config (dict) --

          The Hadoop job configuration of the cluster step.

          • Jar (string) --

            The path to the JAR file that runs during the step.

          • Properties (dict) --

            The list of Java properties that are set when the step runs. You can use these properties to pass key value pairs to your main function.

            • (string) --

              • (string) --

          • MainClass (string) --

            The name of the main class in the specified Java file. If not specified, the JAR file should specify a main class in its manifest file.

          • Args (list) --

            The list of command line arguments to pass to the JAR file's main function for execution.

            • (string) --

        • ActionOnFailure (string) --

          This specifies what action to take when the cluster step fails. Possible values are TERMINATE_CLUSTER, CANCEL_AND_WAIT, and CONTINUE.

        • Status (dict) --

          The current execution status details of the cluster step.

          • State (string) --

            The execution state of the cluster step.

          • StateChangeReason (dict) --

            The reason for the step execution status change.

            • Code (string) --

              The programmable code for the state change reason.

            • Message (string) --

              The descriptive message for the state change reason.

          • Timeline (dict) --

            The timeline of the cluster step status over time.

            • CreationDateTime (datetime) --

              The date and time when the cluster step was created.

            • StartDateTime (datetime) --

              The date and time when the cluster step execution started.

            • EndDateTime (datetime) --

              The date and time when the cluster step execution completed or failed.

    • Marker (string) --

      The pagination token that indicates the next set of results to retrieve.