Amazon Athena

2023/05/15 - Amazon Athena - 2 updated api methods

Changes  You can now define custom spark properties at start of the session for use cases like cluster encryption, table formats, and general Spark tuning.

GetSession (updated) Link ¶
Changes (response)
{'EngineConfiguration': {'SparkProperties': {'string': 'string'}}}

Gets the full details of a previously created session, including the session status and configuration.

See also: AWS API Documentation

Request Syntax

client.get_session(
    SessionId='string'
)
type SessionId:

string

param SessionId:

[REQUIRED]

The session ID.

rtype:

dict

returns:

Response Syntax

{
    'SessionId': 'string',
    'Description': 'string',
    'WorkGroup': 'string',
    'EngineVersion': 'string',
    'EngineConfiguration': {
        'CoordinatorDpuSize': 123,
        'MaxConcurrentDpus': 123,
        'DefaultExecutorDpuSize': 123,
        'AdditionalConfigs': {
            'string': 'string'
        },
        'SparkProperties': {
            'string': 'string'
        }
    },
    'NotebookVersion': 'string',
    'SessionConfiguration': {
        'ExecutionRole': 'string',
        'WorkingDirectory': 'string',
        'IdleTimeoutSeconds': 123,
        'EncryptionConfiguration': {
            'EncryptionOption': 'SSE_S3'|'SSE_KMS'|'CSE_KMS',
            'KmsKey': 'string'
        }
    },
    'Status': {
        'StartDateTime': datetime(2015, 1, 1),
        'LastModifiedDateTime': datetime(2015, 1, 1),
        'EndDateTime': datetime(2015, 1, 1),
        'IdleSinceDateTime': datetime(2015, 1, 1),
        'State': 'CREATING'|'CREATED'|'IDLE'|'BUSY'|'TERMINATING'|'TERMINATED'|'DEGRADED'|'FAILED',
        'StateChangeReason': 'string'
    },
    'Statistics': {
        'DpuExecutionInMillis': 123
    }
}

Response Structure

  • (dict) --

    • SessionId (string) --

      The session ID.

    • Description (string) --

      The session description.

    • WorkGroup (string) --

      The workgroup to which the session belongs.

    • EngineVersion (string) --

      The engine version used by the session (for example, PySpark engine version 3). You can get a list of engine versions by calling ListEngineVersions.

    • EngineConfiguration (dict) --

      Contains engine configuration information like DPU usage.

      • CoordinatorDpuSize (integer) --

        The number of DPUs to use for the coordinator. A coordinator is a special executor that orchestrates processing work and manages other executors in a notebook session. The default is 1.

      • MaxConcurrentDpus (integer) --

        The maximum number of DPUs that can run concurrently.

      • DefaultExecutorDpuSize (integer) --

        The default number of DPUs to use for executors. An executor is the smallest unit of compute that a notebook session can request from Athena. The default is 1.

      • AdditionalConfigs (dict) --

        Contains additional notebook engine MAP<string, string> parameter mappings in the form of key-value pairs. To specify an Athena notebook that the Jupyter server will download and serve, specify a value for the StartSessionRequest$NotebookVersion field, and then add a key named NotebookId to AdditionalConfigs that has the value of the Athena notebook ID.

        • (string) --

          • (string) --

      • SparkProperties (dict) --

        Specifies custom jar files and Spark properties for use cases like cluster encryption, table formats, and general Spark tuning.

        • (string) --

          • (string) --

    • NotebookVersion (string) --

      The notebook version.

    • SessionConfiguration (dict) --

      Contains the workgroup configuration information used by the session.

      • ExecutionRole (string) --

        The ARN of the execution role used for the session.

      • WorkingDirectory (string) --

        The Amazon S3 location that stores information for the notebook.

      • IdleTimeoutSeconds (integer) --

        The idle timeout in seconds for the session.

      • EncryptionConfiguration (dict) --

        If query and calculation results are encrypted in Amazon S3, indicates the encryption option used (for example, SSE_KMS or CSE_KMS) and key information.

        • EncryptionOption (string) --

          Indicates whether Amazon S3 server-side encryption with Amazon S3-managed keys ( SSE_S3), server-side encryption with KMS-managed keys ( SSE_KMS), or client-side encryption with KMS-managed keys ( CSE_KMS) is used.

          If a query runs in a workgroup and the workgroup overrides client-side settings, then the workgroup's setting for encryption is used. It specifies whether query results must be encrypted, for all queries that run in this workgroup.

        • KmsKey (string) --

          For SSE_KMS and CSE_KMS, this is the KMS key ARN or ID.

    • Status (dict) --

      Contains information about the status of the session.

      • StartDateTime (datetime) --

        The date and time that the session started.

      • LastModifiedDateTime (datetime) --

        The most recent date and time that the session was modified.

      • EndDateTime (datetime) --

        The date and time that the session ended.

      • IdleSinceDateTime (datetime) --

        The date and time starting at which the session became idle. Can be empty if the session is not currently idle.

      • State (string) --

        The state of the session. A description of each state follows.

        CREATING - The session is being started, including acquiring resources.

        CREATED - The session has been started.

        IDLE - The session is able to accept a calculation.

        BUSY - The session is processing another task and is unable to accept a calculation.

        TERMINATING - The session is in the process of shutting down.

        TERMINATED - The session and its resources are no longer running.

        DEGRADED - The session has no healthy coordinators.

        FAILED - Due to a failure, the session and its resources are no longer running.

      • StateChangeReason (string) --

        The reason for the session state change (for example, canceled because the session was terminated).

    • Statistics (dict) --

      Contains the DPU execution time.

      • DpuExecutionInMillis (integer) --

        The data processing unit execution time for a session in milliseconds.

StartSession (updated) Link ¶
Changes (request)
{'EngineConfiguration': {'SparkProperties': {'string': 'string'}}}

Creates a session for running calculations within a workgroup. The session is ready when it reaches an IDLE state.

See also: AWS API Documentation

Request Syntax

client.start_session(
    Description='string',
    WorkGroup='string',
    EngineConfiguration={
        'CoordinatorDpuSize': 123,
        'MaxConcurrentDpus': 123,
        'DefaultExecutorDpuSize': 123,
        'AdditionalConfigs': {
            'string': 'string'
        },
        'SparkProperties': {
            'string': 'string'
        }
    },
    NotebookVersion='string',
    SessionIdleTimeoutInMinutes=123,
    ClientRequestToken='string'
)
type Description:

string

param Description:

The session description.

type WorkGroup:

string

param WorkGroup:

[REQUIRED]

The workgroup to which the session belongs.

type EngineConfiguration:

dict

param EngineConfiguration:

[REQUIRED]

Contains engine data processing unit (DPU) configuration settings and parameter mappings.

  • CoordinatorDpuSize (integer) --

    The number of DPUs to use for the coordinator. A coordinator is a special executor that orchestrates processing work and manages other executors in a notebook session. The default is 1.

  • MaxConcurrentDpus (integer) -- [REQUIRED]

    The maximum number of DPUs that can run concurrently.

  • DefaultExecutorDpuSize (integer) --

    The default number of DPUs to use for executors. An executor is the smallest unit of compute that a notebook session can request from Athena. The default is 1.

  • AdditionalConfigs (dict) --

    Contains additional notebook engine MAP<string, string> parameter mappings in the form of key-value pairs. To specify an Athena notebook that the Jupyter server will download and serve, specify a value for the StartSessionRequest$NotebookVersion field, and then add a key named NotebookId to AdditionalConfigs that has the value of the Athena notebook ID.

    • (string) --

      • (string) --

  • SparkProperties (dict) --

    Specifies custom jar files and Spark properties for use cases like cluster encryption, table formats, and general Spark tuning.

    • (string) --

      • (string) --

type NotebookVersion:

string

param NotebookVersion:

The notebook version. This value is supplied automatically for notebook sessions in the Athena console and is not required for programmatic session access. The only valid notebook version is Athena notebook version 1. If you specify a value for NotebookVersion, you must also specify a value for NotebookId. See EngineConfiguration$AdditionalConfigs.

type SessionIdleTimeoutInMinutes:

integer

param SessionIdleTimeoutInMinutes:

The idle timeout in minutes for the session.

type ClientRequestToken:

string

param ClientRequestToken:

A unique case-sensitive string used to ensure the request to create the session is idempotent (executes only once). If another StartSessionRequest is received, the same response is returned and another session is not created. If a parameter has changed, an error is returned.

rtype:

dict

returns:

Response Syntax

{
    'SessionId': 'string',
    'State': 'CREATING'|'CREATED'|'IDLE'|'BUSY'|'TERMINATING'|'TERMINATED'|'DEGRADED'|'FAILED'
}

Response Structure

  • (dict) --

    • SessionId (string) --

      The session ID.

    • State (string) --

      The state of the session. A description of each state follows.

      CREATING - The session is being started, including acquiring resources.

      CREATED - The session has been started.

      IDLE - The session is able to accept a calculation.

      BUSY - The session is processing another task and is unable to accept a calculation.

      TERMINATING - The session is in the process of shutting down.

      TERMINATED - The session and its resources are no longer running.

      DEGRADED - The session has no healthy coordinators.

      FAILED - Due to a failure, the session and its resources are no longer running.