Amazon Simple Storage Service

2019/10/28 - Amazon Simple Storage Service - 1 updated api methods

Changes  Adding support in SelectObjectContent for scanning a portion of an object specified by a scan range.

SelectObjectContent (updated) Link ΒΆ
Changes (request)
{'ScanRange': {'End': 'long', 'Start': 'long'}}

This operation filters the contents of an Amazon S3 object based on a simple Structured Query Language (SQL) statement. In the request, along with the SQL expression, you must also specify a data serialization format (JSON or CSV) of the object. Amazon S3 uses this to parse object data into records, and returns only records that match the specified SQL expression. You must also specify the data serialization format for the response.

See also: AWS API Documentation

Request Syntax

client.select_object_content(
    Bucket='string',
    Key='string',
    SSECustomerAlgorithm='string',
    SSECustomerKey=b'bytes',
    SSECustomerKeyMD5='string',
    Expression='string',
    ExpressionType='SQL',
    RequestProgress={
        'Enabled': True|False
    },
    InputSerialization={
        'CSV': {
            'FileHeaderInfo': 'USE'|'IGNORE'|'NONE',
            'Comments': 'string',
            'QuoteEscapeCharacter': 'string',
            'RecordDelimiter': 'string',
            'FieldDelimiter': 'string',
            'QuoteCharacter': 'string',
            'AllowQuotedRecordDelimiter': True|False
        },
        'CompressionType': 'NONE'|'GZIP'|'BZIP2',
        'JSON': {
            'Type': 'DOCUMENT'|'LINES'
        },
        'Parquet': {}

    },
    OutputSerialization={
        'CSV': {
            'QuoteFields': 'ALWAYS'|'ASNEEDED',
            'QuoteEscapeCharacter': 'string',
            'RecordDelimiter': 'string',
            'FieldDelimiter': 'string',
            'QuoteCharacter': 'string'
        },
        'JSON': {
            'RecordDelimiter': 'string'
        }
    },
    ScanRange={
        'Start': 123,
        'End': 123
    }
)
type Bucket

string

param Bucket

[REQUIRED]

The S3 bucket.

type Key

string

param Key

[REQUIRED]

The object key.

type SSECustomerAlgorithm

string

param SSECustomerAlgorithm

The SSE Algorithm used to encrypt the object. For more information, see Server-Side Encryption (Using Customer-Provided Encryption Keys.

type SSECustomerKey

bytes

param SSECustomerKey

The SSE Customer Key. For more information, see Server-Side Encryption (Using Customer-Provided Encryption Keys.

type SSECustomerKeyMD5

string

param SSECustomerKeyMD5

The SSE Customer Key MD5. For more information, see Server-Side Encryption (Using Customer-Provided Encryption Keys.

type Expression

string

param Expression

[REQUIRED]

The expression that is used to query the object.

type ExpressionType

string

param ExpressionType

[REQUIRED]

The type of the provided expression (for example., SQL).

type RequestProgress

dict

param RequestProgress

Specifies if periodic request progress information should be enabled.

  • Enabled (boolean) --

    Specifies whether periodic QueryProgress frames should be sent. Valid values: TRUE, FALSE. Default value: FALSE.

type InputSerialization

dict

param InputSerialization

[REQUIRED]

Describes the format of the data in the object that is being queried.

  • CSV (dict) --

    Describes the serialization of a CSV-encoded object.

    • FileHeaderInfo (string) --

      Describes the first line of input. Valid values: None, Ignore, Use.

    • Comments (string) --

      The single character used to indicate a row should be ignored when present at the start of a row.

    • QuoteEscapeCharacter (string) --

      The single character used for escaping the quote character inside an already escaped value.

    • RecordDelimiter (string) --

      The value used to separate individual records.

    • FieldDelimiter (string) --

      The value used to separate individual fields in a record.

    • QuoteCharacter (string) --

      Value used for escaping where the field delimiter is part of the value.

    • AllowQuotedRecordDelimiter (boolean) --

      Specifies that CSV field values may contain quoted record delimiters and such records should be allowed. Default value is FALSE. Setting this value to TRUE may lower performance.

  • CompressionType (string) --

    Specifies object's compression format. Valid values: NONE, GZIP, BZIP2. Default Value: NONE.

  • JSON (dict) --

    Specifies JSON as object's input serialization format.

    • Type (string) --

      The type of JSON. Valid values: Document, Lines.

  • Parquet (:class:`.EventStream`) --

    Specifies Parquet as object's input serialization format.

type OutputSerialization

dict

param OutputSerialization

[REQUIRED]

Describes the format of the data that you want Amazon S3 to return in response.

  • CSV (dict) --

    Describes the serialization of CSV-encoded Select results.

    • QuoteFields (string) --

      Indicates whether or not all output fields should be quoted.

    • QuoteEscapeCharacter (string) --

      Th single character used for escaping the quote character inside an already escaped value.

    • RecordDelimiter (string) --

      The value used to separate individual records.

    • FieldDelimiter (string) --

      The value used to separate individual fields in a record.

    • QuoteCharacter (string) --

      The value used for escaping where the field delimiter is part of the value.

  • JSON (dict) --

    Specifies JSON as request's output serialization format.

    • RecordDelimiter (string) --

      The value used to separate individual records in the output.

type ScanRange

dict

param ScanRange

Specifies the byte range of the object to get the records from. A record is processed when its first byte is contained by the range. This parameter is optional, but when specified, it must not be empty. See RFC 2616, Section 14.35.1 about how to specify the start and end of the range.

  • Start (integer) --

    Specifies the start of the byte range. This parameter is optional. Valid values: non-negative integers. The default value is 0.

  • End (integer) --

    Specifies the end of the byte range. This parameter is optional. Valid values: non-negative integers. The default value is one less than the size of the object being queried.

rtype

dict

returns

The response of this operation contains an :class:`.EventStream` member. When iterated the :class:`.EventStream` will yield events based on the structure below, where only one of the top level keys will be present for any given event.

Response Syntax

{
    'Payload': EventStream({
        'Records': {
            'Payload': b'bytes'
        },
        'Stats': {
            'Details': {
                'BytesScanned': 123,
                'BytesProcessed': 123,
                'BytesReturned': 123
            }
        },
        'Progress': {
            'Details': {
                'BytesScanned': 123,
                'BytesProcessed': 123,
                'BytesReturned': 123
            }
        },
        'Cont': {},
        'End': {}
    })
}

Response Structure

  • (dict) --

    • Payload (:class:`.EventStream`) --

      • Records (dict) --

        The Records Event.

        • Payload (bytes) --

          The byte array of partial, one or more result records.

      • Stats (dict) --

        The Stats Event.

        • Details (dict) --

          The Stats event details.

          • BytesScanned (integer) --

            The total number of object bytes scanned.

          • BytesProcessed (integer) --

            The total number of uncompressed object bytes processed.

          • BytesReturned (integer) --

            The total number of bytes of records payload data returned.

      • Progress (dict) --

        The Progress Event.

        • Details (dict) --

          The Progress event details.

          • BytesScanned (integer) --

            The current number of object bytes scanned.

          • BytesProcessed (integer) --

            The current number of uncompressed object bytes processed.

          • BytesReturned (integer) --

            The current number of bytes of records payload data returned.

      • Cont (:class:`.EventStream`) --

        The Continuation Event.

      • End (:class:`.EventStream`) --

        The End Event.