Amazon Kinesis

2017/07/06 - Amazon Kinesis - 2 new 4 updated api methods

Changes  You can now encrypt your data at rest within an Amazon Kinesis Stream using server-side encryption. Server-side encryption via AWS KMS makes it easy for customers to meet strict data management requirements by encrypting their data at rest within the Amazon Kinesis Streams, a fully managed real-time data processing service.

StartStreamEncryption (new) Link ¶

Enables or updates server-side encryption using an AWS KMS key for a specified stream.

Starting encryption is an asynchronous operation. Upon receiving the request, Amazon Kinesis returns immediately and sets the status of the stream to UPDATING . After the update is complete, Amazon Kinesis sets the status of the stream back to ACTIVE . Updating or applying encryption normally takes a few seconds to complete but it can take minutes. You can continue to read and write data to your stream while its status is UPDATING . Once the status of the stream is ACTIVE , records written to the stream will begin to be encrypted.

API Limits: You can successfully apply a new AWS KMS key for server-side encryption 25 times in a rolling 24 hour period.

Note: It can take up to 5 seconds after the stream is in an ACTIVE status before all records written to the stream are encrypted. After you’ve enabled encryption, you can verify encryption was applied by inspecting the API response from PutRecord or PutRecords .

See also: AWS API Documentation

Request Syntax

client.start_stream_encryption(
    StreamName='string',
    EncryptionType='NONE'|'KMS',
    KeyId='string'
)
type StreamName

string

param StreamName

[REQUIRED]

The name of the stream for which to start encrypting records.

type EncryptionType

string

param EncryptionType

[REQUIRED]

The encryption type to use. This parameter can be one of the following values:

  • NONE : Not valid for this operation. An InvalidOperationException will be thrown.

  • KMS : Use server-side encryption on the records in the stream using a customer-managed KMS key.

type KeyId

string

param KeyId

[REQUIRED]

The GUID for the customer-managed KMS key to use for encryption. You can also use a Kinesis-owned master key by specifying the alias aws/kinesis .

returns

None

StopStreamEncryption (new) Link ¶

Disables server-side encryption for a specified stream.

Stopping encryption is an asynchronous operation. Upon receiving the request, Amazon Kinesis returns immediately and sets the status of the stream to UPDATING . After the update is complete, Amazon Kinesis sets the status of the stream back to ACTIVE . Stopping encryption normally takes a few seconds to complete but it can take minutes. You can continue to read and write data to your stream while its status is UPDATING . Once the status of the stream is ACTIVE records written to the stream will no longer be encrypted by the Amazon Kinesis Streams service.

API Limits: You can successfully disable server-side encryption 25 times in a rolling 24 hour period.

Note: It can take up to 5 seconds after the stream is in an ACTIVE status before all records written to the stream are no longer subject to encryption. After you’ve disabled encryption, you can verify encryption was not applied by inspecting the API response from PutRecord or PutRecords .

See also: AWS API Documentation

Request Syntax

client.stop_stream_encryption(
    StreamName='string',
    EncryptionType='NONE'|'KMS',
    KeyId='string'
)
type StreamName

string

param StreamName

[REQUIRED]

The name of the stream on which to stop encrypting records.

type EncryptionType

string

param EncryptionType

[REQUIRED]

The encryption type. This parameter can be one of the following values:

  • NONE : Not valid for this operation. An InvalidOperationException will be thrown.

  • KMS : Use server-side encryption on the records in the stream using a customer-managed KMS key.

type KeyId

string

param KeyId

[REQUIRED]

The GUID for the customer-managed key that was used for encryption.

returns

None

DescribeStream (updated) Link ¶
Changes (response)
{'StreamDescription': {'EncryptionType': 'NONE | KMS', 'KeyId': 'string'}}

Describes the specified Amazon Kinesis stream.

The information returned includes the stream name, Amazon Resource Name (ARN), creation time, enhanced metric configuration, and shard map. The shard map is an array of shard objects. For each shard object, there is the hash key and sequence number ranges that the shard spans, and the IDs of any earlier shards that played in a role in creating the shard. Every record ingested in the stream is identified by a sequence number, which is assigned when the record is put into the stream.

You can limit the number of shards returned by each call. For more information, see Retrieving Shards from a Stream in the Amazon Kinesis Streams Developer Guide .

There are no guarantees about the chronological order shards returned. To process shards in chronological order, use the ID of the parent shard to track the lineage to the oldest shard.

This operation has a limit of 10 transactions per second per account.

See also: AWS API Documentation

Request Syntax

client.describe_stream(
    StreamName='string',
    Limit=123,
    ExclusiveStartShardId='string'
)
type StreamName

string

param StreamName

[REQUIRED]

The name of the stream to describe.

type Limit

integer

param Limit

The maximum number of shards to return in a single call. The default value is 100. If you specify a value greater than 100, at most 100 shards are returned.

type ExclusiveStartShardId

string

param ExclusiveStartShardId

The shard ID of the shard to start with.

rtype

dict

returns

Response Syntax

{
    'StreamDescription': {
        'StreamName': 'string',
        'StreamARN': 'string',
        'StreamStatus': 'CREATING'|'DELETING'|'ACTIVE'|'UPDATING',
        'Shards': [
            {
                'ShardId': 'string',
                'ParentShardId': 'string',
                'AdjacentParentShardId': 'string',
                'HashKeyRange': {
                    'StartingHashKey': 'string',
                    'EndingHashKey': 'string'
                },
                'SequenceNumberRange': {
                    'StartingSequenceNumber': 'string',
                    'EndingSequenceNumber': 'string'
                }
            },
        ],
        'HasMoreShards': True|False,
        'RetentionPeriodHours': 123,
        'StreamCreationTimestamp': datetime(2015, 1, 1),
        'EnhancedMonitoring': [
            {
                'ShardLevelMetrics': [
                    'IncomingBytes'|'IncomingRecords'|'OutgoingBytes'|'OutgoingRecords'|'WriteProvisionedThroughputExceeded'|'ReadProvisionedThroughputExceeded'|'IteratorAgeMilliseconds'|'ALL',
                ]
            },
        ],
        'EncryptionType': 'NONE'|'KMS',
        'KeyId': 'string'
    }
}

Response Structure

  • (dict) --

    Represents the output for DescribeStream .

    • StreamDescription (dict) --

      The current status of the stream, the stream ARN, an array of shard objects that comprise the stream, and whether there are more shards available.

      • StreamName (string) --

        The name of the stream being described.

      • StreamARN (string) --

        The Amazon Resource Name (ARN) for the stream being described.

      • StreamStatus (string) --

        The current status of the stream being described. The stream status is one of the following states:

        • CREATING - The stream is being created. Amazon Kinesis immediately returns and sets StreamStatus to CREATING .

        • DELETING - The stream is being deleted. The specified stream is in the DELETING state until Amazon Kinesis completes the deletion.

        • ACTIVE - The stream exists and is ready for read and write operations or deletion. You should perform read and write operations only on an ACTIVE stream.

        • UPDATING - Shards in the stream are being merged or split. Read and write operations continue to work while the stream is in the UPDATING state.

      • Shards (list) --

        The shards that comprise the stream.

        • (dict) --

          A uniquely identified group of data records in an Amazon Kinesis stream.

          • ShardId (string) --

            The unique identifier of the shard within the stream.

          • ParentShardId (string) --

            The shard ID of the shard's parent.

          • AdjacentParentShardId (string) --

            The shard ID of the shard adjacent to the shard's parent.

          • HashKeyRange (dict) --

            The range of possible hash key values for the shard, which is a set of ordered contiguous positive integers.

            • StartingHashKey (string) --

              The starting hash key of the hash key range.

            • EndingHashKey (string) --

              The ending hash key of the hash key range.

          • SequenceNumberRange (dict) --

            The range of possible sequence numbers for the shard.

            • StartingSequenceNumber (string) --

              The starting sequence number for the range.

            • EndingSequenceNumber (string) --

              The ending sequence number for the range. Shards that are in the OPEN state have an ending sequence number of null .

      • HasMoreShards (boolean) --

        If set to true , more shards in the stream are available to describe.

      • RetentionPeriodHours (integer) --

        The current retention period, in hours.

      • StreamCreationTimestamp (datetime) --

        The approximate time that the stream was created.

      • EnhancedMonitoring (list) --

        Represents the current enhanced monitoring settings of the stream.

        • (dict) --

          Represents enhanced metrics types.

          • ShardLevelMetrics (list) --

            List of shard-level metrics.

            The following are the valid shard-level metrics. The value " ALL " enhances every metric.

            • IncomingBytes

            • IncomingRecords

            • OutgoingBytes

            • OutgoingRecords

            • WriteProvisionedThroughputExceeded

            • ReadProvisionedThroughputExceeded

            • IteratorAgeMilliseconds

            • ALL

            For more information, see Monitoring the Amazon Kinesis Streams Service with Amazon CloudWatch in the Amazon Kinesis Streams Developer Guide .

            • (string) --

      • EncryptionType (string) --

        The server-side encryption type used on the stream. This parameter can be one of the following values:

        • NONE : Do not encrypt the records in the stream.

        • KMS : Use server-side encryption on the records in the stream using a customer-managed KMS key.

      • KeyId (string) --

        The GUID for the customer-managed KMS key used for encryption on the stream.

GetRecords (updated) Link ¶
Changes (response)
{'Records': {'EncryptionType': 'NONE | KMS'}}

Gets data records from an Amazon Kinesis stream's shard.

Specify a shard iterator using the ShardIterator parameter. The shard iterator specifies the position in the shard from which you want to start reading data records sequentially. If there are no records available in the portion of the shard that the iterator points to, GetRecords returns an empty list. Note that it might take multiple calls to get to a portion of the shard that contains records.

You can scale by provisioning multiple shards per stream while considering service limits (for more information, see Streams Limits in the Amazon Kinesis Streams Developer Guide ). Your application should have one thread per shard, each reading continuously from its stream. To read from a stream continually, call GetRecords in a loop. Use GetShardIterator to get the shard iterator to specify in the first GetRecords call. GetRecords returns a new shard iterator in NextShardIterator . Specify the shard iterator returned in NextShardIterator in subsequent calls to GetRecords. Note that if the shard has been closed, the shard iterator can't return more data and GetRecords returns null in NextShardIterator . You can terminate the loop when the shard is closed, or when the shard iterator reaches the record with the sequence number or other attribute that marks it as the last record to process.

Each data record can be up to 1 MB in size, and each shard can read up to 2 MB per second. You can ensure that your calls don't exceed the maximum supported size or throughput by using the Limit parameter to specify the maximum number of records that GetRecords can return. Consider your average record size when determining this limit.

The size of the data returned by GetRecords varies depending on the utilization of the shard. The maximum size of data that GetRecords can return is 10 MB. If a call returns this amount of data, subsequent calls made within the next 5 seconds throw ProvisionedThroughputExceededException . If there is insufficient provisioned throughput on the shard, subsequent calls made within the next 1 second throw ProvisionedThroughputExceededException . Note that GetRecords won't return any data when it throws an exception. For this reason, we recommend that you wait one second between calls to GetRecords; however, it's possible that the application will get exceptions for longer than 1 second.

To detect whether the application is falling behind in processing, you can use the MillisBehindLatest response attribute. You can also monitor the stream using CloudWatch metrics and other mechanisms (see Monitoring in the Amazon Kinesis Streams Developer Guide ).

Each Amazon Kinesis record includes a value, ApproximateArrivalTimestamp , that is set when a stream successfully receives and stores a record. This is commonly referred to as a server-side timestamp, whereas a client-side timestamp is set when a data producer creates or sends the record to a stream (a data producer is any data source putting data records into a stream, for example with PutRecords ). The timestamp has millisecond precision. There are no guarantees about the timestamp accuracy, or that the timestamp is always increasing. For example, records in a shard or across a stream might have timestamps that are out of order.

See also: AWS API Documentation

Request Syntax

client.get_records(
    ShardIterator='string',
    Limit=123
)
type ShardIterator

string

param ShardIterator

[REQUIRED]

The position in the shard from which you want to start sequentially reading data records. A shard iterator specifies this position using the sequence number of a data record in the shard.

type Limit

integer

param Limit

The maximum number of records to return. Specify a value of up to 10,000. If you specify a value that is greater than 10,000, GetRecords throws InvalidArgumentException .

rtype

dict

returns

Response Syntax

{
    'Records': [
        {
            'SequenceNumber': 'string',
            'ApproximateArrivalTimestamp': datetime(2015, 1, 1),
            'Data': b'bytes',
            'PartitionKey': 'string',
            'EncryptionType': 'NONE'|'KMS'
        },
    ],
    'NextShardIterator': 'string',
    'MillisBehindLatest': 123
}

Response Structure

  • (dict) --

    Represents the output for GetRecords.

    • Records (list) --

      The data records retrieved from the shard.

      • (dict) --

        The unit of data of the Amazon Kinesis stream, which is composed of a sequence number, a partition key, and a data blob.

        • SequenceNumber (string) --

          The unique identifier of the record within its shard.

        • ApproximateArrivalTimestamp (datetime) --

          The approximate time that the record was inserted into the stream.

        • Data (bytes) --

          The data blob. The data in the blob is both opaque and immutable to the Amazon Kinesis service, which does not inspect, interpret, or change the data in the blob in any way. When the data blob (the payload before base64-encoding) is added to the partition key size, the total size must not exceed the maximum record size (1 MB).

        • PartitionKey (string) --

          Identifies which shard in the stream the data record is assigned to.

        • EncryptionType (string) --

          The encryption type used on the record. This parameter can be one of the following values:

          • NONE : Do not encrypt the records in the stream.

          • KMS : Use server-side encryption on the records in the stream using a customer-managed KMS key.

    • NextShardIterator (string) --

      The next position in the shard from which to start sequentially reading data records. If set to null , the shard has been closed and the requested iterator will not return any more data.

    • MillisBehindLatest (integer) --

      The number of milliseconds the GetRecords response is from the tip of the stream, indicating how far behind current time the consumer is. A value of zero indicates record processing is caught up, and there are no new records to process at this moment.

PutRecord (updated) Link ¶
Changes (response)
{'EncryptionType': 'NONE | KMS'}

Writes a single data record into an Amazon Kinesis stream. Call PutRecord to send data into the stream for real-time ingestion and subsequent processing, one record at a time. Each shard can support writes up to 1,000 records per second, up to a maximum data write total of 1 MB per second.

You must specify the name of the stream that captures, stores, and transports the data; a partition key; and the data blob itself.

The data blob can be any type of data; for example, a segment from a log file, geographic/location data, website clickstream data, and so on.

The partition key is used by Amazon Kinesis to distribute data across shards. Amazon Kinesis segregates the data records that belong to a stream into multiple shards, using the partition key associated with each data record to determine which shard a given data record belongs to.

Partition keys are Unicode strings, with a maximum length limit of 256 characters for each key. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards using the hash key ranges of the shards. You can override hashing the partition key to determine the shard by explicitly specifying a hash value using the ExplicitHashKey parameter. For more information, see Adding Data to a Stream in the Amazon Kinesis Streams Developer Guide .

PutRecord returns the shard ID of where the data record was placed and the sequence number that was assigned to the data record.

Sequence numbers increase over time and are specific to a shard within a stream, not across all shards within a stream. To guarantee strictly increasing ordering, write serially to a shard and use the SequenceNumberForOrdering parameter. For more information, see Adding Data to a Stream in the Amazon Kinesis Streams Developer Guide .

If a PutRecord request cannot be processed because of insufficient provisioned throughput on the shard involved in the request, PutRecord throws ProvisionedThroughputExceededException .

By default, data records are accessible for 24 hours from the time that they are added to a stream. You can use IncreaseStreamRetentionPeriod or DecreaseStreamRetentionPeriod to modify this retention period.

See also: AWS API Documentation

Request Syntax

client.put_record(
    StreamName='string',
    Data=b'bytes',
    PartitionKey='string',
    ExplicitHashKey='string',
    SequenceNumberForOrdering='string'
)
type StreamName

string

param StreamName

[REQUIRED]

The name of the stream to put the data record into.

type Data

bytes

param Data

[REQUIRED]

The data blob to put into the record, which is base64-encoded when the blob is serialized. When the data blob (the payload before base64-encoding) is added to the partition key size, the total size must not exceed the maximum record size (1 MB).

type PartitionKey

string

param PartitionKey

[REQUIRED]

Determines which shard in the stream the data record is assigned to. Partition keys are Unicode strings with a maximum length limit of 256 characters for each key. Amazon Kinesis uses the partition key as input to a hash function that maps the partition key and associated data to a specific shard. Specifically, an MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. As a result of this hashing mechanism, all data records with the same partition key map to the same shard within the stream.

type ExplicitHashKey

string

param ExplicitHashKey

The hash value used to explicitly determine the shard the data record is assigned to by overriding the partition key hash.

type SequenceNumberForOrdering

string

param SequenceNumberForOrdering

Guarantees strictly increasing sequence numbers, for puts from the same client and to the same partition key. Usage: set the SequenceNumberForOrdering of record n to the sequence number of record n-1 (as returned in the result when putting record n-1 ). If this parameter is not set, records will be coarsely ordered based on arrival time.

rtype

dict

returns

Response Syntax

{
    'ShardId': 'string',
    'SequenceNumber': 'string',
    'EncryptionType': 'NONE'|'KMS'
}

Response Structure

  • (dict) --

    Represents the output for PutRecord .

    • ShardId (string) --

      The shard ID of the shard where the data record was placed.

    • SequenceNumber (string) --

      The sequence number identifier that was assigned to the put data record. The sequence number for the record is unique across all records in the stream. A sequence number is the identifier associated with every record put into the stream.

    • EncryptionType (string) --

      The encryption type to use on the record. This parameter can be one of the following values:

      • NONE : Do not encrypt the records in the stream.

      • KMS : Use server-side encryption on the records in the stream using a customer-managed KMS key.

PutRecords (updated) Link ¶
Changes (response)
{'EncryptionType': 'NONE | KMS'}

Writes multiple data records into an Amazon Kinesis stream in a single call (also referred to as a PutRecords request). Use this operation to send data into the stream for data ingestion and processing.

Each PutRecords request can support up to 500 records. Each record in the request can be as large as 1 MB, up to a limit of 5 MB for the entire request, including partition keys. Each shard can support writes up to 1,000 records per second, up to a maximum data write total of 1 MB per second.

You must specify the name of the stream that captures, stores, and transports the data; and an array of request Records , with each record in the array requiring a partition key and data blob. The record size limit applies to the total size of the partition key and data blob.

The data blob can be any type of data; for example, a segment from a log file, geographic/location data, website clickstream data, and so on.

The partition key is used by Amazon Kinesis as input to a hash function that maps the partition key and associated data to a specific shard. An MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. As a result of this hashing mechanism, all data records with the same partition key map to the same shard within the stream. For more information, see Adding Data to a Stream in the Amazon Kinesis Streams Developer Guide .

Each record in the Records array may include an optional parameter, ExplicitHashKey , which overrides the partition key to shard mapping. This parameter allows a data producer to determine explicitly the shard where the record is stored. For more information, see Adding Multiple Records with PutRecords in the Amazon Kinesis Streams Developer Guide .

The PutRecords response includes an array of response Records . Each record in the response array directly correlates with a record in the request array using natural ordering, from the top to the bottom of the request and response. The response Records array always includes the same number of records as the request array.

The response Records array includes both successfully and unsuccessfully processed records. Amazon Kinesis attempts to process all records in each PutRecords request. A single record failure does not stop the processing of subsequent records.

A successfully-processed record includes ShardId and SequenceNumber values. The ShardId parameter identifies the shard in the stream where the record is stored. The SequenceNumber parameter is an identifier assigned to the put record, unique to all records in the stream.

An unsuccessfully-processed record includes ErrorCode and ErrorMessage values. ErrorCode reflects the type of error and can be one of the following values: ProvisionedThroughputExceededException or InternalFailure . ErrorMessage provides more detailed information about the ProvisionedThroughputExceededException exception including the account ID, stream name, and shard ID of the record that was throttled. For more information about partially successful responses, see Adding Multiple Records with PutRecords in the Amazon Kinesis Streams Developer Guide .

By default, data records are accessible for 24 hours from the time that they are added to a stream. You can use IncreaseStreamRetentionPeriod or DecreaseStreamRetentionPeriod to modify this retention period.

See also: AWS API Documentation

Request Syntax

client.put_records(
    Records=[
        {
            'Data': b'bytes',
            'ExplicitHashKey': 'string',
            'PartitionKey': 'string'
        },
    ],
    StreamName='string'
)
type Records

list

param Records

[REQUIRED]

The records associated with the request.

  • (dict) --

    Represents the output for PutRecords .

    • Data (bytes) -- [REQUIRED]

      The data blob to put into the record, which is base64-encoded when the blob is serialized. When the data blob (the payload before base64-encoding) is added to the partition key size, the total size must not exceed the maximum record size (1 MB).

    • ExplicitHashKey (string) --

      The hash value used to determine explicitly the shard that the data record is assigned to by overriding the partition key hash.

    • PartitionKey (string) -- [REQUIRED]

      Determines which shard in the stream the data record is assigned to. Partition keys are Unicode strings with a maximum length limit of 256 characters for each key. Amazon Kinesis uses the partition key as input to a hash function that maps the partition key and associated data to a specific shard. Specifically, an MD5 hash function is used to map partition keys to 128-bit integer values and to map associated data records to shards. As a result of this hashing mechanism, all data records with the same partition key map to the same shard within the stream.

type StreamName

string

param StreamName

[REQUIRED]

The stream name associated with the request.

rtype

dict

returns

Response Syntax

{
    'FailedRecordCount': 123,
    'Records': [
        {
            'SequenceNumber': 'string',
            'ShardId': 'string',
            'ErrorCode': 'string',
            'ErrorMessage': 'string'
        },
    ],
    'EncryptionType': 'NONE'|'KMS'
}

Response Structure

  • (dict) --

    PutRecords results.

    • FailedRecordCount (integer) --

      The number of unsuccessfully processed records in a PutRecords request.

    • Records (list) --

      An array of successfully and unsuccessfully processed record results, correlated with the request by natural ordering. A record that is successfully added to a stream includes SequenceNumber and ShardId in the result. A record that fails to be added to a stream includes ErrorCode and ErrorMessage in the result.

      • (dict) --

        Represents the result of an individual record from a PutRecords request. A record that is successfully added to a stream includes SequenceNumber and ShardId in the result. A record that fails to be added to the stream includes ErrorCode and ErrorMessage in the result.

        • SequenceNumber (string) --

          The sequence number for an individual record result.

        • ShardId (string) --

          The shard ID for an individual record result.

        • ErrorCode (string) --

          The error code for an individual record result. ErrorCodes can be either ProvisionedThroughputExceededException or InternalFailure .

        • ErrorMessage (string) --

          The error message for an individual record result. An ErrorCode value of ProvisionedThroughputExceededException has an error message that includes the account ID, stream name, and shard ID. An ErrorCode value of InternalFailure has the error message "Internal Service Failure" .

    • EncryptionType (string) --

      The encryption type used on the records. This parameter can be one of the following values:

      • NONE : Do not encrypt the records.

      • KMS : Use server-side encryption on the records using a customer-managed KMS key.