Amazon Kinesis

2020/08/17 - Amazon Kinesis - 3 updated api methods

Changes  Update kinesis client to latest version

GetRecords (updated) Link ¶
Changes (response)
{'ChildShards': [{'HashKeyRange': {'EndingHashKey': 'string',
                                   'StartingHashKey': 'string'},
                  'ParentShards': ['string'],
                  'ShardId': 'string'}]}

Gets data records from a Kinesis data stream's shard.

Specify a shard iterator using the ShardIterator parameter. The shard iterator specifies the position in the shard from which you want to start reading data records sequentially. If there are no records available in the portion of the shard that the iterator points to, GetRecords returns an empty list. It might take multiple calls to get to a portion of the shard that contains records.

You can scale by provisioning multiple shards per stream while considering service limits (for more information, see Amazon Kinesis Data Streams Limits in the Amazon Kinesis Data Streams Developer Guide). Your application should have one thread per shard, each reading continuously from its stream. To read from a stream continually, call GetRecords in a loop. Use GetShardIterator to get the shard iterator to specify in the first GetRecords call. GetRecords returns a new shard iterator in NextShardIterator. Specify the shard iterator returned in NextShardIterator in subsequent calls to GetRecords. If the shard has been closed, the shard iterator can't return more data and GetRecords returns null in NextShardIterator. You can terminate the loop when the shard is closed, or when the shard iterator reaches the record with the sequence number or other attribute that marks it as the last record to process.

Each data record can be up to 1 MiB in size, and each shard can read up to 2 MiB per second. You can ensure that your calls don't exceed the maximum supported size or throughput by using the Limit parameter to specify the maximum number of records that GetRecords can return. Consider your average record size when determining this limit. The maximum number of records that can be returned per call is 10,000.

The size of the data returned by GetRecords varies depending on the utilization of the shard. The maximum size of data that GetRecords can return is 10 MiB. If a call returns this amount of data, subsequent calls made within the next 5 seconds throw ProvisionedThroughputExceededException. If there is insufficient provisioned throughput on the stream, subsequent calls made within the next 1 second throw ProvisionedThroughputExceededException. GetRecords doesn't return any data when it throws an exception. For this reason, we recommend that you wait 1 second between calls to GetRecords. However, it's possible that the application will get exceptions for longer than 1 second.

To detect whether the application is falling behind in processing, you can use the MillisBehindLatest response attribute. You can also monitor the stream using CloudWatch metrics and other mechanisms (see Monitoring in the Amazon Kinesis Data Streams Developer Guide).

Each Amazon Kinesis record includes a value, ApproximateArrivalTimestamp, that is set when a stream successfully receives and stores a record. This is commonly referred to as a server-side time stamp, whereas a client-side time stamp is set when a data producer creates or sends the record to a stream (a data producer is any data source putting data records into a stream, for example with PutRecords). The time stamp has millisecond precision. There are no guarantees about the time stamp accuracy, or that the time stamp is always increasing. For example, records in a shard or across a stream might have time stamps that are out of order.

This operation has a limit of five transactions per second per shard.

See also: AWS API Documentation

Request Syntax

client.get_records(
    ShardIterator='string',
    Limit=123
)
type ShardIterator:

string

param ShardIterator:

[REQUIRED]

The position in the shard from which you want to start sequentially reading data records. A shard iterator specifies this position using the sequence number of a data record in the shard.

type Limit:

integer

param Limit:

The maximum number of records to return. Specify a value of up to 10,000. If you specify a value that is greater than 10,000, GetRecords throws InvalidArgumentException. The default value is 10,000.

rtype:

dict

returns:

Response Syntax

{
    'Records': [
        {
            'SequenceNumber': 'string',
            'ApproximateArrivalTimestamp': datetime(2015, 1, 1),
            'Data': b'bytes',
            'PartitionKey': 'string',
            'EncryptionType': 'NONE'|'KMS'
        },
    ],
    'NextShardIterator': 'string',
    'MillisBehindLatest': 123,
    'ChildShards': [
        {
            'ShardId': 'string',
            'ParentShards': [
                'string',
            ],
            'HashKeyRange': {
                'StartingHashKey': 'string',
                'EndingHashKey': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    Represents the output for GetRecords.

    • Records (list) --

      The data records retrieved from the shard.

      • (dict) --

        The unit of data of the Kinesis data stream, which is composed of a sequence number, a partition key, and a data blob.

        • SequenceNumber (string) --

          The unique identifier of the record within its shard.

        • ApproximateArrivalTimestamp (datetime) --

          The approximate time that the record was inserted into the stream.

        • Data (bytes) --

          The data blob. The data in the blob is both opaque and immutable to Kinesis Data Streams, which does not inspect, interpret, or change the data in the blob in any way. When the data blob (the payload before base64-encoding) is added to the partition key size, the total size must not exceed the maximum record size (1 MiB).

        • PartitionKey (string) --

          Identifies which shard in the stream the data record is assigned to.

        • EncryptionType (string) --

          The encryption type used on the record. This parameter can be one of the following values:

          • NONE: Do not encrypt the records in the stream.

          • KMS: Use server-side encryption on the records in the stream using a customer-managed AWS KMS key.

    • NextShardIterator (string) --

      The next position in the shard from which to start sequentially reading data records. If set to null, the shard has been closed and the requested iterator does not return any more data.

    • MillisBehindLatest (integer) --

      The number of milliseconds the GetRecords response is from the tip of the stream, indicating how far behind current time the consumer is. A value of zero indicates that record processing is caught up, and there are no new records to process at this moment.

    • ChildShards (list) --

      • (dict) --

        • ShardId (string) --

        • ParentShards (list) --

          • (string) --

        • HashKeyRange (dict) --

          The range of possible hash key values for the shard, which is a set of ordered contiguous positive integers.

          • StartingHashKey (string) --

            The starting hash key of the hash key range.

          • EndingHashKey (string) --

            The ending hash key of the hash key range.

ListShards (updated) Link ¶
Changes (request)
{'ShardFilter': {'ShardId': 'string',
                 'Timestamp': 'timestamp',
                 'Type': 'AFTER_SHARD_ID | AT_TRIM_HORIZON | FROM_TRIM_HORIZON '
                         '| AT_LATEST | AT_TIMESTAMP | FROM_TIMESTAMP'}}

Lists the shards in a stream and provides information about each shard. This operation has a limit of 100 transactions per second per data stream.

See also: AWS API Documentation

Request Syntax

client.list_shards(
    StreamName='string',
    NextToken='string',
    ExclusiveStartShardId='string',
    MaxResults=123,
    StreamCreationTimestamp=datetime(2015, 1, 1),
    ShardFilter={
        'Type': 'AFTER_SHARD_ID'|'AT_TRIM_HORIZON'|'FROM_TRIM_HORIZON'|'AT_LATEST'|'AT_TIMESTAMP'|'FROM_TIMESTAMP',
        'ShardId': 'string',
        'Timestamp': datetime(2015, 1, 1)
    }
)
type StreamName:

string

param StreamName:

The name of the data stream whose shards you want to list.

You cannot specify this parameter if you specify the NextToken parameter.

type NextToken:

string

param NextToken:

When the number of shards in the data stream is greater than the default value for the MaxResults parameter, or if you explicitly specify a value for MaxResults that is less than the number of shards in the data stream, the response includes a pagination token named NextToken. You can specify this NextToken value in a subsequent call to ListShards to list the next set of shards.

Don't specify StreamName or StreamCreationTimestamp if you specify NextToken because the latter unambiguously identifies the stream.

You can optionally specify a value for the MaxResults parameter when you specify NextToken. If you specify a MaxResults value that is less than the number of shards that the operation returns if you don't specify MaxResults, the response will contain a new NextToken value. You can use the new NextToken value in a subsequent call to the ListShards operation.

type ExclusiveStartShardId:

string

param ExclusiveStartShardId:

Specify this parameter to indicate that you want to list the shards starting with the shard whose ID immediately follows ExclusiveStartShardId.

If you don't specify this parameter, the default behavior is for ListShards to list the shards starting with the first one in the stream.

You cannot specify this parameter if you specify NextToken.

type MaxResults:

integer

param MaxResults:

The maximum number of shards to return in a single call to ListShards. The minimum value you can specify for this parameter is 1, and the maximum is 10,000, which is also the default.

When the number of shards to be listed is greater than the value of MaxResults, the response contains a NextToken value that you can use in a subsequent call to ListShards to list the next set of shards.

type StreamCreationTimestamp:

datetime

param StreamCreationTimestamp:

Specify this input parameter to distinguish data streams that have the same name. For example, if you create a data stream and then delete it, and you later create another data stream with the same name, you can use this input parameter to specify which of the two streams you want to list the shards for.

You cannot specify this parameter if you specify the NextToken parameter.

type ShardFilter:

dict

param ShardFilter:
  • Type (string) -- [REQUIRED]

  • ShardId (string) --

  • Timestamp (datetime) --

rtype:

dict

returns:

Response Syntax

{
    'Shards': [
        {
            'ShardId': 'string',
            'ParentShardId': 'string',
            'AdjacentParentShardId': 'string',
            'HashKeyRange': {
                'StartingHashKey': 'string',
                'EndingHashKey': 'string'
            },
            'SequenceNumberRange': {
                'StartingSequenceNumber': 'string',
                'EndingSequenceNumber': 'string'
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • Shards (list) --

      An array of JSON objects. Each object represents one shard and specifies the IDs of the shard, the shard's parent, and the shard that's adjacent to the shard's parent. Each object also contains the starting and ending hash keys and the starting and ending sequence numbers for the shard.

      • (dict) --

        A uniquely identified group of data records in a Kinesis data stream.

        • ShardId (string) --

          The unique identifier of the shard within the stream.

        • ParentShardId (string) --

          The shard ID of the shard's parent.

        • AdjacentParentShardId (string) --

          The shard ID of the shard adjacent to the shard's parent.

        • HashKeyRange (dict) --

          The range of possible hash key values for the shard, which is a set of ordered contiguous positive integers.

          • StartingHashKey (string) --

            The starting hash key of the hash key range.

          • EndingHashKey (string) --

            The ending hash key of the hash key range.

        • SequenceNumberRange (dict) --

          The range of possible sequence numbers for the shard.

          • StartingSequenceNumber (string) --

            The starting sequence number for the range.

          • EndingSequenceNumber (string) --

            The ending sequence number for the range. Shards that are in the OPEN state have an ending sequence number of null.

    • NextToken (string) --

      When the number of shards in the data stream is greater than the default value for the MaxResults parameter, or if you explicitly specify a value for MaxResults that is less than the number of shards in the data stream, the response includes a pagination token named NextToken. You can specify this NextToken value in a subsequent call to ListShards to list the next set of shards. For more information about the use of this pagination token when calling the ListShards operation, see ListShardsInput$NextToken.

SubscribeToShard (updated) Link ¶
Changes (response)
{'EventStream': {'SubscribeToShardEvent': {'ChildShards': [{'HashKeyRange': {'EndingHashKey': 'string',
                                                                             'StartingHashKey': 'string'},
                                                            'ParentShards': ['string'],
                                                            'ShardId': 'string'}]}}}

This operation establishes an HTTP/2 connection between the consumer you specify in the ConsumerARN parameter and the shard you specify in the ShardId parameter. After the connection is successfully established, Kinesis Data Streams pushes records from the shard to the consumer over this connection. Before you call this operation, call RegisterStreamConsumer to register the consumer with Kinesis Data Streams.

When the SubscribeToShard call succeeds, your consumer starts receiving events of type SubscribeToShardEvent over the HTTP/2 connection for up to 5 minutes, after which time you need to call SubscribeToShard again to renew the subscription if you want to continue to receive records.

You can make one call to SubscribeToShard per second per registered consumer per shard. For example, if you have a 4000 shard stream and two registered stream consumers, you can make one SubscribeToShard request per second for each combination of shard and registered consumer, allowing you to subscribe both consumers to all 4000 shards in one second.

If you call SubscribeToShard again with the same ConsumerARN and ShardId within 5 seconds of a successful call, you'll get a ResourceInUseException. If you call SubscribeToShard 5 seconds or more after a successful call, the first connection will expire and the second call will take over the subscription.

For an example of how to use this operations, see Enhanced Fan-Out Using the Kinesis Data Streams API.

See also: AWS API Documentation

Request Syntax

client.subscribe_to_shard(
    ConsumerARN='string',
    ShardId='string',
    StartingPosition={
        'Type': 'AT_SEQUENCE_NUMBER'|'AFTER_SEQUENCE_NUMBER'|'TRIM_HORIZON'|'LATEST'|'AT_TIMESTAMP',
        'SequenceNumber': 'string',
        'Timestamp': datetime(2015, 1, 1)
    }
)
type ConsumerARN:

string

param ConsumerARN:

[REQUIRED]

For this parameter, use the value you obtained when you called RegisterStreamConsumer.

type ShardId:

string

param ShardId:

[REQUIRED]

The ID of the shard you want to subscribe to. To see a list of all the shards for a given stream, use ListShards.

type StartingPosition:

dict

param StartingPosition:

[REQUIRED]

  • Type (string) -- [REQUIRED]

    You can set the starting position to one of the following values:

    AT_SEQUENCE_NUMBER: Start streaming from the position denoted by the sequence number specified in the SequenceNumber field.

    AFTER_SEQUENCE_NUMBER: Start streaming right after the position denoted by the sequence number specified in the SequenceNumber field.

    AT_TIMESTAMP: Start streaming from the position denoted by the time stamp specified in the Timestamp field.

    TRIM_HORIZON: Start streaming at the last untrimmed record in the shard, which is the oldest data record in the shard.

    LATEST: Start streaming just after the most recent record in the shard, so that you always read the most recent data in the shard.

  • SequenceNumber (string) --

    The sequence number of the data record in the shard from which to start streaming. To specify a sequence number, set StartingPosition to AT_SEQUENCE_NUMBER or AFTER_SEQUENCE_NUMBER.

  • Timestamp (datetime) --

    The time stamp of the data record from which to start reading. To specify a time stamp, set StartingPosition to Type AT_TIMESTAMP. A time stamp is the Unix epoch date with precision in milliseconds. For example, 2016-04-04T19:58:46.480-00:00 or 1459799926.480. If a record with this exact time stamp does not exist, records will be streamed from the next (later) record. If the time stamp is older than the current trim horizon, records will be streamed from the oldest untrimmed data record ( TRIM_HORIZON).

rtype:

dict

returns:

The response of this operation contains an :class:`.EventStream` member. When iterated the :class:`.EventStream` will yield events based on the structure below, where only one of the top level keys will be present for any given event.

Response Syntax

{
    'EventStream': EventStream({
        'SubscribeToShardEvent': {
            'Records': [
                {
                    'SequenceNumber': 'string',
                    'ApproximateArrivalTimestamp': datetime(2015, 1, 1),
                    'Data': b'bytes',
                    'PartitionKey': 'string',
                    'EncryptionType': 'NONE'|'KMS'
                },
            ],
            'ContinuationSequenceNumber': 'string',
            'MillisBehindLatest': 123,
            'ChildShards': [
                {
                    'ShardId': 'string',
                    'ParentShards': [
                        'string',
                    ],
                    'HashKeyRange': {
                        'StartingHashKey': 'string',
                        'EndingHashKey': 'string'
                    }
                },
            ]
        },
        'ResourceNotFoundException': {
            'message': 'string'
        },
        'ResourceInUseException': {
            'message': 'string'
        },
        'KMSDisabledException': {
            'message': 'string'
        },
        'KMSInvalidStateException': {
            'message': 'string'
        },
        'KMSAccessDeniedException': {
            'message': 'string'
        },
        'KMSNotFoundException': {
            'message': 'string'
        },
        'KMSOptInRequired': {
            'message': 'string'
        },
        'KMSThrottlingException': {
            'message': 'string'
        },
        'InternalFailureException': {
            'message': 'string'
        }
    })
}

Response Structure

  • (dict) --

    • EventStream (:class:`.EventStream`) --

      The event stream that your consumer can use to read records from the shard.

      • SubscribeToShardEvent (dict) --

        After you call SubscribeToShard, Kinesis Data Streams sends events of this type to your consumer. For an example of how to handle these events, see Enhanced Fan-Out Using the Kinesis Data Streams API.

        • Records (list) --

          • (dict) --

            The unit of data of the Kinesis data stream, which is composed of a sequence number, a partition key, and a data blob.

            • SequenceNumber (string) --

              The unique identifier of the record within its shard.

            • ApproximateArrivalTimestamp (datetime) --

              The approximate time that the record was inserted into the stream.

            • Data (bytes) --

              The data blob. The data in the blob is both opaque and immutable to Kinesis Data Streams, which does not inspect, interpret, or change the data in the blob in any way. When the data blob (the payload before base64-encoding) is added to the partition key size, the total size must not exceed the maximum record size (1 MiB).

            • PartitionKey (string) --

              Identifies which shard in the stream the data record is assigned to.

            • EncryptionType (string) --

              The encryption type used on the record. This parameter can be one of the following values:

              • NONE: Do not encrypt the records in the stream.

              • KMS: Use server-side encryption on the records in the stream using a customer-managed AWS KMS key.

        • ContinuationSequenceNumber (string) --

          Use this as SequenceNumber in the next call to SubscribeToShard, with StartingPosition set to AT_SEQUENCE_NUMBER or AFTER_SEQUENCE_NUMBER. Use ContinuationSequenceNumber for checkpointing because it captures your shard progress even when no data is written to the shard.

        • MillisBehindLatest (integer) --

          The number of milliseconds the read records are from the tip of the stream, indicating how far behind current time the consumer is. A value of zero indicates that record processing is caught up, and there are no new records to process at this moment.

        • ChildShards (list) --

          • (dict) --

            • ShardId (string) --

            • ParentShards (list) --

              • (string) --

            • HashKeyRange (dict) --

              The range of possible hash key values for the shard, which is a set of ordered contiguous positive integers.

              • StartingHashKey (string) --

                The starting hash key of the hash key range.

              • EndingHashKey (string) --

                The ending hash key of the hash key range.

      • ResourceNotFoundException (dict) --

        The requested resource could not be found. The stream might not be specified correctly.

        • message (string) --

          A message that provides information about the error.

      • ResourceInUseException (dict) --

        The resource is not available for this operation. For successful operation, the resource must be in the ACTIVE state.

        • message (string) --

          A message that provides information about the error.

      • KMSDisabledException (dict) --

        The request was rejected because the specified customer master key (CMK) isn't enabled.

        • message (string) --

          A message that provides information about the error.

      • KMSInvalidStateException (dict) --

        The request was rejected because the state of the specified resource isn't valid for this request. For more information, see How Key State Affects Use of a Customer Master Key in the AWS Key Management Service Developer Guide.

        • message (string) --

          A message that provides information about the error.

      • KMSAccessDeniedException (dict) --

        The ciphertext references a key that doesn't exist or that you don't have access to.

        • message (string) --

          A message that provides information about the error.

      • KMSNotFoundException (dict) --

        The request was rejected because the specified entity or resource can't be found.

        • message (string) --

          A message that provides information about the error.

      • KMSOptInRequired (dict) --

        The AWS access key ID needs a subscription for the service.

        • message (string) --

          A message that provides information about the error.

      • KMSThrottlingException (dict) --

        The request was denied due to request throttling. For more information about throttling, see Limits in the AWS Key Management Service Developer Guide.

        • message (string) --

          A message that provides information about the error.

      • InternalFailureException (dict) --

        The processing of the request failed because of an unknown error, exception, or failure.

        • message (string) --