AWS API Changes

2024/12/04 - AWSKendraFrontendService - 6 updated api methods

Changes This release adds GenAI Index in Amazon Kendra for Retrieval Augmented Generation (RAG) and intelligent search. With the Kendra GenAI Index, customers get high retrieval accuracy powered by the latest information retrieval technologies and semantic models.

BatchDeleteDocument (updated)

Link ¶
Changes (response)

{'FailedDocuments': {'DataSourceId': 'string'}}

Removes one or more documents from an index. The documents must have been added with the BatchPutDocument API.

The documents are deleted asynchronously. You can see the progress of the deletion by using Amazon Web Services CloudWatch. Any error messages related to the processing of the batch are sent to your Amazon Web Services CloudWatch log. You can also use the BatchGetDocumentStatus API to monitor the progress of deleting your documents.

Deleting documents from an index using BatchDeleteDocument could take up to an hour or more, depending on the number of documents you want to delete.

See also: AWS API Documentation

Request Syntax

client.batch_delete_document(
    IndexId='string',
    DocumentIdList=[
        'string',
    ],
    DataSourceSyncJobMetricTarget={
        'DataSourceId': 'string',
        'DataSourceSyncJobId': 'string'
    }
)

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index that contains the documents to delete.

type DocumentIdList:

list

param DocumentIdList:

[REQUIRED]

One or more identifiers for documents to delete from the index.

(string) --

type DataSourceSyncJobMetricTarget:

dict

param DataSourceSyncJobMetricTarget:

Maps a particular data source sync job to a particular data source.

DataSourceId (string) -- [REQUIRED]

The ID of the data source that is running the sync job.
DataSourceSyncJobId (string) --

The ID of the sync job that is running on the data source.

If the ID of a sync job is not provided and there is a sync job running, then the ID of this sync job is used and metrics are generated for this sync job.

If the ID of a sync job is not provided and there is no sync job running, then no metrics are generated and documents are indexed/deleted at the index level without sync job metrics included.

rtype:

dict

returns:

Response Syntax

{
    'FailedDocuments': [
        {
            'Id': 'string',
            'DataSourceId': 'string',
            'ErrorCode': 'InternalError'|'InvalidRequest',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

(dict) --
- FailedDocuments (list) --
  
  A list of documents that could not be removed from the index. Each entry contains an error message that indicates why the document couldn't be removed from the index.
  - (dict) --
    
    Provides information about documents that could not be removed from an index by the BatchDeleteDocument API.
    - Id (string) --
      
      The identifier of the document that couldn't be removed from the index.
    - DataSourceId (string) --
      
      The identifier of the data source connector that the document belongs to.
    - ErrorCode (string) --
      
      The error code for why the document couldn't be removed from the index.
    - ErrorMessage (string) --
      
      An explanation for why the document couldn't be removed from the index.

BatchGetDocumentStatus (updated)

Link ¶
Changes (response)

{'Errors': {'DataSourceId': 'string'}}

Returns the indexing status for one or more documents submitted with the BatchPutDocument API.

When you use the BatchPutDocument API, documents are indexed asynchronously. You can use the BatchGetDocumentStatus API to get the current status of a list of documents so that you can determine if they have been successfully indexed.

You can also use the BatchGetDocumentStatus API to check the status of the BatchDeleteDocument API. When a document is deleted from the index, Amazon Kendra returns NOT_FOUND as the status.

See also: AWS API Documentation

Request Syntax

client.batch_get_document_status(
    IndexId='string',
    DocumentInfoList=[
        {
            'DocumentId': 'string',
            'Attributes': [
                {
                    'Key': 'string',
                    'Value': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
            ]
        },
    ]
)

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index to add documents to. The index ID is returned by the CreateIndex API.

type DocumentInfoList:

list

param DocumentInfoList:

[REQUIRED]

A list of DocumentInfo objects that identify the documents for which to get the status. You identify the documents by their document ID and optional attributes.

(dict) --

Identifies a document for which to retrieve status information
- DocumentId (string) -- [REQUIRED]
  
  The identifier of the document.
- Attributes (list) --
  
  Attributes that identify a specific version of a document to check.
  
  The only valid attributes are:
  - version
  - datasourceId
  - jobExecutionId
  The attributes follow these rules:
  - dataSourceId and jobExecutionId must be used together.
  - version is ignored if dataSourceId and jobExecutionId are not provided.
  - If dataSourceId and jobExecutionId are provided, but version is not, the version defaults to "0".
  - (dict) --
    
    A document attribute or metadata field. To create custom document attributes, see Custom attributes.
    - Key (string) -- [REQUIRED]
      
      The identifier for the attribute.
    - Value (dict) -- [REQUIRED]
      
      The value of the attribute.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings. The default maximum length or number of strings is 10.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

rtype:

dict

returns:

Response Syntax

{
    'Errors': [
        {
            'DocumentId': 'string',
            'DataSourceId': 'string',
            'ErrorCode': 'InternalError'|'InvalidRequest',
            'ErrorMessage': 'string'
        },
    ],
    'DocumentStatusList': [
        {
            'DocumentId': 'string',
            'DocumentStatus': 'NOT_FOUND'|'PROCESSING'|'INDEXED'|'UPDATED'|'FAILED'|'UPDATE_FAILED',
            'FailureCode': 'string',
            'FailureReason': 'string'
        },
    ]
}

Response Structure

(dict) --
- Errors (list) --
  
  A list of documents that Amazon Kendra couldn't get the status for. The list includes the ID of the document and the reason that the status couldn't be found.
  - (dict) --
    
    Provides a response when the status of a document could not be retrieved.
    - DocumentId (string) --
      
      The identifier of the document whose status could not be retrieved.
    - DataSourceId (string) --
      
      The identifier of the data source connector that the failed document belongs to.
    - ErrorCode (string) --
      
      Indicates the source of the error.
    - ErrorMessage (string) --
      
      States that the API could not get the status of a document. This could be because the request is not valid or there is a system error.
- DocumentStatusList (list) --
  
  The status of documents. The status indicates if the document is waiting to be indexed, is in the process of indexing, has completed indexing, or failed indexing. If a document failed indexing, the status provides the reason why.
  - (dict) --
    
    Provides information about the status of documents submitted for indexing.
    - DocumentId (string) --
      
      The identifier of the document.
    - DocumentStatus (string) --
      
      The current status of a document.
      
      If the document was submitted for deletion, the status is NOT_FOUND after the document is deleted.
    - FailureCode (string) --
      
      Indicates the source of the error.
    - FailureReason (string) --
      
      Provides detailed information about why the document couldn't be indexed. Use this information to correct the error before you resubmit the document for indexing.

BatchPutDocument (updated)

Link ¶
Changes (response)

{'FailedDocuments': {'DataSourceId': 'string'}}

Adds one or more documents to an index.

The BatchPutDocument API enables you to ingest inline documents or a set of documents stored in an Amazon S3 bucket. Use this API to ingest your text and unstructured text into an index, add custom attributes to the documents, and to attach an access control list to the documents added to the index.

The documents are indexed asynchronously. You can see the progress of the batch using Amazon Web Services CloudWatch. Any error messages related to processing the batch are sent to your Amazon Web Services CloudWatch log. You can also use the BatchGetDocumentStatus API to monitor the progress of indexing your documents.

For an example of ingesting inline documents using Python and Java SDKs, see Adding files directly to an index.

See also: AWS API Documentation

Request Syntax

client.batch_put_document(
    IndexId='string',
    RoleArn='string',
    Documents=[
        {
            'Id': 'string',
            'Title': 'string',
            'Blob': b'bytes',
            'S3Path': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Attributes': [
                {
                    'Key': 'string',
                    'Value': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
            ],
            'AccessControlList': [
                {
                    'Name': 'string',
                    'Type': 'USER'|'GROUP',
                    'Access': 'ALLOW'|'DENY',
                    'DataSourceId': 'string'
                },
            ],
            'HierarchicalAccessControlList': [
                {
                    'PrincipalList': [
                        {
                            'Name': 'string',
                            'Type': 'USER'|'GROUP',
                            'Access': 'ALLOW'|'DENY',
                            'DataSourceId': 'string'
                        },
                    ]
                },
            ],
            'ContentType': 'PDF'|'HTML'|'MS_WORD'|'PLAIN_TEXT'|'PPT'|'RTF'|'XML'|'XSLT'|'MS_EXCEL'|'CSV'|'JSON'|'MD',
            'AccessControlConfigurationId': 'string'
        },
    ],
    CustomDocumentEnrichmentConfiguration={
        'InlineConfigurations': [
            {
                'Condition': {
                    'ConditionDocumentAttributeKey': 'string',
                    'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                    'ConditionOnValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'Target': {
                    'TargetDocumentAttributeKey': 'string',
                    'TargetDocumentAttributeValueDeletion': True|False,
                    'TargetDocumentAttributeValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'DocumentContentDeletion': True|False
            },
        ],
        'PreExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'PostExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'RoleArn': 'string'
    }
)

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index to add the documents to. You need to create the index first using the CreateIndex API.

type RoleArn:

string

param RoleArn:

The Amazon Resource Name (ARN) of an IAM role with permission to access your S3 bucket. For more information, see IAM access roles for Amazon Kendra.

type Documents:

list

param Documents:

[REQUIRED]

One or more documents to add to the index.

Documents have the following file size limits.

50 MB total size for any file
5 MB extracted text for any file

For more information, see Quotas.

(dict) --

A document in an index.
- Id (string) -- [REQUIRED]
  
  A identifier of the document in the index.
  
  Note, each document ID must be unique per index. You cannot create a data source to index your documents with their unique IDs and then use the BatchPutDocument API to index the same documents, or vice versa. You can delete a data source and then use the BatchPutDocument API to index the same documents, or vice versa.
- Title (string) --
  
  The title of the document.
- Blob (bytes) --
  
  The contents of the document.
  
  Documents passed to the Blob parameter must be base64 encoded. Your code might not need to encode the document file bytes if you're using an Amazon Web Services SDK to call Amazon Kendra APIs. If you are calling the Amazon Kendra endpoint directly using REST, you must base64 encode the contents before sending.
- S3Path (dict) --
  
  Information required to find a specific file in an Amazon S3 bucket.
  - Bucket (string) -- [REQUIRED]
    
    The name of the S3 bucket that contains the file.
  - Key (string) -- [REQUIRED]
    
    The name of the file.
- Attributes (list) --
  
  Custom attributes to apply to the document. Use the custom attributes to provide additional information for searching, to provide facets for refining searches, and to provide additional information in the query response.
  
  For example, 'DataSourceId' and 'DataSourceSyncJobId' are custom attributes that provide information on the synchronization of documents running on a data source. Note, 'DataSourceSyncJobId' could be an optional custom attribute as Amazon Kendra will use the ID of a running sync job.
  - (dict) --
    
    A document attribute or metadata field. To create custom document attributes, see Custom attributes.
    - Key (string) -- [REQUIRED]
      
      The identifier for the attribute.
    - Value (dict) -- [REQUIRED]
      
      The value of the attribute.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings. The default maximum length or number of strings is 10.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
- AccessControlList (list) --
  
  Information on principals (users and/or groups) and which documents they should have access to. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.
  - (dict) --
    
    Provides user and group information for user context filtering.
    - Name (string) -- [REQUIRED]
      
      The name of the user or group.
    - Type (string) -- [REQUIRED]
      
      The type of principal.
    - Access (string) -- [REQUIRED]
      
      Whether to allow or deny document access to the principal.
    - DataSourceId (string) --
      
      The identifier of the data source the principal should access documents from.
- HierarchicalAccessControlList (list) --
  
  The list of principal lists that define the hierarchy for which documents users should have access to.
  - (dict) --
    
    Information to define the hierarchy for which documents users should have access to.
    - PrincipalList (list) -- [REQUIRED]
      
      A list of principal lists that define the hierarchy for which documents users should have access to. Each hierarchical list specifies which user or group has allow or deny access for each document.
      - (dict) --
        
        Provides user and group information for user context filtering.
        
        Name (string) -- [REQUIRED]
        
        The name of the user or group.
        
        Type (string) -- [REQUIRED]
        
        The type of principal.
        
        Access (string) -- [REQUIRED]
        
        Whether to allow or deny document access to the principal.
        
        DataSourceId (string) --
        
        The identifier of the data source the principal should access documents from.
- ContentType (string) --
  
  The file type of the document in the Blob field.
  
  If you want to index snippets or subsets of HTML documents instead of the entirety of the HTML documents, you must add the HTML start and closing tags ( <HTML>content</HTML>) around the content.
- AccessControlConfigurationId (string) --
  
  The identifier of the access control configuration that you want to apply to the document.

type CustomDocumentEnrichmentConfiguration:

dict

param CustomDocumentEnrichmentConfiguration:

Configuration information for altering your document metadata and content during the document ingestion process when you use the BatchPutDocument API.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

InlineConfigurations (list) --

Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.
- (dict) --
  
  Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.
  
  For more information, see Customizing document metadata during the ingestion process.
  - Condition (dict) --
    
    Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.
    - ConditionDocumentAttributeKey (string) -- [REQUIRED]
      
      The identifier of the document attribute used for the condition.
      
      For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
      
      Amazon Kendra currently does not support _document_body as an attribute key used for the condition.
    - Operator (string) -- [REQUIRED]
      
      The condition operator.
      
      For example, you can use 'Contains' to partially match a string.
    - ConditionOnValue (dict) --
      
      The value used by the operator.
      
      For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings. The default maximum length or number of strings is 10.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
  - Target (dict) --
    
    Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.
    - TargetDocumentAttributeKey (string) --
      
      The identifier of the target document attribute or metadata field.
      
      For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.
    - TargetDocumentAttributeValueDeletion (boolean) --
      
      TRUE to delete the existing target value for your specified target attribute key. You cannot create a target value and set this to TRUE. To create a target value ( TargetDocumentAttributeValue), set this to FALSE.
    - TargetDocumentAttributeValue (dict) --
      
      The target value you want to create for the target attribute.
      
      For example, 'Finance' could be the target value for the target attribute key 'Department'.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings. The default maximum length or number of strings is 10.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
  - DocumentContentDeletion (boolean) --
    
    TRUE to delete content if the condition used for the target attribute is met.
PreExtractionHookConfiguration (dict) --

Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.
- InvocationCondition (dict) --
  
  The condition used for when a Lambda function should be invoked.
  
  For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.
  - ConditionDocumentAttributeKey (string) -- [REQUIRED]
    
    The identifier of the document attribute used for the condition.
    
    For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
    
    Amazon Kendra currently does not support _document_body as an attribute key used for the condition.
  - Operator (string) -- [REQUIRED]
    
    The condition operator.
    
    For example, you can use 'Contains' to partially match a string.
  - ConditionOnValue (dict) --
    
    The value used by the operator.
    
    For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
    - StringValue (string) --
      
      A string, such as "department".
    - StringListValue (list) --
      
      A list of strings. The default maximum length or number of strings is 10.
      - (string) --
    - LongValue (integer) --
      
      A long integer value.
    - DateValue (datetime) --
      
      A date expressed as an ISO 8601 string.
      
      It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
- LambdaArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of an IAM role with permission to run a Lambda function during ingestion. For more information, see an IAM roles for Amazon Kendra.
- S3Bucket (string) -- [REQUIRED]
  
  Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.
PostExtractionHookConfiguration (dict) --

Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.
- InvocationCondition (dict) --
  
  The condition used for when a Lambda function should be invoked.
  
  For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.
  - ConditionDocumentAttributeKey (string) -- [REQUIRED]
    
    The identifier of the document attribute used for the condition.
    
    For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
    
    Amazon Kendra currently does not support _document_body as an attribute key used for the condition.
  - Operator (string) -- [REQUIRED]
    
    The condition operator.
    
    For example, you can use 'Contains' to partially match a string.
  - ConditionOnValue (dict) --
    
    The value used by the operator.
    
    For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
    - StringValue (string) --
      
      A string, such as "department".
    - StringListValue (list) --
      
      A list of strings. The default maximum length or number of strings is 10.
      - (string) --
    - LongValue (integer) --
      
      A long integer value.
    - DateValue (datetime) --
      
      A date expressed as an ISO 8601 string.
      
      It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
- LambdaArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of an IAM role with permission to run a Lambda function during ingestion. For more information, see an IAM roles for Amazon Kendra.
- S3Bucket (string) -- [REQUIRED]
  
  Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.
RoleArn (string) --

The Amazon Resource Name (ARN) of an IAM role with permission to run PreExtractionHookConfiguration and PostExtractionHookConfiguration for altering document metadata and content during the document ingestion process. For more information, see an IAM roles for Amazon Kendra.

rtype:

dict

returns:

Response Syntax

{
    'FailedDocuments': [
        {
            'Id': 'string',
            'DataSourceId': 'string',
            'ErrorCode': 'InternalError'|'InvalidRequest',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

(dict) --
- FailedDocuments (list) --
  
  A list of documents that were not added to the index because the document failed a validation check. Each document contains an error message that indicates why the document couldn't be added to the index.
  
  If there was an error adding a document to an index the error is reported in your Amazon Web Services CloudWatch log. For more information, see Monitoring Amazon Kendra with Amazon CloudWatch logs.
  - (dict) --
    
    Provides information about a document that could not be indexed.
    - Id (string) --
      
      The identifier of the document.
    - DataSourceId (string) --
      
      The identifier of the data source connector that the failed document belongs to.
    - ErrorCode (string) --
      
      The type of error that caused the document to fail to be indexed.
    - ErrorMessage (string) --
      
      A description of the reason why the document could not be indexed.

CreateIndex (updated)

Link ¶
Changes (request)

{'Edition': {'GEN_AI_ENTERPRISE_EDITION'}}

Creates an Amazon Kendra index. Index creation is an asynchronous API. To determine if index creation has completed, check the Status field returned from a call to DescribeIndex. The Status field is set to ACTIVE when the index is ready to use.

Once the index is active, you can index your documents using the BatchPutDocument API or using one of the supported data sources.

For an example of creating an index and data source using the Python SDK, see Getting started with Python SDK. For an example of creating an index and data source using the Java SDK, see Getting started with Java SDK.

See also: AWS API Documentation

Request Syntax

client.create_index(
    Name='string',
    Edition='DEVELOPER_EDITION'|'ENTERPRISE_EDITION'|'GEN_AI_ENTERPRISE_EDITION',
    RoleArn='string',
    ServerSideEncryptionConfiguration={
        'KmsKeyId': 'string'
    },
    Description='string',
    ClientToken='string',
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ],
    UserTokenConfigurations=[
        {
            'JwtTokenTypeConfiguration': {
                'KeyLocation': 'URL'|'SECRET_MANAGER',
                'URL': 'string',
                'SecretManagerArn': 'string',
                'UserNameAttributeField': 'string',
                'GroupAttributeField': 'string',
                'Issuer': 'string',
                'ClaimRegex': 'string'
            },
            'JsonTokenTypeConfiguration': {
                'UserNameAttributeField': 'string',
                'GroupAttributeField': 'string'
            }
        },
    ],
    UserContextPolicy='ATTRIBUTE_FILTER'|'USER_TOKEN',
    UserGroupResolutionConfiguration={
        'UserGroupResolutionMode': 'AWS_SSO'|'NONE'
    }
)

type Name:

string

param Name:

[REQUIRED]

A name for the index.

type Edition:

string

param Edition:

The Amazon Kendra edition to use for the index. Choose DEVELOPER_EDITION for indexes intended for development, testing, or proof of concept. Use ENTERPRISE_EDITION for production. Use GEN_AI_ENTERPRISE_EDITION for creating generative AI applications. Once you set the edition for an index, it can't be changed.

The Edition parameter is optional. If you don't supply a value, the default is ENTERPRISE_EDITION.

For more information on quota limits for Gen AI Enterprise Edition, Enterprise Edition, and Developer Edition indices, see Quotas.

type RoleArn:

string

param RoleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of an IAM role with permission to access your Amazon CloudWatch logs and metrics. For more information, see IAM access roles for Amazon Kendra.

type ServerSideEncryptionConfiguration:

dict

param ServerSideEncryptionConfiguration:

The identifier of the KMS customer managed key (CMK) that's used to encrypt data indexed by Amazon Kendra. Amazon Kendra doesn't support asymmetric CMKs.

KmsKeyId (string) --

The identifier of the KMS key. Amazon Kendra doesn't support asymmetric keys.

type Description:

string

param Description:

A description for the index.

type ClientToken:

string

param ClientToken:

A token that you provide to identify the request to create an index. Multiple calls to the CreateIndex API with the same client token will create only one index.

This field is autopopulated if not provided.

type Tags:

list

param Tags:

A list of key-value pairs that identify or categorize the index. You can also use tags to help control access to the index. Tag keys and values can consist of Unicode letters, digits, white space, and any of the following symbols: _ . : / = + - @.

(dict) --

A key-value pair that identifies or categorizes an index, FAQ, data source, or other resource. TA tag key and value can consist of Unicode letters, digits, white space, and any of the following symbols: _ . : / = + - @.
- Key (string) -- [REQUIRED]
  
  The key for the tag. Keys are not case sensitive and must be unique for the index, FAQ, data source, or other resource.
- Value (string) -- [REQUIRED]
  
  The value associated with the tag. The value may be an empty string but it can't be null.

type UserTokenConfigurations:

list

param UserTokenConfigurations:

The user token configuration.

(dict) --

Provides the configuration information for a token.

Warning

If you're using an Amazon Kendra Gen AI Enterprise Edition index and you try to use UserTokenConfigurations to configure user context policy, Amazon Kendra returns a ValidationException error.
- JwtTokenTypeConfiguration (dict) --
  
  Information about the JWT token type configuration.
  - KeyLocation (string) -- [REQUIRED]
    
    The location of the key.
  - URL (string) --
    
    The signing key URL.
  - SecretManagerArn (string) --
    
    The Amazon Resource Name (arn) of the secret.
  - UserNameAttributeField (string) --
    
    The user name attribute field.
  - GroupAttributeField (string) --
    
    The group attribute field.
  - Issuer (string) --
    
    The issuer of the token.
  - ClaimRegex (string) --
    
    The regular expression that identifies the claim.
- JsonTokenTypeConfiguration (dict) --
  
  Information about the JSON token type configuration.
  - UserNameAttributeField (string) -- [REQUIRED]
    
    The user name attribute field.
  - GroupAttributeField (string) -- [REQUIRED]
    
    The group attribute field.

type UserContextPolicy:

string

param UserContextPolicy:

The user context policy.

All indexed content is searchable and displayable for all users. If you want to filter search results on user context, you can use the attribute filters of _user_id and _group_ids or you can provide user and group information in UserContext.

USER_TOKEN

Enables token-based user access control to filter search results on user context. All documents with no access control and all documents accessible to the user will be searchable and displayable.

type UserGroupResolutionConfiguration:

dict

param UserGroupResolutionConfiguration:

Gets users and groups from IAM Identity Center identity source. To configure this, see UserGroupResolutionConfiguration. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

UserGroupResolutionMode (string) -- [REQUIRED]

The identity store provider (mode) you want to use to get users and groups. IAM Identity Center is currently the only available mode. Your users and groups must exist in an IAM Identity Center identity source in order to use this mode.

rtype:

dict

returns:

Response Syntax

{
    'Id': 'string'
}

Response Structure

(dict) --
- Id (string) --
  
  The identifier of the index. Use this identifier when you query an index, set up a data source, or index a document.

DescribeIndex (updated)

Link ¶
Changes (response)

{'Edition': {'GEN_AI_ENTERPRISE_EDITION'}}

Gets information about an Amazon Kendra index.

See also: AWS API Documentation

Request Syntax

client.describe_index(
    Id='string'
)

type Id:

string

param Id:

[REQUIRED]

The identifier of the index you want to get information on.

rtype:

dict

returns:

Response Syntax

{
    'Name': 'string',
    'Id': 'string',
    'Edition': 'DEVELOPER_EDITION'|'ENTERPRISE_EDITION'|'GEN_AI_ENTERPRISE_EDITION',
    'RoleArn': 'string',
    'ServerSideEncryptionConfiguration': {
        'KmsKeyId': 'string'
    },
    'Status': 'CREATING'|'ACTIVE'|'DELETING'|'FAILED'|'UPDATING'|'SYSTEM_UPDATING',
    'Description': 'string',
    'CreatedAt': datetime(2015, 1, 1),
    'UpdatedAt': datetime(2015, 1, 1),
    'DocumentMetadataConfigurations': [
        {
            'Name': 'string',
            'Type': 'STRING_VALUE'|'STRING_LIST_VALUE'|'LONG_VALUE'|'DATE_VALUE',
            'Relevance': {
                'Freshness': True|False,
                'Importance': 123,
                'Duration': 'string',
                'RankOrder': 'ASCENDING'|'DESCENDING',
                'ValueImportanceMap': {
                    'string': 123
                }
            },
            'Search': {
                'Facetable': True|False,
                'Searchable': True|False,
                'Displayable': True|False,
                'Sortable': True|False
            }
        },
    ],
    'IndexStatistics': {
        'FaqStatistics': {
            'IndexedQuestionAnswersCount': 123
        },
        'TextDocumentStatistics': {
            'IndexedTextDocumentsCount': 123,
            'IndexedTextBytes': 123
        }
    },
    'ErrorMessage': 'string',
    'CapacityUnits': {
        'StorageCapacityUnits': 123,
        'QueryCapacityUnits': 123
    },
    'UserTokenConfigurations': [
        {
            'JwtTokenTypeConfiguration': {
                'KeyLocation': 'URL'|'SECRET_MANAGER',
                'URL': 'string',
                'SecretManagerArn': 'string',
                'UserNameAttributeField': 'string',
                'GroupAttributeField': 'string',
                'Issuer': 'string',
                'ClaimRegex': 'string'
            },
            'JsonTokenTypeConfiguration': {
                'UserNameAttributeField': 'string',
                'GroupAttributeField': 'string'
            }
        },
    ],
    'UserContextPolicy': 'ATTRIBUTE_FILTER'|'USER_TOKEN',
    'UserGroupResolutionConfiguration': {
        'UserGroupResolutionMode': 'AWS_SSO'|'NONE'
    }
}

Response Structure

(dict) --
- Name (string) --
  
  The name of the index.
- Id (string) --
  
  The identifier of the index.
- Edition (string) --
  
  The Amazon Kendra edition used for the index. You decide the edition when you create the index.
- RoleArn (string) --
  
  The Amazon Resource Name (ARN) of the IAM role that gives Amazon Kendra permission to write to your Amazon CloudWatch logs.
- ServerSideEncryptionConfiguration (dict) --
  
  The identifier of the KMS customer master key (CMK) that is used to encrypt your data. Amazon Kendra doesn't support asymmetric CMKs.
  - KmsKeyId (string) --
    
    The identifier of the KMS key. Amazon Kendra doesn't support asymmetric keys.
- Status (string) --
  
  The current status of the index. When the value is ACTIVE, the index is ready for use. If the Status field value is FAILED, the ErrorMessage field contains a message that explains why.
- Description (string) --
  
  The description for the index.
- CreatedAt (datetime) --
  
  The Unix timestamp when the index was created.
- UpdatedAt (datetime) --
  
  The Unix timestamp when the index was last updated.
- DocumentMetadataConfigurations (list) --
  
  Configuration information for document metadata or fields. Document metadata are fields or attributes associated with your documents. For example, the company department name associated with each document.
  - (dict) --
    
    Specifies the properties, such as relevance tuning and searchability, of an index field.
    - Name (string) --
      
      The name of the index field.
    - Type (string) --
      
      The data type of the index field.
    - Relevance (dict) --
      
      Provides tuning parameters to determine how the field affects the search results.
      - Freshness (boolean) --
        
        Indicates that this field determines how "fresh" a document is. For example, if document 1 was created on November 5, and document 2 was created on October 31, document 1 is "fresher" than document 2. Only applies to DATE fields.
      - Importance (integer) --
        
        The relative importance of the field in the search. Larger numbers provide more of a boost than smaller numbers.
      - Duration (string) --
        
        Specifies the time period that the boost applies to. For example, to make the boost apply to documents with the field value within the last month, you would use "2628000s". Once the field value is beyond the specified range, the effect of the boost drops off. The higher the importance, the faster the effect drops off. If you don't specify a value, the default is 3 months. The value of the field is a numeric string followed by the character "s", for example "86400s" for one day, or "604800s" for one week.
        
        Only applies to DATE fields.
      - RankOrder (string) --
        
        Determines how values should be interpreted.
        
        When the RankOrder field is ASCENDING, higher numbers are better. For example, a document with a rating score of 10 is higher ranking than a document with a rating score of 1.
        
        When the RankOrder field is DESCENDING, lower numbers are better. For example, in a task tracking application, a priority 1 task is more important than a priority 5 task.
        
        Only applies to LONG fields.
      - ValueImportanceMap (dict) --
        
        A list of values that should be given a different boost when they appear in the result list. For example, if you are boosting a field called "department", query terms that match the department field are boosted in the result. However, you can add entries from the department field to boost documents with those values higher.
        
        For example, you can add entries to the map with names of departments. If you add "HR",5 and "Legal",3 those departments are given special attention when they appear in the metadata of a document. When those terms appear they are given the specified importance instead of the regular importance for the boost.
        
        (string) --
        
        (integer) --
    - Search (dict) --
      
      Provides information about how the field is used during a search.
      - Facetable (boolean) --
        
        Indicates that the field can be used to create search facets, a count of results for each value in the field. The default is false .
      - Searchable (boolean) --
        
        Determines whether the field is used in the search. If the Searchable field is true, you can use relevance tuning to manually tune how Amazon Kendra weights the field in the search. The default is true for string fields and false for number and date fields.
      - Displayable (boolean) --
        
        Determines whether the field is returned in the query response. The default is true.
      - Sortable (boolean) --
        
        Determines whether the field can be used to sort the results of a query. If you specify sorting on a field that does not have Sortable set to true, Amazon Kendra returns an exception. The default is false.
- IndexStatistics (dict) --
  
  Provides information about the number of FAQ questions and answers and the number of text documents indexed.
  - FaqStatistics (dict) --
    
    The number of question and answer topics in the index.
    - IndexedQuestionAnswersCount (integer) --
      
      The total number of FAQ questions and answers for an index.
  - TextDocumentStatistics (dict) --
    
    The number of text documents indexed.
    - IndexedTextDocumentsCount (integer) --
      
      The number of text documents indexed.
    - IndexedTextBytes (integer) --
      
      The total size, in bytes, of the indexed documents.
- ErrorMessage (string) --
  
  When the Status field value is FAILED, the ErrorMessage field contains a message that explains why.
- CapacityUnits (dict) --
  
  For Enterprise Edition indexes, you can choose to use additional capacity to meet the needs of your application. This contains the capacity units used for the index. A query or document storage capacity of zero indicates that the index is using the default capacity. For more information on the default capacity for an index and adjusting this, see Adjusting capacity.
  - StorageCapacityUnits (integer) --
    
    The amount of extra storage capacity for an index. A single capacity unit provides 30 GB of storage space or 100,000 documents, whichever is reached first. You can add up to 100 extra capacity units.
  - QueryCapacityUnits (integer) --
    
    The amount of extra query capacity for an index and GetQuerySuggestions capacity.
    
    A single extra capacity unit for an index provides 0.1 queries per second or approximately 8,000 queries per day. You can add up to 100 extra capacity units.
    
    GetQuerySuggestions capacity is five times the provisioned query capacity for an index, or the base capacity of 2.5 calls per second, whichever is higher. For example, the base capacity for an index is 0.1 queries per second, and GetQuerySuggestions capacity has a base of 2.5 calls per second. If you add another 0.1 queries per second to total 0.2 queries per second for an index, the GetQuerySuggestions capacity is 2.5 calls per second (higher than five times 0.2 queries per second).
- UserTokenConfigurations (list) --
  
  The user token configuration for the Amazon Kendra index.
  - (dict) --
    
    Provides the configuration information for a token.
    
    Warning
    
    If you're using an Amazon Kendra Gen AI Enterprise Edition index and you try to use UserTokenConfigurations to configure user context policy, Amazon Kendra returns a ValidationException error.
    - JwtTokenTypeConfiguration (dict) --
      
      Information about the JWT token type configuration.
      - KeyLocation (string) --
        
        The location of the key.
      - URL (string) --
        
        The signing key URL.
      - SecretManagerArn (string) --
        
        The Amazon Resource Name (arn) of the secret.
      - UserNameAttributeField (string) --
        
        The user name attribute field.
      - GroupAttributeField (string) --
        
        The group attribute field.
      - Issuer (string) --
        
        The issuer of the token.
      - ClaimRegex (string) --
        
        The regular expression that identifies the claim.
    - JsonTokenTypeConfiguration (dict) --
      
      Information about the JSON token type configuration.
      - UserNameAttributeField (string) --
        
        The user name attribute field.
      - GroupAttributeField (string) --
        
        The group attribute field.
- UserContextPolicy (string) --
  
  The user context policy for the Amazon Kendra index.
- UserGroupResolutionConfiguration (dict) --
  
  Whether you have enabled IAM Identity Center identity source for your users and groups. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.
  - UserGroupResolutionMode (string) --
    
    The identity store provider (mode) you want to use to get users and groups. IAM Identity Center is currently the only available mode. Your users and groups must exist in an IAM Identity Center identity source in order to use this mode.

ListIndices (updated)

Link ¶
Changes (response)

{'IndexConfigurationSummaryItems': {'Edition': {'GEN_AI_ENTERPRISE_EDITION'}}}

Lists the Amazon Kendra indexes that you created.

See also: AWS API Documentation

Request Syntax

client.list_indices(
    NextToken='string',
    MaxResults=123
)

type NextToken:

string

param NextToken:

If the previous response was incomplete (because there is more data to retrieve), Amazon Kendra returns a pagination token in the response. You can use this pagination token to retrieve the next set of indexes.

type MaxResults:

integer

param MaxResults:

The maximum number of indices to return.

rtype:

dict

returns:

Response Syntax

{
    'IndexConfigurationSummaryItems': [
        {
            'Name': 'string',
            'Id': 'string',
            'Edition': 'DEVELOPER_EDITION'|'ENTERPRISE_EDITION'|'GEN_AI_ENTERPRISE_EDITION',
            'CreatedAt': datetime(2015, 1, 1),
            'UpdatedAt': datetime(2015, 1, 1),
            'Status': 'CREATING'|'ACTIVE'|'DELETING'|'FAILED'|'UPDATING'|'SYSTEM_UPDATING'
        },
    ],
    'NextToken': 'string'
}

Response Structure

(dict) --
- IndexConfigurationSummaryItems (list) --
  
  An array of summary information on the configuration of one or more indexes.
  - (dict) --
    
    Summary information on the configuration of an index.
    - Name (string) --
      
      The name of the index.
    - Id (string) --
      
      A identifier for the index. Use this to identify the index when you are using APIs such as Query, DescribeIndex, UpdateIndex, and DeleteIndex.
    - Edition (string) --
      
      Indicates whether the index is a Enterprise Edition index or a Developer Edition index.
    - CreatedAt (datetime) --
      
      The Unix timestamp when the index was created.
    - UpdatedAt (datetime) --
      
      The Unix timestamp when the index was last updated.
    - Status (string) --
      
      The current status of the index. When the status is ACTIVE, the index is ready to search.
- NextToken (string) --
  
  If the response is truncated, Amazon Kendra returns this token that you can use in the subsequent request to retrieve the next set of indexes.