AWS API Changes

2022/07/14 - AWSKendraFrontendService - 5 new 1 updated api methods

Changes This release adds AccessControlConfigurations which allow you to redefine your document level access control without the need for content re-indexing.

ListAccessControlConfigurations (new)

Link ¶

Lists one or more access control configurations for an index. This includes user and group access information for your documents. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

See also: AWS API Documentation

Request Syntax

client.list_access_control_configurations(
    IndexId='string',
    NextToken='string',
    MaxResults=123
)

type IndexId

string

param IndexId

[REQUIRED]

The identifier of the index for the access control configuration.

type NextToken

string

param NextToken

If the previous response was incomplete (because there is more data to retrieve), Amazon Kendra returns a pagination token in the response. You can use this pagination token to retrieve the next set of access control configurations.

type MaxResults

integer

param MaxResults

The maximum number of access control configurations to return.

rtype

dict

returns

Response Syntax

{
    'NextToken': 'string',
    'AccessControlConfigurations': [
        {
            'Id': 'string'
        },
    ]
}

Response Structure

(dict) --
- NextToken (string) --
  
  If the response is truncated, Amazon Kendra returns this token that you can use in the subsequent request to retrieve the next set of access control configurations.
- AccessControlConfigurations (list) --
  
  The details of your access control configurations.
  - (dict) --
    
    Summary information on an access control configuration that you created for your documents in an index.
    - Id (string) --
      
      The identifier of the access control configuration.

UpdateAccessControlConfiguration (new)

Link ¶

Updates an access control configuration for your documents in an index. This includes user and group access information for your documents. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

You can update an access control configuration you created without indexing all of your documents again. For example, your index contains top-secret company documents that only certain employees or users should access. You created an 'allow' access control configuration for one user who recently joined the 'top-secret' team, switching from a team with 'deny' access to top-secret documents. However, the user suddenly returns to their previous team and should no longer have access to top secret documents. You can update the access control configuration to re-configure access control for your documents as circumstances change.

You call the BatchPutDocument API to apply the updated access control configuration, with the AccessControlConfigurationId included in the Document object. If you use an S3 bucket as a data source, you synchronize your data source to apply the the AccessControlConfigurationId in the .metadata.json file. Amazon Kendra currently only supports access control configuration for S3 data sources and documents indexed using the BatchPutDocument API.

See also: AWS API Documentation

Request Syntax

client.update_access_control_configuration(
    IndexId='string',
    Id='string',
    Name='string',
    Description='string',
    AccessControlList=[
        {
            'Name': 'string',
            'Type': 'USER'|'GROUP',
            'Access': 'ALLOW'|'DENY',
            'DataSourceId': 'string'
        },
    ],
    HierarchicalAccessControlList=[
        {
            'PrincipalList': [
                {
                    'Name': 'string',
                    'Type': 'USER'|'GROUP',
                    'Access': 'ALLOW'|'DENY',
                    'DataSourceId': 'string'
                },
            ]
        },
    ]
)

type IndexId

string

param IndexId

[REQUIRED]

The identifier of the index for an access control configuration.

type Id

string

param Id

[REQUIRED]

The identifier of the access control configuration you want to update.

type Name

string

param Name

A new name for the access control configuration.

type Description

string

param Description

A new description for the access control configuration.

type AccessControlList

list

param AccessControlList

Information you want to update on principals (users and/or groups) and which documents they should have access to. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

(dict) --

Provides user and group information for user context filtering.
- Name (string) -- [REQUIRED]
  
  The name of the user or group.
- Type (string) -- [REQUIRED]
  
  The type of principal.
- Access (string) -- [REQUIRED]
  
  Whether to allow or deny document access to the principal.
- DataSourceId (string) --
  
  The identifier of the data source the principal should access documents from.

type HierarchicalAccessControlList

list

param HierarchicalAccessControlList

The updated list of principal lists that define the hierarchy for which documents users should have access to.

(dict) --

Information to define the hierarchy for which documents users should have access to.
- PrincipalList (list) -- [REQUIRED]
  
  A list of principal lists that define the hierarchy for which documents users should have access to. Each hierarchical list specifies which user or group has allow or deny access for each document.
  - (dict) --
    
    Provides user and group information for user context filtering.
    - Name (string) -- [REQUIRED]
      
      The name of the user or group.
    - Type (string) -- [REQUIRED]
      
      The type of principal.
    - Access (string) -- [REQUIRED]
      
      Whether to allow or deny document access to the principal.
    - DataSourceId (string) --
      
      The identifier of the data source the principal should access documents from.

rtype

dict

returns

Response Syntax

{}

Response Structure

(dict) --

DescribeAccessControlConfiguration (new)

Link ¶

Gets information about an access control configuration that you created for your documents in an index. This includes user and group access information for your documents. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

See also: AWS API Documentation

Request Syntax

client.describe_access_control_configuration(
    IndexId='string',
    Id='string'
)

type IndexId

string

param IndexId

[REQUIRED]

The identifier of the index for an access control configuration.

type Id

string

param Id

[REQUIRED]

The identifier of the access control configuration you want to get information on.

rtype

dict

returns

Response Syntax

{
    'Name': 'string',
    'Description': 'string',
    'ErrorMessage': 'string',
    'AccessControlList': [
        {
            'Name': 'string',
            'Type': 'USER'|'GROUP',
            'Access': 'ALLOW'|'DENY',
            'DataSourceId': 'string'
        },
    ],
    'HierarchicalAccessControlList': [
        {
            'PrincipalList': [
                {
                    'Name': 'string',
                    'Type': 'USER'|'GROUP',
                    'Access': 'ALLOW'|'DENY',
                    'DataSourceId': 'string'
                },
            ]
        },
    ]
}

Response Structure

(dict) --
- Name (string) --
  
  The name for the access control configuration.
- Description (string) --
  
  The description for the access control configuration.
- ErrorMessage (string) --
  
  The error message containing details if there are issues processing the access control configuration.
- AccessControlList (list) --
  
  Information on principals (users and/or groups) and which documents they should have access to. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.
  - (dict) --
    
    Provides user and group information for user context filtering.
    - Name (string) --
      
      The name of the user or group.
    - Type (string) --
      
      The type of principal.
    - Access (string) --
      
      Whether to allow or deny document access to the principal.
    - DataSourceId (string) --
      
      The identifier of the data source the principal should access documents from.
- HierarchicalAccessControlList (list) --
  
  The list of principal lists that define the hierarchy for which documents users should have access to.
  - (dict) --
    
    Information to define the hierarchy for which documents users should have access to.
    - PrincipalList (list) --
      
      A list of principal lists that define the hierarchy for which documents users should have access to. Each hierarchical list specifies which user or group has allow or deny access for each document.
      - (dict) --
        
        Provides user and group information for user context filtering.
        
        Name (string) --
        
        The name of the user or group.
        
        Type (string) --
        
        The type of principal.
        
        Access (string) --
        
        Whether to allow or deny document access to the principal.
        
        DataSourceId (string) --
        
        The identifier of the data source the principal should access documents from.

DeleteAccessControlConfiguration (new)

Link ¶

Deletes an access control configuration that you created for your documents in an index. This includes user and group access information for your documents. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

See also: AWS API Documentation

Request Syntax

client.delete_access_control_configuration(
    IndexId='string',
    Id='string'
)

type IndexId

string

param IndexId

[REQUIRED]

The identifier of the index for an access control configuration.

type Id

string

param Id

[REQUIRED]

The identifier of the access control configuration you want to delete.

rtype

dict

returns

Response Syntax

{}

Response Structure

(dict) --

CreateAccessControlConfiguration (new)

Link ¶

Creates an access configuration for your documents. This includes user and group access information for your documents. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

You can use this to re-configure your existing document level access control without indexing all of your documents again. For example, your index contains top-secret company documents that only certain employees or users should access. One of these users leaves the company or switches to a team that should be blocked from access to top-secret documents. Your documents in your index still give this user access to top-secret documents due to the user having access at the time your documents were indexed. You can create a specific access control configuration for this user with deny access. You can later update the access control configuration to allow access in the case the user returns to the company and re-joins the 'top-secret' team. You can re-configure access control for your documents circumstances change.

To apply your access control configuration to certain documents, you call the BatchPutDocument API with the AccessControlConfigurationId included in the Document object. If you use an S3 bucket as a data source, you update the .metadata.json with the AccessControlConfigurationId and synchronize your data source. Amazon Kendra currently only supports access control configuration for S3 data sources and documents indexed using the BatchPutDocument API.

See also: AWS API Documentation

Request Syntax

client.create_access_control_configuration(
    IndexId='string',
    Name='string',
    Description='string',
    AccessControlList=[
        {
            'Name': 'string',
            'Type': 'USER'|'GROUP',
            'Access': 'ALLOW'|'DENY',
            'DataSourceId': 'string'
        },
    ],
    HierarchicalAccessControlList=[
        {
            'PrincipalList': [
                {
                    'Name': 'string',
                    'Type': 'USER'|'GROUP',
                    'Access': 'ALLOW'|'DENY',
                    'DataSourceId': 'string'
                },
            ]
        },
    ],
    ClientToken='string'
)

type IndexId

string

param IndexId

[REQUIRED]

The identifier of the index to create an access control configuration for your documents.

type Name

string

param Name

[REQUIRED]

A name for the access control configuration.

type Description

string

param Description

A description for the access control configuration.

type AccessControlList

list

param AccessControlList

Information on principals (users and/or groups) and which documents they should have access to. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.

(dict) --

Provides user and group information for user context filtering.
- Name (string) -- [REQUIRED]
  
  The name of the user or group.
- Type (string) -- [REQUIRED]
  
  The type of principal.
- Access (string) -- [REQUIRED]
  
  Whether to allow or deny document access to the principal.
- DataSourceId (string) --
  
  The identifier of the data source the principal should access documents from.

type HierarchicalAccessControlList

list

param HierarchicalAccessControlList

The list of principal lists that define the hierarchy for which documents users should have access to.

(dict) --

Information to define the hierarchy for which documents users should have access to.
- PrincipalList (list) -- [REQUIRED]
  
  A list of principal lists that define the hierarchy for which documents users should have access to. Each hierarchical list specifies which user or group has allow or deny access for each document.
  - (dict) --
    
    Provides user and group information for user context filtering.
    - Name (string) -- [REQUIRED]
      
      The name of the user or group.
    - Type (string) -- [REQUIRED]
      
      The type of principal.
    - Access (string) -- [REQUIRED]
      
      Whether to allow or deny document access to the principal.
    - DataSourceId (string) --
      
      The identifier of the data source the principal should access documents from.

type ClientToken

string

param ClientToken

A token that you provide to identify the request to create an access control configuration. Multiple calls to the CreateAccessControlConfiguration API with the same client token will create only one access control configuration.

This field is autopopulated if not provided.

rtype

dict

returns

Response Syntax

{
    'Id': 'string'
}

Response Structure

(dict) --
- Id (string) --
  
  The identifier of the access control configuration for your documents in an index.

BatchPutDocument (updated)

Link ¶
Changes (request)

{'Documents': {'AccessControlConfigurationId': 'string'}}

Adds one or more documents to an index.

The BatchPutDocument API enables you to ingest inline documents or a set of documents stored in an Amazon S3 bucket. Use this API to ingest your text and unstructured text into an index, add custom attributes to the documents, and to attach an access control list to the documents added to the index.

The documents are indexed asynchronously. You can see the progress of the batch using Amazon Web Services CloudWatch. Any error messages related to processing the batch are sent to your Amazon Web Services CloudWatch log.

For an example of ingesting inline documents using Python and Java SDKs, see Adding files directly to an index.

See also: AWS API Documentation

Request Syntax

client.batch_put_document(
    IndexId='string',
    RoleArn='string',
    Documents=[
        {
            'Id': 'string',
            'Title': 'string',
            'Blob': b'bytes',
            'S3Path': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Attributes': [
                {
                    'Key': 'string',
                    'Value': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
            ],
            'AccessControlList': [
                {
                    'Name': 'string',
                    'Type': 'USER'|'GROUP',
                    'Access': 'ALLOW'|'DENY',
                    'DataSourceId': 'string'
                },
            ],
            'HierarchicalAccessControlList': [
                {
                    'PrincipalList': [
                        {
                            'Name': 'string',
                            'Type': 'USER'|'GROUP',
                            'Access': 'ALLOW'|'DENY',
                            'DataSourceId': 'string'
                        },
                    ]
                },
            ],
            'ContentType': 'PDF'|'HTML'|'MS_WORD'|'PLAIN_TEXT'|'PPT',
            'AccessControlConfigurationId': 'string'
        },
    ],
    CustomDocumentEnrichmentConfiguration={
        'InlineConfigurations': [
            {
                'Condition': {
                    'ConditionDocumentAttributeKey': 'string',
                    'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                    'ConditionOnValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'Target': {
                    'TargetDocumentAttributeKey': 'string',
                    'TargetDocumentAttributeValueDeletion': True|False,
                    'TargetDocumentAttributeValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'DocumentContentDeletion': True|False
            },
        ],
        'PreExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'PostExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'RoleArn': 'string'
    }
)

type IndexId

string

param IndexId

[REQUIRED]

The identifier of the index to add the documents to. You need to create the index first using the CreateIndex API.

type RoleArn

string

param RoleArn

The Amazon Resource Name (ARN) of a role that is allowed to run the BatchPutDocument API. For more information, see IAM Roles for Amazon Kendra.

type Documents

list

param Documents

[REQUIRED]

One or more documents to add to the index.

Documents have the following file size limits.

5 MB total size for inline documents
50 MB total size for files from an S3 bucket
5 MB extracted text for any file

For more information about file size and transaction per second quotas, see Quotas.

(dict) --

A document in an index.
- Id (string) -- [REQUIRED]
  
  A unique identifier of the document in the index.
  
  Note, each document ID must be unique per index. You cannot create a data source to index your documents with their unique IDs and then use the BatchPutDocument API to index the same documents, or vice versa. You can delete a data source and then use the BatchPutDocument API to index the same documents, or vice versa.
- Title (string) --
  
  The title of the document.
- Blob (bytes) --
  
  The contents of the document.
  
  Documents passed to the Blob parameter must be base64 encoded. Your code might not need to encode the document file bytes if you're using an Amazon Web Services SDK to call Amazon Kendra APIs. If you are calling the Amazon Kendra endpoint directly using REST, you must base64 encode the contents before sending.
- S3Path (dict) --
  
  Information required to find a specific file in an Amazon S3 bucket.
  - Bucket (string) -- [REQUIRED]
    
    The name of the S3 bucket that contains the file.
  - Key (string) -- [REQUIRED]
    
    The name of the file.
- Attributes (list) --
  
  Custom attributes to apply to the document. Use the custom attributes to provide additional information for searching, to provide facets for refining searches, and to provide additional information in the query response.
  
  For example, 'DataSourceId' and 'DataSourceSyncJobId' are custom attributes that provide information on the synchronization of documents running on a data source. Note, 'DataSourceSyncJobId' could be an optional custom attribute as Amazon Kendra will use the ID of a running sync job.
  - (dict) --
    
    A document attribute or metadata field. To create custom document attributes, see Custom attributes.
    - Key (string) -- [REQUIRED]
      
      The identifier for the attribute.
    - Value (dict) -- [REQUIRED]
      
      The value of the attribute.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
- AccessControlList (list) --
  
  Information on principals (users and/or groups) and which documents they should have access to. This is useful for user context filtering, where search results are filtered based on the user or their group access to documents.
  - (dict) --
    
    Provides user and group information for user context filtering.
    - Name (string) -- [REQUIRED]
      
      The name of the user or group.
    - Type (string) -- [REQUIRED]
      
      The type of principal.
    - Access (string) -- [REQUIRED]
      
      Whether to allow or deny document access to the principal.
    - DataSourceId (string) --
      
      The identifier of the data source the principal should access documents from.
- HierarchicalAccessControlList (list) --
  
  The list of principal lists that define the hierarchy for which documents users should have access to.
  - (dict) --
    
    Information to define the hierarchy for which documents users should have access to.
    - PrincipalList (list) -- [REQUIRED]
      
      A list of principal lists that define the hierarchy for which documents users should have access to. Each hierarchical list specifies which user or group has allow or deny access for each document.
      - (dict) --
        
        Provides user and group information for user context filtering.
        
        Name (string) -- [REQUIRED]
        
        The name of the user or group.
        
        Type (string) -- [REQUIRED]
        
        The type of principal.
        
        Access (string) -- [REQUIRED]
        
        Whether to allow or deny document access to the principal.
        
        DataSourceId (string) --
        
        The identifier of the data source the principal should access documents from.
- ContentType (string) --
  
  The file type of the document in the Blob field.
- AccessControlConfigurationId (string) --
  
  The identifier of the access control configuration that you want to apply to the document.

type CustomDocumentEnrichmentConfiguration

dict

param CustomDocumentEnrichmentConfiguration

Configuration information for altering your document metadata and content during the document ingestion process when you use the BatchPutDocument API.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

InlineConfigurations (list) --

Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.
- (dict) --
  
  Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.
  
  For more information, see Customizing document metadata during the ingestion process.
  - Condition (dict) --
    
    Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.
    - ConditionDocumentAttributeKey (string) -- [REQUIRED]
      
      The identifier of the document attribute used for the condition.
      
      For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
      
      Amazon Kendra currently does not support _document_body as an attribute key used for the condition.
    - Operator (string) -- [REQUIRED]
      
      The condition operator.
      
      For example, you can use 'Contains' to partially match a string.
    - ConditionOnValue (dict) --
      
      The value used by the operator.
      
      For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
  - Target (dict) --
    
    Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.
    - TargetDocumentAttributeKey (string) --
      
      The identifier of the target document attribute or metadata field.
      
      For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.
    - TargetDocumentAttributeValueDeletion (boolean) --
      
      TRUE to delete the existing target value for your specified target attribute key. You cannot create a target value and set this to TRUE . To create a target value ( TargetDocumentAttributeValue ), set this to FALSE .
    - TargetDocumentAttributeValue (dict) --
      
      The target value you want to create for the target attribute.
      
      For example, 'Finance' could be the target value for the target attribute key 'Department'.
      - StringValue (string) --
        
        A string, such as "department".
      - StringListValue (list) --
        
        A list of strings.
        
        (string) --
      - LongValue (integer) --
        
        A long integer value.
      - DateValue (datetime) --
        
        A date expressed as an ISO 8601 string.
        
        It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
  - DocumentContentDeletion (boolean) --
    
    TRUE to delete content if the condition used for the target attribute is met.
PreExtractionHookConfiguration (dict) --

Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.
- InvocationCondition (dict) --
  
  The condition used for when a Lambda function should be invoked.
  
  For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.
  - ConditionDocumentAttributeKey (string) -- [REQUIRED]
    
    The identifier of the document attribute used for the condition.
    
    For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
    
    Amazon Kendra currently does not support _document_body as an attribute key used for the condition.
  - Operator (string) -- [REQUIRED]
    
    The condition operator.
    
    For example, you can use 'Contains' to partially match a string.
  - ConditionOnValue (dict) --
    
    The value used by the operator.
    
    For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
    - StringValue (string) --
      
      A string, such as "department".
    - StringListValue (list) --
      
      A list of strings.
      - (string) --
    - LongValue (integer) --
      
      A long integer value.
    - DateValue (datetime) --
      
      A date expressed as an ISO 8601 string.
      
      It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
- LambdaArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.
- S3Bucket (string) -- [REQUIRED]
  
  Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.
PostExtractionHookConfiguration (dict) --

Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.
- InvocationCondition (dict) --
  
  The condition used for when a Lambda function should be invoked.
  
  For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.
  - ConditionDocumentAttributeKey (string) -- [REQUIRED]
    
    The identifier of the document attribute used for the condition.
    
    For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.
    
    Amazon Kendra currently does not support _document_body as an attribute key used for the condition.
  - Operator (string) -- [REQUIRED]
    
    The condition operator.
    
    For example, you can use 'Contains' to partially match a string.
  - ConditionOnValue (dict) --
    
    The value used by the operator.
    
    For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.
    - StringValue (string) --
      
      A string, such as "department".
    - StringListValue (list) --
      
      A list of strings.
      - (string) --
    - LongValue (integer) --
      
      A long integer value.
    - DateValue (datetime) --
      
      A date expressed as an ISO 8601 string.
      
      It is important for the time zone to be included in the ISO 8601 date-time format. For example, 2012-03-25T12:30:10+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.
- LambdaArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.
- S3Bucket (string) -- [REQUIRED]
  
  Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.
RoleArn (string) --

The Amazon Resource Name (ARN) of a role with permission to run PreExtractionHookConfiguration and PostExtractionHookConfiguration for altering document metadata and content during the document ingestion process. For more information, see IAM roles for Amazon Kendra.

rtype

dict

returns

Response Syntax

{
    'FailedDocuments': [
        {
            'Id': 'string',
            'ErrorCode': 'InternalError'|'InvalidRequest',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

(dict) --
- FailedDocuments (list) --
  
  A list of documents that were not added to the index because the document failed a validation check. Each document contains an error message that indicates why the document couldn't be added to the index.
  
  If there was an error adding a document to an index the error is reported in your Amazon Web Services CloudWatch log. For more information, see Monitoring Amazon Kendra with Amazon CloudWatch Logs
  - (dict) --
    
    Provides information about a document that could not be indexed.
    - Id (string) --
      
      The unique identifier of the document.
    - ErrorCode (string) --
      
      The type of error that caused the document to fail to be indexed.
    - ErrorMessage (string) --
      
      A description of the reason why the document could not be indexed.