AWS API Changes

2026/06/17 - Agents for Amazon Bedrock - 3 new15 updated api methods

Changes Launching Bedrock Managed Knowledge Bases. Added support for resource-based policies on Knowledge Base resources, enabling cross-account access for Managed Knowledge Bases.

DeleteResourcePolicy (new)

Link ¶

Removes the resource policy associated with a knowledge base. After deletion, other AWS accounts can no longer access the knowledge base using cross-account permissions.

See also: AWS API Documentation

Request Syntax

client.delete_resource_policy(
    resourceArn='string',
    expectedRevisionId='string'
)

type resourceArn:

string

param resourceArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the knowledge base to remove the resource policy from.

type expectedRevisionId:

string

param expectedRevisionId:

The expected revision identifier of the resource policy. Use this to prevent conflicts when multiple users update the same policy concurrently.

rtype:

dict

returns:

Response Syntax

{
    'resourceArn': 'string',
    'revisionId': 'string'
}

Response Structure

(dict) --
- resourceArn (string) --
  
  The ARN of the knowledge base that the resource policy was removed from.
- revisionId (string) --
  
  The revision identifier after the resource policy was deleted.

GetResourcePolicy (new)

Link ¶

Retrieves the resource policy associated with a knowledge base.

See also: AWS API Documentation

Request Syntax

client.get_resource_policy(
    resourceArn='string'
)

type resourceArn:

string

param resourceArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the knowledge base to retrieve the resource policy for.

rtype:

dict

returns:

Response Syntax

{
    'resourceArn': 'string',
    'policy': 'string',
    'revisionId': 'string'
}

Response Structure

(dict) --
- resourceArn (string) --
  
  The ARN of the knowledge base that the resource policy is associated with.
- policy (string) --
  
  The JSON-formatted resource policy associated with the knowledge base.
- revisionId (string) --
  
  The revision identifier of the resource policy.

PutResourcePolicy (new)

Link ¶

Associates a resource policy with a knowledge base. A resource policy allows other AWS accounts to access the knowledge base. For more information, see Cross-account access for knowledge bases.

See also: AWS API Documentation

Request Syntax

client.put_resource_policy(
    resourceArn='string',
    policy='string',
    expectedRevisionId='string'
)

type resourceArn:

string

param resourceArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the knowledge base to attach the resource policy to.

type policy:

string

param policy:

[REQUIRED]

The JSON-formatted resource policy to associate with the knowledge base.

type expectedRevisionId:

string

param expectedRevisionId:

The expected revision identifier of the resource policy. Use this to prevent conflicts when multiple users update the same policy concurrently. Specify the revisionId from the most recent GetResourcePolicy or PutResourcePolicy response.

rtype:

dict

returns:

Response Syntax

{
    'resourceArn': 'string',
    'revisionId': 'string'
}

Response Structure

(dict) --
- resourceArn (string) --
  
  The ARN of the knowledge base that the resource policy was attached to.
- revisionId (string) --
  
  The revision identifier of the resource policy. Use this value in the expectedRevisionId field of a subsequent PutResourcePolicy or DeleteResourcePolicy request.

CreateDataSource (updated)

Link ¶
Changes (request, response)
Request

{'dataSourceConfiguration': {'managedKnowledgeBaseConnectorConfiguration': {'connectorParameters': {},
                                                                            'deletionProtectionConfiguration': {'deletionProtectionStatus': 'ENABLED '
                                                                                                                                            '| '
                                                                                                                                            'DISABLED',
                                                                                                                'deletionProtectionThreshold': 'integer'},
                                                                            'mediaExtractionConfiguration': {'audioExtractionConfiguration': {'audioExtractionStatus': 'ENABLED '
                                                                                                                                                                       '| '
                                                                                                                                                                       'DISABLED'},
                                                                                                             'imageExtractionConfiguration': {'imageExtractionStatus': 'ENABLED '
                                                                                                                                                                       '| '
                                                                                                                                                                       'DISABLED'},
                                                                                                             'videoExtractionConfiguration': {'videoExtractionStatus': 'ENABLED '
                                                                                                                                                                       '| '
                                                                                                                                                                       'DISABLED'}}},
                             'type': {'MANAGED_KNOWLEDGE_BASE_CONNECTOR'}},
 'vectorIngestionConfiguration': {'parsingConfiguration': {'parsingStrategy': {'SMART_PARSING'}}}}

Response

{'dataSource': {'dataSourceConfiguration': {'managedKnowledgeBaseConnectorConfiguration': {'connectorParameters': {},
                                                                                           'deletionProtectionConfiguration': {'deletionProtectionStatus': 'ENABLED '
                                                                                                                                                           '| '
                                                                                                                                                           'DISABLED',
                                                                                                                               'deletionProtectionThreshold': 'integer'},
                                                                                           'mediaExtractionConfiguration': {'audioExtractionConfiguration': {'audioExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'},
                                                                                                                            'imageExtractionConfiguration': {'imageExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'},
                                                                                                                            'videoExtractionConfiguration': {'videoExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'}}},
                                            'type': {'MANAGED_KNOWLEDGE_BASE_CONNECTOR'}},
                'status': {'CREATING', 'FAILED', 'UPDATING'},
                'vectorIngestionConfiguration': {'parsingConfiguration': {'parsingStrategy': {'SMART_PARSING'}}}}}

Connects a knowledge base to a data source. You specify the configuration for the specific data source service in the dataSourceConfiguration field.

See also: AWS API Documentation

Request Syntax

client.create_data_source(
    knowledgeBaseId='string',
    clientToken='string',
    name='string',
    description='string',
    dataSourceConfiguration={
        'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA'|'MANAGED_KNOWLEDGE_BASE_CONNECTOR',
        'managedKnowledgeBaseConnectorConfiguration': {
            'deletionProtectionConfiguration': {
                'deletionProtectionStatus': 'ENABLED'|'DISABLED',
                'deletionProtectionThreshold': 123
            },
            'mediaExtractionConfiguration': {
                'imageExtractionConfiguration': {
                    'imageExtractionStatus': 'ENABLED'|'DISABLED'
                },
                'audioExtractionConfiguration': {
                    'audioExtractionStatus': 'ENABLED'|'DISABLED'
                },
                'videoExtractionConfiguration': {
                    'videoExtractionStatus': 'ENABLED'|'DISABLED'
                }
            },
            'connectorParameters': {...}|[...]|123|123.4|'string'|True|None
        },
        's3Configuration': {
            'bucketArn': 'string',
            'inclusionPrefixes': [
                'string',
            ],
            'bucketOwnerAccountId': 'string'
        },
        'webConfiguration': {
            'sourceConfiguration': {
                'urlConfiguration': {
                    'seedUrls': [
                        {
                            'url': 'string'
                        },
                    ]
                }
            },
            'crawlerConfiguration': {
                'crawlerLimits': {
                    'rateLimit': 123,
                    'maxPages': 123
                },
                'inclusionFilters': [
                    'string',
                ],
                'exclusionFilters': [
                    'string',
                ],
                'scope': 'HOST_ONLY'|'SUBDOMAINS',
                'userAgent': 'string',
                'userAgentHeader': 'string'
            }
        },
        'confluenceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'string',
                'hostType': 'SAAS',
                'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'string'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'string',
                                'inclusionFilters': [
                                    'string',
                                ],
                                'exclusionFilters': [
                                    'string',
                                ]
                            },
                        ]
                    }
                }
            }
        },
        'salesforceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'string',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'string'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'string',
                                'inclusionFilters': [
                                    'string',
                                ],
                                'exclusionFilters': [
                                    'string',
                                ]
                            },
                        ]
                    }
                }
            }
        },
        'sharePointConfiguration': {
            'sourceConfiguration': {
                'tenantId': 'string',
                'domain': 'string',
                'siteUrls': [
                    'string',
                ],
                'hostType': 'ONLINE',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'string'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'string',
                                'inclusionFilters': [
                                    'string',
                                ],
                                'exclusionFilters': [
                                    'string',
                                ]
                            },
                        ]
                    }
                }
            }
        }
    },
    dataDeletionPolicy='RETAIN'|'DELETE',
    serverSideEncryptionConfiguration={
        'kmsKeyArn': 'string'
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 123,
                'overlapPercentage': 123
            },
            'hierarchicalChunkingConfiguration': {
                'levelConfigurations': [
                    {
                        'maxTokens': 123
                    },
                ],
                'overlapTokens': 123
            },
            'semanticChunkingConfiguration': {
                'maxTokens': 123,
                'bufferSize': 123,
                'breakpointPercentileThreshold': 123
            }
        },
        'customTransformationConfiguration': {
            'intermediateStorage': {
                's3Location': {
                    'uri': 'string'
                }
            },
            'transformations': [
                {
                    'transformationFunction': {
                        'transformationLambdaConfiguration': {
                            'lambdaArn': 'string'
                        }
                    },
                    'stepToApply': 'POST_CHUNKING'
                },
            ]
        },
        'parsingConfiguration': {
            'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION'|'SMART_PARSING',
            'bedrockFoundationModelConfiguration': {
                'modelArn': 'string',
                'parsingPrompt': {
                    'parsingPromptText': 'string'
                },
                'parsingModality': 'MULTIMODAL'
            },
            'bedrockDataAutomationConfiguration': {
                'parsingModality': 'MULTIMODAL'
            }
        },
        'contextEnrichmentConfiguration': {
            'type': 'BEDROCK_FOUNDATION_MODEL',
            'bedrockFoundationModelConfiguration': {
                'enrichmentStrategyConfiguration': {
                    'method': 'CHUNK_ENTITY_EXTRACTION'
                },
                'modelArn': 'string'
            }
        }
    }
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base to which to add the data source.

type clientToken:

string

param clientToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

This field is autopopulated if not provided.

type name:

string

param name:

[REQUIRED]

The name of the data source.

type description:

string

param description:

A description of the data source.

type dataSourceConfiguration:

dict

param dataSourceConfiguration:

[REQUIRED]

The connection configuration for the data source.

type (string) -- [REQUIRED]

The type of data source.
managedKnowledgeBaseConnectorConfiguration (dict) --

Contains the configuration for a data source that connects a managed knowledge base to a supported data source connector. Specify this object when the data source type is MANAGED_KNOWLEDGE_BASE_CONNECTOR.
- deletionProtectionConfiguration (dict) --
  
  A safeguard against accidental bulk deletion of indexed content.
  - deletionProtectionStatus (string) -- [REQUIRED]
    
    Enable or disable deletion protection for the connector.
  - deletionProtectionThreshold (integer) --
    
    The threshold is the maximum percentage of documents that a sync job can delete from your index. If a sync would delete more than this percentage, the sync skips its delete phase, leaving your indexed documents in place. Not supported for the Custom connector.
- mediaExtractionConfiguration (dict) --
  
  Configuration for extracting media (images, audio, video) from data source files.
  - imageExtractionConfiguration (dict) --
    
    Configuration for image extraction.
    - imageExtractionStatus (string) -- [REQUIRED]
      
      Whether image extraction is enabled or disabled.
  - audioExtractionConfiguration (dict) --
    
    Configuration for audio extraction.
    - audioExtractionStatus (string) -- [REQUIRED]
      
      Whether audio extraction is enabled or disabled.
  - videoExtractionConfiguration (dict) --
    
    Configuration for video extraction.
    - videoExtractionStatus (string) -- [REQUIRED]
      
      Whether video extraction is enabled or disabled.
- connectorParameters (:ref:`document<document>`) --
  
  Connector-specific parameters. For more information, see Connect a data source.
s3Configuration (dict) --

The configuration information to connect to Amazon S3 as your data source for self-managed knowledge bases. To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration.
- bucketArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
- inclusionPrefixes (list) --
  
  A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
  - (string) --
- bucketOwnerAccountId (string) --
  
  The account ID for the owner of the S3 bucket.
webConfiguration (dict) --

The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.

Note

To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Web crawler data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The source configuration details for the web data source.
  - urlConfiguration (dict) -- [REQUIRED]
    
    The configuration of the URL/URLs.
    - seedUrls (list) --
      
      One or more seed or starting point URLs.
      - (dict) --
        
        The seed or starting point URL. You should be authorized to crawl the URL.
        
        url (string) --
        
        A seed or starting point URL.
- crawlerConfiguration (dict) --
  
  The Web Crawler configuration details for the web data source.
  - crawlerLimits (dict) --
    
    The configuration of crawl limits for the web URLs.
    - rateLimit (integer) --
      
      The max rate at which pages are crawled, up to 300 per minute per host.
    - maxPages (integer) --
      
      The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
  - inclusionFilters (list) --
    
    A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
    - (string) --
  - exclusionFilters (list) --
    
    A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
    - (string) --
  - scope (string) --
    
    The scope of what is crawled for your URLs.
    
    You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
  - userAgent (string) --
    
    Returns the user agent suffix for your web crawler.
  - userAgentHeader (string) --
    
    A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID. You can optionally append a custom suffix to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
confluenceConfiguration (dict) --

The configuration information to connect to Confluence as your data source for self-managed knowledge bases.

Note

To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Confluence data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The endpoint information to connect to your Confluence data source.
  - hostUrl (string) -- [REQUIRED]
    
    The Confluence host URL or instance URL.
  - hostType (string) -- [REQUIRED]
    
    The supported host type, whether online/cloud or server/on-premises.
  - authType (string) -- [REQUIRED]
    
    The supported authentication type to authenticate and connect to your Confluence instance.
  - credentialsSecretArn (string) -- [REQUIRED]
    
    The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
- crawlerConfiguration (dict) --
  
  The configuration of the Confluence content. For example, configuring specific types of Confluence content.
  - filterConfiguration (dict) --
    
    The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
    - type (string) -- [REQUIRED]
      
      The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
    - patternObjectFilter (dict) --
      
      The configuration of filtering certain objects or content types of the data source.
      - filters (list) -- [REQUIRED]
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) -- [REQUIRED]
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
salesforceConfiguration (dict) --

The configuration information to connect to Salesforce as your data source.

Note

Salesforce data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The endpoint information to connect to your Salesforce data source.
  - hostUrl (string) -- [REQUIRED]
    
    The Salesforce host URL or instance URL.
  - authType (string) -- [REQUIRED]
    
    The supported authentication type to authenticate and connect to your Salesforce instance.
  - credentialsSecretArn (string) -- [REQUIRED]
    
    The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
- crawlerConfiguration (dict) --
  
  The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
  - filterConfiguration (dict) --
    
    The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
    - type (string) -- [REQUIRED]
      
      The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
    - patternObjectFilter (dict) --
      
      The configuration of filtering certain objects or content types of the data source.
      - filters (list) -- [REQUIRED]
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) -- [REQUIRED]
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
sharePointConfiguration (dict) --

The configuration information to connect to SharePoint as your data source for self-managed knowledge bases.

Note

To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. SharePoint data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The endpoint information to connect to your SharePoint data source.
  - tenantId (string) --
    
    The identifier of your Microsoft 365 tenant.
  - domain (string) -- [REQUIRED]
    
    The domain of your SharePoint instance or site URL/URLs.
  - siteUrls (list) -- [REQUIRED]
    
    A list of one or more SharePoint site URLs.
    - (string) --
  - hostType (string) -- [REQUIRED]
    
    The supported host type, whether online/cloud or server/on-premises.
  - authType (string) -- [REQUIRED]
    
    The supported authentication type to authenticate and connect to your SharePoint site/sites.
  - credentialsSecretArn (string) -- [REQUIRED]
    
    The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
- crawlerConfiguration (dict) --
  
  The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
  - filterConfiguration (dict) --
    
    The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
    - type (string) -- [REQUIRED]
      
      The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
    - patternObjectFilter (dict) --
      
      The configuration of filtering certain objects or content types of the data source.
      - filters (list) -- [REQUIRED]
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) -- [REQUIRED]
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --

type dataDeletionPolicy:

string

param dataDeletionPolicy:

The data deletion policy for the data source.

You can set the data deletion policy to:

DELETE: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the data. This flag is ignored if an Amazon Web Services account is deleted.
RETAIN: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.

type serverSideEncryptionConfiguration:

dict

param serverSideEncryptionConfiguration:

Contains details about the server-side encryption for the data source.

kmsKeyArn (string) --

The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.

type vectorIngestionConfiguration:

dict

param vectorIngestionConfiguration:

Contains details about how to ingest the documents in the data source.

chunkingConfiguration (dict) --

Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
- chunkingStrategy (string) -- [REQUIRED]
  
  Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
  - FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
  - HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
  - SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
  - NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
- fixedSizeChunkingConfiguration (dict) --
  
  Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
  - maxTokens (integer) -- [REQUIRED]
    
    The maximum number of tokens to include in a chunk.
  - overlapPercentage (integer) -- [REQUIRED]
    
    The percentage of overlap between adjacent chunks of a data source.
- hierarchicalChunkingConfiguration (dict) --
  
  Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
  - levelConfigurations (list) -- [REQUIRED]
    
    Token settings for each layer.
    - (dict) --
      
      Token settings for a layer in a hierarchical chunking configuration.
      - maxTokens (integer) -- [REQUIRED]
        
        The maximum number of tokens that a chunk can contain in this layer.
  - overlapTokens (integer) -- [REQUIRED]
    
    The number of tokens to repeat across chunks in the same layer.
- semanticChunkingConfiguration (dict) --
  
  Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
  - maxTokens (integer) -- [REQUIRED]
    
    The maximum number of tokens that a chunk can contain.
  - bufferSize (integer) -- [REQUIRED]
    
    The buffer size.
  - breakpointPercentileThreshold (integer) -- [REQUIRED]
    
    The dissimilarity threshold for splitting chunks.
customTransformationConfiguration (dict) --

A custom document transformer for parsed data source documents.
- intermediateStorage (dict) -- [REQUIRED]
  
  An S3 bucket path for input and output objects.
  - s3Location (dict) -- [REQUIRED]
    
    An S3 bucket path.
    - uri (string) -- [REQUIRED]
      
      The location's URI. For example, s3://my-bucket/chunk-processor/.
- transformations (list) -- [REQUIRED]
  
  A Lambda function that processes documents.
  - (dict) --
    
    A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
    - transformationFunction (dict) -- [REQUIRED]
      
      A Lambda function that processes documents.
      - transformationLambdaConfiguration (dict) -- [REQUIRED]
        
        The Lambda function.
        
        lambdaArn (string) -- [REQUIRED]
        
        The function's ARN identifier.
    - stepToApply (string) -- [REQUIRED]
      
      When the service applies the transformation.
parsingConfiguration (dict) --

Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
- parsingStrategy (string) -- [REQUIRED]
  
  The parsing strategy for the data source. Only SMART_PARSING can be selected for managed knowledge bases. For more information, see Customize ingestion for managed knowledge bases.
- bedrockFoundationModelConfiguration (dict) --
  
  If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
  - modelArn (string) -- [REQUIRED]
    
    The ARN of the foundation model to use for parsing.
  - parsingPrompt (dict) --
    
    Instructions for interpreting the contents of a document.
    - parsingPromptText (string) -- [REQUIRED]
      
      Instructions for interpreting the contents of a document.
  - parsingModality (string) --
    
    Specifies whether to enable parsing of multimodal data, including both text and/or images.
- bedrockDataAutomationConfiguration (dict) --
  
  If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
  - parsingModality (string) --
    
    Specifies whether to enable parsing of multimodal data, including both text and/or images.
contextEnrichmentConfiguration (dict) --

The context enrichment configuration used for ingestion of the data into the vector store.
- type (string) -- [REQUIRED]
  
  The method used for context enrichment. It must be Amazon Bedrock foundation models.
- bedrockFoundationModelConfiguration (dict) --
  
  The configuration of the Amazon Bedrock foundation model used for context enrichment.
  - enrichmentStrategyConfiguration (dict) -- [REQUIRED]
    
    The enrichment stategy used to provide additional context. For example, Neptune GraphRAG uses Amazon Bedrock foundation models to perform chunk entity extraction.
    - method (string) -- [REQUIRED]
      
      The method used for the context enrichment strategy.
  - modelArn (string) -- [REQUIRED]
    
    The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.

rtype:

dict

returns:

Response Syntax

{
    'dataSource': {
        'knowledgeBaseId': 'string',
        'dataSourceId': 'string',
        'name': 'string',
        'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL'|'CREATING'|'UPDATING'|'FAILED',
        'description': 'string',
        'dataSourceConfiguration': {
            'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA'|'MANAGED_KNOWLEDGE_BASE_CONNECTOR',
            'managedKnowledgeBaseConnectorConfiguration': {
                'deletionProtectionConfiguration': {
                    'deletionProtectionStatus': 'ENABLED'|'DISABLED',
                    'deletionProtectionThreshold': 123
                },
                'mediaExtractionConfiguration': {
                    'imageExtractionConfiguration': {
                        'imageExtractionStatus': 'ENABLED'|'DISABLED'
                    },
                    'audioExtractionConfiguration': {
                        'audioExtractionStatus': 'ENABLED'|'DISABLED'
                    },
                    'videoExtractionConfiguration': {
                        'videoExtractionStatus': 'ENABLED'|'DISABLED'
                    }
                },
                'connectorParameters': {...}|[...]|123|123.4|'string'|True|None
            },
            's3Configuration': {
                'bucketArn': 'string',
                'inclusionPrefixes': [
                    'string',
                ],
                'bucketOwnerAccountId': 'string'
            },
            'webConfiguration': {
                'sourceConfiguration': {
                    'urlConfiguration': {
                        'seedUrls': [
                            {
                                'url': 'string'
                            },
                        ]
                    }
                },
                'crawlerConfiguration': {
                    'crawlerLimits': {
                        'rateLimit': 123,
                        'maxPages': 123
                    },
                    'inclusionFilters': [
                        'string',
                    ],
                    'exclusionFilters': [
                        'string',
                    ],
                    'scope': 'HOST_ONLY'|'SUBDOMAINS',
                    'userAgent': 'string',
                    'userAgentHeader': 'string'
                }
            },
            'confluenceConfiguration': {
                'sourceConfiguration': {
                    'hostUrl': 'string',
                    'hostType': 'SAAS',
                    'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            },
            'salesforceConfiguration': {
                'sourceConfiguration': {
                    'hostUrl': 'string',
                    'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            },
            'sharePointConfiguration': {
                'sourceConfiguration': {
                    'tenantId': 'string',
                    'domain': 'string',
                    'siteUrls': [
                        'string',
                    ],
                    'hostType': 'ONLINE',
                    'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            }
        },
        'serverSideEncryptionConfiguration': {
            'kmsKeyArn': 'string'
        },
        'vectorIngestionConfiguration': {
            'chunkingConfiguration': {
                'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC',
                'fixedSizeChunkingConfiguration': {
                    'maxTokens': 123,
                    'overlapPercentage': 123
                },
                'hierarchicalChunkingConfiguration': {
                    'levelConfigurations': [
                        {
                            'maxTokens': 123
                        },
                    ],
                    'overlapTokens': 123
                },
                'semanticChunkingConfiguration': {
                    'maxTokens': 123,
                    'bufferSize': 123,
                    'breakpointPercentileThreshold': 123
                }
            },
            'customTransformationConfiguration': {
                'intermediateStorage': {
                    's3Location': {
                        'uri': 'string'
                    }
                },
                'transformations': [
                    {
                        'transformationFunction': {
                            'transformationLambdaConfiguration': {
                                'lambdaArn': 'string'
                            }
                        },
                        'stepToApply': 'POST_CHUNKING'
                    },
                ]
            },
            'parsingConfiguration': {
                'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION'|'SMART_PARSING',
                'bedrockFoundationModelConfiguration': {
                    'modelArn': 'string',
                    'parsingPrompt': {
                        'parsingPromptText': 'string'
                    },
                    'parsingModality': 'MULTIMODAL'
                },
                'bedrockDataAutomationConfiguration': {
                    'parsingModality': 'MULTIMODAL'
                }
            },
            'contextEnrichmentConfiguration': {
                'type': 'BEDROCK_FOUNDATION_MODEL',
                'bedrockFoundationModelConfiguration': {
                    'enrichmentStrategyConfiguration': {
                        'method': 'CHUNK_ENTITY_EXTRACTION'
                    },
                    'modelArn': 'string'
                }
            }
        },
        'dataDeletionPolicy': 'RETAIN'|'DELETE',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'failureReasons': [
            'string',
        ]
    }
}

Response Structure

(dict) --
- dataSource (dict) --
  
  Contains details about the data source.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge base to which the data source belongs.
  - dataSourceId (string) --
    
    The unique identifier of the data source.
  - name (string) --
    
    The name of the data source.
  - status (string) --
    
    The status of the data source. The following statuses are possible:
    - Available – The data source has been created and is ready for ingestion into the knowledge base.
    - Deleting – The data source is being deleted.
  - description (string) --
    
    The description of the data source.
  - dataSourceConfiguration (dict) --
    
    The connection configuration for the data source.
    - type (string) --
      
      The type of data source.
    - managedKnowledgeBaseConnectorConfiguration (dict) --
      
      Contains the configuration for a data source that connects a managed knowledge base to a supported data source connector. Specify this object when the data source type is MANAGED_KNOWLEDGE_BASE_CONNECTOR.
      - deletionProtectionConfiguration (dict) --
        
        A safeguard against accidental bulk deletion of indexed content.
        
        deletionProtectionStatus (string) --
        
        Enable or disable deletion protection for the connector.
        
        deletionProtectionThreshold (integer) --
        
        The threshold is the maximum percentage of documents that a sync job can delete from your index. If a sync would delete more than this percentage, the sync skips its delete phase, leaving your indexed documents in place. Not supported for the Custom connector.
      - mediaExtractionConfiguration (dict) --
        
        Configuration for extracting media (images, audio, video) from data source files.
        
        imageExtractionConfiguration (dict) --
        
        Configuration for image extraction.
        
        imageExtractionStatus (string) --
        
        Whether image extraction is enabled or disabled.
        
        audioExtractionConfiguration (dict) --
        
        Configuration for audio extraction.
        
        audioExtractionStatus (string) --
        
        Whether audio extraction is enabled or disabled.
        
        videoExtractionConfiguration (dict) --
        
        Configuration for video extraction.
        
        videoExtractionStatus (string) --
        
        Whether video extraction is enabled or disabled.
      - connectorParameters (:ref:`document<document>`) --
        
        Connector-specific parameters. For more information, see Connect a data source.
    - s3Configuration (dict) --
      
      The configuration information to connect to Amazon S3 as your data source for self-managed knowledge bases. To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration.
      - bucketArn (string) --
        
        The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
      - inclusionPrefixes (list) --
        
        A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
        
        (string) --
      - bucketOwnerAccountId (string) --
        
        The account ID for the owner of the S3 bucket.
    - webConfiguration (dict) --
      
      The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Web crawler data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The source configuration details for the web data source.
        
        urlConfiguration (dict) --
        
        The configuration of the URL/URLs.
        
        seedUrls (list) --
        
        One or more seed or starting point URLs.
        
        (dict) --
        
        The seed or starting point URL. You should be authorized to crawl the URL.
        
        url (string) --
        
        A seed or starting point URL.
      - crawlerConfiguration (dict) --
        
        The Web Crawler configuration details for the web data source.
        
        crawlerLimits (dict) --
        
        The configuration of crawl limits for the web URLs.
        
        rateLimit (integer) --
        
        The max rate at which pages are crawled, up to 300 per minute per host.
        
        maxPages (integer) --
        
        The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
        
        (string) --
        
        scope (string) --
        
        The scope of what is crawled for your URLs.
        
        You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
        
        userAgent (string) --
        
        Returns the user agent suffix for your web crawler.
        
        userAgentHeader (string) --
        
        A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID. You can optionally append a custom suffix to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
    - confluenceConfiguration (dict) --
      
      The configuration information to connect to Confluence as your data source for self-managed knowledge bases.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Confluence data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your Confluence data source.
        
        hostUrl (string) --
        
        The Confluence host URL or instance URL.
        
        hostType (string) --
        
        The supported host type, whether online/cloud or server/on-premises.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your Confluence instance.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the Confluence content. For example, configuring specific types of Confluence content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
    - salesforceConfiguration (dict) --
      
      The configuration information to connect to Salesforce as your data source.
      
      Note
      
      Salesforce data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your Salesforce data source.
        
        hostUrl (string) --
        
        The Salesforce host URL or instance URL.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your Salesforce instance.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
    - sharePointConfiguration (dict) --
      
      The configuration information to connect to SharePoint as your data source for self-managed knowledge bases.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. SharePoint data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your SharePoint data source.
        
        tenantId (string) --
        
        The identifier of your Microsoft 365 tenant.
        
        domain (string) --
        
        The domain of your SharePoint instance or site URL/URLs.
        
        siteUrls (list) --
        
        A list of one or more SharePoint site URLs.
        
        (string) --
        
        hostType (string) --
        
        The supported host type, whether online/cloud or server/on-premises.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your SharePoint site/sites.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
  - serverSideEncryptionConfiguration (dict) --
    
    Contains details about the configuration of the server-side encryption.
    - kmsKeyArn (string) --
      
      The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
  - vectorIngestionConfiguration (dict) --
    
    Contains details about how to ingest the documents in the data source.
    - chunkingConfiguration (dict) --
      
      Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
      - chunkingStrategy (string) --
        
        Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
        
        FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
        
        HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
        
        SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
        
        NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
      - fixedSizeChunkingConfiguration (dict) --
        
        Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
        
        maxTokens (integer) --
        
        The maximum number of tokens to include in a chunk.
        
        overlapPercentage (integer) --
        
        The percentage of overlap between adjacent chunks of a data source.
      - hierarchicalChunkingConfiguration (dict) --
        
        Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
        
        levelConfigurations (list) --
        
        Token settings for each layer.
        
        (dict) --
        
        Token settings for a layer in a hierarchical chunking configuration.
        
        maxTokens (integer) --
        
        The maximum number of tokens that a chunk can contain in this layer.
        
        overlapTokens (integer) --
        
        The number of tokens to repeat across chunks in the same layer.
      - semanticChunkingConfiguration (dict) --
        
        Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
        
        maxTokens (integer) --
        
        The maximum number of tokens that a chunk can contain.
        
        bufferSize (integer) --
        
        The buffer size.
        
        breakpointPercentileThreshold (integer) --
        
        The dissimilarity threshold for splitting chunks.
    - customTransformationConfiguration (dict) --
      
      A custom document transformer for parsed data source documents.
      - intermediateStorage (dict) --
        
        An S3 bucket path for input and output objects.
        
        s3Location (dict) --
        
        An S3 bucket path.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
      - transformations (list) --
        
        A Lambda function that processes documents.
        
        (dict) --
        
        A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
        
        transformationFunction (dict) --
        
        A Lambda function that processes documents.
        
        transformationLambdaConfiguration (dict) --
        
        The Lambda function.
        
        lambdaArn (string) --
        
        The function's ARN identifier.
        
        stepToApply (string) --
        
        When the service applies the transformation.
    - parsingConfiguration (dict) --
      
      Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
      - parsingStrategy (string) --
        
        The parsing strategy for the data source. Only SMART_PARSING can be selected for managed knowledge bases. For more information, see Customize ingestion for managed knowledge bases.
      - bedrockFoundationModelConfiguration (dict) --
        
        If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
        
        modelArn (string) --
        
        The ARN of the foundation model to use for parsing.
        
        parsingPrompt (dict) --
        
        Instructions for interpreting the contents of a document.
        
        parsingPromptText (string) --
        
        Instructions for interpreting the contents of a document.
        
        parsingModality (string) --
        
        Specifies whether to enable parsing of multimodal data, including both text and/or images.
      - bedrockDataAutomationConfiguration (dict) --
        
        If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
        
        parsingModality (string) --
        
        Specifies whether to enable parsing of multimodal data, including both text and/or images.
    - contextEnrichmentConfiguration (dict) --
      
      The context enrichment configuration used for ingestion of the data into the vector store.
      - type (string) --
        
        The method used for context enrichment. It must be Amazon Bedrock foundation models.
      - bedrockFoundationModelConfiguration (dict) --
        
        The configuration of the Amazon Bedrock foundation model used for context enrichment.
        
        enrichmentStrategyConfiguration (dict) --
        
        The enrichment stategy used to provide additional context. For example, Neptune GraphRAG uses Amazon Bedrock foundation models to perform chunk entity extraction.
        
        method (string) --
        
        The method used for the context enrichment strategy.
        
        modelArn (string) --
        
        The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
  - dataDeletionPolicy (string) --
    
    The data deletion policy for the data source.
  - createdAt (datetime) --
    
    The time at which the data source was created.
  - updatedAt (datetime) --
    
    The time at which the data source was last updated.
  - failureReasons (list) --
    
    The detailed reasons on the failure to delete a data source.
    - (string) --

CreateKnowledgeBase (updated)

Link ¶
Changes (request, response)
Request

{'knowledgeBaseConfiguration': {'managedKnowledgeBaseConfiguration': {'embeddingModelArn': 'string',
                                                                      'embeddingModelConfiguration': {'bedrockEmbeddingModelConfiguration': {'audio': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}],
                                                                                                                                             'dimensions': 'integer',
                                                                                                                                             'embeddingDataType': 'FLOAT32 '
                                                                                                                                                                  '| '
                                                                                                                                                                  'BINARY',
                                                                                                                                             'video': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}]}},
                                                                      'embeddingModelType': 'CUSTOM '
                                                                                            '| '
                                                                                            'MANAGED',
                                                                      'serverSideEncryptionConfiguration': {'kmsKeyArn': 'string'}},
                                'type': {'MANAGED'}}}

Response

{'knowledgeBase': {'knowledgeBaseConfiguration': {'managedKnowledgeBaseConfiguration': {'embeddingModelArn': 'string',
                                                                                        'embeddingModelConfiguration': {'bedrockEmbeddingModelConfiguration': {'audio': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}],
                                                                                                                                                               'dimensions': 'integer',
                                                                                                                                                               'embeddingDataType': 'FLOAT32 '
                                                                                                                                                                                    '| '
                                                                                                                                                                                    'BINARY',
                                                                                                                                                               'video': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}]}},
                                                                                        'embeddingModelType': 'CUSTOM '
                                                                                                              '| '
                                                                                                              'MANAGED',
                                                                                        'serverSideEncryptionConfiguration': {'kmsKeyArn': 'string'}},
                                                  'type': {'MANAGED'}},
                   'status': {'UPDATE_UNSUCCESSFUL'}}}

Creates a knowledge base. A knowledge base contains your data sources so that Large Language Models (LLMs) can use your data. To create a knowledge base, you must first set up your data sources and configure a supported vector store. For more information, see Set up a knowledge base.

Provide the name and an optional description.
Provide the Amazon Resource Name (ARN) with permissions to create a knowledge base in the roleArn field.
For managed knowledge bases, set embeddingModelType to MANAGED to use the service-managed embedding model, or CUSTOM with an embeddingModelArn to use your own. To use your own KMS key for encryption, provide the ARN in serverSideEncryptionConfiguration. No vector store configuration is required for managed knowledge bases.
For self-managed knowledge bases, provide the embedding model to use in the embeddingModelArn field in the knowledgeBaseConfiguration object.
For self-managed knowledge bases, provide the configuration for your vector store in the storageConfiguration object.
- For an Amazon OpenSearch Service database, use the opensearchServerlessConfiguration object. For more information, see Create a vector store in Amazon OpenSearch Service.
- For an Amazon Aurora database, use the RdsConfiguration object. For more information, see Create a vector store in Amazon Aurora.
- For a Pinecone database, use the pineconeConfiguration object. For more information, see Create a vector store in Pinecone.
- For a Redis Enterprise Cloud database, use the redisEnterpriseCloudConfiguration object. For more information, see Create a vector store in Redis Enterprise Cloud.

See also: AWS API Documentation

Request Syntax

client.create_knowledge_base(
    clientToken='string',
    name='string',
    description='string',
    roleArn='string',
    knowledgeBaseConfiguration={
        'type': 'VECTOR'|'KENDRA'|'SQL'|'MANAGED',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'string',
            'embeddingModelConfiguration': {
                'bedrockEmbeddingModelConfiguration': {
                    'dimensions': 123,
                    'embeddingDataType': 'FLOAT32'|'BINARY',
                    'audio': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ],
                    'video': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ]
                }
            },
            'supplementalDataStorageConfiguration': {
                'storageLocations': [
                    {
                        'type': 'S3',
                        's3Location': {
                            'uri': 'string'
                        }
                    },
                ]
            }
        },
        'managedKnowledgeBaseConfiguration': {
            'embeddingModelType': 'CUSTOM'|'MANAGED',
            'embeddingModelArn': 'string',
            'embeddingModelConfiguration': {
                'bedrockEmbeddingModelConfiguration': {
                    'dimensions': 123,
                    'embeddingDataType': 'FLOAT32'|'BINARY',
                    'audio': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ],
                    'video': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ]
                }
            },
            'serverSideEncryptionConfiguration': {
                'kmsKeyArn': 'string'
            }
        },
        'kendraKnowledgeBaseConfiguration': {
            'kendraIndexArn': 'string'
        },
        'sqlKnowledgeBaseConfiguration': {
            'type': 'REDSHIFT',
            'redshiftConfiguration': {
                'storageConfigurations': [
                    {
                        'type': 'REDSHIFT'|'AWS_DATA_CATALOG',
                        'awsDataCatalogConfiguration': {
                            'tableNames': [
                                'string',
                            ]
                        },
                        'redshiftConfiguration': {
                            'databaseName': 'string'
                        }
                    },
                ],
                'queryEngineConfiguration': {
                    'type': 'SERVERLESS'|'PROVISIONED',
                    'serverlessConfiguration': {
                        'workgroupArn': 'string',
                        'authConfiguration': {
                            'type': 'IAM'|'USERNAME_PASSWORD',
                            'usernamePasswordSecretArn': 'string'
                        }
                    },
                    'provisionedConfiguration': {
                        'clusterIdentifier': 'string',
                        'authConfiguration': {
                            'type': 'IAM'|'USERNAME_PASSWORD'|'USERNAME',
                            'databaseUser': 'string',
                            'usernamePasswordSecretArn': 'string'
                        }
                    }
                },
                'queryGenerationConfiguration': {
                    'executionTimeoutSeconds': 123,
                    'generationContext': {
                        'tables': [
                            {
                                'name': 'string',
                                'description': 'string',
                                'inclusion': 'INCLUDE'|'EXCLUDE',
                                'columns': [
                                    {
                                        'name': 'string',
                                        'description': 'string',
                                        'inclusion': 'INCLUDE'|'EXCLUDE'
                                    },
                                ]
                            },
                        ],
                        'curatedQueries': [
                            {
                                'naturalLanguage': 'string',
                                'sql': 'string'
                            },
                        ]
                    }
                }
            }
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS'|'PINECONE'|'REDIS_ENTERPRISE_CLOUD'|'RDS'|'MONGO_DB_ATLAS'|'NEPTUNE_ANALYTICS'|'OPENSEARCH_MANAGED_CLUSTER'|'S3_VECTORS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'string',
            'vectorIndexName': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'opensearchManagedClusterConfiguration': {
            'domainEndpoint': 'string',
            'domainArn': 'string',
            'vectorIndexName': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'pineconeConfiguration': {
            'connectionString': 'string',
            'credentialsSecretArn': 'string',
            'namespace': 'string',
            'fieldMapping': {
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'redisEnterpriseCloudConfiguration': {
            'endpoint': 'string',
            'vectorIndexName': 'string',
            'credentialsSecretArn': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'rdsConfiguration': {
            'resourceArn': 'string',
            'credentialsSecretArn': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'fieldMapping': {
                'primaryKeyField': 'string',
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string',
                'customMetadataField': 'string'
            }
        },
        'mongoDbAtlasConfiguration': {
            'endpoint': 'string',
            'databaseName': 'string',
            'collectionName': 'string',
            'vectorIndexName': 'string',
            'credentialsSecretArn': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            },
            'endpointServiceName': 'string',
            'textIndexName': 'string'
        },
        'neptuneAnalyticsConfiguration': {
            'graphArn': 'string',
            'fieldMapping': {
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        's3VectorsConfiguration': {
            'vectorBucketArn': 'string',
            'indexArn': 'string',
            'indexName': 'string'
        }
    },
    tags={
        'string': 'string'
    }
)

type clientToken:

string

param clientToken:

This field is autopopulated if not provided.

type name:

string

param name:

[REQUIRED]

A name for the knowledge base.

type description:

string

param description:

A description of the knowledge base.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the knowledge base.

type knowledgeBaseConfiguration:

dict

param knowledgeBaseConfiguration:

[REQUIRED]

Contains details about the embeddings model used for the knowledge base.

type (string) -- [REQUIRED]

The type of data that the data source is converted into for the knowledge base. Choose MANAGED to create a managed knowledge base.
vectorKnowledgeBaseConfiguration (dict) --

Contains details about the model that's used to convert the data source into vector embeddings.
- embeddingModelArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
- embeddingModelConfiguration (dict) --
  
  The embeddings model configuration details for the vector model used in Knowledge Base.
  - bedrockEmbeddingModelConfiguration (dict) --
    
    The vector configuration details on the Bedrock embeddings model.
    - dimensions (integer) --
      
      The dimensions details for the vector configuration used on the Bedrock embeddings model.
    - embeddingDataType (string) --
      
      The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
    - audio (list) --
      
      Configuration settings for processing audio content in multimodal knowledge bases.
      - (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
    - video (list) --
      
      Configuration settings for processing video content in multimodal knowledge bases.
      - (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
- supplementalDataStorageConfiguration (dict) --
  
  If you include multimodal data from your data source, use this object to specify configurations for the storage location of the images extracted from your documents. These images can be retrieved and returned to the end user. They can also be used in generation when using RetrieveAndGenerate.
  - storageLocations (list) -- [REQUIRED]
    
    A list of objects specifying storage locations for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
    - (dict) --
      
      Contains information about a storage location for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
      - type (string) -- [REQUIRED]
        
        Specifies the storage service used for this location.
      - s3Location (dict) --
        
        Contains information about the Amazon S3 location for the extracted multimedia content.
        
        uri (string) -- [REQUIRED]
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
managedKnowledgeBaseConfiguration (dict) --

Configurations for a managed knowledge base.
- embeddingModelType (string) --
  
  Choose CUSTOM to provide your own Bedrock embedding model ARN. Choose MANAGED to use a service-managed embedding model. For more information, see Embedding model options.
- embeddingModelArn (string) --
  
  The ARN for the embeddings model.
- embeddingModelConfiguration (dict) --
  
  The configuration details for the embeddings model.
  - bedrockEmbeddingModelConfiguration (dict) --
    
    The vector configuration details on the Bedrock embeddings model.
    - dimensions (integer) --
      
      The dimensions details for the vector configuration used on the Bedrock embeddings model.
    - embeddingDataType (string) --
      
      The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
    - audio (list) --
      
      Configuration settings for processing audio content in multimodal knowledge bases.
      - (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
    - video (list) --
      
      Configuration settings for processing video content in multimodal knowledge bases.
      - (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
- serverSideEncryptionConfiguration (dict) --
  
  Contains the configuration for server-side encryption for your managed knowledge base.
  - kmsKeyArn (string) --
    
    The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
kendraKnowledgeBaseConfiguration (dict) --

Settings for an Amazon Kendra knowledge base.
- kendraIndexArn (string) -- [REQUIRED]
  
  The ARN of the Amazon Kendra index.
sqlKnowledgeBaseConfiguration (dict) --

Specifies configurations for a knowledge base connected to an SQL database.
- type (string) -- [REQUIRED]
  
  The type of SQL database to connect to the knowledge base.
- redshiftConfiguration (dict) --
  
  Specifies configurations for a knowledge base connected to an Amazon Redshift database.
  - storageConfigurations (list) -- [REQUIRED]
    
    Specifies configurations for Amazon Redshift database storage.
    - (dict) --
      
      Contains configurations for Amazon Redshift data storage. Specify the data storage service to use in the type field and include the corresponding field. For more information, see Build a knowledge base by connecting to a structured data source in the Amazon Bedrock User Guide.
      - type (string) -- [REQUIRED]
        
        The data storage service to use.
      - awsDataCatalogConfiguration (dict) --
        
        Specifies configurations for storage in Glue Data Catalog.
        
        tableNames (list) -- [REQUIRED]
        
        A list of names of the tables to use.
        
        (string) --
      - redshiftConfiguration (dict) --
        
        Specifies configurations for storage in Amazon Redshift.
        
        databaseName (string) -- [REQUIRED]
        
        The name of the Amazon Redshift database.
  - queryEngineConfiguration (dict) -- [REQUIRED]
    
    Specifies configurations for an Amazon Redshift query engine.
    - type (string) -- [REQUIRED]
      
      The type of query engine.
    - serverlessConfiguration (dict) --
      
      Specifies configurations for a serverless Amazon Redshift query engine.
      - workgroupArn (string) -- [REQUIRED]
        
        The ARN of the Amazon Redshift workgroup.
      - authConfiguration (dict) -- [REQUIRED]
        
        Specifies configurations for authentication to an Amazon Redshift provisioned data warehouse.
        
        type (string) -- [REQUIRED]
        
        The type of authentication to use.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
    - provisionedConfiguration (dict) --
      
      Specifies configurations for a provisioned Amazon Redshift query engine.
      - clusterIdentifier (string) -- [REQUIRED]
        
        The ID of the Amazon Redshift cluster.
      - authConfiguration (dict) -- [REQUIRED]
        
        Specifies configurations for authentication to Amazon Redshift.
        
        type (string) -- [REQUIRED]
        
        The type of authentication to use.
        
        databaseUser (string) --
        
        The database username for authentication to an Amazon Redshift provisioned data warehouse.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
  - queryGenerationConfiguration (dict) --
    
    Specifies configurations for generating queries.
    - executionTimeoutSeconds (integer) --
      
      The time after which query generation will time out.
    - generationContext (dict) --
      
      Specifies configurations for context to use during query generation.
      - tables (list) --
        
        An array of objects, each of which defines information about a table in the database.
        
        (dict) --
        
        Contains information about a table for the query engine to consider.
        
        name (string) -- [REQUIRED]
        
        The name of the table for which the other fields in this object apply.
        
        description (string) --
        
        A description of the table that helps the query engine understand the contents of the table.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the table during query generation. If you specify EXCLUDE, the table will be ignored. If you specify INCLUDE, all other tables will be ignored.
        
        columns (list) --
        
        An array of objects, each of which defines information about a column in the table.
        
        (dict) --
        
        Contains information about a column in the current table for the query engine to consider.
        
        name (string) --
        
        The name of the column for which the other fields in this object apply.
        
        description (string) --
        
        A description of the column that helps the query engine understand the contents of the column.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the column during query generation. If you specify EXCLUDE, the column will be ignored. If you specify INCLUDE, all other columns in the table will be ignored.
      - curatedQueries (list) --
        
        An array of objects, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        (dict) --
        
        Contains configurations for a query, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        naturalLanguage (string) -- [REQUIRED]
        
        An example natural language query.
        
        sql (string) -- [REQUIRED]
        
        The SQL equivalent of the natural language query.

type storageConfiguration:

dict

param storageConfiguration:

Contains details about the configuration of the vector database used for the knowledge base.

type (string) -- [REQUIRED]

The vector store service in which the knowledge base is stored.
opensearchServerlessConfiguration (dict) --

Contains the storage configuration of the knowledge base in Amazon OpenSearch Service.
- collectionArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the OpenSearch Service vector store.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the vector store.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
opensearchManagedClusterConfiguration (dict) --

Contains details about the storage configuration of the knowledge base in OpenSearch Managed Cluster. For more information, see Create a vector index in Amazon OpenSearch Service.
- domainEndpoint (string) -- [REQUIRED]
  
  The endpoint URL the OpenSearch domain.
- domainArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the OpenSearch domain.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the vector store.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
pineconeConfiguration (dict) --

Contains the storage configuration of the knowledge base in Pinecone.
- connectionString (string) -- [REQUIRED]
  
  The endpoint URL for your index management page.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Pinecone API key.
- namespace (string) --
  
  The namespace to be used to write new data to your database.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
redisEnterpriseCloudConfiguration (dict) --

Contains the storage configuration of the knowledge base in Redis Enterprise Cloud.
- endpoint (string) -- [REQUIRED]
  
  The endpoint URL of the Redis Enterprise Cloud database.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the vector index.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Redis Enterprise Cloud database.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
rdsConfiguration (dict) --

Contains details about the storage configuration of the knowledge base in Amazon RDS. For more information, see Create a vector index in Amazon RDS.
- resourceArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the vector store.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Amazon RDS database.
- databaseName (string) -- [REQUIRED]
  
  The name of your Amazon RDS database.
- tableName (string) -- [REQUIRED]
  
  The name of the table in the database.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - primaryKeyField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the ID for each entry.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
  - customMetadataField (string) --
    
    Provide a name for the universal metadata field where Amazon Bedrock will store any custom metadata from your data source.
mongoDbAtlasConfiguration (dict) --

Contains the storage configuration of the knowledge base in MongoDB Atlas.
- endpoint (string) -- [REQUIRED]
  
  The endpoint URL of your MongoDB Atlas cluster for your knowledge base.
- databaseName (string) -- [REQUIRED]
  
  The database name in your MongoDB Atlas cluster for your knowledge base.
- collectionName (string) -- [REQUIRED]
  
  The collection name of the knowledge base in MongoDB Atlas.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the MongoDB Atlas vector search index.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that contains user credentials for your MongoDB Atlas cluster.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
- endpointServiceName (string) --
  
  The name of the VPC endpoint service in your account that is connected to your MongoDB Atlas cluster.
- textIndexName (string) --
  
  The name of the text search index in the MongoDB collection. This is required for using the hybrid search feature.
neptuneAnalyticsConfiguration (dict) --

Contains details about the Neptune Analytics configuration of the knowledge base in Amazon Neptune. For more information, see Create a vector index in Amazon Neptune Analytics..
- graphArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the Neptune Analytics vector store.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
s3VectorsConfiguration (dict) --

The configuration settings for storing knowledge base data using S3 vectors. This includes vector index information and S3 bucket details for vector storage.
- vectorBucketArn (string) --
  
  The Amazon Resource Name (ARN) of the S3 bucket where vector embeddings are stored. This bucket contains the vector data used by the knowledge base.
- indexArn (string) --
  
  The Amazon Resource Name (ARN) of the vector index used for the knowledge base. This ARN identifies the specific vector index resource within Amazon Bedrock.
- indexName (string) --
  
  The name of the vector index used for the knowledge base. This name identifies the vector index within the Amazon Bedrock service.

type tags:

dict

param tags:

Specify the key-value pairs for the tags that you want to attach to your knowledge base in this object.

(string) --
- (string) --

rtype:

dict

returns:

Response Syntax

{
    'knowledgeBase': {
        'knowledgeBaseId': 'string',
        'name': 'string',
        'knowledgeBaseArn': 'string',
        'description': 'string',
        'roleArn': 'string',
        'knowledgeBaseConfiguration': {
            'type': 'VECTOR'|'KENDRA'|'SQL'|'MANAGED',
            'vectorKnowledgeBaseConfiguration': {
                'embeddingModelArn': 'string',
                'embeddingModelConfiguration': {
                    'bedrockEmbeddingModelConfiguration': {
                        'dimensions': 123,
                        'embeddingDataType': 'FLOAT32'|'BINARY',
                        'audio': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ],
                        'video': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ]
                    }
                },
                'supplementalDataStorageConfiguration': {
                    'storageLocations': [
                        {
                            'type': 'S3',
                            's3Location': {
                                'uri': 'string'
                            }
                        },
                    ]
                }
            },
            'managedKnowledgeBaseConfiguration': {
                'embeddingModelType': 'CUSTOM'|'MANAGED',
                'embeddingModelArn': 'string',
                'embeddingModelConfiguration': {
                    'bedrockEmbeddingModelConfiguration': {
                        'dimensions': 123,
                        'embeddingDataType': 'FLOAT32'|'BINARY',
                        'audio': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ],
                        'video': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ]
                    }
                },
                'serverSideEncryptionConfiguration': {
                    'kmsKeyArn': 'string'
                }
            },
            'kendraKnowledgeBaseConfiguration': {
                'kendraIndexArn': 'string'
            },
            'sqlKnowledgeBaseConfiguration': {
                'type': 'REDSHIFT',
                'redshiftConfiguration': {
                    'storageConfigurations': [
                        {
                            'type': 'REDSHIFT'|'AWS_DATA_CATALOG',
                            'awsDataCatalogConfiguration': {
                                'tableNames': [
                                    'string',
                                ]
                            },
                            'redshiftConfiguration': {
                                'databaseName': 'string'
                            }
                        },
                    ],
                    'queryEngineConfiguration': {
                        'type': 'SERVERLESS'|'PROVISIONED',
                        'serverlessConfiguration': {
                            'workgroupArn': 'string',
                            'authConfiguration': {
                                'type': 'IAM'|'USERNAME_PASSWORD',
                                'usernamePasswordSecretArn': 'string'
                            }
                        },
                        'provisionedConfiguration': {
                            'clusterIdentifier': 'string',
                            'authConfiguration': {
                                'type': 'IAM'|'USERNAME_PASSWORD'|'USERNAME',
                                'databaseUser': 'string',
                                'usernamePasswordSecretArn': 'string'
                            }
                        }
                    },
                    'queryGenerationConfiguration': {
                        'executionTimeoutSeconds': 123,
                        'generationContext': {
                            'tables': [
                                {
                                    'name': 'string',
                                    'description': 'string',
                                    'inclusion': 'INCLUDE'|'EXCLUDE',
                                    'columns': [
                                        {
                                            'name': 'string',
                                            'description': 'string',
                                            'inclusion': 'INCLUDE'|'EXCLUDE'
                                        },
                                    ]
                                },
                            ],
                            'curatedQueries': [
                                {
                                    'naturalLanguage': 'string',
                                    'sql': 'string'
                                },
                            ]
                        }
                    }
                }
            }
        },
        'storageConfiguration': {
            'type': 'OPENSEARCH_SERVERLESS'|'PINECONE'|'REDIS_ENTERPRISE_CLOUD'|'RDS'|'MONGO_DB_ATLAS'|'NEPTUNE_ANALYTICS'|'OPENSEARCH_MANAGED_CLUSTER'|'S3_VECTORS',
            'opensearchServerlessConfiguration': {
                'collectionArn': 'string',
                'vectorIndexName': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'opensearchManagedClusterConfiguration': {
                'domainEndpoint': 'string',
                'domainArn': 'string',
                'vectorIndexName': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'pineconeConfiguration': {
                'connectionString': 'string',
                'credentialsSecretArn': 'string',
                'namespace': 'string',
                'fieldMapping': {
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'redisEnterpriseCloudConfiguration': {
                'endpoint': 'string',
                'vectorIndexName': 'string',
                'credentialsSecretArn': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'rdsConfiguration': {
                'resourceArn': 'string',
                'credentialsSecretArn': 'string',
                'databaseName': 'string',
                'tableName': 'string',
                'fieldMapping': {
                    'primaryKeyField': 'string',
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string',
                    'customMetadataField': 'string'
                }
            },
            'mongoDbAtlasConfiguration': {
                'endpoint': 'string',
                'databaseName': 'string',
                'collectionName': 'string',
                'vectorIndexName': 'string',
                'credentialsSecretArn': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                },
                'endpointServiceName': 'string',
                'textIndexName': 'string'
            },
            'neptuneAnalyticsConfiguration': {
                'graphArn': 'string',
                'fieldMapping': {
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            's3VectorsConfiguration': {
                'vectorBucketArn': 'string',
                'indexArn': 'string',
                'indexName': 'string'
            }
        },
        'status': 'CREATING'|'ACTIVE'|'DELETING'|'UPDATING'|'FAILED'|'DELETE_UNSUCCESSFUL'|'UPDATE_UNSUCCESSFUL',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'failureReasons': [
            'string',
        ]
    }
}

Response Structure

(dict) --
- knowledgeBase (dict) --
  
  Contains details about the knowledge base.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge base.
  - name (string) --
    
    The name of the knowledge base.
  - knowledgeBaseArn (string) --
    
    The Amazon Resource Name (ARN) of the knowledge base.
  - description (string) --
    
    The description of the knowledge base.
  - roleArn (string) --
    
    The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the knowledge base.
  - knowledgeBaseConfiguration (dict) --
    
    Contains details about the embeddings configuration of the knowledge base.
    - type (string) --
      
      The type of data that the data source is converted into for the knowledge base. Choose MANAGED to create a managed knowledge base.
    - vectorKnowledgeBaseConfiguration (dict) --
      
      Contains details about the model that's used to convert the data source into vector embeddings.
      - embeddingModelArn (string) --
        
        The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
      - embeddingModelConfiguration (dict) --
        
        The embeddings model configuration details for the vector model used in Knowledge Base.
        
        bedrockEmbeddingModelConfiguration (dict) --
        
        The vector configuration details on the Bedrock embeddings model.
        
        dimensions (integer) --
        
        The dimensions details for the vector configuration used on the Bedrock embeddings model.
        
        embeddingDataType (string) --
        
        The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
        
        audio (list) --
        
        Configuration settings for processing audio content in multimodal knowledge bases.
        
        (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
        
        video (list) --
        
        Configuration settings for processing video content in multimodal knowledge bases.
        
        (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
      - supplementalDataStorageConfiguration (dict) --
        
        If you include multimodal data from your data source, use this object to specify configurations for the storage location of the images extracted from your documents. These images can be retrieved and returned to the end user. They can also be used in generation when using RetrieveAndGenerate.
        
        storageLocations (list) --
        
        A list of objects specifying storage locations for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
        
        (dict) --
        
        Contains information about a storage location for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
        
        type (string) --
        
        Specifies the storage service used for this location.
        
        s3Location (dict) --
        
        Contains information about the Amazon S3 location for the extracted multimedia content.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
    - managedKnowledgeBaseConfiguration (dict) --
      
      Configurations for a managed knowledge base.
      - embeddingModelType (string) --
        
        Choose CUSTOM to provide your own Bedrock embedding model ARN. Choose MANAGED to use a service-managed embedding model. For more information, see Embedding model options.
      - embeddingModelArn (string) --
        
        The ARN for the embeddings model.
      - embeddingModelConfiguration (dict) --
        
        The configuration details for the embeddings model.
        
        bedrockEmbeddingModelConfiguration (dict) --
        
        The vector configuration details on the Bedrock embeddings model.
        
        dimensions (integer) --
        
        The dimensions details for the vector configuration used on the Bedrock embeddings model.
        
        embeddingDataType (string) --
        
        The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
        
        audio (list) --
        
        Configuration settings for processing audio content in multimodal knowledge bases.
        
        (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
        
        video (list) --
        
        Configuration settings for processing video content in multimodal knowledge bases.
        
        (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
      - serverSideEncryptionConfiguration (dict) --
        
        Contains the configuration for server-side encryption for your managed knowledge base.
        
        kmsKeyArn (string) --
        
        The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
    - kendraKnowledgeBaseConfiguration (dict) --
      
      Settings for an Amazon Kendra knowledge base.
      - kendraIndexArn (string) --
        
        The ARN of the Amazon Kendra index.
    - sqlKnowledgeBaseConfiguration (dict) --
      
      Specifies configurations for a knowledge base connected to an SQL database.
      - type (string) --
        
        The type of SQL database to connect to the knowledge base.
      - redshiftConfiguration (dict) --
        
        Specifies configurations for a knowledge base connected to an Amazon Redshift database.
        
        storageConfigurations (list) --
        
        Specifies configurations for Amazon Redshift database storage.
        
        (dict) --
        
        Contains configurations for Amazon Redshift data storage. Specify the data storage service to use in the type field and include the corresponding field. For more information, see Build a knowledge base by connecting to a structured data source in the Amazon Bedrock User Guide.
        
        type (string) --
        
        The data storage service to use.
        
        awsDataCatalogConfiguration (dict) --
        
        Specifies configurations for storage in Glue Data Catalog.
        
        tableNames (list) --
        
        A list of names of the tables to use.
        
        (string) --
        
        redshiftConfiguration (dict) --
        
        Specifies configurations for storage in Amazon Redshift.
        
        databaseName (string) --
        
        The name of the Amazon Redshift database.
        
        queryEngineConfiguration (dict) --
        
        Specifies configurations for an Amazon Redshift query engine.
        
        type (string) --
        
        The type of query engine.
        
        serverlessConfiguration (dict) --
        
        Specifies configurations for a serverless Amazon Redshift query engine.
        
        workgroupArn (string) --
        
        The ARN of the Amazon Redshift workgroup.
        
        authConfiguration (dict) --
        
        Specifies configurations for authentication to an Amazon Redshift provisioned data warehouse.
        
        type (string) --
        
        The type of authentication to use.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
        
        provisionedConfiguration (dict) --
        
        Specifies configurations for a provisioned Amazon Redshift query engine.
        
        clusterIdentifier (string) --
        
        The ID of the Amazon Redshift cluster.
        
        authConfiguration (dict) --
        
        Specifies configurations for authentication to Amazon Redshift.
        
        type (string) --
        
        The type of authentication to use.
        
        databaseUser (string) --
        
        The database username for authentication to an Amazon Redshift provisioned data warehouse.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
        
        queryGenerationConfiguration (dict) --
        
        Specifies configurations for generating queries.
        
        executionTimeoutSeconds (integer) --
        
        The time after which query generation will time out.
        
        generationContext (dict) --
        
        Specifies configurations for context to use during query generation.
        
        tables (list) --
        
        An array of objects, each of which defines information about a table in the database.
        
        (dict) --
        
        Contains information about a table for the query engine to consider.
        
        name (string) --
        
        The name of the table for which the other fields in this object apply.
        
        description (string) --
        
        A description of the table that helps the query engine understand the contents of the table.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the table during query generation. If you specify EXCLUDE, the table will be ignored. If you specify INCLUDE, all other tables will be ignored.
        
        columns (list) --
        
        An array of objects, each of which defines information about a column in the table.
        
        (dict) --
        
        Contains information about a column in the current table for the query engine to consider.
        
        name (string) --
        
        The name of the column for which the other fields in this object apply.
        
        description (string) --
        
        A description of the column that helps the query engine understand the contents of the column.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the column during query generation. If you specify EXCLUDE, the column will be ignored. If you specify INCLUDE, all other columns in the table will be ignored.
        
        curatedQueries (list) --
        
        An array of objects, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        (dict) --
        
        Contains configurations for a query, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        naturalLanguage (string) --
        
        An example natural language query.
        
        sql (string) --
        
        The SQL equivalent of the natural language query.
  - storageConfiguration (dict) --
    
    Contains details about the storage configuration of the knowledge base.
    - type (string) --
      
      The vector store service in which the knowledge base is stored.
    - opensearchServerlessConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Amazon OpenSearch Service.
      - collectionArn (string) --
        
        The Amazon Resource Name (ARN) of the OpenSearch Service vector store.
      - vectorIndexName (string) --
        
        The name of the vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - opensearchManagedClusterConfiguration (dict) --
      
      Contains details about the storage configuration of the knowledge base in OpenSearch Managed Cluster. For more information, see Create a vector index in Amazon OpenSearch Service.
      - domainEndpoint (string) --
        
        The endpoint URL the OpenSearch domain.
      - domainArn (string) --
        
        The Amazon Resource Name (ARN) of the OpenSearch domain.
      - vectorIndexName (string) --
        
        The name of the vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - pineconeConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Pinecone.
      - connectionString (string) --
        
        The endpoint URL for your index management page.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Pinecone API key.
      - namespace (string) --
        
        The namespace to be used to write new data to your database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - redisEnterpriseCloudConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Redis Enterprise Cloud.
      - endpoint (string) --
        
        The endpoint URL of the Redis Enterprise Cloud database.
      - vectorIndexName (string) --
        
        The name of the vector index.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Redis Enterprise Cloud database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - rdsConfiguration (dict) --
      
      Contains details about the storage configuration of the knowledge base in Amazon RDS. For more information, see Create a vector index in Amazon RDS.
      - resourceArn (string) --
        
        The Amazon Resource Name (ARN) of the vector store.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Amazon RDS database.
      - databaseName (string) --
        
        The name of your Amazon RDS database.
      - tableName (string) --
        
        The name of the table in the database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        primaryKeyField (string) --
        
        The name of the field in which Amazon Bedrock stores the ID for each entry.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
        
        customMetadataField (string) --
        
        Provide a name for the universal metadata field where Amazon Bedrock will store any custom metadata from your data source.
    - mongoDbAtlasConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in MongoDB Atlas.
      - endpoint (string) --
        
        The endpoint URL of your MongoDB Atlas cluster for your knowledge base.
      - databaseName (string) --
        
        The database name in your MongoDB Atlas cluster for your knowledge base.
      - collectionName (string) --
        
        The collection name of the knowledge base in MongoDB Atlas.
      - vectorIndexName (string) --
        
        The name of the MongoDB Atlas vector search index.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that contains user credentials for your MongoDB Atlas cluster.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
      - endpointServiceName (string) --
        
        The name of the VPC endpoint service in your account that is connected to your MongoDB Atlas cluster.
      - textIndexName (string) --
        
        The name of the text search index in the MongoDB collection. This is required for using the hybrid search feature.
    - neptuneAnalyticsConfiguration (dict) --
      
      Contains details about the Neptune Analytics configuration of the knowledge base in Amazon Neptune. For more information, see Create a vector index in Amazon Neptune Analytics..
      - graphArn (string) --
        
        The Amazon Resource Name (ARN) of the Neptune Analytics vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - s3VectorsConfiguration (dict) --
      
      The configuration settings for storing knowledge base data using S3 vectors. This includes vector index information and S3 bucket details for vector storage.
      - vectorBucketArn (string) --
        
        The Amazon Resource Name (ARN) of the S3 bucket where vector embeddings are stored. This bucket contains the vector data used by the knowledge base.
      - indexArn (string) --
        
        The Amazon Resource Name (ARN) of the vector index used for the knowledge base. This ARN identifies the specific vector index resource within Amazon Bedrock.
      - indexName (string) --
        
        The name of the vector index used for the knowledge base. This name identifies the vector index within the Amazon Bedrock service.
  - status (string) --
    
    The status of the knowledge base. The following statuses are possible:
    - CREATING – The knowledge base is being created.
    - ACTIVE – The knowledge base is ready to be queried.
    - DELETING – The knowledge base is being deleted.
    - UPDATING – The knowledge base is being updated.
    - FAILED – The knowledge base API operation failed.
  - createdAt (datetime) --
    
    The time the knowledge base was created.
  - updatedAt (datetime) --
    
    The time the knowledge base was last updated.
  - failureReasons (list) --
    
    A list of reasons that the API operation on the knowledge base failed.
    - (string) --

DeleteDataSource (updated)

Link ¶
Changes (response)

{'status': {'CREATING', 'FAILED', 'UPDATING'}}

Deletes a data source from a knowledge base.

See also: AWS API Documentation

Request Syntax

client.delete_data_source(
    knowledgeBaseId='string',
    dataSourceId='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base from which to delete the data source.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source to delete.

rtype:

dict

returns:

Response Syntax

{
    'knowledgeBaseId': 'string',
    'dataSourceId': 'string',
    'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL'|'CREATING'|'UPDATING'|'FAILED'
}

Response Structure

(dict) --
- knowledgeBaseId (string) --
  
  The unique identifier of the knowledge base to which the data source that was deleted belonged.
- dataSourceId (string) --
  
  The unique identifier of the data source that was deleted.
- status (string) --
  
  The status of the data source.

DeleteKnowledgeBase (updated)

Link ¶
Changes (response)

{'status': {'UPDATE_UNSUCCESSFUL'}}

Deletes a knowledge base. Before deleting a knowledge base, you should disassociate the knowledge base from any agents that it is associated with by making a DisassociateAgentKnowledgeBase request.

See also: AWS API Documentation

Request Syntax

client.delete_knowledge_base(
    knowledgeBaseId='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base to delete.

rtype:

dict

returns:

Response Syntax

{
    'knowledgeBaseId': 'string',
    'status': 'CREATING'|'ACTIVE'|'DELETING'|'UPDATING'|'FAILED'|'DELETE_UNSUCCESSFUL'|'UPDATE_UNSUCCESSFUL'
}

Response Structure

(dict) --
- knowledgeBaseId (string) --
  
  The unique identifier of the knowledge base that was deleted.
- status (string) --
  
  The status of the knowledge base and whether it has been successfully deleted.

GetDataSource (updated)

Link ¶
Changes (response)

{'dataSource': {'dataSourceConfiguration': {'managedKnowledgeBaseConnectorConfiguration': {'connectorParameters': {},
                                                                                           'deletionProtectionConfiguration': {'deletionProtectionStatus': 'ENABLED '
                                                                                                                                                           '| '
                                                                                                                                                           'DISABLED',
                                                                                                                               'deletionProtectionThreshold': 'integer'},
                                                                                           'mediaExtractionConfiguration': {'audioExtractionConfiguration': {'audioExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'},
                                                                                                                            'imageExtractionConfiguration': {'imageExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'},
                                                                                                                            'videoExtractionConfiguration': {'videoExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'}}},
                                            'type': {'MANAGED_KNOWLEDGE_BASE_CONNECTOR'}},
                'status': {'CREATING', 'FAILED', 'UPDATING'},
                'vectorIngestionConfiguration': {'parsingConfiguration': {'parsingStrategy': {'SMART_PARSING'}}}}}

Gets information about a data source.

See also: AWS API Documentation

Request Syntax

client.get_data_source(
    knowledgeBaseId='string',
    dataSourceId='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for the data source.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source.

rtype:

dict

returns:

Response Syntax

{
    'dataSource': {
        'knowledgeBaseId': 'string',
        'dataSourceId': 'string',
        'name': 'string',
        'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL'|'CREATING'|'UPDATING'|'FAILED',
        'description': 'string',
        'dataSourceConfiguration': {
            'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA'|'MANAGED_KNOWLEDGE_BASE_CONNECTOR',
            'managedKnowledgeBaseConnectorConfiguration': {
                'deletionProtectionConfiguration': {
                    'deletionProtectionStatus': 'ENABLED'|'DISABLED',
                    'deletionProtectionThreshold': 123
                },
                'mediaExtractionConfiguration': {
                    'imageExtractionConfiguration': {
                        'imageExtractionStatus': 'ENABLED'|'DISABLED'
                    },
                    'audioExtractionConfiguration': {
                        'audioExtractionStatus': 'ENABLED'|'DISABLED'
                    },
                    'videoExtractionConfiguration': {
                        'videoExtractionStatus': 'ENABLED'|'DISABLED'
                    }
                },
                'connectorParameters': {...}|[...]|123|123.4|'string'|True|None
            },
            's3Configuration': {
                'bucketArn': 'string',
                'inclusionPrefixes': [
                    'string',
                ],
                'bucketOwnerAccountId': 'string'
            },
            'webConfiguration': {
                'sourceConfiguration': {
                    'urlConfiguration': {
                        'seedUrls': [
                            {
                                'url': 'string'
                            },
                        ]
                    }
                },
                'crawlerConfiguration': {
                    'crawlerLimits': {
                        'rateLimit': 123,
                        'maxPages': 123
                    },
                    'inclusionFilters': [
                        'string',
                    ],
                    'exclusionFilters': [
                        'string',
                    ],
                    'scope': 'HOST_ONLY'|'SUBDOMAINS',
                    'userAgent': 'string',
                    'userAgentHeader': 'string'
                }
            },
            'confluenceConfiguration': {
                'sourceConfiguration': {
                    'hostUrl': 'string',
                    'hostType': 'SAAS',
                    'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            },
            'salesforceConfiguration': {
                'sourceConfiguration': {
                    'hostUrl': 'string',
                    'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            },
            'sharePointConfiguration': {
                'sourceConfiguration': {
                    'tenantId': 'string',
                    'domain': 'string',
                    'siteUrls': [
                        'string',
                    ],
                    'hostType': 'ONLINE',
                    'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            }
        },
        'serverSideEncryptionConfiguration': {
            'kmsKeyArn': 'string'
        },
        'vectorIngestionConfiguration': {
            'chunkingConfiguration': {
                'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC',
                'fixedSizeChunkingConfiguration': {
                    'maxTokens': 123,
                    'overlapPercentage': 123
                },
                'hierarchicalChunkingConfiguration': {
                    'levelConfigurations': [
                        {
                            'maxTokens': 123
                        },
                    ],
                    'overlapTokens': 123
                },
                'semanticChunkingConfiguration': {
                    'maxTokens': 123,
                    'bufferSize': 123,
                    'breakpointPercentileThreshold': 123
                }
            },
            'customTransformationConfiguration': {
                'intermediateStorage': {
                    's3Location': {
                        'uri': 'string'
                    }
                },
                'transformations': [
                    {
                        'transformationFunction': {
                            'transformationLambdaConfiguration': {
                                'lambdaArn': 'string'
                            }
                        },
                        'stepToApply': 'POST_CHUNKING'
                    },
                ]
            },
            'parsingConfiguration': {
                'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION'|'SMART_PARSING',
                'bedrockFoundationModelConfiguration': {
                    'modelArn': 'string',
                    'parsingPrompt': {
                        'parsingPromptText': 'string'
                    },
                    'parsingModality': 'MULTIMODAL'
                },
                'bedrockDataAutomationConfiguration': {
                    'parsingModality': 'MULTIMODAL'
                }
            },
            'contextEnrichmentConfiguration': {
                'type': 'BEDROCK_FOUNDATION_MODEL',
                'bedrockFoundationModelConfiguration': {
                    'enrichmentStrategyConfiguration': {
                        'method': 'CHUNK_ENTITY_EXTRACTION'
                    },
                    'modelArn': 'string'
                }
            }
        },
        'dataDeletionPolicy': 'RETAIN'|'DELETE',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'failureReasons': [
            'string',
        ]
    }
}

Response Structure

(dict) --
- dataSource (dict) --
  
  Contains details about the data source.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge base to which the data source belongs.
  - dataSourceId (string) --
    
    The unique identifier of the data source.
  - name (string) --
    
    The name of the data source.
  - status (string) --
    
    The status of the data source. The following statuses are possible:
    - Available – The data source has been created and is ready for ingestion into the knowledge base.
    - Deleting – The data source is being deleted.
  - description (string) --
    
    The description of the data source.
  - dataSourceConfiguration (dict) --
    
    The connection configuration for the data source.
    - type (string) --
      
      The type of data source.
    - managedKnowledgeBaseConnectorConfiguration (dict) --
      
      Contains the configuration for a data source that connects a managed knowledge base to a supported data source connector. Specify this object when the data source type is MANAGED_KNOWLEDGE_BASE_CONNECTOR.
      - deletionProtectionConfiguration (dict) --
        
        A safeguard against accidental bulk deletion of indexed content.
        
        deletionProtectionStatus (string) --
        
        Enable or disable deletion protection for the connector.
        
        deletionProtectionThreshold (integer) --
        
        The threshold is the maximum percentage of documents that a sync job can delete from your index. If a sync would delete more than this percentage, the sync skips its delete phase, leaving your indexed documents in place. Not supported for the Custom connector.
      - mediaExtractionConfiguration (dict) --
        
        Configuration for extracting media (images, audio, video) from data source files.
        
        imageExtractionConfiguration (dict) --
        
        Configuration for image extraction.
        
        imageExtractionStatus (string) --
        
        Whether image extraction is enabled or disabled.
        
        audioExtractionConfiguration (dict) --
        
        Configuration for audio extraction.
        
        audioExtractionStatus (string) --
        
        Whether audio extraction is enabled or disabled.
        
        videoExtractionConfiguration (dict) --
        
        Configuration for video extraction.
        
        videoExtractionStatus (string) --
        
        Whether video extraction is enabled or disabled.
      - connectorParameters (:ref:`document<document>`) --
        
        Connector-specific parameters. For more information, see Connect a data source.
    - s3Configuration (dict) --
      
      The configuration information to connect to Amazon S3 as your data source for self-managed knowledge bases. To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration.
      - bucketArn (string) --
        
        The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
      - inclusionPrefixes (list) --
        
        A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
        
        (string) --
      - bucketOwnerAccountId (string) --
        
        The account ID for the owner of the S3 bucket.
    - webConfiguration (dict) --
      
      The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Web crawler data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The source configuration details for the web data source.
        
        urlConfiguration (dict) --
        
        The configuration of the URL/URLs.
        
        seedUrls (list) --
        
        One or more seed or starting point URLs.
        
        (dict) --
        
        The seed or starting point URL. You should be authorized to crawl the URL.
        
        url (string) --
        
        A seed or starting point URL.
      - crawlerConfiguration (dict) --
        
        The Web Crawler configuration details for the web data source.
        
        crawlerLimits (dict) --
        
        The configuration of crawl limits for the web URLs.
        
        rateLimit (integer) --
        
        The max rate at which pages are crawled, up to 300 per minute per host.
        
        maxPages (integer) --
        
        The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
        
        (string) --
        
        scope (string) --
        
        The scope of what is crawled for your URLs.
        
        You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
        
        userAgent (string) --
        
        Returns the user agent suffix for your web crawler.
        
        userAgentHeader (string) --
        
        A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID. You can optionally append a custom suffix to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
    - confluenceConfiguration (dict) --
      
      The configuration information to connect to Confluence as your data source for self-managed knowledge bases.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Confluence data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your Confluence data source.
        
        hostUrl (string) --
        
        The Confluence host URL or instance URL.
        
        hostType (string) --
        
        The supported host type, whether online/cloud or server/on-premises.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your Confluence instance.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the Confluence content. For example, configuring specific types of Confluence content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
    - salesforceConfiguration (dict) --
      
      The configuration information to connect to Salesforce as your data source.
      
      Note
      
      Salesforce data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your Salesforce data source.
        
        hostUrl (string) --
        
        The Salesforce host URL or instance URL.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your Salesforce instance.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
    - sharePointConfiguration (dict) --
      
      The configuration information to connect to SharePoint as your data source for self-managed knowledge bases.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. SharePoint data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your SharePoint data source.
        
        tenantId (string) --
        
        The identifier of your Microsoft 365 tenant.
        
        domain (string) --
        
        The domain of your SharePoint instance or site URL/URLs.
        
        siteUrls (list) --
        
        A list of one or more SharePoint site URLs.
        
        (string) --
        
        hostType (string) --
        
        The supported host type, whether online/cloud or server/on-premises.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your SharePoint site/sites.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
  - serverSideEncryptionConfiguration (dict) --
    
    Contains details about the configuration of the server-side encryption.
    - kmsKeyArn (string) --
      
      The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
  - vectorIngestionConfiguration (dict) --
    
    Contains details about how to ingest the documents in the data source.
    - chunkingConfiguration (dict) --
      
      Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
      - chunkingStrategy (string) --
        
        Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
        
        FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
        
        HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
        
        SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
        
        NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
      - fixedSizeChunkingConfiguration (dict) --
        
        Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
        
        maxTokens (integer) --
        
        The maximum number of tokens to include in a chunk.
        
        overlapPercentage (integer) --
        
        The percentage of overlap between adjacent chunks of a data source.
      - hierarchicalChunkingConfiguration (dict) --
        
        Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
        
        levelConfigurations (list) --
        
        Token settings for each layer.
        
        (dict) --
        
        Token settings for a layer in a hierarchical chunking configuration.
        
        maxTokens (integer) --
        
        The maximum number of tokens that a chunk can contain in this layer.
        
        overlapTokens (integer) --
        
        The number of tokens to repeat across chunks in the same layer.
      - semanticChunkingConfiguration (dict) --
        
        Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
        
        maxTokens (integer) --
        
        The maximum number of tokens that a chunk can contain.
        
        bufferSize (integer) --
        
        The buffer size.
        
        breakpointPercentileThreshold (integer) --
        
        The dissimilarity threshold for splitting chunks.
    - customTransformationConfiguration (dict) --
      
      A custom document transformer for parsed data source documents.
      - intermediateStorage (dict) --
        
        An S3 bucket path for input and output objects.
        
        s3Location (dict) --
        
        An S3 bucket path.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
      - transformations (list) --
        
        A Lambda function that processes documents.
        
        (dict) --
        
        A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
        
        transformationFunction (dict) --
        
        A Lambda function that processes documents.
        
        transformationLambdaConfiguration (dict) --
        
        The Lambda function.
        
        lambdaArn (string) --
        
        The function's ARN identifier.
        
        stepToApply (string) --
        
        When the service applies the transformation.
    - parsingConfiguration (dict) --
      
      Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
      - parsingStrategy (string) --
        
        The parsing strategy for the data source. Only SMART_PARSING can be selected for managed knowledge bases. For more information, see Customize ingestion for managed knowledge bases.
      - bedrockFoundationModelConfiguration (dict) --
        
        If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
        
        modelArn (string) --
        
        The ARN of the foundation model to use for parsing.
        
        parsingPrompt (dict) --
        
        Instructions for interpreting the contents of a document.
        
        parsingPromptText (string) --
        
        Instructions for interpreting the contents of a document.
        
        parsingModality (string) --
        
        Specifies whether to enable parsing of multimodal data, including both text and/or images.
      - bedrockDataAutomationConfiguration (dict) --
        
        If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
        
        parsingModality (string) --
        
        Specifies whether to enable parsing of multimodal data, including both text and/or images.
    - contextEnrichmentConfiguration (dict) --
      
      The context enrichment configuration used for ingestion of the data into the vector store.
      - type (string) --
        
        The method used for context enrichment. It must be Amazon Bedrock foundation models.
      - bedrockFoundationModelConfiguration (dict) --
        
        The configuration of the Amazon Bedrock foundation model used for context enrichment.
        
        enrichmentStrategyConfiguration (dict) --
        
        The enrichment stategy used to provide additional context. For example, Neptune GraphRAG uses Amazon Bedrock foundation models to perform chunk entity extraction.
        
        method (string) --
        
        The method used for the context enrichment strategy.
        
        modelArn (string) --
        
        The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
  - dataDeletionPolicy (string) --
    
    The data deletion policy for the data source.
  - createdAt (datetime) --
    
    The time at which the data source was created.
  - updatedAt (datetime) --
    
    The time at which the data source was last updated.
  - failureReasons (list) --
    
    The detailed reasons on the failure to delete a data source.
    - (string) --

GetIngestionJob (updated)

Link ¶
Changes (response)

{'ingestionJob': {'statistics': {'numberOfDocumentsSkipped': 'long'}}}

Gets information about a data ingestion job. Data sources are ingested into your knowledge base so that Large Language Models (LLMs) can use your data.

See also: AWS API Documentation

Request Syntax

client.get_ingestion_job(
    knowledgeBaseId='string',
    dataSourceId='string',
    ingestionJobId='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for the data ingestion job you want to get information on.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source for the data ingestion job you want to get information on.

type ingestionJobId:

string

param ingestionJobId:

[REQUIRED]

The unique identifier of the data ingestion job you want to get information on.

rtype:

dict

returns:

Response Syntax

{
    'ingestionJob': {
        'knowledgeBaseId': 'string',
        'dataSourceId': 'string',
        'ingestionJobId': 'string',
        'description': 'string',
        'status': 'STARTING'|'IN_PROGRESS'|'COMPLETE'|'FAILED'|'STOPPING'|'STOPPED',
        'statistics': {
            'numberOfDocumentsScanned': 123,
            'numberOfMetadataDocumentsScanned': 123,
            'numberOfNewDocumentsIndexed': 123,
            'numberOfModifiedDocumentsIndexed': 123,
            'numberOfMetadataDocumentsModified': 123,
            'numberOfDocumentsDeleted': 123,
            'numberOfDocumentsFailed': 123,
            'numberOfDocumentsSkipped': 123
        },
        'failureReasons': [
            'string',
        ],
        'startedAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1)
    }
}

Response Structure

(dict) --
- ingestionJob (dict) --
  
  Contains details about the data ingestion job.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge for the data ingestion job.
  - dataSourceId (string) --
    
    The unique identifier of the data source for the data ingestion job.
  - ingestionJobId (string) --
    
    The unique identifier of the data ingestion job.
  - description (string) --
    
    The description of the data ingestion job.
  - status (string) --
    
    The status of the data ingestion job.
  - statistics (dict) --
    
    Contains statistics about the data ingestion job.
    - numberOfDocumentsScanned (integer) --
      
      The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
    - numberOfMetadataDocumentsScanned (integer) --
      
      The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
    - numberOfNewDocumentsIndexed (integer) --
      
      The number of new source documents in the data source that were successfully indexed.
    - numberOfModifiedDocumentsIndexed (integer) --
      
      The number of modified source documents in the data source that were successfully indexed.
    - numberOfMetadataDocumentsModified (integer) --
      
      The number of metadata files that were updated or deleted.
    - numberOfDocumentsDeleted (integer) --
      
      The number of source documents that were deleted.
    - numberOfDocumentsFailed (integer) --
      
      The number of source documents that failed to be ingested.
    - numberOfDocumentsSkipped (integer) --
      
      The number of source documents that were skipped during ingestion.
  - failureReasons (list) --
    
    A list of reasons that the data ingestion job failed.
    - (string) --
  - startedAt (datetime) --
    
    The time the data ingestion job started.
    
    If you stop a data ingestion job, the startedAt time is the time the job was started before the job was stopped.
  - updatedAt (datetime) --
    
    The time the data ingestion job was last updated.
    
    If you stop a data ingestion job, the updatedAt time is the time the job was stopped.

GetKnowledgeBase (updated)

Link ¶
Changes (response)

{'knowledgeBase': {'knowledgeBaseConfiguration': {'managedKnowledgeBaseConfiguration': {'embeddingModelArn': 'string',
                                                                                        'embeddingModelConfiguration': {'bedrockEmbeddingModelConfiguration': {'audio': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}],
                                                                                                                                                               'dimensions': 'integer',
                                                                                                                                                               'embeddingDataType': 'FLOAT32 '
                                                                                                                                                                                    '| '
                                                                                                                                                                                    'BINARY',
                                                                                                                                                               'video': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}]}},
                                                                                        'embeddingModelType': 'CUSTOM '
                                                                                                              '| '
                                                                                                              'MANAGED',
                                                                                        'serverSideEncryptionConfiguration': {'kmsKeyArn': 'string'}},
                                                  'type': {'MANAGED'}},
                   'status': {'UPDATE_UNSUCCESSFUL'}}}

Gets information about a knowledge base.

See also: AWS API Documentation

Request Syntax

client.get_knowledge_base(
    knowledgeBaseId='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base you want to get information on.

rtype:

dict

returns:

Response Syntax

{
    'knowledgeBase': {
        'knowledgeBaseId': 'string',
        'name': 'string',
        'knowledgeBaseArn': 'string',
        'description': 'string',
        'roleArn': 'string',
        'knowledgeBaseConfiguration': {
            'type': 'VECTOR'|'KENDRA'|'SQL'|'MANAGED',
            'vectorKnowledgeBaseConfiguration': {
                'embeddingModelArn': 'string',
                'embeddingModelConfiguration': {
                    'bedrockEmbeddingModelConfiguration': {
                        'dimensions': 123,
                        'embeddingDataType': 'FLOAT32'|'BINARY',
                        'audio': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ],
                        'video': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ]
                    }
                },
                'supplementalDataStorageConfiguration': {
                    'storageLocations': [
                        {
                            'type': 'S3',
                            's3Location': {
                                'uri': 'string'
                            }
                        },
                    ]
                }
            },
            'managedKnowledgeBaseConfiguration': {
                'embeddingModelType': 'CUSTOM'|'MANAGED',
                'embeddingModelArn': 'string',
                'embeddingModelConfiguration': {
                    'bedrockEmbeddingModelConfiguration': {
                        'dimensions': 123,
                        'embeddingDataType': 'FLOAT32'|'BINARY',
                        'audio': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ],
                        'video': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ]
                    }
                },
                'serverSideEncryptionConfiguration': {
                    'kmsKeyArn': 'string'
                }
            },
            'kendraKnowledgeBaseConfiguration': {
                'kendraIndexArn': 'string'
            },
            'sqlKnowledgeBaseConfiguration': {
                'type': 'REDSHIFT',
                'redshiftConfiguration': {
                    'storageConfigurations': [
                        {
                            'type': 'REDSHIFT'|'AWS_DATA_CATALOG',
                            'awsDataCatalogConfiguration': {
                                'tableNames': [
                                    'string',
                                ]
                            },
                            'redshiftConfiguration': {
                                'databaseName': 'string'
                            }
                        },
                    ],
                    'queryEngineConfiguration': {
                        'type': 'SERVERLESS'|'PROVISIONED',
                        'serverlessConfiguration': {
                            'workgroupArn': 'string',
                            'authConfiguration': {
                                'type': 'IAM'|'USERNAME_PASSWORD',
                                'usernamePasswordSecretArn': 'string'
                            }
                        },
                        'provisionedConfiguration': {
                            'clusterIdentifier': 'string',
                            'authConfiguration': {
                                'type': 'IAM'|'USERNAME_PASSWORD'|'USERNAME',
                                'databaseUser': 'string',
                                'usernamePasswordSecretArn': 'string'
                            }
                        }
                    },
                    'queryGenerationConfiguration': {
                        'executionTimeoutSeconds': 123,
                        'generationContext': {
                            'tables': [
                                {
                                    'name': 'string',
                                    'description': 'string',
                                    'inclusion': 'INCLUDE'|'EXCLUDE',
                                    'columns': [
                                        {
                                            'name': 'string',
                                            'description': 'string',
                                            'inclusion': 'INCLUDE'|'EXCLUDE'
                                        },
                                    ]
                                },
                            ],
                            'curatedQueries': [
                                {
                                    'naturalLanguage': 'string',
                                    'sql': 'string'
                                },
                            ]
                        }
                    }
                }
            }
        },
        'storageConfiguration': {
            'type': 'OPENSEARCH_SERVERLESS'|'PINECONE'|'REDIS_ENTERPRISE_CLOUD'|'RDS'|'MONGO_DB_ATLAS'|'NEPTUNE_ANALYTICS'|'OPENSEARCH_MANAGED_CLUSTER'|'S3_VECTORS',
            'opensearchServerlessConfiguration': {
                'collectionArn': 'string',
                'vectorIndexName': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'opensearchManagedClusterConfiguration': {
                'domainEndpoint': 'string',
                'domainArn': 'string',
                'vectorIndexName': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'pineconeConfiguration': {
                'connectionString': 'string',
                'credentialsSecretArn': 'string',
                'namespace': 'string',
                'fieldMapping': {
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'redisEnterpriseCloudConfiguration': {
                'endpoint': 'string',
                'vectorIndexName': 'string',
                'credentialsSecretArn': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'rdsConfiguration': {
                'resourceArn': 'string',
                'credentialsSecretArn': 'string',
                'databaseName': 'string',
                'tableName': 'string',
                'fieldMapping': {
                    'primaryKeyField': 'string',
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string',
                    'customMetadataField': 'string'
                }
            },
            'mongoDbAtlasConfiguration': {
                'endpoint': 'string',
                'databaseName': 'string',
                'collectionName': 'string',
                'vectorIndexName': 'string',
                'credentialsSecretArn': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                },
                'endpointServiceName': 'string',
                'textIndexName': 'string'
            },
            'neptuneAnalyticsConfiguration': {
                'graphArn': 'string',
                'fieldMapping': {
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            's3VectorsConfiguration': {
                'vectorBucketArn': 'string',
                'indexArn': 'string',
                'indexName': 'string'
            }
        },
        'status': 'CREATING'|'ACTIVE'|'DELETING'|'UPDATING'|'FAILED'|'DELETE_UNSUCCESSFUL'|'UPDATE_UNSUCCESSFUL',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'failureReasons': [
            'string',
        ]
    }
}

Response Structure

(dict) --
- knowledgeBase (dict) --
  
  Contains details about the knowledge base.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge base.
  - name (string) --
    
    The name of the knowledge base.
  - knowledgeBaseArn (string) --
    
    The Amazon Resource Name (ARN) of the knowledge base.
  - description (string) --
    
    The description of the knowledge base.
  - roleArn (string) --
    
    The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the knowledge base.
  - knowledgeBaseConfiguration (dict) --
    
    Contains details about the embeddings configuration of the knowledge base.
    - type (string) --
      
      The type of data that the data source is converted into for the knowledge base. Choose MANAGED to create a managed knowledge base.
    - vectorKnowledgeBaseConfiguration (dict) --
      
      Contains details about the model that's used to convert the data source into vector embeddings.
      - embeddingModelArn (string) --
        
        The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
      - embeddingModelConfiguration (dict) --
        
        The embeddings model configuration details for the vector model used in Knowledge Base.
        
        bedrockEmbeddingModelConfiguration (dict) --
        
        The vector configuration details on the Bedrock embeddings model.
        
        dimensions (integer) --
        
        The dimensions details for the vector configuration used on the Bedrock embeddings model.
        
        embeddingDataType (string) --
        
        The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
        
        audio (list) --
        
        Configuration settings for processing audio content in multimodal knowledge bases.
        
        (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
        
        video (list) --
        
        Configuration settings for processing video content in multimodal knowledge bases.
        
        (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
      - supplementalDataStorageConfiguration (dict) --
        
        If you include multimodal data from your data source, use this object to specify configurations for the storage location of the images extracted from your documents. These images can be retrieved and returned to the end user. They can also be used in generation when using RetrieveAndGenerate.
        
        storageLocations (list) --
        
        A list of objects specifying storage locations for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
        
        (dict) --
        
        Contains information about a storage location for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
        
        type (string) --
        
        Specifies the storage service used for this location.
        
        s3Location (dict) --
        
        Contains information about the Amazon S3 location for the extracted multimedia content.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
    - managedKnowledgeBaseConfiguration (dict) --
      
      Configurations for a managed knowledge base.
      - embeddingModelType (string) --
        
        Choose CUSTOM to provide your own Bedrock embedding model ARN. Choose MANAGED to use a service-managed embedding model. For more information, see Embedding model options.
      - embeddingModelArn (string) --
        
        The ARN for the embeddings model.
      - embeddingModelConfiguration (dict) --
        
        The configuration details for the embeddings model.
        
        bedrockEmbeddingModelConfiguration (dict) --
        
        The vector configuration details on the Bedrock embeddings model.
        
        dimensions (integer) --
        
        The dimensions details for the vector configuration used on the Bedrock embeddings model.
        
        embeddingDataType (string) --
        
        The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
        
        audio (list) --
        
        Configuration settings for processing audio content in multimodal knowledge bases.
        
        (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
        
        video (list) --
        
        Configuration settings for processing video content in multimodal knowledge bases.
        
        (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
      - serverSideEncryptionConfiguration (dict) --
        
        Contains the configuration for server-side encryption for your managed knowledge base.
        
        kmsKeyArn (string) --
        
        The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
    - kendraKnowledgeBaseConfiguration (dict) --
      
      Settings for an Amazon Kendra knowledge base.
      - kendraIndexArn (string) --
        
        The ARN of the Amazon Kendra index.
    - sqlKnowledgeBaseConfiguration (dict) --
      
      Specifies configurations for a knowledge base connected to an SQL database.
      - type (string) --
        
        The type of SQL database to connect to the knowledge base.
      - redshiftConfiguration (dict) --
        
        Specifies configurations for a knowledge base connected to an Amazon Redshift database.
        
        storageConfigurations (list) --
        
        Specifies configurations for Amazon Redshift database storage.
        
        (dict) --
        
        Contains configurations for Amazon Redshift data storage. Specify the data storage service to use in the type field and include the corresponding field. For more information, see Build a knowledge base by connecting to a structured data source in the Amazon Bedrock User Guide.
        
        type (string) --
        
        The data storage service to use.
        
        awsDataCatalogConfiguration (dict) --
        
        Specifies configurations for storage in Glue Data Catalog.
        
        tableNames (list) --
        
        A list of names of the tables to use.
        
        (string) --
        
        redshiftConfiguration (dict) --
        
        Specifies configurations for storage in Amazon Redshift.
        
        databaseName (string) --
        
        The name of the Amazon Redshift database.
        
        queryEngineConfiguration (dict) --
        
        Specifies configurations for an Amazon Redshift query engine.
        
        type (string) --
        
        The type of query engine.
        
        serverlessConfiguration (dict) --
        
        Specifies configurations for a serverless Amazon Redshift query engine.
        
        workgroupArn (string) --
        
        The ARN of the Amazon Redshift workgroup.
        
        authConfiguration (dict) --
        
        Specifies configurations for authentication to an Amazon Redshift provisioned data warehouse.
        
        type (string) --
        
        The type of authentication to use.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
        
        provisionedConfiguration (dict) --
        
        Specifies configurations for a provisioned Amazon Redshift query engine.
        
        clusterIdentifier (string) --
        
        The ID of the Amazon Redshift cluster.
        
        authConfiguration (dict) --
        
        Specifies configurations for authentication to Amazon Redshift.
        
        type (string) --
        
        The type of authentication to use.
        
        databaseUser (string) --
        
        The database username for authentication to an Amazon Redshift provisioned data warehouse.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
        
        queryGenerationConfiguration (dict) --
        
        Specifies configurations for generating queries.
        
        executionTimeoutSeconds (integer) --
        
        The time after which query generation will time out.
        
        generationContext (dict) --
        
        Specifies configurations for context to use during query generation.
        
        tables (list) --
        
        An array of objects, each of which defines information about a table in the database.
        
        (dict) --
        
        Contains information about a table for the query engine to consider.
        
        name (string) --
        
        The name of the table for which the other fields in this object apply.
        
        description (string) --
        
        A description of the table that helps the query engine understand the contents of the table.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the table during query generation. If you specify EXCLUDE, the table will be ignored. If you specify INCLUDE, all other tables will be ignored.
        
        columns (list) --
        
        An array of objects, each of which defines information about a column in the table.
        
        (dict) --
        
        Contains information about a column in the current table for the query engine to consider.
        
        name (string) --
        
        The name of the column for which the other fields in this object apply.
        
        description (string) --
        
        A description of the column that helps the query engine understand the contents of the column.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the column during query generation. If you specify EXCLUDE, the column will be ignored. If you specify INCLUDE, all other columns in the table will be ignored.
        
        curatedQueries (list) --
        
        An array of objects, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        (dict) --
        
        Contains configurations for a query, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        naturalLanguage (string) --
        
        An example natural language query.
        
        sql (string) --
        
        The SQL equivalent of the natural language query.
  - storageConfiguration (dict) --
    
    Contains details about the storage configuration of the knowledge base.
    - type (string) --
      
      The vector store service in which the knowledge base is stored.
    - opensearchServerlessConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Amazon OpenSearch Service.
      - collectionArn (string) --
        
        The Amazon Resource Name (ARN) of the OpenSearch Service vector store.
      - vectorIndexName (string) --
        
        The name of the vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - opensearchManagedClusterConfiguration (dict) --
      
      Contains details about the storage configuration of the knowledge base in OpenSearch Managed Cluster. For more information, see Create a vector index in Amazon OpenSearch Service.
      - domainEndpoint (string) --
        
        The endpoint URL the OpenSearch domain.
      - domainArn (string) --
        
        The Amazon Resource Name (ARN) of the OpenSearch domain.
      - vectorIndexName (string) --
        
        The name of the vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - pineconeConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Pinecone.
      - connectionString (string) --
        
        The endpoint URL for your index management page.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Pinecone API key.
      - namespace (string) --
        
        The namespace to be used to write new data to your database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - redisEnterpriseCloudConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Redis Enterprise Cloud.
      - endpoint (string) --
        
        The endpoint URL of the Redis Enterprise Cloud database.
      - vectorIndexName (string) --
        
        The name of the vector index.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Redis Enterprise Cloud database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - rdsConfiguration (dict) --
      
      Contains details about the storage configuration of the knowledge base in Amazon RDS. For more information, see Create a vector index in Amazon RDS.
      - resourceArn (string) --
        
        The Amazon Resource Name (ARN) of the vector store.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Amazon RDS database.
      - databaseName (string) --
        
        The name of your Amazon RDS database.
      - tableName (string) --
        
        The name of the table in the database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        primaryKeyField (string) --
        
        The name of the field in which Amazon Bedrock stores the ID for each entry.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
        
        customMetadataField (string) --
        
        Provide a name for the universal metadata field where Amazon Bedrock will store any custom metadata from your data source.
    - mongoDbAtlasConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in MongoDB Atlas.
      - endpoint (string) --
        
        The endpoint URL of your MongoDB Atlas cluster for your knowledge base.
      - databaseName (string) --
        
        The database name in your MongoDB Atlas cluster for your knowledge base.
      - collectionName (string) --
        
        The collection name of the knowledge base in MongoDB Atlas.
      - vectorIndexName (string) --
        
        The name of the MongoDB Atlas vector search index.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that contains user credentials for your MongoDB Atlas cluster.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
      - endpointServiceName (string) --
        
        The name of the VPC endpoint service in your account that is connected to your MongoDB Atlas cluster.
      - textIndexName (string) --
        
        The name of the text search index in the MongoDB collection. This is required for using the hybrid search feature.
    - neptuneAnalyticsConfiguration (dict) --
      
      Contains details about the Neptune Analytics configuration of the knowledge base in Amazon Neptune. For more information, see Create a vector index in Amazon Neptune Analytics..
      - graphArn (string) --
        
        The Amazon Resource Name (ARN) of the Neptune Analytics vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - s3VectorsConfiguration (dict) --
      
      The configuration settings for storing knowledge base data using S3 vectors. This includes vector index information and S3 bucket details for vector storage.
      - vectorBucketArn (string) --
        
        The Amazon Resource Name (ARN) of the S3 bucket where vector embeddings are stored. This bucket contains the vector data used by the knowledge base.
      - indexArn (string) --
        
        The Amazon Resource Name (ARN) of the vector index used for the knowledge base. This ARN identifies the specific vector index resource within Amazon Bedrock.
      - indexName (string) --
        
        The name of the vector index used for the knowledge base. This name identifies the vector index within the Amazon Bedrock service.
  - status (string) --
    
    The status of the knowledge base. The following statuses are possible:
    - CREATING – The knowledge base is being created.
    - ACTIVE – The knowledge base is ready to be queried.
    - DELETING – The knowledge base is being deleted.
    - UPDATING – The knowledge base is being updated.
    - FAILED – The knowledge base API operation failed.
  - createdAt (datetime) --
    
    The time the knowledge base was created.
  - updatedAt (datetime) --
    
    The time the knowledge base was last updated.
  - failureReasons (list) --
    
    A list of reasons that the API operation on the knowledge base failed.
    - (string) --

IngestKnowledgeBaseDocuments (updated)

Link ¶
Changes (request)

{'documents': {'metadata': {'accessControlList': [{'access': 'ALLOW | DENY',
                                                   'name': 'string',
                                                   'type': 'USER'}]}}}

Ingests documents directly into the knowledge base that is connected to the data source. The dataSourceType specified in the content for each document must match the type of the data source that you specify in the header. For more information, see Ingest changes directly into a knowledge base in the Amazon Bedrock User Guide.

See also: AWS API Documentation

Request Syntax

client.ingest_knowledge_base_documents(
    knowledgeBaseId='string',
    dataSourceId='string',
    clientToken='string',
    documents=[
        {
            'metadata': {
                'type': 'IN_LINE_ATTRIBUTE'|'S3_LOCATION',
                'inlineAttributes': [
                    {
                        'key': 'string',
                        'value': {
                            'type': 'BOOLEAN'|'NUMBER'|'STRING'|'STRING_LIST',
                            'numberValue': 123.0,
                            'booleanValue': True|False,
                            'stringValue': 'string',
                            'stringListValue': [
                                'string',
                            ]
                        }
                    },
                ],
                's3Location': {
                    'uri': 'string',
                    'bucketOwnerAccountId': 'string'
                },
                'accessControlList': [
                    {
                        'name': 'string',
                        'type': 'USER',
                        'access': 'ALLOW'|'DENY'
                    },
                ]
            },
            'content': {
                'dataSourceType': 'CUSTOM'|'S3',
                'custom': {
                    'customDocumentIdentifier': {
                        'id': 'string'
                    },
                    'sourceType': 'IN_LINE'|'S3_LOCATION',
                    's3Location': {
                        'uri': 'string',
                        'bucketOwnerAccountId': 'string'
                    },
                    'inlineContent': {
                        'type': 'BYTE'|'TEXT',
                        'byteContent': {
                            'mimeType': 'string',
                            'data': b'bytes'
                        },
                        'textContent': {
                            'data': 'string'
                        }
                    }
                },
                's3': {
                    's3Location': {
                        'uri': 'string'
                    }
                }
            }
        },
    ]
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base to ingest the documents into.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source connected to the knowledge base that you're adding documents to.

type clientToken:

string

param clientToken:

This field is autopopulated if not provided.

type documents:

list

param documents:

[REQUIRED]

A list of objects, each of which contains information about the documents to add.

(dict) --

Contains information about a document to ingest into a knowledge base and metadata to associate with it.
- metadata (dict) --
  
  Contains the metadata to associate with the document.
  - type (string) -- [REQUIRED]
    
    The type of the source source from which to add metadata.
  - inlineAttributes (list) --
    
    An array of objects, each of which defines a metadata attribute to associate with the content to ingest. You define the attributes inline.
    - (dict) --
      
      Contains information about a metadata attribute.
      - key (string) -- [REQUIRED]
        
        The key of the metadata attribute.
      - value (dict) -- [REQUIRED]
        
        Contains the value of the metadata attribute.
        
        type (string) -- [REQUIRED]
        
        The type of the metadata attribute.
        
        numberValue (float) --
        
        The value of the numeric metadata attribute.
        
        booleanValue (boolean) --
        
        The value of the Boolean metadata attribute.
        
        stringValue (string) --
        
        The value of the string metadata attribute.
        
        stringListValue (list) --
        
        An array of strings that define the value of the metadata attribute.
        
        (string) --
  - s3Location (dict) --
    
    The Amazon S3 location of the file containing metadata to associate with the content to ingest.
    - uri (string) -- [REQUIRED]
      
      The S3 URI of the file containing the content to ingest.
    - bucketOwnerAccountId (string) --
      
      The identifier of the Amazon Web Services account that owns the S3 bucket containing the content to ingest.
  - accessControlList (list) --
    
    Access control list for the document. Used when metadata type is IN_LINE_ATTRIBUTE.
    - (dict) --
      
      An access control entry specifying a principal and their access level.
      - name (string) -- [REQUIRED]
        
        The user identifier.
      - type (string) -- [REQUIRED]
        
        The type of principal.
      - access (string) -- [REQUIRED]
        
        Whether to allow or deny access.
- content (dict) -- [REQUIRED]
  
  Contains the content of the document.
  - dataSourceType (string) -- [REQUIRED]
    
    The type of data source that is connected to the knowledge base to which to ingest this document.
  - custom (dict) --
    
    Contains information about the content to ingest into a knowledge base connected to a custom data source.
    - customDocumentIdentifier (dict) -- [REQUIRED]
      
      A unique identifier for the document.
      - id (string) -- [REQUIRED]
        
        The identifier of the document to ingest into a custom data source.
    - sourceType (string) -- [REQUIRED]
      
      The source of the data to ingest.
    - s3Location (dict) --
      
      Contains information about the Amazon S3 location of the file from which to ingest data.
      - uri (string) -- [REQUIRED]
        
        The S3 URI of the file containing the content to ingest.
      - bucketOwnerAccountId (string) --
        
        The identifier of the Amazon Web Services account that owns the S3 bucket containing the content to ingest.
    - inlineContent (dict) --
      
      Contains information about content defined inline to ingest into a knowledge base.
      - type (string) -- [REQUIRED]
        
        The type of inline content to define.
      - byteContent (dict) --
        
        Contains information about content defined inline in bytes.
        
        mimeType (string) -- [REQUIRED]
        
        The MIME type of the content. For a list of MIME types, see Media Types. The following MIME types are supported:
        
        text/plain
        
        text/html
        
        text/csv
        
        text/vtt
        
        message/rfc822
        
        application/xhtml+xml
        
        application/pdf
        
        application/msword
        
        application/vnd.ms-word.document.macroenabled.12
        
        application/vnd.ms-word.template.macroenabled.12
        
        application/vnd.ms-excel
        
        application/vnd.ms-excel.addin.macroenabled.12
        
        application/vnd.ms-excel.sheet.macroenabled.12
        
        application/vnd.ms-excel.template.macroenabled.12
        
        application/vnd.ms-excel.sheet.binary.macroenabled.12
        
        application/vnd.ms-spreadsheetml
        
        application/vnd.openxmlformats-officedocument.spreadsheetml.sheet
        
        application/vnd.openxmlformats-officedocument.spreadsheetml.template
        
        application/vnd.openxmlformats-officedocument.wordprocessingml.document
        
        application/vnd.openxmlformats-officedocument.wordprocessingml.template
        
        data (bytes) -- [REQUIRED]
        
        The base64-encoded string of the content.
      - textContent (dict) --
        
        Contains information about content defined inline in text.
        
        data (string) -- [REQUIRED]
        
        The text of the content.
  - s3 (dict) --
    
    Contains information about the content to ingest into a knowledge base connected to an Amazon S3 data source
    - s3Location (dict) -- [REQUIRED]
      
      The S3 location of the file containing the content to ingest.
      - uri (string) -- [REQUIRED]
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.

rtype:

dict

returns:

Response Syntax

{
    'documentDetails': [
        {
            'knowledgeBaseId': 'string',
            'dataSourceId': 'string',
            'status': 'INDEXED'|'PARTIALLY_INDEXED'|'PENDING'|'FAILED'|'METADATA_PARTIALLY_INDEXED'|'METADATA_UPDATE_FAILED'|'IGNORED'|'NOT_FOUND'|'STARTING'|'IN_PROGRESS'|'DELETING'|'DELETE_IN_PROGRESS',
            'identifier': {
                'dataSourceType': 'CUSTOM'|'S3',
                's3': {
                    'uri': 'string'
                },
                'custom': {
                    'id': 'string'
                }
            },
            'statusReason': 'string',
            'updatedAt': datetime(2015, 1, 1)
        },
    ]
}

Response Structure

(dict) --
- documentDetails (list) --
  
  A list of objects, each of which contains information about the documents that were ingested.
  - (dict) --
    
    Contains the details for a document that was ingested or deleted.
    - knowledgeBaseId (string) --
      
      The identifier of the knowledge base that the document was ingested into or deleted from.
    - dataSourceId (string) --
      
      The identifier of the data source connected to the knowledge base that the document was ingested into or deleted from.
    - status (string) --
      
      The ingestion status of the document. The following statuses are possible:
      - STARTING – You submitted the ingestion job containing the document.
      - PENDING – The document is waiting to be ingested.
      - IN_PROGRESS – The document is being ingested.
      - INDEXED – The document was successfully indexed.
      - PARTIALLY_INDEXED – The document was partially indexed.
      - METADATA_PARTIALLY_INDEXED – You submitted metadata for an existing document and it was partially indexed.
      - METADATA_UPDATE_FAILED – You submitted a metadata update for an existing document but it failed.
      - FAILED – The document failed to be ingested.
      - NOT_FOUND – The document wasn't found.
      - IGNORED – The document was ignored during ingestion.
      - DELETING – You submitted the delete job containing the document.
      - DELETE_IN_PROGRESS – The document is being deleted.
    - identifier (dict) --
      
      Contains information that identifies the document.
      - dataSourceType (string) --
        
        The type of data source connected to the knowledge base that contains the document.
      - s3 (dict) --
        
        Contains information that identifies the document in an S3 data source.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
      - custom (dict) --
        
        Contains information that identifies the document in a custom data source.
        
        id (string) --
        
        The identifier of the document to ingest into a custom data source.
    - statusReason (string) --
      
      The reason for the status. Appears alongside the status IGNORED.
    - updatedAt (datetime) --
      
      The date and time at which the document was last updated.

ListDataSources (updated)

Link ¶
Changes (response)

{'dataSourceSummaries': {'status': {'CREATING', 'FAILED', 'UPDATING'}}}

Lists the data sources in a knowledge base and information about each one.

See also: AWS API Documentation

Request Syntax

client.list_data_sources(
    knowledgeBaseId='string',
    maxResults=123,
    nextToken='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for which to return a list of information.

type maxResults:

integer

param maxResults:

The maximum number of results to return in the response. If the total number of results is greater than this value, use the token returned in the response in the nextToken field when making another request to return the next batch of results.

type nextToken:

string

param nextToken:

If the total number of results is greater than the maxResults value provided in the request, enter the token returned in the nextToken field in the response in this field to return the next batch of results.

rtype:

dict

returns:

Response Syntax

{
    'dataSourceSummaries': [
        {
            'knowledgeBaseId': 'string',
            'dataSourceId': 'string',
            'name': 'string',
            'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL'|'CREATING'|'UPDATING'|'FAILED',
            'description': 'string',
            'updatedAt': datetime(2015, 1, 1)
        },
    ],
    'nextToken': 'string'
}

Response Structure

(dict) --
- dataSourceSummaries (list) --
  
  A list of objects, each of which contains information about a data source.
  - (dict) --
    
    Contains details about a data source.
    - knowledgeBaseId (string) --
      
      The unique identifier of the knowledge base to which the data source belongs.
    - dataSourceId (string) --
      
      The unique identifier of the data source.
    - name (string) --
      
      The name of the data source.
    - status (string) --
      
      The status of the data source.
    - description (string) --
      
      The description of the data source.
    - updatedAt (datetime) --
      
      The time at which the data source was last updated.
- nextToken (string) --
  
  If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

ListIngestionJobs (updated)

Link ¶
Changes (response)

{'ingestionJobSummaries': {'statistics': {'numberOfDocumentsSkipped': 'long'}}}

Lists the data ingestion jobs for a data source. The list also includes information about each job.

See also: AWS API Documentation

Request Syntax

client.list_ingestion_jobs(
    knowledgeBaseId='string',
    dataSourceId='string',
    filters=[
        {
            'attribute': 'STATUS',
            'operator': 'EQ',
            'values': [
                'string',
            ]
        },
    ],
    sortBy={
        'attribute': 'STATUS'|'STARTED_AT',
        'order': 'ASCENDING'|'DESCENDING'
    },
    maxResults=123,
    nextToken='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for the list of data ingestion jobs.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source for the list of data ingestion jobs.

type filters:

list

param filters:

Contains information about the filters for filtering the data.

(dict) --

The definition of a filter to filter the data.
- attribute (string) -- [REQUIRED]
  
  The name of field or attribute to apply the filter.
- operator (string) -- [REQUIRED]
  
  The operation to apply to the field or attribute.
- values (list) -- [REQUIRED]
  
  A list of values that belong to the field or attribute.
  - (string) --

type sortBy:

dict

param sortBy:

Contains details about how to sort the data.

attribute (string) -- [REQUIRED]

The name of field or attribute to apply sorting of data.
order (string) -- [REQUIRED]

The order for sorting the data.

type maxResults:

integer

param maxResults:

type nextToken:

string

param nextToken:

rtype:

dict

returns:

Response Syntax

{
    'ingestionJobSummaries': [
        {
            'knowledgeBaseId': 'string',
            'dataSourceId': 'string',
            'ingestionJobId': 'string',
            'description': 'string',
            'status': 'STARTING'|'IN_PROGRESS'|'COMPLETE'|'FAILED'|'STOPPING'|'STOPPED',
            'startedAt': datetime(2015, 1, 1),
            'updatedAt': datetime(2015, 1, 1),
            'statistics': {
                'numberOfDocumentsScanned': 123,
                'numberOfMetadataDocumentsScanned': 123,
                'numberOfNewDocumentsIndexed': 123,
                'numberOfModifiedDocumentsIndexed': 123,
                'numberOfMetadataDocumentsModified': 123,
                'numberOfDocumentsDeleted': 123,
                'numberOfDocumentsFailed': 123,
                'numberOfDocumentsSkipped': 123
            }
        },
    ],
    'nextToken': 'string'
}

Response Structure

(dict) --
- ingestionJobSummaries (list) --
  
  A list of data ingestion jobs with information about each job.
  - (dict) --
    
    Contains details about a data ingestion job.
    - knowledgeBaseId (string) --
      
      The unique identifier of the knowledge base for the data ingestion job.
    - dataSourceId (string) --
      
      The unique identifier of the data source for the data ingestion job.
    - ingestionJobId (string) --
      
      The unique identifier of the data ingestion job.
    - description (string) --
      
      The description of the data ingestion job.
    - status (string) --
      
      The status of the data ingestion job.
    - startedAt (datetime) --
      
      The time the data ingestion job started.
    - updatedAt (datetime) --
      
      The time the data ingestion job was last updated.
    - statistics (dict) --
      
      Contains statistics for the data ingestion job.
      - numberOfDocumentsScanned (integer) --
        
        The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
      - numberOfMetadataDocumentsScanned (integer) --
        
        The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
      - numberOfNewDocumentsIndexed (integer) --
        
        The number of new source documents in the data source that were successfully indexed.
      - numberOfModifiedDocumentsIndexed (integer) --
        
        The number of modified source documents in the data source that were successfully indexed.
      - numberOfMetadataDocumentsModified (integer) --
        
        The number of metadata files that were updated or deleted.
      - numberOfDocumentsDeleted (integer) --
        
        The number of source documents that were deleted.
      - numberOfDocumentsFailed (integer) --
        
        The number of source documents that failed to be ingested.
      - numberOfDocumentsSkipped (integer) --
        
        The number of source documents that were skipped during ingestion.
- nextToken (string) --
  
  If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

ListKnowledgeBases (updated)

Link ¶
Changes (response)

{'knowledgeBaseSummaries': {'status': {'UPDATE_UNSUCCESSFUL'}}}

Lists the knowledge bases in an account. The list also includesinformation about each knowledge base.

See also: AWS API Documentation

Request Syntax

client.list_knowledge_bases(
    maxResults=123,
    nextToken='string'
)

type maxResults:

integer

param maxResults:

type nextToken:

string

param nextToken:

rtype:

dict

returns:

Response Syntax

{
    'knowledgeBaseSummaries': [
        {
            'knowledgeBaseId': 'string',
            'name': 'string',
            'description': 'string',
            'status': 'CREATING'|'ACTIVE'|'DELETING'|'UPDATING'|'FAILED'|'DELETE_UNSUCCESSFUL'|'UPDATE_UNSUCCESSFUL',
            'updatedAt': datetime(2015, 1, 1)
        },
    ],
    'nextToken': 'string'
}

Response Structure

(dict) --
- knowledgeBaseSummaries (list) --
  
  A list of knowledge bases with information about each knowledge base.
  - (dict) --
    
    Contains details about a knowledge base.
    - knowledgeBaseId (string) --
      
      The unique identifier of the knowledge base.
    - name (string) --
      
      The name of the knowledge base.
    - description (string) --
      
      The description of the knowledge base.
    - status (string) --
      
      The status of the knowledge base.
    - updatedAt (datetime) --
      
      The time the knowledge base was last updated.
- nextToken (string) --
  
  If the total number of results is greater than the maxResults value provided in the request, use this token when making another request in the nextToken field to return the next batch of results.

StartIngestionJob (updated)

Link ¶
Changes (response)

{'ingestionJob': {'statistics': {'numberOfDocumentsSkipped': 'long'}}}

Begins a data ingestion job. Data sources are ingested into your knowledge base so that Large Language Models (LLMs) can use your data.

See also: AWS API Documentation

Request Syntax

client.start_ingestion_job(
    knowledgeBaseId='string',
    dataSourceId='string',
    clientToken='string',
    description='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for the data ingestion job.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source you want to ingest into your knowledge base.

type clientToken:

string

param clientToken:

This field is autopopulated if not provided.

type description:

string

param description:

A description of the data ingestion job.

rtype:

dict

returns:

Response Syntax

{
    'ingestionJob': {
        'knowledgeBaseId': 'string',
        'dataSourceId': 'string',
        'ingestionJobId': 'string',
        'description': 'string',
        'status': 'STARTING'|'IN_PROGRESS'|'COMPLETE'|'FAILED'|'STOPPING'|'STOPPED',
        'statistics': {
            'numberOfDocumentsScanned': 123,
            'numberOfMetadataDocumentsScanned': 123,
            'numberOfNewDocumentsIndexed': 123,
            'numberOfModifiedDocumentsIndexed': 123,
            'numberOfMetadataDocumentsModified': 123,
            'numberOfDocumentsDeleted': 123,
            'numberOfDocumentsFailed': 123,
            'numberOfDocumentsSkipped': 123
        },
        'failureReasons': [
            'string',
        ],
        'startedAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1)
    }
}

Response Structure

(dict) --
- ingestionJob (dict) --
  
  Contains information about the data ingestion job.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge for the data ingestion job.
  - dataSourceId (string) --
    
    The unique identifier of the data source for the data ingestion job.
  - ingestionJobId (string) --
    
    The unique identifier of the data ingestion job.
  - description (string) --
    
    The description of the data ingestion job.
  - status (string) --
    
    The status of the data ingestion job.
  - statistics (dict) --
    
    Contains statistics about the data ingestion job.
    - numberOfDocumentsScanned (integer) --
      
      The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
    - numberOfMetadataDocumentsScanned (integer) --
      
      The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
    - numberOfNewDocumentsIndexed (integer) --
      
      The number of new source documents in the data source that were successfully indexed.
    - numberOfModifiedDocumentsIndexed (integer) --
      
      The number of modified source documents in the data source that were successfully indexed.
    - numberOfMetadataDocumentsModified (integer) --
      
      The number of metadata files that were updated or deleted.
    - numberOfDocumentsDeleted (integer) --
      
      The number of source documents that were deleted.
    - numberOfDocumentsFailed (integer) --
      
      The number of source documents that failed to be ingested.
    - numberOfDocumentsSkipped (integer) --
      
      The number of source documents that were skipped during ingestion.
  - failureReasons (list) --
    
    A list of reasons that the data ingestion job failed.
    - (string) --
  - startedAt (datetime) --
    
    The time the data ingestion job started.
    
    If you stop a data ingestion job, the startedAt time is the time the job was started before the job was stopped.
  - updatedAt (datetime) --
    
    The time the data ingestion job was last updated.
    
    If you stop a data ingestion job, the updatedAt time is the time the job was stopped.

StopIngestionJob (updated)

Link ¶
Changes (response)

{'ingestionJob': {'statistics': {'numberOfDocumentsSkipped': 'long'}}}

Stops a currently running data ingestion job. You can send a StartIngestionJob request again to ingest the rest of your data when you are ready.

See also: AWS API Documentation

Request Syntax

client.stop_ingestion_job(
    knowledgeBaseId='string',
    dataSourceId='string',
    ingestionJobId='string'
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for the data ingestion job you want to stop.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source for the data ingestion job you want to stop.

type ingestionJobId:

string

param ingestionJobId:

[REQUIRED]

The unique identifier of the data ingestion job you want to stop.

rtype:

dict

returns:

Response Syntax

{
    'ingestionJob': {
        'knowledgeBaseId': 'string',
        'dataSourceId': 'string',
        'ingestionJobId': 'string',
        'description': 'string',
        'status': 'STARTING'|'IN_PROGRESS'|'COMPLETE'|'FAILED'|'STOPPING'|'STOPPED',
        'statistics': {
            'numberOfDocumentsScanned': 123,
            'numberOfMetadataDocumentsScanned': 123,
            'numberOfNewDocumentsIndexed': 123,
            'numberOfModifiedDocumentsIndexed': 123,
            'numberOfMetadataDocumentsModified': 123,
            'numberOfDocumentsDeleted': 123,
            'numberOfDocumentsFailed': 123,
            'numberOfDocumentsSkipped': 123
        },
        'failureReasons': [
            'string',
        ],
        'startedAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1)
    }
}

Response Structure

(dict) --
- ingestionJob (dict) --
  
  Contains information about the stopped data ingestion job.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge for the data ingestion job.
  - dataSourceId (string) --
    
    The unique identifier of the data source for the data ingestion job.
  - ingestionJobId (string) --
    
    The unique identifier of the data ingestion job.
  - description (string) --
    
    The description of the data ingestion job.
  - status (string) --
    
    The status of the data ingestion job.
  - statistics (dict) --
    
    Contains statistics about the data ingestion job.
    - numberOfDocumentsScanned (integer) --
      
      The total number of source documents that were scanned. Includes new, updated, and unchanged documents.
    - numberOfMetadataDocumentsScanned (integer) --
      
      The total number of metadata files that were scanned. Includes new, updated, and unchanged files.
    - numberOfNewDocumentsIndexed (integer) --
      
      The number of new source documents in the data source that were successfully indexed.
    - numberOfModifiedDocumentsIndexed (integer) --
      
      The number of modified source documents in the data source that were successfully indexed.
    - numberOfMetadataDocumentsModified (integer) --
      
      The number of metadata files that were updated or deleted.
    - numberOfDocumentsDeleted (integer) --
      
      The number of source documents that were deleted.
    - numberOfDocumentsFailed (integer) --
      
      The number of source documents that failed to be ingested.
    - numberOfDocumentsSkipped (integer) --
      
      The number of source documents that were skipped during ingestion.
  - failureReasons (list) --
    
    A list of reasons that the data ingestion job failed.
    - (string) --
  - startedAt (datetime) --
    
    The time the data ingestion job started.
    
    If you stop a data ingestion job, the startedAt time is the time the job was started before the job was stopped.
  - updatedAt (datetime) --
    
    The time the data ingestion job was last updated.
    
    If you stop a data ingestion job, the updatedAt time is the time the job was stopped.

UpdateDataSource (updated)

Link ¶
Changes (request, response)
Request

{'dataSourceConfiguration': {'managedKnowledgeBaseConnectorConfiguration': {'connectorParameters': {},
                                                                            'deletionProtectionConfiguration': {'deletionProtectionStatus': 'ENABLED '
                                                                                                                                            '| '
                                                                                                                                            'DISABLED',
                                                                                                                'deletionProtectionThreshold': 'integer'},
                                                                            'mediaExtractionConfiguration': {'audioExtractionConfiguration': {'audioExtractionStatus': 'ENABLED '
                                                                                                                                                                       '| '
                                                                                                                                                                       'DISABLED'},
                                                                                                             'imageExtractionConfiguration': {'imageExtractionStatus': 'ENABLED '
                                                                                                                                                                       '| '
                                                                                                                                                                       'DISABLED'},
                                                                                                             'videoExtractionConfiguration': {'videoExtractionStatus': 'ENABLED '
                                                                                                                                                                       '| '
                                                                                                                                                                       'DISABLED'}}},
                             'type': {'MANAGED_KNOWLEDGE_BASE_CONNECTOR'}},
 'vectorIngestionConfiguration': {'parsingConfiguration': {'parsingStrategy': {'SMART_PARSING'}}}}

Response

{'dataSource': {'dataSourceConfiguration': {'managedKnowledgeBaseConnectorConfiguration': {'connectorParameters': {},
                                                                                           'deletionProtectionConfiguration': {'deletionProtectionStatus': 'ENABLED '
                                                                                                                                                           '| '
                                                                                                                                                           'DISABLED',
                                                                                                                               'deletionProtectionThreshold': 'integer'},
                                                                                           'mediaExtractionConfiguration': {'audioExtractionConfiguration': {'audioExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'},
                                                                                                                            'imageExtractionConfiguration': {'imageExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'},
                                                                                                                            'videoExtractionConfiguration': {'videoExtractionStatus': 'ENABLED '
                                                                                                                                                                                      '| '
                                                                                                                                                                                      'DISABLED'}}},
                                            'type': {'MANAGED_KNOWLEDGE_BASE_CONNECTOR'}},
                'status': {'CREATING', 'FAILED', 'UPDATING'},
                'vectorIngestionConfiguration': {'parsingConfiguration': {'parsingStrategy': {'SMART_PARSING'}}}}}

Updates the configurations for a data source connector.

See also: AWS API Documentation

Request Syntax

client.update_data_source(
    knowledgeBaseId='string',
    dataSourceId='string',
    name='string',
    description='string',
    dataSourceConfiguration={
        'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA'|'MANAGED_KNOWLEDGE_BASE_CONNECTOR',
        'managedKnowledgeBaseConnectorConfiguration': {
            'deletionProtectionConfiguration': {
                'deletionProtectionStatus': 'ENABLED'|'DISABLED',
                'deletionProtectionThreshold': 123
            },
            'mediaExtractionConfiguration': {
                'imageExtractionConfiguration': {
                    'imageExtractionStatus': 'ENABLED'|'DISABLED'
                },
                'audioExtractionConfiguration': {
                    'audioExtractionStatus': 'ENABLED'|'DISABLED'
                },
                'videoExtractionConfiguration': {
                    'videoExtractionStatus': 'ENABLED'|'DISABLED'
                }
            },
            'connectorParameters': {...}|[...]|123|123.4|'string'|True|None
        },
        's3Configuration': {
            'bucketArn': 'string',
            'inclusionPrefixes': [
                'string',
            ],
            'bucketOwnerAccountId': 'string'
        },
        'webConfiguration': {
            'sourceConfiguration': {
                'urlConfiguration': {
                    'seedUrls': [
                        {
                            'url': 'string'
                        },
                    ]
                }
            },
            'crawlerConfiguration': {
                'crawlerLimits': {
                    'rateLimit': 123,
                    'maxPages': 123
                },
                'inclusionFilters': [
                    'string',
                ],
                'exclusionFilters': [
                    'string',
                ],
                'scope': 'HOST_ONLY'|'SUBDOMAINS',
                'userAgent': 'string',
                'userAgentHeader': 'string'
            }
        },
        'confluenceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'string',
                'hostType': 'SAAS',
                'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'string'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'string',
                                'inclusionFilters': [
                                    'string',
                                ],
                                'exclusionFilters': [
                                    'string',
                                ]
                            },
                        ]
                    }
                }
            }
        },
        'salesforceConfiguration': {
            'sourceConfiguration': {
                'hostUrl': 'string',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'string'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'string',
                                'inclusionFilters': [
                                    'string',
                                ],
                                'exclusionFilters': [
                                    'string',
                                ]
                            },
                        ]
                    }
                }
            }
        },
        'sharePointConfiguration': {
            'sourceConfiguration': {
                'tenantId': 'string',
                'domain': 'string',
                'siteUrls': [
                    'string',
                ],
                'hostType': 'ONLINE',
                'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS',
                'credentialsSecretArn': 'string'
            },
            'crawlerConfiguration': {
                'filterConfiguration': {
                    'type': 'PATTERN',
                    'patternObjectFilter': {
                        'filters': [
                            {
                                'objectType': 'string',
                                'inclusionFilters': [
                                    'string',
                                ],
                                'exclusionFilters': [
                                    'string',
                                ]
                            },
                        ]
                    }
                }
            }
        }
    },
    dataDeletionPolicy='RETAIN'|'DELETE',
    serverSideEncryptionConfiguration={
        'kmsKeyArn': 'string'
    },
    vectorIngestionConfiguration={
        'chunkingConfiguration': {
            'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC',
            'fixedSizeChunkingConfiguration': {
                'maxTokens': 123,
                'overlapPercentage': 123
            },
            'hierarchicalChunkingConfiguration': {
                'levelConfigurations': [
                    {
                        'maxTokens': 123
                    },
                ],
                'overlapTokens': 123
            },
            'semanticChunkingConfiguration': {
                'maxTokens': 123,
                'bufferSize': 123,
                'breakpointPercentileThreshold': 123
            }
        },
        'customTransformationConfiguration': {
            'intermediateStorage': {
                's3Location': {
                    'uri': 'string'
                }
            },
            'transformations': [
                {
                    'transformationFunction': {
                        'transformationLambdaConfiguration': {
                            'lambdaArn': 'string'
                        }
                    },
                    'stepToApply': 'POST_CHUNKING'
                },
            ]
        },
        'parsingConfiguration': {
            'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION'|'SMART_PARSING',
            'bedrockFoundationModelConfiguration': {
                'modelArn': 'string',
                'parsingPrompt': {
                    'parsingPromptText': 'string'
                },
                'parsingModality': 'MULTIMODAL'
            },
            'bedrockDataAutomationConfiguration': {
                'parsingModality': 'MULTIMODAL'
            }
        },
        'contextEnrichmentConfiguration': {
            'type': 'BEDROCK_FOUNDATION_MODEL',
            'bedrockFoundationModelConfiguration': {
                'enrichmentStrategyConfiguration': {
                    'method': 'CHUNK_ENTITY_EXTRACTION'
                },
                'modelArn': 'string'
            }
        }
    }
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base for the data source.

type dataSourceId:

string

param dataSourceId:

[REQUIRED]

The unique identifier of the data source.

type name:

string

param name:

[REQUIRED]

Specifies a new name for the data source.

type description:

string

param description:

Specifies a new description for the data source.

type dataSourceConfiguration:

dict

param dataSourceConfiguration:

[REQUIRED]

The connection configuration for the data source that you want to update.

type (string) -- [REQUIRED]

The type of data source.
managedKnowledgeBaseConnectorConfiguration (dict) --

Contains the configuration for a data source that connects a managed knowledge base to a supported data source connector. Specify this object when the data source type is MANAGED_KNOWLEDGE_BASE_CONNECTOR.
- deletionProtectionConfiguration (dict) --
  
  A safeguard against accidental bulk deletion of indexed content.
  - deletionProtectionStatus (string) -- [REQUIRED]
    
    Enable or disable deletion protection for the connector.
  - deletionProtectionThreshold (integer) --
    
    The threshold is the maximum percentage of documents that a sync job can delete from your index. If a sync would delete more than this percentage, the sync skips its delete phase, leaving your indexed documents in place. Not supported for the Custom connector.
- mediaExtractionConfiguration (dict) --
  
  Configuration for extracting media (images, audio, video) from data source files.
  - imageExtractionConfiguration (dict) --
    
    Configuration for image extraction.
    - imageExtractionStatus (string) -- [REQUIRED]
      
      Whether image extraction is enabled or disabled.
  - audioExtractionConfiguration (dict) --
    
    Configuration for audio extraction.
    - audioExtractionStatus (string) -- [REQUIRED]
      
      Whether audio extraction is enabled or disabled.
  - videoExtractionConfiguration (dict) --
    
    Configuration for video extraction.
    - videoExtractionStatus (string) -- [REQUIRED]
      
      Whether video extraction is enabled or disabled.
- connectorParameters (:ref:`document<document>`) --
  
  Connector-specific parameters. For more information, see Connect a data source.
s3Configuration (dict) --

The configuration information to connect to Amazon S3 as your data source for self-managed knowledge bases. To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration.
- bucketArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
- inclusionPrefixes (list) --
  
  A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
  - (string) --
- bucketOwnerAccountId (string) --
  
  The account ID for the owner of the S3 bucket.
webConfiguration (dict) --

The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.

Note

To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Web crawler data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The source configuration details for the web data source.
  - urlConfiguration (dict) -- [REQUIRED]
    
    The configuration of the URL/URLs.
    - seedUrls (list) --
      
      One or more seed or starting point URLs.
      - (dict) --
        
        The seed or starting point URL. You should be authorized to crawl the URL.
        
        url (string) --
        
        A seed or starting point URL.
- crawlerConfiguration (dict) --
  
  The Web Crawler configuration details for the web data source.
  - crawlerLimits (dict) --
    
    The configuration of crawl limits for the web URLs.
    - rateLimit (integer) --
      
      The max rate at which pages are crawled, up to 300 per minute per host.
    - maxPages (integer) --
      
      The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
  - inclusionFilters (list) --
    
    A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
    - (string) --
  - exclusionFilters (list) --
    
    A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
    - (string) --
  - scope (string) --
    
    The scope of what is crawled for your URLs.
    
    You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
  - userAgent (string) --
    
    Returns the user agent suffix for your web crawler.
  - userAgentHeader (string) --
    
    A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID. You can optionally append a custom suffix to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
confluenceConfiguration (dict) --

The configuration information to connect to Confluence as your data source for self-managed knowledge bases.

Note

To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Confluence data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The endpoint information to connect to your Confluence data source.
  - hostUrl (string) -- [REQUIRED]
    
    The Confluence host URL or instance URL.
  - hostType (string) -- [REQUIRED]
    
    The supported host type, whether online/cloud or server/on-premises.
  - authType (string) -- [REQUIRED]
    
    The supported authentication type to authenticate and connect to your Confluence instance.
  - credentialsSecretArn (string) -- [REQUIRED]
    
    The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
- crawlerConfiguration (dict) --
  
  The configuration of the Confluence content. For example, configuring specific types of Confluence content.
  - filterConfiguration (dict) --
    
    The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
    - type (string) -- [REQUIRED]
      
      The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
    - patternObjectFilter (dict) --
      
      The configuration of filtering certain objects or content types of the data source.
      - filters (list) -- [REQUIRED]
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) -- [REQUIRED]
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
salesforceConfiguration (dict) --

The configuration information to connect to Salesforce as your data source.

Note

Salesforce data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The endpoint information to connect to your Salesforce data source.
  - hostUrl (string) -- [REQUIRED]
    
    The Salesforce host URL or instance URL.
  - authType (string) -- [REQUIRED]
    
    The supported authentication type to authenticate and connect to your Salesforce instance.
  - credentialsSecretArn (string) -- [REQUIRED]
    
    The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
- crawlerConfiguration (dict) --
  
  The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
  - filterConfiguration (dict) --
    
    The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
    - type (string) -- [REQUIRED]
      
      The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
    - patternObjectFilter (dict) --
      
      The configuration of filtering certain objects or content types of the data source.
      - filters (list) -- [REQUIRED]
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) -- [REQUIRED]
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
sharePointConfiguration (dict) --

The configuration information to connect to SharePoint as your data source for self-managed knowledge bases.

Note

To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. SharePoint data source connector for self-managed knowledge bases is in preview release and is subject to change.
- sourceConfiguration (dict) -- [REQUIRED]
  
  The endpoint information to connect to your SharePoint data source.
  - tenantId (string) --
    
    The identifier of your Microsoft 365 tenant.
  - domain (string) -- [REQUIRED]
    
    The domain of your SharePoint instance or site URL/URLs.
  - siteUrls (list) -- [REQUIRED]
    
    A list of one or more SharePoint site URLs.
    - (string) --
  - hostType (string) -- [REQUIRED]
    
    The supported host type, whether online/cloud or server/on-premises.
  - authType (string) -- [REQUIRED]
    
    The supported authentication type to authenticate and connect to your SharePoint site/sites.
  - credentialsSecretArn (string) -- [REQUIRED]
    
    The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
- crawlerConfiguration (dict) --
  
  The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
  - filterConfiguration (dict) --
    
    The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
    - type (string) -- [REQUIRED]
      
      The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
    - patternObjectFilter (dict) --
      
      The configuration of filtering certain objects or content types of the data source.
      - filters (list) -- [REQUIRED]
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) -- [REQUIRED]
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --

type dataDeletionPolicy:

string

param dataDeletionPolicy:

The data deletion policy for the data source that you want to update.

type serverSideEncryptionConfiguration:

dict

param serverSideEncryptionConfiguration:

Contains details about server-side encryption of the data source.

kmsKeyArn (string) --

The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.

type vectorIngestionConfiguration:

dict

param vectorIngestionConfiguration:

Contains details about how to ingest the documents in the data source.

chunkingConfiguration (dict) --

Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
- chunkingStrategy (string) -- [REQUIRED]
  
  Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
  - FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
  - HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
  - SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
  - NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
- fixedSizeChunkingConfiguration (dict) --
  
  Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
  - maxTokens (integer) -- [REQUIRED]
    
    The maximum number of tokens to include in a chunk.
  - overlapPercentage (integer) -- [REQUIRED]
    
    The percentage of overlap between adjacent chunks of a data source.
- hierarchicalChunkingConfiguration (dict) --
  
  Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
  - levelConfigurations (list) -- [REQUIRED]
    
    Token settings for each layer.
    - (dict) --
      
      Token settings for a layer in a hierarchical chunking configuration.
      - maxTokens (integer) -- [REQUIRED]
        
        The maximum number of tokens that a chunk can contain in this layer.
  - overlapTokens (integer) -- [REQUIRED]
    
    The number of tokens to repeat across chunks in the same layer.
- semanticChunkingConfiguration (dict) --
  
  Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
  - maxTokens (integer) -- [REQUIRED]
    
    The maximum number of tokens that a chunk can contain.
  - bufferSize (integer) -- [REQUIRED]
    
    The buffer size.
  - breakpointPercentileThreshold (integer) -- [REQUIRED]
    
    The dissimilarity threshold for splitting chunks.
customTransformationConfiguration (dict) --

A custom document transformer for parsed data source documents.
- intermediateStorage (dict) -- [REQUIRED]
  
  An S3 bucket path for input and output objects.
  - s3Location (dict) -- [REQUIRED]
    
    An S3 bucket path.
    - uri (string) -- [REQUIRED]
      
      The location's URI. For example, s3://my-bucket/chunk-processor/.
- transformations (list) -- [REQUIRED]
  
  A Lambda function that processes documents.
  - (dict) --
    
    A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
    - transformationFunction (dict) -- [REQUIRED]
      
      A Lambda function that processes documents.
      - transformationLambdaConfiguration (dict) -- [REQUIRED]
        
        The Lambda function.
        
        lambdaArn (string) -- [REQUIRED]
        
        The function's ARN identifier.
    - stepToApply (string) -- [REQUIRED]
      
      When the service applies the transformation.
parsingConfiguration (dict) --

Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
- parsingStrategy (string) -- [REQUIRED]
  
  The parsing strategy for the data source. Only SMART_PARSING can be selected for managed knowledge bases. For more information, see Customize ingestion for managed knowledge bases.
- bedrockFoundationModelConfiguration (dict) --
  
  If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
  - modelArn (string) -- [REQUIRED]
    
    The ARN of the foundation model to use for parsing.
  - parsingPrompt (dict) --
    
    Instructions for interpreting the contents of a document.
    - parsingPromptText (string) -- [REQUIRED]
      
      Instructions for interpreting the contents of a document.
  - parsingModality (string) --
    
    Specifies whether to enable parsing of multimodal data, including both text and/or images.
- bedrockDataAutomationConfiguration (dict) --
  
  If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
  - parsingModality (string) --
    
    Specifies whether to enable parsing of multimodal data, including both text and/or images.
contextEnrichmentConfiguration (dict) --

The context enrichment configuration used for ingestion of the data into the vector store.
- type (string) -- [REQUIRED]
  
  The method used for context enrichment. It must be Amazon Bedrock foundation models.
- bedrockFoundationModelConfiguration (dict) --
  
  The configuration of the Amazon Bedrock foundation model used for context enrichment.
  - enrichmentStrategyConfiguration (dict) -- [REQUIRED]
    
    The enrichment stategy used to provide additional context. For example, Neptune GraphRAG uses Amazon Bedrock foundation models to perform chunk entity extraction.
    - method (string) -- [REQUIRED]
      
      The method used for the context enrichment strategy.
  - modelArn (string) -- [REQUIRED]
    
    The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.

rtype:

dict

returns:

Response Syntax

{
    'dataSource': {
        'knowledgeBaseId': 'string',
        'dataSourceId': 'string',
        'name': 'string',
        'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL'|'CREATING'|'UPDATING'|'FAILED',
        'description': 'string',
        'dataSourceConfiguration': {
            'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA'|'MANAGED_KNOWLEDGE_BASE_CONNECTOR',
            'managedKnowledgeBaseConnectorConfiguration': {
                'deletionProtectionConfiguration': {
                    'deletionProtectionStatus': 'ENABLED'|'DISABLED',
                    'deletionProtectionThreshold': 123
                },
                'mediaExtractionConfiguration': {
                    'imageExtractionConfiguration': {
                        'imageExtractionStatus': 'ENABLED'|'DISABLED'
                    },
                    'audioExtractionConfiguration': {
                        'audioExtractionStatus': 'ENABLED'|'DISABLED'
                    },
                    'videoExtractionConfiguration': {
                        'videoExtractionStatus': 'ENABLED'|'DISABLED'
                    }
                },
                'connectorParameters': {...}|[...]|123|123.4|'string'|True|None
            },
            's3Configuration': {
                'bucketArn': 'string',
                'inclusionPrefixes': [
                    'string',
                ],
                'bucketOwnerAccountId': 'string'
            },
            'webConfiguration': {
                'sourceConfiguration': {
                    'urlConfiguration': {
                        'seedUrls': [
                            {
                                'url': 'string'
                            },
                        ]
                    }
                },
                'crawlerConfiguration': {
                    'crawlerLimits': {
                        'rateLimit': 123,
                        'maxPages': 123
                    },
                    'inclusionFilters': [
                        'string',
                    ],
                    'exclusionFilters': [
                        'string',
                    ],
                    'scope': 'HOST_ONLY'|'SUBDOMAINS',
                    'userAgent': 'string',
                    'userAgentHeader': 'string'
                }
            },
            'confluenceConfiguration': {
                'sourceConfiguration': {
                    'hostUrl': 'string',
                    'hostType': 'SAAS',
                    'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            },
            'salesforceConfiguration': {
                'sourceConfiguration': {
                    'hostUrl': 'string',
                    'authType': 'OAUTH2_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            },
            'sharePointConfiguration': {
                'sourceConfiguration': {
                    'tenantId': 'string',
                    'domain': 'string',
                    'siteUrls': [
                        'string',
                    ],
                    'hostType': 'ONLINE',
                    'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS',
                    'credentialsSecretArn': 'string'
                },
                'crawlerConfiguration': {
                    'filterConfiguration': {
                        'type': 'PATTERN',
                        'patternObjectFilter': {
                            'filters': [
                                {
                                    'objectType': 'string',
                                    'inclusionFilters': [
                                        'string',
                                    ],
                                    'exclusionFilters': [
                                        'string',
                                    ]
                                },
                            ]
                        }
                    }
                }
            }
        },
        'serverSideEncryptionConfiguration': {
            'kmsKeyArn': 'string'
        },
        'vectorIngestionConfiguration': {
            'chunkingConfiguration': {
                'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC',
                'fixedSizeChunkingConfiguration': {
                    'maxTokens': 123,
                    'overlapPercentage': 123
                },
                'hierarchicalChunkingConfiguration': {
                    'levelConfigurations': [
                        {
                            'maxTokens': 123
                        },
                    ],
                    'overlapTokens': 123
                },
                'semanticChunkingConfiguration': {
                    'maxTokens': 123,
                    'bufferSize': 123,
                    'breakpointPercentileThreshold': 123
                }
            },
            'customTransformationConfiguration': {
                'intermediateStorage': {
                    's3Location': {
                        'uri': 'string'
                    }
                },
                'transformations': [
                    {
                        'transformationFunction': {
                            'transformationLambdaConfiguration': {
                                'lambdaArn': 'string'
                            }
                        },
                        'stepToApply': 'POST_CHUNKING'
                    },
                ]
            },
            'parsingConfiguration': {
                'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION'|'SMART_PARSING',
                'bedrockFoundationModelConfiguration': {
                    'modelArn': 'string',
                    'parsingPrompt': {
                        'parsingPromptText': 'string'
                    },
                    'parsingModality': 'MULTIMODAL'
                },
                'bedrockDataAutomationConfiguration': {
                    'parsingModality': 'MULTIMODAL'
                }
            },
            'contextEnrichmentConfiguration': {
                'type': 'BEDROCK_FOUNDATION_MODEL',
                'bedrockFoundationModelConfiguration': {
                    'enrichmentStrategyConfiguration': {
                        'method': 'CHUNK_ENTITY_EXTRACTION'
                    },
                    'modelArn': 'string'
                }
            }
        },
        'dataDeletionPolicy': 'RETAIN'|'DELETE',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'failureReasons': [
            'string',
        ]
    }
}

Response Structure

(dict) --
- dataSource (dict) --
  
  Contains details about the data source.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge base to which the data source belongs.
  - dataSourceId (string) --
    
    The unique identifier of the data source.
  - name (string) --
    
    The name of the data source.
  - status (string) --
    
    The status of the data source. The following statuses are possible:
    - Available – The data source has been created and is ready for ingestion into the knowledge base.
    - Deleting – The data source is being deleted.
  - description (string) --
    
    The description of the data source.
  - dataSourceConfiguration (dict) --
    
    The connection configuration for the data source.
    - type (string) --
      
      The type of data source.
    - managedKnowledgeBaseConnectorConfiguration (dict) --
      
      Contains the configuration for a data source that connects a managed knowledge base to a supported data source connector. Specify this object when the data source type is MANAGED_KNOWLEDGE_BASE_CONNECTOR.
      - deletionProtectionConfiguration (dict) --
        
        A safeguard against accidental bulk deletion of indexed content.
        
        deletionProtectionStatus (string) --
        
        Enable or disable deletion protection for the connector.
        
        deletionProtectionThreshold (integer) --
        
        The threshold is the maximum percentage of documents that a sync job can delete from your index. If a sync would delete more than this percentage, the sync skips its delete phase, leaving your indexed documents in place. Not supported for the Custom connector.
      - mediaExtractionConfiguration (dict) --
        
        Configuration for extracting media (images, audio, video) from data source files.
        
        imageExtractionConfiguration (dict) --
        
        Configuration for image extraction.
        
        imageExtractionStatus (string) --
        
        Whether image extraction is enabled or disabled.
        
        audioExtractionConfiguration (dict) --
        
        Configuration for audio extraction.
        
        audioExtractionStatus (string) --
        
        Whether audio extraction is enabled or disabled.
        
        videoExtractionConfiguration (dict) --
        
        Configuration for video extraction.
        
        videoExtractionStatus (string) --
        
        Whether video extraction is enabled or disabled.
      - connectorParameters (:ref:`document<document>`) --
        
        Connector-specific parameters. For more information, see Connect a data source.
    - s3Configuration (dict) --
      
      The configuration information to connect to Amazon S3 as your data source for self-managed knowledge bases. To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration.
      - bucketArn (string) --
        
        The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
      - inclusionPrefixes (list) --
        
        A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
        
        (string) --
      - bucketOwnerAccountId (string) --
        
        The account ID for the owner of the S3 bucket.
    - webConfiguration (dict) --
      
      The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Web crawler data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The source configuration details for the web data source.
        
        urlConfiguration (dict) --
        
        The configuration of the URL/URLs.
        
        seedUrls (list) --
        
        One or more seed or starting point URLs.
        
        (dict) --
        
        The seed or starting point URL. You should be authorized to crawl the URL.
        
        url (string) --
        
        A seed or starting point URL.
      - crawlerConfiguration (dict) --
        
        The Web Crawler configuration details for the web data source.
        
        crawlerLimits (dict) --
        
        The configuration of crawl limits for the web URLs.
        
        rateLimit (integer) --
        
        The max rate at which pages are crawled, up to 300 per minute per host.
        
        maxPages (integer) --
        
        The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
        
        (string) --
        
        scope (string) --
        
        The scope of what is crawled for your URLs.
        
        You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
        
        userAgent (string) --
        
        Returns the user agent suffix for your web crawler.
        
        userAgentHeader (string) --
        
        A string used for identifying the crawler or bot when it accesses a web server. The user agent header value consists of the bedrockbot, UUID, and a user agent suffix for your crawler (if one is provided). By default, it is set to bedrockbot_UUID. You can optionally append a custom suffix to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
    - confluenceConfiguration (dict) --
      
      The configuration information to connect to Confluence as your data source for self-managed knowledge bases.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. Confluence data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your Confluence data source.
        
        hostUrl (string) --
        
        The Confluence host URL or instance URL.
        
        hostType (string) --
        
        The supported host type, whether online/cloud or server/on-premises.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your Confluence instance.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the Confluence content. For example, configuring specific types of Confluence content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
    - salesforceConfiguration (dict) --
      
      The configuration information to connect to Salesforce as your data source.
      
      Note
      
      Salesforce data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your Salesforce data source.
        
        hostUrl (string) --
        
        The Salesforce host URL or instance URL.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your Salesforce instance.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
    - sharePointConfiguration (dict) --
      
      The configuration information to connect to SharePoint as your data source for self-managed knowledge bases.
      
      Note
      
      To configure this data source for managed knowledge bases, use managedKnowledgeBaseConnectorConfiguration. SharePoint data source connector for self-managed knowledge bases is in preview release and is subject to change.
      - sourceConfiguration (dict) --
        
        The endpoint information to connect to your SharePoint data source.
        
        tenantId (string) --
        
        The identifier of your Microsoft 365 tenant.
        
        domain (string) --
        
        The domain of your SharePoint instance or site URL/URLs.
        
        siteUrls (list) --
        
        A list of one or more SharePoint site URLs.
        
        (string) --
        
        hostType (string) --
        
        The supported host type, whether online/cloud or server/on-premises.
        
        authType (string) --
        
        The supported authentication type to authenticate and connect to your SharePoint site/sites.
        
        credentialsSecretArn (string) --
        
        The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
      - crawlerConfiguration (dict) --
        
        The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
        
        filterConfiguration (dict) --
        
        The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
        
        type (string) --
        
        The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
        
        patternObjectFilter (dict) --
        
        The configuration of filtering certain objects or content types of the data source.
        
        filters (list) --
        
        The configuration of specific filters applied to your data source content. You can filter out or include certain content.
        
        (dict) --
        
        The specific filters applied to your data source content. You can filter out or include certain content.
        
        objectType (string) --
        
        The supported object type or content type of the data source.
        
        inclusionFilters (list) --
        
        A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
        
        exclusionFilters (list) --
        
        A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
        
        (string) --
  - serverSideEncryptionConfiguration (dict) --
    
    Contains details about the configuration of the server-side encryption.
    - kmsKeyArn (string) --
      
      The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
  - vectorIngestionConfiguration (dict) --
    
    Contains details about how to ingest the documents in the data source.
    - chunkingConfiguration (dict) --
      
      Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
      - chunkingStrategy (string) --
        
        Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
        
        FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
        
        HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
        
        SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
        
        NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
      - fixedSizeChunkingConfiguration (dict) --
        
        Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
        
        maxTokens (integer) --
        
        The maximum number of tokens to include in a chunk.
        
        overlapPercentage (integer) --
        
        The percentage of overlap between adjacent chunks of a data source.
      - hierarchicalChunkingConfiguration (dict) --
        
        Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
        
        levelConfigurations (list) --
        
        Token settings for each layer.
        
        (dict) --
        
        Token settings for a layer in a hierarchical chunking configuration.
        
        maxTokens (integer) --
        
        The maximum number of tokens that a chunk can contain in this layer.
        
        overlapTokens (integer) --
        
        The number of tokens to repeat across chunks in the same layer.
      - semanticChunkingConfiguration (dict) --
        
        Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
        
        maxTokens (integer) --
        
        The maximum number of tokens that a chunk can contain.
        
        bufferSize (integer) --
        
        The buffer size.
        
        breakpointPercentileThreshold (integer) --
        
        The dissimilarity threshold for splitting chunks.
    - customTransformationConfiguration (dict) --
      
      A custom document transformer for parsed data source documents.
      - intermediateStorage (dict) --
        
        An S3 bucket path for input and output objects.
        
        s3Location (dict) --
        
        An S3 bucket path.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
      - transformations (list) --
        
        A Lambda function that processes documents.
        
        (dict) --
        
        A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
        
        transformationFunction (dict) --
        
        A Lambda function that processes documents.
        
        transformationLambdaConfiguration (dict) --
        
        The Lambda function.
        
        lambdaArn (string) --
        
        The function's ARN identifier.
        
        stepToApply (string) --
        
        When the service applies the transformation.
    - parsingConfiguration (dict) --
      
      Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
      - parsingStrategy (string) --
        
        The parsing strategy for the data source. Only SMART_PARSING can be selected for managed knowledge bases. For more information, see Customize ingestion for managed knowledge bases.
      - bedrockFoundationModelConfiguration (dict) --
        
        If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
        
        modelArn (string) --
        
        The ARN of the foundation model to use for parsing.
        
        parsingPrompt (dict) --
        
        Instructions for interpreting the contents of a document.
        
        parsingPromptText (string) --
        
        Instructions for interpreting the contents of a document.
        
        parsingModality (string) --
        
        Specifies whether to enable parsing of multimodal data, including both text and/or images.
      - bedrockDataAutomationConfiguration (dict) --
        
        If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
        
        parsingModality (string) --
        
        Specifies whether to enable parsing of multimodal data, including both text and/or images.
    - contextEnrichmentConfiguration (dict) --
      
      The context enrichment configuration used for ingestion of the data into the vector store.
      - type (string) --
        
        The method used for context enrichment. It must be Amazon Bedrock foundation models.
      - bedrockFoundationModelConfiguration (dict) --
        
        The configuration of the Amazon Bedrock foundation model used for context enrichment.
        
        enrichmentStrategyConfiguration (dict) --
        
        The enrichment stategy used to provide additional context. For example, Neptune GraphRAG uses Amazon Bedrock foundation models to perform chunk entity extraction.
        
        method (string) --
        
        The method used for the context enrichment strategy.
        
        modelArn (string) --
        
        The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
  - dataDeletionPolicy (string) --
    
    The data deletion policy for the data source.
  - createdAt (datetime) --
    
    The time at which the data source was created.
  - updatedAt (datetime) --
    
    The time at which the data source was last updated.
  - failureReasons (list) --
    
    The detailed reasons on the failure to delete a data source.
    - (string) --

UpdateKnowledgeBase (updated)

Link ¶
Changes (request, response)
Request

{'knowledgeBaseConfiguration': {'managedKnowledgeBaseConfiguration': {'embeddingModelArn': 'string',
                                                                      'embeddingModelConfiguration': {'bedrockEmbeddingModelConfiguration': {'audio': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}],
                                                                                                                                             'dimensions': 'integer',
                                                                                                                                             'embeddingDataType': 'FLOAT32 '
                                                                                                                                                                  '| '
                                                                                                                                                                  'BINARY',
                                                                                                                                             'video': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}]}},
                                                                      'embeddingModelType': 'CUSTOM '
                                                                                            '| '
                                                                                            'MANAGED',
                                                                      'serverSideEncryptionConfiguration': {'kmsKeyArn': 'string'}},
                                'type': {'MANAGED'}}}

Response

{'knowledgeBase': {'knowledgeBaseConfiguration': {'managedKnowledgeBaseConfiguration': {'embeddingModelArn': 'string',
                                                                                        'embeddingModelConfiguration': {'bedrockEmbeddingModelConfiguration': {'audio': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}],
                                                                                                                                                               'dimensions': 'integer',
                                                                                                                                                               'embeddingDataType': 'FLOAT32 '
                                                                                                                                                                                    '| '
                                                                                                                                                                                    'BINARY',
                                                                                                                                                               'video': [{'segmentationConfiguration': {'fixedLengthDuration': 'integer'}}]}},
                                                                                        'embeddingModelType': 'CUSTOM '
                                                                                                              '| '
                                                                                                              'MANAGED',
                                                                                        'serverSideEncryptionConfiguration': {'kmsKeyArn': 'string'}},
                                                  'type': {'MANAGED'}},
                   'status': {'UPDATE_UNSUCCESSFUL'}}}

Updates the configuration of a knowledge base with the fields that you specify. Because all fields will be overwritten, you must include the same values for fields that you want to keep the same.

You can change the following fields:

name
description
roleArn

You can't change the knowledgeBaseConfiguration or storageConfiguration fields, so you must specify the same configurations as when you created the knowledge base. You can send a GetKnowledgeBase request and copy the same configurations.

See also: AWS API Documentation

Request Syntax

client.update_knowledge_base(
    knowledgeBaseId='string',
    name='string',
    description='string',
    roleArn='string',
    knowledgeBaseConfiguration={
        'type': 'VECTOR'|'KENDRA'|'SQL'|'MANAGED',
        'vectorKnowledgeBaseConfiguration': {
            'embeddingModelArn': 'string',
            'embeddingModelConfiguration': {
                'bedrockEmbeddingModelConfiguration': {
                    'dimensions': 123,
                    'embeddingDataType': 'FLOAT32'|'BINARY',
                    'audio': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ],
                    'video': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ]
                }
            },
            'supplementalDataStorageConfiguration': {
                'storageLocations': [
                    {
                        'type': 'S3',
                        's3Location': {
                            'uri': 'string'
                        }
                    },
                ]
            }
        },
        'managedKnowledgeBaseConfiguration': {
            'embeddingModelType': 'CUSTOM'|'MANAGED',
            'embeddingModelArn': 'string',
            'embeddingModelConfiguration': {
                'bedrockEmbeddingModelConfiguration': {
                    'dimensions': 123,
                    'embeddingDataType': 'FLOAT32'|'BINARY',
                    'audio': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ],
                    'video': [
                        {
                            'segmentationConfiguration': {
                                'fixedLengthDuration': 123
                            }
                        },
                    ]
                }
            },
            'serverSideEncryptionConfiguration': {
                'kmsKeyArn': 'string'
            }
        },
        'kendraKnowledgeBaseConfiguration': {
            'kendraIndexArn': 'string'
        },
        'sqlKnowledgeBaseConfiguration': {
            'type': 'REDSHIFT',
            'redshiftConfiguration': {
                'storageConfigurations': [
                    {
                        'type': 'REDSHIFT'|'AWS_DATA_CATALOG',
                        'awsDataCatalogConfiguration': {
                            'tableNames': [
                                'string',
                            ]
                        },
                        'redshiftConfiguration': {
                            'databaseName': 'string'
                        }
                    },
                ],
                'queryEngineConfiguration': {
                    'type': 'SERVERLESS'|'PROVISIONED',
                    'serverlessConfiguration': {
                        'workgroupArn': 'string',
                        'authConfiguration': {
                            'type': 'IAM'|'USERNAME_PASSWORD',
                            'usernamePasswordSecretArn': 'string'
                        }
                    },
                    'provisionedConfiguration': {
                        'clusterIdentifier': 'string',
                        'authConfiguration': {
                            'type': 'IAM'|'USERNAME_PASSWORD'|'USERNAME',
                            'databaseUser': 'string',
                            'usernamePasswordSecretArn': 'string'
                        }
                    }
                },
                'queryGenerationConfiguration': {
                    'executionTimeoutSeconds': 123,
                    'generationContext': {
                        'tables': [
                            {
                                'name': 'string',
                                'description': 'string',
                                'inclusion': 'INCLUDE'|'EXCLUDE',
                                'columns': [
                                    {
                                        'name': 'string',
                                        'description': 'string',
                                        'inclusion': 'INCLUDE'|'EXCLUDE'
                                    },
                                ]
                            },
                        ],
                        'curatedQueries': [
                            {
                                'naturalLanguage': 'string',
                                'sql': 'string'
                            },
                        ]
                    }
                }
            }
        }
    },
    storageConfiguration={
        'type': 'OPENSEARCH_SERVERLESS'|'PINECONE'|'REDIS_ENTERPRISE_CLOUD'|'RDS'|'MONGO_DB_ATLAS'|'NEPTUNE_ANALYTICS'|'OPENSEARCH_MANAGED_CLUSTER'|'S3_VECTORS',
        'opensearchServerlessConfiguration': {
            'collectionArn': 'string',
            'vectorIndexName': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'opensearchManagedClusterConfiguration': {
            'domainEndpoint': 'string',
            'domainArn': 'string',
            'vectorIndexName': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'pineconeConfiguration': {
            'connectionString': 'string',
            'credentialsSecretArn': 'string',
            'namespace': 'string',
            'fieldMapping': {
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'redisEnterpriseCloudConfiguration': {
            'endpoint': 'string',
            'vectorIndexName': 'string',
            'credentialsSecretArn': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        'rdsConfiguration': {
            'resourceArn': 'string',
            'credentialsSecretArn': 'string',
            'databaseName': 'string',
            'tableName': 'string',
            'fieldMapping': {
                'primaryKeyField': 'string',
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string',
                'customMetadataField': 'string'
            }
        },
        'mongoDbAtlasConfiguration': {
            'endpoint': 'string',
            'databaseName': 'string',
            'collectionName': 'string',
            'vectorIndexName': 'string',
            'credentialsSecretArn': 'string',
            'fieldMapping': {
                'vectorField': 'string',
                'textField': 'string',
                'metadataField': 'string'
            },
            'endpointServiceName': 'string',
            'textIndexName': 'string'
        },
        'neptuneAnalyticsConfiguration': {
            'graphArn': 'string',
            'fieldMapping': {
                'textField': 'string',
                'metadataField': 'string'
            }
        },
        's3VectorsConfiguration': {
            'vectorBucketArn': 'string',
            'indexArn': 'string',
            'indexName': 'string'
        }
    }
)

type knowledgeBaseId:

string

param knowledgeBaseId:

[REQUIRED]

The unique identifier of the knowledge base to update.

type name:

string

param name:

[REQUIRED]

Specifies a new name for the knowledge base.

type description:

string

param description:

Specifies a new description for the knowledge base.

type roleArn:

string

param roleArn:

[REQUIRED]

Specifies a different Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the knowledge base.

type knowledgeBaseConfiguration:

dict

param knowledgeBaseConfiguration:

[REQUIRED]

Specifies the configuration for the embeddings model used for the knowledge base. You must use the same configuration as when the knowledge base was created.

type (string) -- [REQUIRED]

The type of data that the data source is converted into for the knowledge base. Choose MANAGED to create a managed knowledge base.
vectorKnowledgeBaseConfiguration (dict) --

Contains details about the model that's used to convert the data source into vector embeddings.
- embeddingModelArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
- embeddingModelConfiguration (dict) --
  
  The embeddings model configuration details for the vector model used in Knowledge Base.
  - bedrockEmbeddingModelConfiguration (dict) --
    
    The vector configuration details on the Bedrock embeddings model.
    - dimensions (integer) --
      
      The dimensions details for the vector configuration used on the Bedrock embeddings model.
    - embeddingDataType (string) --
      
      The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
    - audio (list) --
      
      Configuration settings for processing audio content in multimodal knowledge bases.
      - (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
    - video (list) --
      
      Configuration settings for processing video content in multimodal knowledge bases.
      - (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
- supplementalDataStorageConfiguration (dict) --
  
  If you include multimodal data from your data source, use this object to specify configurations for the storage location of the images extracted from your documents. These images can be retrieved and returned to the end user. They can also be used in generation when using RetrieveAndGenerate.
  - storageLocations (list) -- [REQUIRED]
    
    A list of objects specifying storage locations for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
    - (dict) --
      
      Contains information about a storage location for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
      - type (string) -- [REQUIRED]
        
        Specifies the storage service used for this location.
      - s3Location (dict) --
        
        Contains information about the Amazon S3 location for the extracted multimedia content.
        
        uri (string) -- [REQUIRED]
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
managedKnowledgeBaseConfiguration (dict) --

Configurations for a managed knowledge base.
- embeddingModelType (string) --
  
  Choose CUSTOM to provide your own Bedrock embedding model ARN. Choose MANAGED to use a service-managed embedding model. For more information, see Embedding model options.
- embeddingModelArn (string) --
  
  The ARN for the embeddings model.
- embeddingModelConfiguration (dict) --
  
  The configuration details for the embeddings model.
  - bedrockEmbeddingModelConfiguration (dict) --
    
    The vector configuration details on the Bedrock embeddings model.
    - dimensions (integer) --
      
      The dimensions details for the vector configuration used on the Bedrock embeddings model.
    - embeddingDataType (string) --
      
      The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
    - audio (list) --
      
      Configuration settings for processing audio content in multimodal knowledge bases.
      - (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
    - video (list) --
      
      Configuration settings for processing video content in multimodal knowledge bases.
      - (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) -- [REQUIRED]
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) -- [REQUIRED]
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
- serverSideEncryptionConfiguration (dict) --
  
  Contains the configuration for server-side encryption for your managed knowledge base.
  - kmsKeyArn (string) --
    
    The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
kendraKnowledgeBaseConfiguration (dict) --

Settings for an Amazon Kendra knowledge base.
- kendraIndexArn (string) -- [REQUIRED]
  
  The ARN of the Amazon Kendra index.
sqlKnowledgeBaseConfiguration (dict) --

Specifies configurations for a knowledge base connected to an SQL database.
- type (string) -- [REQUIRED]
  
  The type of SQL database to connect to the knowledge base.
- redshiftConfiguration (dict) --
  
  Specifies configurations for a knowledge base connected to an Amazon Redshift database.
  - storageConfigurations (list) -- [REQUIRED]
    
    Specifies configurations for Amazon Redshift database storage.
    - (dict) --
      
      Contains configurations for Amazon Redshift data storage. Specify the data storage service to use in the type field and include the corresponding field. For more information, see Build a knowledge base by connecting to a structured data source in the Amazon Bedrock User Guide.
      - type (string) -- [REQUIRED]
        
        The data storage service to use.
      - awsDataCatalogConfiguration (dict) --
        
        Specifies configurations for storage in Glue Data Catalog.
        
        tableNames (list) -- [REQUIRED]
        
        A list of names of the tables to use.
        
        (string) --
      - redshiftConfiguration (dict) --
        
        Specifies configurations for storage in Amazon Redshift.
        
        databaseName (string) -- [REQUIRED]
        
        The name of the Amazon Redshift database.
  - queryEngineConfiguration (dict) -- [REQUIRED]
    
    Specifies configurations for an Amazon Redshift query engine.
    - type (string) -- [REQUIRED]
      
      The type of query engine.
    - serverlessConfiguration (dict) --
      
      Specifies configurations for a serverless Amazon Redshift query engine.
      - workgroupArn (string) -- [REQUIRED]
        
        The ARN of the Amazon Redshift workgroup.
      - authConfiguration (dict) -- [REQUIRED]
        
        Specifies configurations for authentication to an Amazon Redshift provisioned data warehouse.
        
        type (string) -- [REQUIRED]
        
        The type of authentication to use.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
    - provisionedConfiguration (dict) --
      
      Specifies configurations for a provisioned Amazon Redshift query engine.
      - clusterIdentifier (string) -- [REQUIRED]
        
        The ID of the Amazon Redshift cluster.
      - authConfiguration (dict) -- [REQUIRED]
        
        Specifies configurations for authentication to Amazon Redshift.
        
        type (string) -- [REQUIRED]
        
        The type of authentication to use.
        
        databaseUser (string) --
        
        The database username for authentication to an Amazon Redshift provisioned data warehouse.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
  - queryGenerationConfiguration (dict) --
    
    Specifies configurations for generating queries.
    - executionTimeoutSeconds (integer) --
      
      The time after which query generation will time out.
    - generationContext (dict) --
      
      Specifies configurations for context to use during query generation.
      - tables (list) --
        
        An array of objects, each of which defines information about a table in the database.
        
        (dict) --
        
        Contains information about a table for the query engine to consider.
        
        name (string) -- [REQUIRED]
        
        The name of the table for which the other fields in this object apply.
        
        description (string) --
        
        A description of the table that helps the query engine understand the contents of the table.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the table during query generation. If you specify EXCLUDE, the table will be ignored. If you specify INCLUDE, all other tables will be ignored.
        
        columns (list) --
        
        An array of objects, each of which defines information about a column in the table.
        
        (dict) --
        
        Contains information about a column in the current table for the query engine to consider.
        
        name (string) --
        
        The name of the column for which the other fields in this object apply.
        
        description (string) --
        
        A description of the column that helps the query engine understand the contents of the column.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the column during query generation. If you specify EXCLUDE, the column will be ignored. If you specify INCLUDE, all other columns in the table will be ignored.
      - curatedQueries (list) --
        
        An array of objects, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        (dict) --
        
        Contains configurations for a query, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        naturalLanguage (string) -- [REQUIRED]
        
        An example natural language query.
        
        sql (string) -- [REQUIRED]
        
        The SQL equivalent of the natural language query.

type storageConfiguration:

dict

param storageConfiguration:

Specifies the configuration for the vector store used for the knowledge base. You must use the same configuration as when the knowledge base was created.

type (string) -- [REQUIRED]

The vector store service in which the knowledge base is stored.
opensearchServerlessConfiguration (dict) --

Contains the storage configuration of the knowledge base in Amazon OpenSearch Service.
- collectionArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the OpenSearch Service vector store.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the vector store.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
opensearchManagedClusterConfiguration (dict) --

Contains details about the storage configuration of the knowledge base in OpenSearch Managed Cluster. For more information, see Create a vector index in Amazon OpenSearch Service.
- domainEndpoint (string) -- [REQUIRED]
  
  The endpoint URL the OpenSearch domain.
- domainArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the OpenSearch domain.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the vector store.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
pineconeConfiguration (dict) --

Contains the storage configuration of the knowledge base in Pinecone.
- connectionString (string) -- [REQUIRED]
  
  The endpoint URL for your index management page.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Pinecone API key.
- namespace (string) --
  
  The namespace to be used to write new data to your database.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
redisEnterpriseCloudConfiguration (dict) --

Contains the storage configuration of the knowledge base in Redis Enterprise Cloud.
- endpoint (string) -- [REQUIRED]
  
  The endpoint URL of the Redis Enterprise Cloud database.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the vector index.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Redis Enterprise Cloud database.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
rdsConfiguration (dict) --

Contains details about the storage configuration of the knowledge base in Amazon RDS. For more information, see Create a vector index in Amazon RDS.
- resourceArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the vector store.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Amazon RDS database.
- databaseName (string) -- [REQUIRED]
  
  The name of your Amazon RDS database.
- tableName (string) -- [REQUIRED]
  
  The name of the table in the database.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - primaryKeyField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the ID for each entry.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
  - customMetadataField (string) --
    
    Provide a name for the universal metadata field where Amazon Bedrock will store any custom metadata from your data source.
mongoDbAtlasConfiguration (dict) --

Contains the storage configuration of the knowledge base in MongoDB Atlas.
- endpoint (string) -- [REQUIRED]
  
  The endpoint URL of your MongoDB Atlas cluster for your knowledge base.
- databaseName (string) -- [REQUIRED]
  
  The database name in your MongoDB Atlas cluster for your knowledge base.
- collectionName (string) -- [REQUIRED]
  
  The collection name of the knowledge base in MongoDB Atlas.
- vectorIndexName (string) -- [REQUIRED]
  
  The name of the MongoDB Atlas vector search index.
- credentialsSecretArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that contains user credentials for your MongoDB Atlas cluster.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - vectorField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
- endpointServiceName (string) --
  
  The name of the VPC endpoint service in your account that is connected to your MongoDB Atlas cluster.
- textIndexName (string) --
  
  The name of the text search index in the MongoDB collection. This is required for using the hybrid search feature.
neptuneAnalyticsConfiguration (dict) --

Contains details about the Neptune Analytics configuration of the knowledge base in Amazon Neptune. For more information, see Create a vector index in Amazon Neptune Analytics..
- graphArn (string) -- [REQUIRED]
  
  The Amazon Resource Name (ARN) of the Neptune Analytics vector store.
- fieldMapping (dict) -- [REQUIRED]
  
  Contains the names of the fields to which to map information about the vector store.
  - textField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
  - metadataField (string) -- [REQUIRED]
    
    The name of the field in which Amazon Bedrock stores metadata about the vector store.
s3VectorsConfiguration (dict) --

The configuration settings for storing knowledge base data using S3 vectors. This includes vector index information and S3 bucket details for vector storage.
- vectorBucketArn (string) --
  
  The Amazon Resource Name (ARN) of the S3 bucket where vector embeddings are stored. This bucket contains the vector data used by the knowledge base.
- indexArn (string) --
  
  The Amazon Resource Name (ARN) of the vector index used for the knowledge base. This ARN identifies the specific vector index resource within Amazon Bedrock.
- indexName (string) --
  
  The name of the vector index used for the knowledge base. This name identifies the vector index within the Amazon Bedrock service.

rtype:

dict

returns:

Response Syntax

{
    'knowledgeBase': {
        'knowledgeBaseId': 'string',
        'name': 'string',
        'knowledgeBaseArn': 'string',
        'description': 'string',
        'roleArn': 'string',
        'knowledgeBaseConfiguration': {
            'type': 'VECTOR'|'KENDRA'|'SQL'|'MANAGED',
            'vectorKnowledgeBaseConfiguration': {
                'embeddingModelArn': 'string',
                'embeddingModelConfiguration': {
                    'bedrockEmbeddingModelConfiguration': {
                        'dimensions': 123,
                        'embeddingDataType': 'FLOAT32'|'BINARY',
                        'audio': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ],
                        'video': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ]
                    }
                },
                'supplementalDataStorageConfiguration': {
                    'storageLocations': [
                        {
                            'type': 'S3',
                            's3Location': {
                                'uri': 'string'
                            }
                        },
                    ]
                }
            },
            'managedKnowledgeBaseConfiguration': {
                'embeddingModelType': 'CUSTOM'|'MANAGED',
                'embeddingModelArn': 'string',
                'embeddingModelConfiguration': {
                    'bedrockEmbeddingModelConfiguration': {
                        'dimensions': 123,
                        'embeddingDataType': 'FLOAT32'|'BINARY',
                        'audio': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ],
                        'video': [
                            {
                                'segmentationConfiguration': {
                                    'fixedLengthDuration': 123
                                }
                            },
                        ]
                    }
                },
                'serverSideEncryptionConfiguration': {
                    'kmsKeyArn': 'string'
                }
            },
            'kendraKnowledgeBaseConfiguration': {
                'kendraIndexArn': 'string'
            },
            'sqlKnowledgeBaseConfiguration': {
                'type': 'REDSHIFT',
                'redshiftConfiguration': {
                    'storageConfigurations': [
                        {
                            'type': 'REDSHIFT'|'AWS_DATA_CATALOG',
                            'awsDataCatalogConfiguration': {
                                'tableNames': [
                                    'string',
                                ]
                            },
                            'redshiftConfiguration': {
                                'databaseName': 'string'
                            }
                        },
                    ],
                    'queryEngineConfiguration': {
                        'type': 'SERVERLESS'|'PROVISIONED',
                        'serverlessConfiguration': {
                            'workgroupArn': 'string',
                            'authConfiguration': {
                                'type': 'IAM'|'USERNAME_PASSWORD',
                                'usernamePasswordSecretArn': 'string'
                            }
                        },
                        'provisionedConfiguration': {
                            'clusterIdentifier': 'string',
                            'authConfiguration': {
                                'type': 'IAM'|'USERNAME_PASSWORD'|'USERNAME',
                                'databaseUser': 'string',
                                'usernamePasswordSecretArn': 'string'
                            }
                        }
                    },
                    'queryGenerationConfiguration': {
                        'executionTimeoutSeconds': 123,
                        'generationContext': {
                            'tables': [
                                {
                                    'name': 'string',
                                    'description': 'string',
                                    'inclusion': 'INCLUDE'|'EXCLUDE',
                                    'columns': [
                                        {
                                            'name': 'string',
                                            'description': 'string',
                                            'inclusion': 'INCLUDE'|'EXCLUDE'
                                        },
                                    ]
                                },
                            ],
                            'curatedQueries': [
                                {
                                    'naturalLanguage': 'string',
                                    'sql': 'string'
                                },
                            ]
                        }
                    }
                }
            }
        },
        'storageConfiguration': {
            'type': 'OPENSEARCH_SERVERLESS'|'PINECONE'|'REDIS_ENTERPRISE_CLOUD'|'RDS'|'MONGO_DB_ATLAS'|'NEPTUNE_ANALYTICS'|'OPENSEARCH_MANAGED_CLUSTER'|'S3_VECTORS',
            'opensearchServerlessConfiguration': {
                'collectionArn': 'string',
                'vectorIndexName': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'opensearchManagedClusterConfiguration': {
                'domainEndpoint': 'string',
                'domainArn': 'string',
                'vectorIndexName': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'pineconeConfiguration': {
                'connectionString': 'string',
                'credentialsSecretArn': 'string',
                'namespace': 'string',
                'fieldMapping': {
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'redisEnterpriseCloudConfiguration': {
                'endpoint': 'string',
                'vectorIndexName': 'string',
                'credentialsSecretArn': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            'rdsConfiguration': {
                'resourceArn': 'string',
                'credentialsSecretArn': 'string',
                'databaseName': 'string',
                'tableName': 'string',
                'fieldMapping': {
                    'primaryKeyField': 'string',
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string',
                    'customMetadataField': 'string'
                }
            },
            'mongoDbAtlasConfiguration': {
                'endpoint': 'string',
                'databaseName': 'string',
                'collectionName': 'string',
                'vectorIndexName': 'string',
                'credentialsSecretArn': 'string',
                'fieldMapping': {
                    'vectorField': 'string',
                    'textField': 'string',
                    'metadataField': 'string'
                },
                'endpointServiceName': 'string',
                'textIndexName': 'string'
            },
            'neptuneAnalyticsConfiguration': {
                'graphArn': 'string',
                'fieldMapping': {
                    'textField': 'string',
                    'metadataField': 'string'
                }
            },
            's3VectorsConfiguration': {
                'vectorBucketArn': 'string',
                'indexArn': 'string',
                'indexName': 'string'
            }
        },
        'status': 'CREATING'|'ACTIVE'|'DELETING'|'UPDATING'|'FAILED'|'DELETE_UNSUCCESSFUL'|'UPDATE_UNSUCCESSFUL',
        'createdAt': datetime(2015, 1, 1),
        'updatedAt': datetime(2015, 1, 1),
        'failureReasons': [
            'string',
        ]
    }
}

Response Structure

(dict) --
- knowledgeBase (dict) --
  
  Contains details about the knowledge base.
  - knowledgeBaseId (string) --
    
    The unique identifier of the knowledge base.
  - name (string) --
    
    The name of the knowledge base.
  - knowledgeBaseArn (string) --
    
    The Amazon Resource Name (ARN) of the knowledge base.
  - description (string) --
    
    The description of the knowledge base.
  - roleArn (string) --
    
    The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the knowledge base.
  - knowledgeBaseConfiguration (dict) --
    
    Contains details about the embeddings configuration of the knowledge base.
    - type (string) --
      
      The type of data that the data source is converted into for the knowledge base. Choose MANAGED to create a managed knowledge base.
    - vectorKnowledgeBaseConfiguration (dict) --
      
      Contains details about the model that's used to convert the data source into vector embeddings.
      - embeddingModelArn (string) --
        
        The Amazon Resource Name (ARN) of the model used to create vector embeddings for the knowledge base.
      - embeddingModelConfiguration (dict) --
        
        The embeddings model configuration details for the vector model used in Knowledge Base.
        
        bedrockEmbeddingModelConfiguration (dict) --
        
        The vector configuration details on the Bedrock embeddings model.
        
        dimensions (integer) --
        
        The dimensions details for the vector configuration used on the Bedrock embeddings model.
        
        embeddingDataType (string) --
        
        The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
        
        audio (list) --
        
        Configuration settings for processing audio content in multimodal knowledge bases.
        
        (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
        
        video (list) --
        
        Configuration settings for processing video content in multimodal knowledge bases.
        
        (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
      - supplementalDataStorageConfiguration (dict) --
        
        If you include multimodal data from your data source, use this object to specify configurations for the storage location of the images extracted from your documents. These images can be retrieved and returned to the end user. They can also be used in generation when using RetrieveAndGenerate.
        
        storageLocations (list) --
        
        A list of objects specifying storage locations for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
        
        (dict) --
        
        Contains information about a storage location for multimedia content (images, audio, and video) extracted from multimodal documents in your data source.
        
        type (string) --
        
        Specifies the storage service used for this location.
        
        s3Location (dict) --
        
        Contains information about the Amazon S3 location for the extracted multimedia content.
        
        uri (string) --
        
        The location's URI. For example, s3://my-bucket/chunk-processor/.
    - managedKnowledgeBaseConfiguration (dict) --
      
      Configurations for a managed knowledge base.
      - embeddingModelType (string) --
        
        Choose CUSTOM to provide your own Bedrock embedding model ARN. Choose MANAGED to use a service-managed embedding model. For more information, see Embedding model options.
      - embeddingModelArn (string) --
        
        The ARN for the embeddings model.
      - embeddingModelConfiguration (dict) --
        
        The configuration details for the embeddings model.
        
        bedrockEmbeddingModelConfiguration (dict) --
        
        The vector configuration details on the Bedrock embeddings model.
        
        dimensions (integer) --
        
        The dimensions details for the vector configuration used on the Bedrock embeddings model.
        
        embeddingDataType (string) --
        
        The data type for the vectors when using a model to convert text into vector embeddings. The model must support the specified data type for vector embeddings. Floating-point (float32) is the default data type, and is supported by most models for vector embeddings. See Supported embeddings models for information on the available models and their vector data types.
        
        audio (list) --
        
        Configuration settings for processing audio content in multimodal knowledge bases.
        
        (dict) --
        
        Audio configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting audio content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each audio segment. Audio files will be divided into chunks of this length for processing.
        
        video (list) --
        
        Configuration settings for processing video content in multimodal knowledge bases.
        
        (dict) --
        
        Video configuration for multi modal ingestion.
        
        segmentationConfiguration (dict) --
        
        Configuration for segmenting video content during processing.
        
        fixedLengthDuration (integer) --
        
        The duration in seconds for each video segment. Video files will be divided into chunks of this length for processing.
      - serverSideEncryptionConfiguration (dict) --
        
        Contains the configuration for server-side encryption for your managed knowledge base.
        
        kmsKeyArn (string) --
        
        The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
    - kendraKnowledgeBaseConfiguration (dict) --
      
      Settings for an Amazon Kendra knowledge base.
      - kendraIndexArn (string) --
        
        The ARN of the Amazon Kendra index.
    - sqlKnowledgeBaseConfiguration (dict) --
      
      Specifies configurations for a knowledge base connected to an SQL database.
      - type (string) --
        
        The type of SQL database to connect to the knowledge base.
      - redshiftConfiguration (dict) --
        
        Specifies configurations for a knowledge base connected to an Amazon Redshift database.
        
        storageConfigurations (list) --
        
        Specifies configurations for Amazon Redshift database storage.
        
        (dict) --
        
        Contains configurations for Amazon Redshift data storage. Specify the data storage service to use in the type field and include the corresponding field. For more information, see Build a knowledge base by connecting to a structured data source in the Amazon Bedrock User Guide.
        
        type (string) --
        
        The data storage service to use.
        
        awsDataCatalogConfiguration (dict) --
        
        Specifies configurations for storage in Glue Data Catalog.
        
        tableNames (list) --
        
        A list of names of the tables to use.
        
        (string) --
        
        redshiftConfiguration (dict) --
        
        Specifies configurations for storage in Amazon Redshift.
        
        databaseName (string) --
        
        The name of the Amazon Redshift database.
        
        queryEngineConfiguration (dict) --
        
        Specifies configurations for an Amazon Redshift query engine.
        
        type (string) --
        
        The type of query engine.
        
        serverlessConfiguration (dict) --
        
        Specifies configurations for a serverless Amazon Redshift query engine.
        
        workgroupArn (string) --
        
        The ARN of the Amazon Redshift workgroup.
        
        authConfiguration (dict) --
        
        Specifies configurations for authentication to an Amazon Redshift provisioned data warehouse.
        
        type (string) --
        
        The type of authentication to use.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
        
        provisionedConfiguration (dict) --
        
        Specifies configurations for a provisioned Amazon Redshift query engine.
        
        clusterIdentifier (string) --
        
        The ID of the Amazon Redshift cluster.
        
        authConfiguration (dict) --
        
        Specifies configurations for authentication to Amazon Redshift.
        
        type (string) --
        
        The type of authentication to use.
        
        databaseUser (string) --
        
        The database username for authentication to an Amazon Redshift provisioned data warehouse.
        
        usernamePasswordSecretArn (string) --
        
        The ARN of an Secrets Manager secret for authentication.
        
        queryGenerationConfiguration (dict) --
        
        Specifies configurations for generating queries.
        
        executionTimeoutSeconds (integer) --
        
        The time after which query generation will time out.
        
        generationContext (dict) --
        
        Specifies configurations for context to use during query generation.
        
        tables (list) --
        
        An array of objects, each of which defines information about a table in the database.
        
        (dict) --
        
        Contains information about a table for the query engine to consider.
        
        name (string) --
        
        The name of the table for which the other fields in this object apply.
        
        description (string) --
        
        A description of the table that helps the query engine understand the contents of the table.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the table during query generation. If you specify EXCLUDE, the table will be ignored. If you specify INCLUDE, all other tables will be ignored.
        
        columns (list) --
        
        An array of objects, each of which defines information about a column in the table.
        
        (dict) --
        
        Contains information about a column in the current table for the query engine to consider.
        
        name (string) --
        
        The name of the column for which the other fields in this object apply.
        
        description (string) --
        
        A description of the column that helps the query engine understand the contents of the column.
        
        inclusion (string) --
        
        Specifies whether to include or exclude the column during query generation. If you specify EXCLUDE, the column will be ignored. If you specify INCLUDE, all other columns in the table will be ignored.
        
        curatedQueries (list) --
        
        An array of objects, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        (dict) --
        
        Contains configurations for a query, each of which defines information about example queries to help the query engine generate appropriate SQL queries.
        
        naturalLanguage (string) --
        
        An example natural language query.
        
        sql (string) --
        
        The SQL equivalent of the natural language query.
  - storageConfiguration (dict) --
    
    Contains details about the storage configuration of the knowledge base.
    - type (string) --
      
      The vector store service in which the knowledge base is stored.
    - opensearchServerlessConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Amazon OpenSearch Service.
      - collectionArn (string) --
        
        The Amazon Resource Name (ARN) of the OpenSearch Service vector store.
      - vectorIndexName (string) --
        
        The name of the vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - opensearchManagedClusterConfiguration (dict) --
      
      Contains details about the storage configuration of the knowledge base in OpenSearch Managed Cluster. For more information, see Create a vector index in Amazon OpenSearch Service.
      - domainEndpoint (string) --
        
        The endpoint URL the OpenSearch domain.
      - domainArn (string) --
        
        The Amazon Resource Name (ARN) of the OpenSearch domain.
      - vectorIndexName (string) --
        
        The name of the vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - pineconeConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Pinecone.
      - connectionString (string) --
        
        The endpoint URL for your index management page.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Pinecone API key.
      - namespace (string) --
        
        The namespace to be used to write new data to your database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - redisEnterpriseCloudConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in Redis Enterprise Cloud.
      - endpoint (string) --
        
        The endpoint URL of the Redis Enterprise Cloud database.
      - vectorIndexName (string) --
        
        The name of the vector index.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Redis Enterprise Cloud database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - rdsConfiguration (dict) --
      
      Contains details about the storage configuration of the knowledge base in Amazon RDS. For more information, see Create a vector index in Amazon RDS.
      - resourceArn (string) --
        
        The Amazon Resource Name (ARN) of the vector store.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that is linked to your Amazon RDS database.
      - databaseName (string) --
        
        The name of your Amazon RDS database.
      - tableName (string) --
        
        The name of the table in the database.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        primaryKeyField (string) --
        
        The name of the field in which Amazon Bedrock stores the ID for each entry.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
        
        customMetadataField (string) --
        
        Provide a name for the universal metadata field where Amazon Bedrock will store any custom metadata from your data source.
    - mongoDbAtlasConfiguration (dict) --
      
      Contains the storage configuration of the knowledge base in MongoDB Atlas.
      - endpoint (string) --
        
        The endpoint URL of your MongoDB Atlas cluster for your knowledge base.
      - databaseName (string) --
        
        The database name in your MongoDB Atlas cluster for your knowledge base.
      - collectionName (string) --
        
        The collection name of the knowledge base in MongoDB Atlas.
      - vectorIndexName (string) --
        
        The name of the MongoDB Atlas vector search index.
      - credentialsSecretArn (string) --
        
        The Amazon Resource Name (ARN) of the secret that you created in Secrets Manager that contains user credentials for your MongoDB Atlas cluster.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        vectorField (string) --
        
        The name of the field in which Amazon Bedrock stores the vector embeddings for your data sources.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
      - endpointServiceName (string) --
        
        The name of the VPC endpoint service in your account that is connected to your MongoDB Atlas cluster.
      - textIndexName (string) --
        
        The name of the text search index in the MongoDB collection. This is required for using the hybrid search feature.
    - neptuneAnalyticsConfiguration (dict) --
      
      Contains details about the Neptune Analytics configuration of the knowledge base in Amazon Neptune. For more information, see Create a vector index in Amazon Neptune Analytics..
      - graphArn (string) --
        
        The Amazon Resource Name (ARN) of the Neptune Analytics vector store.
      - fieldMapping (dict) --
        
        Contains the names of the fields to which to map information about the vector store.
        
        textField (string) --
        
        The name of the field in which Amazon Bedrock stores the raw text from your data. The text is split according to the chunking strategy you choose.
        
        metadataField (string) --
        
        The name of the field in which Amazon Bedrock stores metadata about the vector store.
    - s3VectorsConfiguration (dict) --
      
      The configuration settings for storing knowledge base data using S3 vectors. This includes vector index information and S3 bucket details for vector storage.
      - vectorBucketArn (string) --
        
        The Amazon Resource Name (ARN) of the S3 bucket where vector embeddings are stored. This bucket contains the vector data used by the knowledge base.
      - indexArn (string) --
        
        The Amazon Resource Name (ARN) of the vector index used for the knowledge base. This ARN identifies the specific vector index resource within Amazon Bedrock.
      - indexName (string) --
        
        The name of the vector index used for the knowledge base. This name identifies the vector index within the Amazon Bedrock service.
  - status (string) --
    
    The status of the knowledge base. The following statuses are possible:
    - CREATING – The knowledge base is being created.
    - ACTIVE – The knowledge base is ready to be queried.
    - DELETING – The knowledge base is being deleted.
    - UPDATING – The knowledge base is being updated.
    - FAILED – The knowledge base API operation failed.
  - createdAt (datetime) --
    
    The time the knowledge base was created.
  - updatedAt (datetime) --
    
    The time the knowledge base was last updated.
  - failureReasons (list) --
    
    A list of reasons that the API operation on the knowledge base failed.
    - (string) --