2024/12/20 - Agents for Amazon Bedrock - 7 updated api methods
Changes Support for custom user agent and max web pages crawled for web connector. Support app only credentials for SharePoint connector. Increase agents memory duration limit to 365 days. Support to specify max number of session summaries to include in agent invocation context.
{'memoryConfiguration': {'sessionSummaryConfiguration': {'maxRecentSessions': 'integer'}}, 'promptOverrideConfiguration': {'promptConfigurations': {'promptType': {'MEMORY_SUMMARIZATION'}}}}Response
{'agent': {'memoryConfiguration': {'sessionSummaryConfiguration': {'maxRecentSessions': 'integer'}}, 'promptOverrideConfiguration': {'promptConfigurations': {'promptType': {'MEMORY_SUMMARIZATION'}}}}}
Creates an agent that orchestrates interactions between foundation models, data sources, software applications, user conversations, and APIs to carry out tasks to help customers.
Specify the following fields for security purposes.
agentResourceRoleArn – The Amazon Resource Name (ARN) of the role with permissions to invoke API operations on an agent.
(Optional) customerEncryptionKeyArn – The Amazon Resource Name (ARN) of a KMS key to encrypt the creation of the agent.
(Optional) idleSessionTTLinSeconds – Specify the number of seconds for which the agent should maintain session information. After this time expires, the subsequent InvokeAgent request begins a new session.
To enable your agent to retain conversational context across multiple sessions, include a memoryConfiguration object. For more information, see Configure memory.
To override the default prompt behavior for agent orchestration and to use advanced prompts, include a promptOverrideConfiguration object. For more information, see Advanced prompts.
If your agent fails to be created, the response returns a list of failureReasons alongside a list of recommendedActions for you to troubleshoot.
The agent instructions will not be honored if your agent has only one knowledge base, uses default prompts, has no action group, and user input is disabled.
See also: AWS API Documentation
Request Syntax
client.create_agent( agentCollaboration='SUPERVISOR'|'SUPERVISOR_ROUTER'|'DISABLED', agentName='string', agentResourceRoleArn='string', clientToken='string', customOrchestration={ 'executor': { 'lambda': 'string' } }, customerEncryptionKeyArn='string', description='string', foundationModel='string', guardrailConfiguration={ 'guardrailIdentifier': 'string', 'guardrailVersion': 'string' }, idleSessionTTLInSeconds=123, instruction='string', memoryConfiguration={ 'enabledMemoryTypes': [ 'SESSION_SUMMARY', ], 'sessionSummaryConfiguration': { 'maxRecentSessions': 123 }, 'storageDays': 123 }, orchestrationType='DEFAULT'|'CUSTOM_ORCHESTRATION', promptOverrideConfiguration={ 'overrideLambda': 'string', 'promptConfigurations': [ { 'basePromptTemplate': 'string', 'foundationModel': 'string', 'inferenceConfiguration': { 'maximumLength': 123, 'stopSequences': [ 'string', ], 'temperature': ..., 'topK': 123, 'topP': ... }, 'parserMode': 'DEFAULT'|'OVERRIDDEN', 'promptCreationMode': 'DEFAULT'|'OVERRIDDEN', 'promptState': 'ENABLED'|'DISABLED', 'promptType': 'PRE_PROCESSING'|'ORCHESTRATION'|'POST_PROCESSING'|'KNOWLEDGE_BASE_RESPONSE_GENERATION'|'MEMORY_SUMMARIZATION' }, ] }, tags={ 'string': 'string' } )
string
The agent's collaboration role.
string
[REQUIRED]
A name for the agent that you create.
string
The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the agent.
string
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
dict
Contains details of the custom orchestration configured for the agent.
executor (dict) --
The structure of the executor invoking the actions in custom orchestration.
lambda (string) --
The Amazon Resource Name (ARN) of the Lambda function containing the business logic that is carried out upon invoking the action.
string
The Amazon Resource Name (ARN) of the KMS key with which to encrypt the agent.
string
A description of the agent.
string
The identifier for the model that you want to be used for orchestration by the agent you create.
The modelId to provide depends on the type of model or throughput that you use:
If you use a base model, specify the model ID or its ARN. For a list of model IDs for base models, see Amazon Bedrock base model IDs (on-demand throughput) in the Amazon Bedrock User Guide.
If you use an inference profile, specify the inference profile ID or its ARN. For a list of inference profile IDs, see Supported Regions and models for cross-region inference in the Amazon Bedrock User Guide.
If you use a provisioned model, specify the ARN of the Provisioned Throughput. For more information, see Run inference using a Provisioned Throughput in the Amazon Bedrock User Guide.
If you use a custom model, first purchase Provisioned Throughput for it. Then specify the ARN of the resulting provisioned model. For more information, see Use a custom model in Amazon Bedrock in the Amazon Bedrock User Guide.
If you use an imported model, specify the ARN of the imported model. You can get the model ARN from a successful call to CreateModelImportJob or from the Imported models page in the Amazon Bedrock console.
dict
The unique Guardrail configuration assigned to the agent when it is created.
guardrailIdentifier (string) --
The unique identifier of the guardrail.
guardrailVersion (string) --
The version of the guardrail.
integer
The number of seconds for which Amazon Bedrock keeps information about a user's conversation with the agent.
A user interaction remains active for the amount of time specified. If no conversation occurs during this time, the session expires and Amazon Bedrock deletes any data provided before the timeout.
string
Instructions that tell the agent what it should do and how it should interact with users.
dict
Contains the details of the memory configured for the agent.
enabledMemoryTypes (list) -- [REQUIRED]
The type of memory that is stored.
(string) --
sessionSummaryConfiguration (dict) --
Contains the configuration for SESSION_SUMMARY memory type enabled for the agent.
maxRecentSessions (integer) --
Maximum number of recent session summaries to include in the agent's prompt context.
storageDays (integer) --
The number of days the agent is configured to retain the conversational context.
string
Specifies the type of orchestration strategy for the agent. This is set to DEFAULT orchestration type, by default.
dict
Contains configurations to override prompts in different parts of an agent sequence. For more information, see Advanced prompts.
overrideLambda (string) --
The ARN of the Lambda function to use when parsing the raw foundation model output in parts of the agent sequence. If you specify this field, at least one of the promptConfigurations must contain a parserMode value that is set to OVERRIDDEN. For more information, see Parser Lambda function in Amazon Bedrock Agents.
promptConfigurations (list) -- [REQUIRED]
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
(dict) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
basePromptTemplate (string) --
Defines the prompt template with which to replace the default prompt template. You can use placeholder variables in the base prompt template to customize the prompt. For more information, see Prompt template placeholder variables. For more information, see Configure the prompt templates.
foundationModel (string) --
The agent's foundation model.
inferenceConfiguration (dict) --
Contains inference parameters to use when the agent invokes a foundation model in the part of the agent sequence defined by the promptType. For more information, see Inference parameters for foundation models.
maximumLength (integer) --
The maximum number of tokens to allow in the generated response.
stopSequences (list) --
A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.
(string) --
temperature (float) --
The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options, while a higher value makes the model more likely to choose lower-probability options.
topK (integer) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for topK is the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topK to 50, the model selects the next token from among the top 50 most likely choices.
topP (float) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for Top P determines the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topP to 80, the model only selects the next token from the top 80% of the probability distribution of next tokens.
parserMode (string) --
Specifies whether to override the default parser Lambda function when parsing the raw foundation model output in the part of the agent sequence defined by the promptType. If you set the field as OVERRIDEN, the overrideLambda field in the PromptOverrideConfiguration must be specified with the ARN of a Lambda function.
promptCreationMode (string) --
Specifies whether to override the default prompt template for this promptType. Set this value to OVERRIDDEN to use the prompt that you provide in the basePromptTemplate. If you leave it as DEFAULT, the agent uses a default prompt template.
promptState (string) --
Specifies whether to allow the agent to carry out the step specified in the promptType. If you set this value to DISABLED, the agent skips that step. The default state for each promptType is as follows.
PRE_PROCESSING – ENABLED
ORCHESTRATION – ENABLED
KNOWLEDGE_BASE_RESPONSE_GENERATION – ENABLED
POST_PROCESSING – DISABLED
promptType (string) --
The step in the agent sequence that this prompt configuration applies to.
dict
Any tags that you want to attach to the agent.
(string) --
(string) --
dict
Response Syntax
{ 'agent': { 'agentArn': 'string', 'agentCollaboration': 'SUPERVISOR'|'SUPERVISOR_ROUTER'|'DISABLED', 'agentId': 'string', 'agentName': 'string', 'agentResourceRoleArn': 'string', 'agentStatus': 'CREATING'|'PREPARING'|'PREPARED'|'NOT_PREPARED'|'DELETING'|'FAILED'|'VERSIONING'|'UPDATING', 'agentVersion': 'string', 'clientToken': 'string', 'createdAt': datetime(2015, 1, 1), 'customOrchestration': { 'executor': { 'lambda': 'string' } }, 'customerEncryptionKeyArn': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'foundationModel': 'string', 'guardrailConfiguration': { 'guardrailIdentifier': 'string', 'guardrailVersion': 'string' }, 'idleSessionTTLInSeconds': 123, 'instruction': 'string', 'memoryConfiguration': { 'enabledMemoryTypes': [ 'SESSION_SUMMARY', ], 'sessionSummaryConfiguration': { 'maxRecentSessions': 123 }, 'storageDays': 123 }, 'orchestrationType': 'DEFAULT'|'CUSTOM_ORCHESTRATION', 'preparedAt': datetime(2015, 1, 1), 'promptOverrideConfiguration': { 'overrideLambda': 'string', 'promptConfigurations': [ { 'basePromptTemplate': 'string', 'foundationModel': 'string', 'inferenceConfiguration': { 'maximumLength': 123, 'stopSequences': [ 'string', ], 'temperature': ..., 'topK': 123, 'topP': ... }, 'parserMode': 'DEFAULT'|'OVERRIDDEN', 'promptCreationMode': 'DEFAULT'|'OVERRIDDEN', 'promptState': 'ENABLED'|'DISABLED', 'promptType': 'PRE_PROCESSING'|'ORCHESTRATION'|'POST_PROCESSING'|'KNOWLEDGE_BASE_RESPONSE_GENERATION'|'MEMORY_SUMMARIZATION' }, ] }, 'recommendedActions': [ 'string', ], 'updatedAt': datetime(2015, 1, 1) } }
Response Structure
(dict) --
agent (dict) --
Contains details about the agent created.
agentArn (string) --
The Amazon Resource Name (ARN) of the agent.
agentCollaboration (string) --
The agent's collaboration settings.
agentId (string) --
The unique identifier of the agent.
agentName (string) --
The name of the agent.
agentResourceRoleArn (string) --
The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the agent.
agentStatus (string) --
The status of the agent and whether it is ready for use. The following statuses are possible:
CREATING – The agent is being created.
PREPARING – The agent is being prepared.
PREPARED – The agent is prepared and ready to be invoked.
NOT_PREPARED – The agent has been created but not yet prepared.
FAILED – The agent API operation failed.
UPDATING – The agent is being updated.
DELETING – The agent is being deleted.
agentVersion (string) --
The version of the agent.
clientToken (string) --
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
createdAt (datetime) --
The time at which the agent was created.
customOrchestration (dict) --
Contains custom orchestration configurations for the agent.
executor (dict) --
The structure of the executor invoking the actions in custom orchestration.
lambda (string) --
The Amazon Resource Name (ARN) of the Lambda function containing the business logic that is carried out upon invoking the action.
customerEncryptionKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key that encrypts the agent.
description (string) --
The description of the agent.
failureReasons (list) --
Contains reasons that the agent-related API that you invoked failed.
(string) --
foundationModel (string) --
The foundation model used for orchestration by the agent.
guardrailConfiguration (dict) --
Details about the guardrail associated with the agent.
guardrailIdentifier (string) --
The unique identifier of the guardrail.
guardrailVersion (string) --
The version of the guardrail.
idleSessionTTLInSeconds (integer) --
The number of seconds for which Amazon Bedrock keeps information about a user's conversation with the agent.
A user interaction remains active for the amount of time specified. If no conversation occurs during this time, the session expires and Amazon Bedrock deletes any data provided before the timeout.
instruction (string) --
Instructions that tell the agent what it should do and how it should interact with users.
memoryConfiguration (dict) --
Contains memory configuration for the agent.
enabledMemoryTypes (list) --
The type of memory that is stored.
(string) --
sessionSummaryConfiguration (dict) --
Contains the configuration for SESSION_SUMMARY memory type enabled for the agent.
maxRecentSessions (integer) --
Maximum number of recent session summaries to include in the agent's prompt context.
storageDays (integer) --
The number of days the agent is configured to retain the conversational context.
orchestrationType (string) --
Specifies the orchestration strategy for the agent.
preparedAt (datetime) --
The time at which the agent was last prepared.
promptOverrideConfiguration (dict) --
Contains configurations to override prompt templates in different parts of an agent sequence. For more information, see Advanced prompts.
overrideLambda (string) --
The ARN of the Lambda function to use when parsing the raw foundation model output in parts of the agent sequence. If you specify this field, at least one of the promptConfigurations must contain a parserMode value that is set to OVERRIDDEN. For more information, see Parser Lambda function in Amazon Bedrock Agents.
promptConfigurations (list) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
(dict) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
basePromptTemplate (string) --
Defines the prompt template with which to replace the default prompt template. You can use placeholder variables in the base prompt template to customize the prompt. For more information, see Prompt template placeholder variables. For more information, see Configure the prompt templates.
foundationModel (string) --
The agent's foundation model.
inferenceConfiguration (dict) --
Contains inference parameters to use when the agent invokes a foundation model in the part of the agent sequence defined by the promptType. For more information, see Inference parameters for foundation models.
maximumLength (integer) --
The maximum number of tokens to allow in the generated response.
stopSequences (list) --
A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.
(string) --
temperature (float) --
The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options, while a higher value makes the model more likely to choose lower-probability options.
topK (integer) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for topK is the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topK to 50, the model selects the next token from among the top 50 most likely choices.
topP (float) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for Top P determines the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topP to 80, the model only selects the next token from the top 80% of the probability distribution of next tokens.
parserMode (string) --
Specifies whether to override the default parser Lambda function when parsing the raw foundation model output in the part of the agent sequence defined by the promptType. If you set the field as OVERRIDEN, the overrideLambda field in the PromptOverrideConfiguration must be specified with the ARN of a Lambda function.
promptCreationMode (string) --
Specifies whether to override the default prompt template for this promptType. Set this value to OVERRIDDEN to use the prompt that you provide in the basePromptTemplate. If you leave it as DEFAULT, the agent uses a default prompt template.
promptState (string) --
Specifies whether to allow the agent to carry out the step specified in the promptType. If you set this value to DISABLED, the agent skips that step. The default state for each promptType is as follows.
PRE_PROCESSING – ENABLED
ORCHESTRATION – ENABLED
KNOWLEDGE_BASE_RESPONSE_GENERATION – ENABLED
POST_PROCESSING – DISABLED
promptType (string) --
The step in the agent sequence that this prompt configuration applies to.
recommendedActions (list) --
Contains recommended actions to take for the agent-related API that you invoked to succeed.
(string) --
updatedAt (datetime) --
The time at which the agent was last updated.
{'dataSourceConfiguration': {'sharePointConfiguration': {'sourceConfiguration': {'authType': {'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS'}}}, 'webConfiguration': {'crawlerConfiguration': {'crawlerLimits': {'maxPages': 'integer'}, 'userAgent': 'string'}}}}Response
{'dataSource': {'dataSourceConfiguration': {'sharePointConfiguration': {'sourceConfiguration': {'authType': {'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS'}}}, 'webConfiguration': {'crawlerConfiguration': {'crawlerLimits': {'maxPages': 'integer'}, 'userAgent': 'string'}}}}}
Connects a knowledge base to a data source. You specify the configuration for the specific data source service in the dataSourceConfiguration field.
See also: AWS API Documentation
Request Syntax
client.create_data_source( clientToken='string', dataDeletionPolicy='RETAIN'|'DELETE', dataSourceConfiguration={ 'confluenceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostType': 'SAAS', 'hostUrl': 'string' } }, 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'salesforceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostUrl': 'string' } }, 'sharePointConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'domain': 'string', 'hostType': 'ONLINE', 'siteUrls': [ 'string', ], 'tenantId': 'string' } }, 'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA', 'webConfiguration': { 'crawlerConfiguration': { 'crawlerLimits': { 'maxPages': 123, 'rateLimit': 123 }, 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'scope': 'HOST_ONLY'|'SUBDOMAINS', 'userAgent': 'string' }, 'sourceConfiguration': { 'urlConfiguration': { 'seedUrls': [ { 'url': 'string' }, ] } } } }, description='string', knowledgeBaseId='string', name='string', serverSideEncryptionConfiguration={ 'kmsKeyArn': 'string' }, vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 }, 'hierarchicalChunkingConfiguration': { 'levelConfigurations': [ { 'maxTokens': 123 }, ], 'overlapTokens': 123 }, 'semanticChunkingConfiguration': { 'breakpointPercentileThreshold': 123, 'bufferSize': 123, 'maxTokens': 123 } }, 'customTransformationConfiguration': { 'intermediateStorage': { 's3Location': { 'uri': 'string' } }, 'transformations': [ { 'stepToApply': 'POST_CHUNKING', 'transformationFunction': { 'transformationLambdaConfiguration': { 'lambdaArn': 'string' } } }, ] }, 'parsingConfiguration': { 'bedrockDataAutomationConfiguration': { 'parsingModality': 'MULTIMODAL' }, 'bedrockFoundationModelConfiguration': { 'modelArn': 'string', 'parsingModality': 'MULTIMODAL', 'parsingPrompt': { 'parsingPromptText': 'string' } }, 'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION' } } )
string
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
This field is autopopulated if not provided.
string
The data deletion policy for the data source.
You can set the data deletion policy to:
DELETE: Deletes all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted, only the data. This flag is ignored if an Amazon Web Services account is deleted.
RETAIN: Retains all data from your data source that’s converted into vector embeddings upon deletion of a knowledge base or data source resource. Note that the vector store itself is not deleted if you delete a knowledge base or data source resource.
dict
[REQUIRED]
The connection configuration for the data source.
confluenceConfiguration (dict) --
The configuration information to connect to Confluence as your data source.
crawlerConfiguration (dict) --
The configuration of the Confluence content. For example, configuring specific types of Confluence content.
filterConfiguration (dict) --
The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) -- [REQUIRED]
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) -- [REQUIRED]
The supported object type or content type of the data source.
type (string) -- [REQUIRED]
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) -- [REQUIRED]
The endpoint information to connect to your Confluence data source.
authType (string) -- [REQUIRED]
The supported authentication type to authenticate and connect to your Confluence instance.
credentialsSecretArn (string) -- [REQUIRED]
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
hostType (string) -- [REQUIRED]
The supported host type, whether online/cloud or server/on-premises.
hostUrl (string) -- [REQUIRED]
The Confluence host URL or instance URL.
s3Configuration (dict) --
The configuration information to connect to Amazon S3 as your data source.
bucketArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
bucketOwnerAccountId (string) --
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) --
A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
(string) --
salesforceConfiguration (dict) --
The configuration information to connect to Salesforce as your data source.
crawlerConfiguration (dict) --
The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
filterConfiguration (dict) --
The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) -- [REQUIRED]
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) -- [REQUIRED]
The supported object type or content type of the data source.
type (string) -- [REQUIRED]
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) -- [REQUIRED]
The endpoint information to connect to your Salesforce data source.
authType (string) -- [REQUIRED]
The supported authentication type to authenticate and connect to your Salesforce instance.
credentialsSecretArn (string) -- [REQUIRED]
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
hostUrl (string) -- [REQUIRED]
The Salesforce host URL or instance URL.
sharePointConfiguration (dict) --
The configuration information to connect to SharePoint as your data source.
crawlerConfiguration (dict) --
The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
filterConfiguration (dict) --
The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) -- [REQUIRED]
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) -- [REQUIRED]
The supported object type or content type of the data source.
type (string) -- [REQUIRED]
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) -- [REQUIRED]
The endpoint information to connect to your SharePoint data source.
authType (string) -- [REQUIRED]
The supported authentication type to authenticate and connect to your SharePoint site/sites.
credentialsSecretArn (string) -- [REQUIRED]
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
domain (string) -- [REQUIRED]
The domain of your SharePoint instance or site URL/URLs.
hostType (string) -- [REQUIRED]
The supported host type, whether online/cloud or server/on-premises.
siteUrls (list) -- [REQUIRED]
A list of one or more SharePoint site URLs.
(string) --
tenantId (string) --
The identifier of your Microsoft 365 tenant.
type (string) -- [REQUIRED]
The type of data source.
webConfiguration (dict) --
The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
crawlerConfiguration (dict) --
The Web Crawler configuration details for the web data source.
crawlerLimits (dict) --
The configuration of crawl limits for the web URLs.
maxPages (integer) --
The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
rateLimit (integer) --
The max rate at which pages are crawled, up to 300 per minute per host.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
scope (string) --
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
userAgent (string) --
A string used for identifying the crawler or a bot when it accesses a web server. By default, this is set to bedrockbot_UUID for your crawler. You can optionally append a custom string to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
sourceConfiguration (dict) -- [REQUIRED]
The source configuration details for the web data source.
urlConfiguration (dict) -- [REQUIRED]
The configuration of the URL/URLs.
seedUrls (list) --
One or more seed or starting point URLs.
(dict) --
The seed or starting point URL. You should be authorized to crawl the URL.
url (string) --
A seed or starting point URL.
string
A description of the data source.
string
[REQUIRED]
The unique identifier of the knowledge base to which to add the data source.
string
[REQUIRED]
The name of the data source.
dict
Contains details about the server-side encryption for the data source.
kmsKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
dict
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) --
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) -- [REQUIRED]
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) --
Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
maxTokens (integer) -- [REQUIRED]
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) -- [REQUIRED]
The percentage of overlap between adjacent chunks of a data source.
hierarchicalChunkingConfiguration (dict) --
Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
levelConfigurations (list) -- [REQUIRED]
Token settings for each layer.
(dict) --
Token settings for a layer in a hierarchical chunking configuration.
maxTokens (integer) -- [REQUIRED]
The maximum number of tokens that a chunk can contain in this layer.
overlapTokens (integer) -- [REQUIRED]
The number of tokens to repeat across chunks in the same layer.
semanticChunkingConfiguration (dict) --
Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
breakpointPercentileThreshold (integer) -- [REQUIRED]
The dissimilarity threshold for splitting chunks.
bufferSize (integer) -- [REQUIRED]
The buffer size.
maxTokens (integer) -- [REQUIRED]
The maximum number of tokens that a chunk can contain.
customTransformationConfiguration (dict) --
A custom document transformer for parsed data source documents.
intermediateStorage (dict) -- [REQUIRED]
An S3 bucket path for input and output objects.
s3Location (dict) -- [REQUIRED]
An S3 bucket path.
uri (string) -- [REQUIRED]
The location's URI. For example, s3://my-bucket/chunk-processor/.
transformations (list) -- [REQUIRED]
A Lambda function that processes documents.
(dict) --
A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
stepToApply (string) -- [REQUIRED]
When the service applies the transformation.
transformationFunction (dict) -- [REQUIRED]
A Lambda function that processes documents.
transformationLambdaConfiguration (dict) -- [REQUIRED]
The Lambda function.
lambdaArn (string) -- [REQUIRED]
The function's ARN identifier.
parsingConfiguration (dict) --
Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
bedrockDataAutomationConfiguration (dict) --
If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
bedrockFoundationModelConfiguration (dict) --
If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
modelArn (string) -- [REQUIRED]
The ARN of the foundation model or inference profile to use for parsing.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
parsingPrompt (dict) --
Instructions for interpreting the contents of a document.
parsingPromptText (string) -- [REQUIRED]
Instructions for interpreting the contents of a document.
parsingStrategy (string) -- [REQUIRED]
The parsing strategy for the data source.
dict
Response Syntax
{ 'dataSource': { 'createdAt': datetime(2015, 1, 1), 'dataDeletionPolicy': 'RETAIN'|'DELETE', 'dataSourceConfiguration': { 'confluenceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostType': 'SAAS', 'hostUrl': 'string' } }, 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'salesforceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostUrl': 'string' } }, 'sharePointConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'domain': 'string', 'hostType': 'ONLINE', 'siteUrls': [ 'string', ], 'tenantId': 'string' } }, 'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA', 'webConfiguration': { 'crawlerConfiguration': { 'crawlerLimits': { 'maxPages': 123, 'rateLimit': 123 }, 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'scope': 'HOST_ONLY'|'SUBDOMAINS', 'userAgent': 'string' }, 'sourceConfiguration': { 'urlConfiguration': { 'seedUrls': [ { 'url': 'string' }, ] } } } }, 'dataSourceId': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'knowledgeBaseId': 'string', 'name': 'string', 'serverSideEncryptionConfiguration': { 'kmsKeyArn': 'string' }, 'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL', 'updatedAt': datetime(2015, 1, 1), 'vectorIngestionConfiguration': { 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 }, 'hierarchicalChunkingConfiguration': { 'levelConfigurations': [ { 'maxTokens': 123 }, ], 'overlapTokens': 123 }, 'semanticChunkingConfiguration': { 'breakpointPercentileThreshold': 123, 'bufferSize': 123, 'maxTokens': 123 } }, 'customTransformationConfiguration': { 'intermediateStorage': { 's3Location': { 'uri': 'string' } }, 'transformations': [ { 'stepToApply': 'POST_CHUNKING', 'transformationFunction': { 'transformationLambdaConfiguration': { 'lambdaArn': 'string' } } }, ] }, 'parsingConfiguration': { 'bedrockDataAutomationConfiguration': { 'parsingModality': 'MULTIMODAL' }, 'bedrockFoundationModelConfiguration': { 'modelArn': 'string', 'parsingModality': 'MULTIMODAL', 'parsingPrompt': { 'parsingPromptText': 'string' } }, 'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION' } } } }
Response Structure
(dict) --
dataSource (dict) --
Contains details about the data source.
createdAt (datetime) --
The time at which the data source was created.
dataDeletionPolicy (string) --
The data deletion policy for the data source.
dataSourceConfiguration (dict) --
The connection configuration for the data source.
confluenceConfiguration (dict) --
The configuration information to connect to Confluence as your data source.
crawlerConfiguration (dict) --
The configuration of the Confluence content. For example, configuring specific types of Confluence content.
filterConfiguration (dict) --
The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your Confluence data source.
authType (string) --
The supported authentication type to authenticate and connect to your Confluence instance.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
hostType (string) --
The supported host type, whether online/cloud or server/on-premises.
hostUrl (string) --
The Confluence host URL or instance URL.
s3Configuration (dict) --
The configuration information to connect to Amazon S3 as your data source.
bucketArn (string) --
The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
bucketOwnerAccountId (string) --
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) --
A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
(string) --
salesforceConfiguration (dict) --
The configuration information to connect to Salesforce as your data source.
crawlerConfiguration (dict) --
The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
filterConfiguration (dict) --
The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your Salesforce data source.
authType (string) --
The supported authentication type to authenticate and connect to your Salesforce instance.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
hostUrl (string) --
The Salesforce host URL or instance URL.
sharePointConfiguration (dict) --
The configuration information to connect to SharePoint as your data source.
crawlerConfiguration (dict) --
The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
filterConfiguration (dict) --
The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your SharePoint data source.
authType (string) --
The supported authentication type to authenticate and connect to your SharePoint site/sites.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
domain (string) --
The domain of your SharePoint instance or site URL/URLs.
hostType (string) --
The supported host type, whether online/cloud or server/on-premises.
siteUrls (list) --
A list of one or more SharePoint site URLs.
(string) --
tenantId (string) --
The identifier of your Microsoft 365 tenant.
type (string) --
The type of data source.
webConfiguration (dict) --
The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
crawlerConfiguration (dict) --
The Web Crawler configuration details for the web data source.
crawlerLimits (dict) --
The configuration of crawl limits for the web URLs.
maxPages (integer) --
The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
rateLimit (integer) --
The max rate at which pages are crawled, up to 300 per minute per host.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
scope (string) --
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
userAgent (string) --
A string used for identifying the crawler or a bot when it accesses a web server. By default, this is set to bedrockbot_UUID for your crawler. You can optionally append a custom string to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
sourceConfiguration (dict) --
The source configuration details for the web data source.
urlConfiguration (dict) --
The configuration of the URL/URLs.
seedUrls (list) --
One or more seed or starting point URLs.
(dict) --
The seed or starting point URL. You should be authorized to crawl the URL.
url (string) --
A seed or starting point URL.
dataSourceId (string) --
The unique identifier of the data source.
description (string) --
The description of the data source.
failureReasons (list) --
The detailed reasons on the failure to delete a data source.
(string) --
knowledgeBaseId (string) --
The unique identifier of the knowledge base to which the data source belongs.
name (string) --
The name of the data source.
serverSideEncryptionConfiguration (dict) --
Contains details about the configuration of the server-side encryption.
kmsKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
status (string) --
The status of the data source. The following statuses are possible:
Available – The data source has been created and is ready for ingestion into the knowledge base.
Deleting – The data source is being deleted.
updatedAt (datetime) --
The time at which the data source was last updated.
vectorIngestionConfiguration (dict) --
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) --
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) --
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) --
Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
maxTokens (integer) --
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) --
The percentage of overlap between adjacent chunks of a data source.
hierarchicalChunkingConfiguration (dict) --
Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
levelConfigurations (list) --
Token settings for each layer.
(dict) --
Token settings for a layer in a hierarchical chunking configuration.
maxTokens (integer) --
The maximum number of tokens that a chunk can contain in this layer.
overlapTokens (integer) --
The number of tokens to repeat across chunks in the same layer.
semanticChunkingConfiguration (dict) --
Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
breakpointPercentileThreshold (integer) --
The dissimilarity threshold for splitting chunks.
bufferSize (integer) --
The buffer size.
maxTokens (integer) --
The maximum number of tokens that a chunk can contain.
customTransformationConfiguration (dict) --
A custom document transformer for parsed data source documents.
intermediateStorage (dict) --
An S3 bucket path for input and output objects.
s3Location (dict) --
An S3 bucket path.
uri (string) --
The location's URI. For example, s3://my-bucket/chunk-processor/.
transformations (list) --
A Lambda function that processes documents.
(dict) --
A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
stepToApply (string) --
When the service applies the transformation.
transformationFunction (dict) --
A Lambda function that processes documents.
transformationLambdaConfiguration (dict) --
The Lambda function.
lambdaArn (string) --
The function's ARN identifier.
parsingConfiguration (dict) --
Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
bedrockDataAutomationConfiguration (dict) --
If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
bedrockFoundationModelConfiguration (dict) --
If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
modelArn (string) --
The ARN of the foundation model or inference profile to use for parsing.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
parsingPrompt (dict) --
Instructions for interpreting the contents of a document.
parsingPromptText (string) --
Instructions for interpreting the contents of a document.
parsingStrategy (string) --
The parsing strategy for the data source.
{'agent': {'memoryConfiguration': {'sessionSummaryConfiguration': {'maxRecentSessions': 'integer'}}, 'promptOverrideConfiguration': {'promptConfigurations': {'promptType': {'MEMORY_SUMMARIZATION'}}}}}
Gets information about an agent.
See also: AWS API Documentation
Request Syntax
client.get_agent( agentId='string' )
string
[REQUIRED]
The unique identifier of the agent.
dict
Response Syntax
{ 'agent': { 'agentArn': 'string', 'agentCollaboration': 'SUPERVISOR'|'SUPERVISOR_ROUTER'|'DISABLED', 'agentId': 'string', 'agentName': 'string', 'agentResourceRoleArn': 'string', 'agentStatus': 'CREATING'|'PREPARING'|'PREPARED'|'NOT_PREPARED'|'DELETING'|'FAILED'|'VERSIONING'|'UPDATING', 'agentVersion': 'string', 'clientToken': 'string', 'createdAt': datetime(2015, 1, 1), 'customOrchestration': { 'executor': { 'lambda': 'string' } }, 'customerEncryptionKeyArn': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'foundationModel': 'string', 'guardrailConfiguration': { 'guardrailIdentifier': 'string', 'guardrailVersion': 'string' }, 'idleSessionTTLInSeconds': 123, 'instruction': 'string', 'memoryConfiguration': { 'enabledMemoryTypes': [ 'SESSION_SUMMARY', ], 'sessionSummaryConfiguration': { 'maxRecentSessions': 123 }, 'storageDays': 123 }, 'orchestrationType': 'DEFAULT'|'CUSTOM_ORCHESTRATION', 'preparedAt': datetime(2015, 1, 1), 'promptOverrideConfiguration': { 'overrideLambda': 'string', 'promptConfigurations': [ { 'basePromptTemplate': 'string', 'foundationModel': 'string', 'inferenceConfiguration': { 'maximumLength': 123, 'stopSequences': [ 'string', ], 'temperature': ..., 'topK': 123, 'topP': ... }, 'parserMode': 'DEFAULT'|'OVERRIDDEN', 'promptCreationMode': 'DEFAULT'|'OVERRIDDEN', 'promptState': 'ENABLED'|'DISABLED', 'promptType': 'PRE_PROCESSING'|'ORCHESTRATION'|'POST_PROCESSING'|'KNOWLEDGE_BASE_RESPONSE_GENERATION'|'MEMORY_SUMMARIZATION' }, ] }, 'recommendedActions': [ 'string', ], 'updatedAt': datetime(2015, 1, 1) } }
Response Structure
(dict) --
agent (dict) --
Contains details about the agent.
agentArn (string) --
The Amazon Resource Name (ARN) of the agent.
agentCollaboration (string) --
The agent's collaboration settings.
agentId (string) --
The unique identifier of the agent.
agentName (string) --
The name of the agent.
agentResourceRoleArn (string) --
The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the agent.
agentStatus (string) --
The status of the agent and whether it is ready for use. The following statuses are possible:
CREATING – The agent is being created.
PREPARING – The agent is being prepared.
PREPARED – The agent is prepared and ready to be invoked.
NOT_PREPARED – The agent has been created but not yet prepared.
FAILED – The agent API operation failed.
UPDATING – The agent is being updated.
DELETING – The agent is being deleted.
agentVersion (string) --
The version of the agent.
clientToken (string) --
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
createdAt (datetime) --
The time at which the agent was created.
customOrchestration (dict) --
Contains custom orchestration configurations for the agent.
executor (dict) --
The structure of the executor invoking the actions in custom orchestration.
lambda (string) --
The Amazon Resource Name (ARN) of the Lambda function containing the business logic that is carried out upon invoking the action.
customerEncryptionKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key that encrypts the agent.
description (string) --
The description of the agent.
failureReasons (list) --
Contains reasons that the agent-related API that you invoked failed.
(string) --
foundationModel (string) --
The foundation model used for orchestration by the agent.
guardrailConfiguration (dict) --
Details about the guardrail associated with the agent.
guardrailIdentifier (string) --
The unique identifier of the guardrail.
guardrailVersion (string) --
The version of the guardrail.
idleSessionTTLInSeconds (integer) --
The number of seconds for which Amazon Bedrock keeps information about a user's conversation with the agent.
A user interaction remains active for the amount of time specified. If no conversation occurs during this time, the session expires and Amazon Bedrock deletes any data provided before the timeout.
instruction (string) --
Instructions that tell the agent what it should do and how it should interact with users.
memoryConfiguration (dict) --
Contains memory configuration for the agent.
enabledMemoryTypes (list) --
The type of memory that is stored.
(string) --
sessionSummaryConfiguration (dict) --
Contains the configuration for SESSION_SUMMARY memory type enabled for the agent.
maxRecentSessions (integer) --
Maximum number of recent session summaries to include in the agent's prompt context.
storageDays (integer) --
The number of days the agent is configured to retain the conversational context.
orchestrationType (string) --
Specifies the orchestration strategy for the agent.
preparedAt (datetime) --
The time at which the agent was last prepared.
promptOverrideConfiguration (dict) --
Contains configurations to override prompt templates in different parts of an agent sequence. For more information, see Advanced prompts.
overrideLambda (string) --
The ARN of the Lambda function to use when parsing the raw foundation model output in parts of the agent sequence. If you specify this field, at least one of the promptConfigurations must contain a parserMode value that is set to OVERRIDDEN. For more information, see Parser Lambda function in Amazon Bedrock Agents.
promptConfigurations (list) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
(dict) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
basePromptTemplate (string) --
Defines the prompt template with which to replace the default prompt template. You can use placeholder variables in the base prompt template to customize the prompt. For more information, see Prompt template placeholder variables. For more information, see Configure the prompt templates.
foundationModel (string) --
The agent's foundation model.
inferenceConfiguration (dict) --
Contains inference parameters to use when the agent invokes a foundation model in the part of the agent sequence defined by the promptType. For more information, see Inference parameters for foundation models.
maximumLength (integer) --
The maximum number of tokens to allow in the generated response.
stopSequences (list) --
A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.
(string) --
temperature (float) --
The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options, while a higher value makes the model more likely to choose lower-probability options.
topK (integer) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for topK is the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topK to 50, the model selects the next token from among the top 50 most likely choices.
topP (float) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for Top P determines the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topP to 80, the model only selects the next token from the top 80% of the probability distribution of next tokens.
parserMode (string) --
Specifies whether to override the default parser Lambda function when parsing the raw foundation model output in the part of the agent sequence defined by the promptType. If you set the field as OVERRIDEN, the overrideLambda field in the PromptOverrideConfiguration must be specified with the ARN of a Lambda function.
promptCreationMode (string) --
Specifies whether to override the default prompt template for this promptType. Set this value to OVERRIDDEN to use the prompt that you provide in the basePromptTemplate. If you leave it as DEFAULT, the agent uses a default prompt template.
promptState (string) --
Specifies whether to allow the agent to carry out the step specified in the promptType. If you set this value to DISABLED, the agent skips that step. The default state for each promptType is as follows.
PRE_PROCESSING – ENABLED
ORCHESTRATION – ENABLED
KNOWLEDGE_BASE_RESPONSE_GENERATION – ENABLED
POST_PROCESSING – DISABLED
promptType (string) --
The step in the agent sequence that this prompt configuration applies to.
recommendedActions (list) --
Contains recommended actions to take for the agent-related API that you invoked to succeed.
(string) --
updatedAt (datetime) --
The time at which the agent was last updated.
{'agentVersion': {'memoryConfiguration': {'sessionSummaryConfiguration': {'maxRecentSessions': 'integer'}}, 'promptOverrideConfiguration': {'promptConfigurations': {'promptType': {'MEMORY_SUMMARIZATION'}}}}}
Gets details about a version of an agent.
See also: AWS API Documentation
Request Syntax
client.get_agent_version( agentId='string', agentVersion='string' )
string
[REQUIRED]
The unique identifier of the agent.
string
[REQUIRED]
The version of the agent.
dict
Response Syntax
{ 'agentVersion': { 'agentArn': 'string', 'agentCollaboration': 'SUPERVISOR'|'SUPERVISOR_ROUTER'|'DISABLED', 'agentId': 'string', 'agentName': 'string', 'agentResourceRoleArn': 'string', 'agentStatus': 'CREATING'|'PREPARING'|'PREPARED'|'NOT_PREPARED'|'DELETING'|'FAILED'|'VERSIONING'|'UPDATING', 'createdAt': datetime(2015, 1, 1), 'customerEncryptionKeyArn': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'foundationModel': 'string', 'guardrailConfiguration': { 'guardrailIdentifier': 'string', 'guardrailVersion': 'string' }, 'idleSessionTTLInSeconds': 123, 'instruction': 'string', 'memoryConfiguration': { 'enabledMemoryTypes': [ 'SESSION_SUMMARY', ], 'sessionSummaryConfiguration': { 'maxRecentSessions': 123 }, 'storageDays': 123 }, 'promptOverrideConfiguration': { 'overrideLambda': 'string', 'promptConfigurations': [ { 'basePromptTemplate': 'string', 'foundationModel': 'string', 'inferenceConfiguration': { 'maximumLength': 123, 'stopSequences': [ 'string', ], 'temperature': ..., 'topK': 123, 'topP': ... }, 'parserMode': 'DEFAULT'|'OVERRIDDEN', 'promptCreationMode': 'DEFAULT'|'OVERRIDDEN', 'promptState': 'ENABLED'|'DISABLED', 'promptType': 'PRE_PROCESSING'|'ORCHESTRATION'|'POST_PROCESSING'|'KNOWLEDGE_BASE_RESPONSE_GENERATION'|'MEMORY_SUMMARIZATION' }, ] }, 'recommendedActions': [ 'string', ], 'updatedAt': datetime(2015, 1, 1), 'version': 'string' } }
Response Structure
(dict) --
agentVersion (dict) --
Contains details about the version of the agent.
agentArn (string) --
The Amazon Resource Name (ARN) of the agent that the version belongs to.
agentCollaboration (string) --
The agent's collaboration settings.
agentId (string) --
The unique identifier of the agent that the version belongs to.
agentName (string) --
The name of the agent that the version belongs to.
agentResourceRoleArn (string) --
The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the agent.
agentStatus (string) --
The status of the agent that the version belongs to.
createdAt (datetime) --
The time at which the version was created.
customerEncryptionKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key that encrypts the agent.
description (string) --
The description of the version.
failureReasons (list) --
A list of reasons that the API operation on the version failed.
(string) --
foundationModel (string) --
The foundation model that the version invokes.
guardrailConfiguration (dict) --
Details about the guardrail associated with the agent.
guardrailIdentifier (string) --
The unique identifier of the guardrail.
guardrailVersion (string) --
The version of the guardrail.
idleSessionTTLInSeconds (integer) --
The number of seconds for which Amazon Bedrock keeps information about a user's conversation with the agent.
A user interaction remains active for the amount of time specified. If no conversation occurs during this time, the session expires and Amazon Bedrock deletes any data provided before the timeout.
instruction (string) --
The instructions provided to the agent.
memoryConfiguration (dict) --
Contains details of the memory configuration on the version of the agent.
enabledMemoryTypes (list) --
The type of memory that is stored.
(string) --
sessionSummaryConfiguration (dict) --
Contains the configuration for SESSION_SUMMARY memory type enabled for the agent.
maxRecentSessions (integer) --
Maximum number of recent session summaries to include in the agent's prompt context.
storageDays (integer) --
The number of days the agent is configured to retain the conversational context.
promptOverrideConfiguration (dict) --
Contains configurations to override prompt templates in different parts of an agent sequence. For more information, see Advanced prompts.
overrideLambda (string) --
The ARN of the Lambda function to use when parsing the raw foundation model output in parts of the agent sequence. If you specify this field, at least one of the promptConfigurations must contain a parserMode value that is set to OVERRIDDEN. For more information, see Parser Lambda function in Amazon Bedrock Agents.
promptConfigurations (list) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
(dict) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
basePromptTemplate (string) --
Defines the prompt template with which to replace the default prompt template. You can use placeholder variables in the base prompt template to customize the prompt. For more information, see Prompt template placeholder variables. For more information, see Configure the prompt templates.
foundationModel (string) --
The agent's foundation model.
inferenceConfiguration (dict) --
Contains inference parameters to use when the agent invokes a foundation model in the part of the agent sequence defined by the promptType. For more information, see Inference parameters for foundation models.
maximumLength (integer) --
The maximum number of tokens to allow in the generated response.
stopSequences (list) --
A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.
(string) --
temperature (float) --
The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options, while a higher value makes the model more likely to choose lower-probability options.
topK (integer) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for topK is the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topK to 50, the model selects the next token from among the top 50 most likely choices.
topP (float) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for Top P determines the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topP to 80, the model only selects the next token from the top 80% of the probability distribution of next tokens.
parserMode (string) --
Specifies whether to override the default parser Lambda function when parsing the raw foundation model output in the part of the agent sequence defined by the promptType. If you set the field as OVERRIDEN, the overrideLambda field in the PromptOverrideConfiguration must be specified with the ARN of a Lambda function.
promptCreationMode (string) --
Specifies whether to override the default prompt template for this promptType. Set this value to OVERRIDDEN to use the prompt that you provide in the basePromptTemplate. If you leave it as DEFAULT, the agent uses a default prompt template.
promptState (string) --
Specifies whether to allow the agent to carry out the step specified in the promptType. If you set this value to DISABLED, the agent skips that step. The default state for each promptType is as follows.
PRE_PROCESSING – ENABLED
ORCHESTRATION – ENABLED
KNOWLEDGE_BASE_RESPONSE_GENERATION – ENABLED
POST_PROCESSING – DISABLED
promptType (string) --
The step in the agent sequence that this prompt configuration applies to.
recommendedActions (list) --
A list of recommended actions to take for the failed API operation on the version to succeed.
(string) --
updatedAt (datetime) --
The time at which the version was last updated.
version (string) --
The version number.
{'dataSource': {'dataSourceConfiguration': {'sharePointConfiguration': {'sourceConfiguration': {'authType': {'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS'}}}, 'webConfiguration': {'crawlerConfiguration': {'crawlerLimits': {'maxPages': 'integer'}, 'userAgent': 'string'}}}}}
Gets information about a data source.
See also: AWS API Documentation
Request Syntax
client.get_data_source( dataSourceId='string', knowledgeBaseId='string' )
string
[REQUIRED]
The unique identifier of the data source.
string
[REQUIRED]
The unique identifier of the knowledge base for the data source.
dict
Response Syntax
{ 'dataSource': { 'createdAt': datetime(2015, 1, 1), 'dataDeletionPolicy': 'RETAIN'|'DELETE', 'dataSourceConfiguration': { 'confluenceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostType': 'SAAS', 'hostUrl': 'string' } }, 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'salesforceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostUrl': 'string' } }, 'sharePointConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'domain': 'string', 'hostType': 'ONLINE', 'siteUrls': [ 'string', ], 'tenantId': 'string' } }, 'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA', 'webConfiguration': { 'crawlerConfiguration': { 'crawlerLimits': { 'maxPages': 123, 'rateLimit': 123 }, 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'scope': 'HOST_ONLY'|'SUBDOMAINS', 'userAgent': 'string' }, 'sourceConfiguration': { 'urlConfiguration': { 'seedUrls': [ { 'url': 'string' }, ] } } } }, 'dataSourceId': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'knowledgeBaseId': 'string', 'name': 'string', 'serverSideEncryptionConfiguration': { 'kmsKeyArn': 'string' }, 'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL', 'updatedAt': datetime(2015, 1, 1), 'vectorIngestionConfiguration': { 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 }, 'hierarchicalChunkingConfiguration': { 'levelConfigurations': [ { 'maxTokens': 123 }, ], 'overlapTokens': 123 }, 'semanticChunkingConfiguration': { 'breakpointPercentileThreshold': 123, 'bufferSize': 123, 'maxTokens': 123 } }, 'customTransformationConfiguration': { 'intermediateStorage': { 's3Location': { 'uri': 'string' } }, 'transformations': [ { 'stepToApply': 'POST_CHUNKING', 'transformationFunction': { 'transformationLambdaConfiguration': { 'lambdaArn': 'string' } } }, ] }, 'parsingConfiguration': { 'bedrockDataAutomationConfiguration': { 'parsingModality': 'MULTIMODAL' }, 'bedrockFoundationModelConfiguration': { 'modelArn': 'string', 'parsingModality': 'MULTIMODAL', 'parsingPrompt': { 'parsingPromptText': 'string' } }, 'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION' } } } }
Response Structure
(dict) --
dataSource (dict) --
Contains details about the data source.
createdAt (datetime) --
The time at which the data source was created.
dataDeletionPolicy (string) --
The data deletion policy for the data source.
dataSourceConfiguration (dict) --
The connection configuration for the data source.
confluenceConfiguration (dict) --
The configuration information to connect to Confluence as your data source.
crawlerConfiguration (dict) --
The configuration of the Confluence content. For example, configuring specific types of Confluence content.
filterConfiguration (dict) --
The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your Confluence data source.
authType (string) --
The supported authentication type to authenticate and connect to your Confluence instance.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
hostType (string) --
The supported host type, whether online/cloud or server/on-premises.
hostUrl (string) --
The Confluence host URL or instance URL.
s3Configuration (dict) --
The configuration information to connect to Amazon S3 as your data source.
bucketArn (string) --
The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
bucketOwnerAccountId (string) --
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) --
A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
(string) --
salesforceConfiguration (dict) --
The configuration information to connect to Salesforce as your data source.
crawlerConfiguration (dict) --
The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
filterConfiguration (dict) --
The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your Salesforce data source.
authType (string) --
The supported authentication type to authenticate and connect to your Salesforce instance.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
hostUrl (string) --
The Salesforce host URL or instance URL.
sharePointConfiguration (dict) --
The configuration information to connect to SharePoint as your data source.
crawlerConfiguration (dict) --
The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
filterConfiguration (dict) --
The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your SharePoint data source.
authType (string) --
The supported authentication type to authenticate and connect to your SharePoint site/sites.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
domain (string) --
The domain of your SharePoint instance or site URL/URLs.
hostType (string) --
The supported host type, whether online/cloud or server/on-premises.
siteUrls (list) --
A list of one or more SharePoint site URLs.
(string) --
tenantId (string) --
The identifier of your Microsoft 365 tenant.
type (string) --
The type of data source.
webConfiguration (dict) --
The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
crawlerConfiguration (dict) --
The Web Crawler configuration details for the web data source.
crawlerLimits (dict) --
The configuration of crawl limits for the web URLs.
maxPages (integer) --
The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
rateLimit (integer) --
The max rate at which pages are crawled, up to 300 per minute per host.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
scope (string) --
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
userAgent (string) --
A string used for identifying the crawler or a bot when it accesses a web server. By default, this is set to bedrockbot_UUID for your crawler. You can optionally append a custom string to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
sourceConfiguration (dict) --
The source configuration details for the web data source.
urlConfiguration (dict) --
The configuration of the URL/URLs.
seedUrls (list) --
One or more seed or starting point URLs.
(dict) --
The seed or starting point URL. You should be authorized to crawl the URL.
url (string) --
A seed or starting point URL.
dataSourceId (string) --
The unique identifier of the data source.
description (string) --
The description of the data source.
failureReasons (list) --
The detailed reasons on the failure to delete a data source.
(string) --
knowledgeBaseId (string) --
The unique identifier of the knowledge base to which the data source belongs.
name (string) --
The name of the data source.
serverSideEncryptionConfiguration (dict) --
Contains details about the configuration of the server-side encryption.
kmsKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
status (string) --
The status of the data source. The following statuses are possible:
Available – The data source has been created and is ready for ingestion into the knowledge base.
Deleting – The data source is being deleted.
updatedAt (datetime) --
The time at which the data source was last updated.
vectorIngestionConfiguration (dict) --
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) --
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) --
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) --
Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
maxTokens (integer) --
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) --
The percentage of overlap between adjacent chunks of a data source.
hierarchicalChunkingConfiguration (dict) --
Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
levelConfigurations (list) --
Token settings for each layer.
(dict) --
Token settings for a layer in a hierarchical chunking configuration.
maxTokens (integer) --
The maximum number of tokens that a chunk can contain in this layer.
overlapTokens (integer) --
The number of tokens to repeat across chunks in the same layer.
semanticChunkingConfiguration (dict) --
Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
breakpointPercentileThreshold (integer) --
The dissimilarity threshold for splitting chunks.
bufferSize (integer) --
The buffer size.
maxTokens (integer) --
The maximum number of tokens that a chunk can contain.
customTransformationConfiguration (dict) --
A custom document transformer for parsed data source documents.
intermediateStorage (dict) --
An S3 bucket path for input and output objects.
s3Location (dict) --
An S3 bucket path.
uri (string) --
The location's URI. For example, s3://my-bucket/chunk-processor/.
transformations (list) --
A Lambda function that processes documents.
(dict) --
A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
stepToApply (string) --
When the service applies the transformation.
transformationFunction (dict) --
A Lambda function that processes documents.
transformationLambdaConfiguration (dict) --
The Lambda function.
lambdaArn (string) --
The function's ARN identifier.
parsingConfiguration (dict) --
Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
bedrockDataAutomationConfiguration (dict) --
If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
bedrockFoundationModelConfiguration (dict) --
If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
modelArn (string) --
The ARN of the foundation model or inference profile to use for parsing.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
parsingPrompt (dict) --
Instructions for interpreting the contents of a document.
parsingPromptText (string) --
Instructions for interpreting the contents of a document.
parsingStrategy (string) --
The parsing strategy for the data source.
{'memoryConfiguration': {'sessionSummaryConfiguration': {'maxRecentSessions': 'integer'}}, 'promptOverrideConfiguration': {'promptConfigurations': {'promptType': {'MEMORY_SUMMARIZATION'}}}}Response
{'agent': {'memoryConfiguration': {'sessionSummaryConfiguration': {'maxRecentSessions': 'integer'}}, 'promptOverrideConfiguration': {'promptConfigurations': {'promptType': {'MEMORY_SUMMARIZATION'}}}}}
Updates the configuration of an agent.
See also: AWS API Documentation
Request Syntax
client.update_agent( agentCollaboration='SUPERVISOR'|'SUPERVISOR_ROUTER'|'DISABLED', agentId='string', agentName='string', agentResourceRoleArn='string', customOrchestration={ 'executor': { 'lambda': 'string' } }, customerEncryptionKeyArn='string', description='string', foundationModel='string', guardrailConfiguration={ 'guardrailIdentifier': 'string', 'guardrailVersion': 'string' }, idleSessionTTLInSeconds=123, instruction='string', memoryConfiguration={ 'enabledMemoryTypes': [ 'SESSION_SUMMARY', ], 'sessionSummaryConfiguration': { 'maxRecentSessions': 123 }, 'storageDays': 123 }, orchestrationType='DEFAULT'|'CUSTOM_ORCHESTRATION', promptOverrideConfiguration={ 'overrideLambda': 'string', 'promptConfigurations': [ { 'basePromptTemplate': 'string', 'foundationModel': 'string', 'inferenceConfiguration': { 'maximumLength': 123, 'stopSequences': [ 'string', ], 'temperature': ..., 'topK': 123, 'topP': ... }, 'parserMode': 'DEFAULT'|'OVERRIDDEN', 'promptCreationMode': 'DEFAULT'|'OVERRIDDEN', 'promptState': 'ENABLED'|'DISABLED', 'promptType': 'PRE_PROCESSING'|'ORCHESTRATION'|'POST_PROCESSING'|'KNOWLEDGE_BASE_RESPONSE_GENERATION'|'MEMORY_SUMMARIZATION' }, ] } )
string
The agent's collaboration role.
string
[REQUIRED]
The unique identifier of the agent.
string
[REQUIRED]
Specifies a new name for the agent.
string
[REQUIRED]
The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the agent.
dict
Contains details of the custom orchestration configured for the agent.
executor (dict) --
The structure of the executor invoking the actions in custom orchestration.
lambda (string) --
The Amazon Resource Name (ARN) of the Lambda function containing the business logic that is carried out upon invoking the action.
string
The Amazon Resource Name (ARN) of the KMS key with which to encrypt the agent.
string
Specifies a new description of the agent.
string
[REQUIRED]
The identifier for the model that you want to be used for orchestration by the agent you create.
The modelId to provide depends on the type of model or throughput that you use:
If you use a base model, specify the model ID or its ARN. For a list of model IDs for base models, see Amazon Bedrock base model IDs (on-demand throughput) in the Amazon Bedrock User Guide.
If you use an inference profile, specify the inference profile ID or its ARN. For a list of inference profile IDs, see Supported Regions and models for cross-region inference in the Amazon Bedrock User Guide.
If you use a provisioned model, specify the ARN of the Provisioned Throughput. For more information, see Run inference using a Provisioned Throughput in the Amazon Bedrock User Guide.
If you use a custom model, first purchase Provisioned Throughput for it. Then specify the ARN of the resulting provisioned model. For more information, see Use a custom model in Amazon Bedrock in the Amazon Bedrock User Guide.
If you use an imported model, specify the ARN of the imported model. You can get the model ARN from a successful call to CreateModelImportJob or from the Imported models page in the Amazon Bedrock console.
dict
The unique Guardrail configuration assigned to the agent when it is updated.
guardrailIdentifier (string) --
The unique identifier of the guardrail.
guardrailVersion (string) --
The version of the guardrail.
integer
The number of seconds for which Amazon Bedrock keeps information about a user's conversation with the agent.
A user interaction remains active for the amount of time specified. If no conversation occurs during this time, the session expires and Amazon Bedrock deletes any data provided before the timeout.
string
Specifies new instructions that tell the agent what it should do and how it should interact with users.
dict
Specifies the new memory configuration for the agent.
enabledMemoryTypes (list) -- [REQUIRED]
The type of memory that is stored.
(string) --
sessionSummaryConfiguration (dict) --
Contains the configuration for SESSION_SUMMARY memory type enabled for the agent.
maxRecentSessions (integer) --
Maximum number of recent session summaries to include in the agent's prompt context.
storageDays (integer) --
The number of days the agent is configured to retain the conversational context.
string
Specifies the type of orchestration strategy for the agent. This is set to DEFAULT orchestration type, by default.
dict
Contains configurations to override prompts in different parts of an agent sequence. For more information, see Advanced prompts.
overrideLambda (string) --
The ARN of the Lambda function to use when parsing the raw foundation model output in parts of the agent sequence. If you specify this field, at least one of the promptConfigurations must contain a parserMode value that is set to OVERRIDDEN. For more information, see Parser Lambda function in Amazon Bedrock Agents.
promptConfigurations (list) -- [REQUIRED]
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
(dict) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
basePromptTemplate (string) --
Defines the prompt template with which to replace the default prompt template. You can use placeholder variables in the base prompt template to customize the prompt. For more information, see Prompt template placeholder variables. For more information, see Configure the prompt templates.
foundationModel (string) --
The agent's foundation model.
inferenceConfiguration (dict) --
Contains inference parameters to use when the agent invokes a foundation model in the part of the agent sequence defined by the promptType. For more information, see Inference parameters for foundation models.
maximumLength (integer) --
The maximum number of tokens to allow in the generated response.
stopSequences (list) --
A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.
(string) --
temperature (float) --
The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options, while a higher value makes the model more likely to choose lower-probability options.
topK (integer) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for topK is the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topK to 50, the model selects the next token from among the top 50 most likely choices.
topP (float) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for Top P determines the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topP to 80, the model only selects the next token from the top 80% of the probability distribution of next tokens.
parserMode (string) --
Specifies whether to override the default parser Lambda function when parsing the raw foundation model output in the part of the agent sequence defined by the promptType. If you set the field as OVERRIDEN, the overrideLambda field in the PromptOverrideConfiguration must be specified with the ARN of a Lambda function.
promptCreationMode (string) --
Specifies whether to override the default prompt template for this promptType. Set this value to OVERRIDDEN to use the prompt that you provide in the basePromptTemplate. If you leave it as DEFAULT, the agent uses a default prompt template.
promptState (string) --
Specifies whether to allow the agent to carry out the step specified in the promptType. If you set this value to DISABLED, the agent skips that step. The default state for each promptType is as follows.
PRE_PROCESSING – ENABLED
ORCHESTRATION – ENABLED
KNOWLEDGE_BASE_RESPONSE_GENERATION – ENABLED
POST_PROCESSING – DISABLED
promptType (string) --
The step in the agent sequence that this prompt configuration applies to.
dict
Response Syntax
{ 'agent': { 'agentArn': 'string', 'agentCollaboration': 'SUPERVISOR'|'SUPERVISOR_ROUTER'|'DISABLED', 'agentId': 'string', 'agentName': 'string', 'agentResourceRoleArn': 'string', 'agentStatus': 'CREATING'|'PREPARING'|'PREPARED'|'NOT_PREPARED'|'DELETING'|'FAILED'|'VERSIONING'|'UPDATING', 'agentVersion': 'string', 'clientToken': 'string', 'createdAt': datetime(2015, 1, 1), 'customOrchestration': { 'executor': { 'lambda': 'string' } }, 'customerEncryptionKeyArn': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'foundationModel': 'string', 'guardrailConfiguration': { 'guardrailIdentifier': 'string', 'guardrailVersion': 'string' }, 'idleSessionTTLInSeconds': 123, 'instruction': 'string', 'memoryConfiguration': { 'enabledMemoryTypes': [ 'SESSION_SUMMARY', ], 'sessionSummaryConfiguration': { 'maxRecentSessions': 123 }, 'storageDays': 123 }, 'orchestrationType': 'DEFAULT'|'CUSTOM_ORCHESTRATION', 'preparedAt': datetime(2015, 1, 1), 'promptOverrideConfiguration': { 'overrideLambda': 'string', 'promptConfigurations': [ { 'basePromptTemplate': 'string', 'foundationModel': 'string', 'inferenceConfiguration': { 'maximumLength': 123, 'stopSequences': [ 'string', ], 'temperature': ..., 'topK': 123, 'topP': ... }, 'parserMode': 'DEFAULT'|'OVERRIDDEN', 'promptCreationMode': 'DEFAULT'|'OVERRIDDEN', 'promptState': 'ENABLED'|'DISABLED', 'promptType': 'PRE_PROCESSING'|'ORCHESTRATION'|'POST_PROCESSING'|'KNOWLEDGE_BASE_RESPONSE_GENERATION'|'MEMORY_SUMMARIZATION' }, ] }, 'recommendedActions': [ 'string', ], 'updatedAt': datetime(2015, 1, 1) } }
Response Structure
(dict) --
agent (dict) --
Contains details about the agent that was updated.
agentArn (string) --
The Amazon Resource Name (ARN) of the agent.
agentCollaboration (string) --
The agent's collaboration settings.
agentId (string) --
The unique identifier of the agent.
agentName (string) --
The name of the agent.
agentResourceRoleArn (string) --
The Amazon Resource Name (ARN) of the IAM role with permissions to invoke API operations on the agent.
agentStatus (string) --
The status of the agent and whether it is ready for use. The following statuses are possible:
CREATING – The agent is being created.
PREPARING – The agent is being prepared.
PREPARED – The agent is prepared and ready to be invoked.
NOT_PREPARED – The agent has been created but not yet prepared.
FAILED – The agent API operation failed.
UPDATING – The agent is being updated.
DELETING – The agent is being deleted.
agentVersion (string) --
The version of the agent.
clientToken (string) --
A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.
createdAt (datetime) --
The time at which the agent was created.
customOrchestration (dict) --
Contains custom orchestration configurations for the agent.
executor (dict) --
The structure of the executor invoking the actions in custom orchestration.
lambda (string) --
The Amazon Resource Name (ARN) of the Lambda function containing the business logic that is carried out upon invoking the action.
customerEncryptionKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key that encrypts the agent.
description (string) --
The description of the agent.
failureReasons (list) --
Contains reasons that the agent-related API that you invoked failed.
(string) --
foundationModel (string) --
The foundation model used for orchestration by the agent.
guardrailConfiguration (dict) --
Details about the guardrail associated with the agent.
guardrailIdentifier (string) --
The unique identifier of the guardrail.
guardrailVersion (string) --
The version of the guardrail.
idleSessionTTLInSeconds (integer) --
The number of seconds for which Amazon Bedrock keeps information about a user's conversation with the agent.
A user interaction remains active for the amount of time specified. If no conversation occurs during this time, the session expires and Amazon Bedrock deletes any data provided before the timeout.
instruction (string) --
Instructions that tell the agent what it should do and how it should interact with users.
memoryConfiguration (dict) --
Contains memory configuration for the agent.
enabledMemoryTypes (list) --
The type of memory that is stored.
(string) --
sessionSummaryConfiguration (dict) --
Contains the configuration for SESSION_SUMMARY memory type enabled for the agent.
maxRecentSessions (integer) --
Maximum number of recent session summaries to include in the agent's prompt context.
storageDays (integer) --
The number of days the agent is configured to retain the conversational context.
orchestrationType (string) --
Specifies the orchestration strategy for the agent.
preparedAt (datetime) --
The time at which the agent was last prepared.
promptOverrideConfiguration (dict) --
Contains configurations to override prompt templates in different parts of an agent sequence. For more information, see Advanced prompts.
overrideLambda (string) --
The ARN of the Lambda function to use when parsing the raw foundation model output in parts of the agent sequence. If you specify this field, at least one of the promptConfigurations must contain a parserMode value that is set to OVERRIDDEN. For more information, see Parser Lambda function in Amazon Bedrock Agents.
promptConfigurations (list) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
(dict) --
Contains configurations to override a prompt template in one part of an agent sequence. For more information, see Advanced prompts.
basePromptTemplate (string) --
Defines the prompt template with which to replace the default prompt template. You can use placeholder variables in the base prompt template to customize the prompt. For more information, see Prompt template placeholder variables. For more information, see Configure the prompt templates.
foundationModel (string) --
The agent's foundation model.
inferenceConfiguration (dict) --
Contains inference parameters to use when the agent invokes a foundation model in the part of the agent sequence defined by the promptType. For more information, see Inference parameters for foundation models.
maximumLength (integer) --
The maximum number of tokens to allow in the generated response.
stopSequences (list) --
A list of stop sequences. A stop sequence is a sequence of characters that causes the model to stop generating the response.
(string) --
temperature (float) --
The likelihood of the model selecting higher-probability options while generating a response. A lower value makes the model more likely to choose higher-probability options, while a higher value makes the model more likely to choose lower-probability options.
topK (integer) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for topK is the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topK to 50, the model selects the next token from among the top 50 most likely choices.
topP (float) --
While generating a response, the model determines the probability of the following token at each point of generation. The value that you set for Top P determines the number of most-likely candidates from which the model chooses the next token in the sequence. For example, if you set topP to 80, the model only selects the next token from the top 80% of the probability distribution of next tokens.
parserMode (string) --
Specifies whether to override the default parser Lambda function when parsing the raw foundation model output in the part of the agent sequence defined by the promptType. If you set the field as OVERRIDEN, the overrideLambda field in the PromptOverrideConfiguration must be specified with the ARN of a Lambda function.
promptCreationMode (string) --
Specifies whether to override the default prompt template for this promptType. Set this value to OVERRIDDEN to use the prompt that you provide in the basePromptTemplate. If you leave it as DEFAULT, the agent uses a default prompt template.
promptState (string) --
Specifies whether to allow the agent to carry out the step specified in the promptType. If you set this value to DISABLED, the agent skips that step. The default state for each promptType is as follows.
PRE_PROCESSING – ENABLED
ORCHESTRATION – ENABLED
KNOWLEDGE_BASE_RESPONSE_GENERATION – ENABLED
POST_PROCESSING – DISABLED
promptType (string) --
The step in the agent sequence that this prompt configuration applies to.
recommendedActions (list) --
Contains recommended actions to take for the agent-related API that you invoked to succeed.
(string) --
updatedAt (datetime) --
The time at which the agent was last updated.
{'dataSourceConfiguration': {'sharePointConfiguration': {'sourceConfiguration': {'authType': {'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS'}}}, 'webConfiguration': {'crawlerConfiguration': {'crawlerLimits': {'maxPages': 'integer'}, 'userAgent': 'string'}}}}Response
{'dataSource': {'dataSourceConfiguration': {'sharePointConfiguration': {'sourceConfiguration': {'authType': {'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS'}}}, 'webConfiguration': {'crawlerConfiguration': {'crawlerLimits': {'maxPages': 'integer'}, 'userAgent': 'string'}}}}}
Updates the configurations for a data source connector.
See also: AWS API Documentation
Request Syntax
client.update_data_source( dataDeletionPolicy='RETAIN'|'DELETE', dataSourceConfiguration={ 'confluenceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostType': 'SAAS', 'hostUrl': 'string' } }, 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'salesforceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostUrl': 'string' } }, 'sharePointConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'domain': 'string', 'hostType': 'ONLINE', 'siteUrls': [ 'string', ], 'tenantId': 'string' } }, 'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA', 'webConfiguration': { 'crawlerConfiguration': { 'crawlerLimits': { 'maxPages': 123, 'rateLimit': 123 }, 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'scope': 'HOST_ONLY'|'SUBDOMAINS', 'userAgent': 'string' }, 'sourceConfiguration': { 'urlConfiguration': { 'seedUrls': [ { 'url': 'string' }, ] } } } }, dataSourceId='string', description='string', knowledgeBaseId='string', name='string', serverSideEncryptionConfiguration={ 'kmsKeyArn': 'string' }, vectorIngestionConfiguration={ 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 }, 'hierarchicalChunkingConfiguration': { 'levelConfigurations': [ { 'maxTokens': 123 }, ], 'overlapTokens': 123 }, 'semanticChunkingConfiguration': { 'breakpointPercentileThreshold': 123, 'bufferSize': 123, 'maxTokens': 123 } }, 'customTransformationConfiguration': { 'intermediateStorage': { 's3Location': { 'uri': 'string' } }, 'transformations': [ { 'stepToApply': 'POST_CHUNKING', 'transformationFunction': { 'transformationLambdaConfiguration': { 'lambdaArn': 'string' } } }, ] }, 'parsingConfiguration': { 'bedrockDataAutomationConfiguration': { 'parsingModality': 'MULTIMODAL' }, 'bedrockFoundationModelConfiguration': { 'modelArn': 'string', 'parsingModality': 'MULTIMODAL', 'parsingPrompt': { 'parsingPromptText': 'string' } }, 'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION' } } )
string
The data deletion policy for the data source that you want to update.
dict
[REQUIRED]
The connection configuration for the data source that you want to update.
confluenceConfiguration (dict) --
The configuration information to connect to Confluence as your data source.
crawlerConfiguration (dict) --
The configuration of the Confluence content. For example, configuring specific types of Confluence content.
filterConfiguration (dict) --
The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) -- [REQUIRED]
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) -- [REQUIRED]
The supported object type or content type of the data source.
type (string) -- [REQUIRED]
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) -- [REQUIRED]
The endpoint information to connect to your Confluence data source.
authType (string) -- [REQUIRED]
The supported authentication type to authenticate and connect to your Confluence instance.
credentialsSecretArn (string) -- [REQUIRED]
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
hostType (string) -- [REQUIRED]
The supported host type, whether online/cloud or server/on-premises.
hostUrl (string) -- [REQUIRED]
The Confluence host URL or instance URL.
s3Configuration (dict) --
The configuration information to connect to Amazon S3 as your data source.
bucketArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
bucketOwnerAccountId (string) --
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) --
A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
(string) --
salesforceConfiguration (dict) --
The configuration information to connect to Salesforce as your data source.
crawlerConfiguration (dict) --
The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
filterConfiguration (dict) --
The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) -- [REQUIRED]
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) -- [REQUIRED]
The supported object type or content type of the data source.
type (string) -- [REQUIRED]
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) -- [REQUIRED]
The endpoint information to connect to your Salesforce data source.
authType (string) -- [REQUIRED]
The supported authentication type to authenticate and connect to your Salesforce instance.
credentialsSecretArn (string) -- [REQUIRED]
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
hostUrl (string) -- [REQUIRED]
The Salesforce host URL or instance URL.
sharePointConfiguration (dict) --
The configuration information to connect to SharePoint as your data source.
crawlerConfiguration (dict) --
The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
filterConfiguration (dict) --
The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) -- [REQUIRED]
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) -- [REQUIRED]
The supported object type or content type of the data source.
type (string) -- [REQUIRED]
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) -- [REQUIRED]
The endpoint information to connect to your SharePoint data source.
authType (string) -- [REQUIRED]
The supported authentication type to authenticate and connect to your SharePoint site/sites.
credentialsSecretArn (string) -- [REQUIRED]
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
domain (string) -- [REQUIRED]
The domain of your SharePoint instance or site URL/URLs.
hostType (string) -- [REQUIRED]
The supported host type, whether online/cloud or server/on-premises.
siteUrls (list) -- [REQUIRED]
A list of one or more SharePoint site URLs.
(string) --
tenantId (string) --
The identifier of your Microsoft 365 tenant.
type (string) -- [REQUIRED]
The type of data source.
webConfiguration (dict) --
The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
crawlerConfiguration (dict) --
The Web Crawler configuration details for the web data source.
crawlerLimits (dict) --
The configuration of crawl limits for the web URLs.
maxPages (integer) --
The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
rateLimit (integer) --
The max rate at which pages are crawled, up to 300 per minute per host.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
scope (string) --
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
userAgent (string) --
A string used for identifying the crawler or a bot when it accesses a web server. By default, this is set to bedrockbot_UUID for your crawler. You can optionally append a custom string to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
sourceConfiguration (dict) -- [REQUIRED]
The source configuration details for the web data source.
urlConfiguration (dict) -- [REQUIRED]
The configuration of the URL/URLs.
seedUrls (list) --
One or more seed or starting point URLs.
(dict) --
The seed or starting point URL. You should be authorized to crawl the URL.
url (string) --
A seed or starting point URL.
string
[REQUIRED]
The unique identifier of the data source.
string
Specifies a new description for the data source.
string
[REQUIRED]
The unique identifier of the knowledge base for the data source.
string
[REQUIRED]
Specifies a new name for the data source.
dict
Contains details about server-side encryption of the data source.
kmsKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
dict
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) --
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) -- [REQUIRED]
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) --
Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
maxTokens (integer) -- [REQUIRED]
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) -- [REQUIRED]
The percentage of overlap between adjacent chunks of a data source.
hierarchicalChunkingConfiguration (dict) --
Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
levelConfigurations (list) -- [REQUIRED]
Token settings for each layer.
(dict) --
Token settings for a layer in a hierarchical chunking configuration.
maxTokens (integer) -- [REQUIRED]
The maximum number of tokens that a chunk can contain in this layer.
overlapTokens (integer) -- [REQUIRED]
The number of tokens to repeat across chunks in the same layer.
semanticChunkingConfiguration (dict) --
Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
breakpointPercentileThreshold (integer) -- [REQUIRED]
The dissimilarity threshold for splitting chunks.
bufferSize (integer) -- [REQUIRED]
The buffer size.
maxTokens (integer) -- [REQUIRED]
The maximum number of tokens that a chunk can contain.
customTransformationConfiguration (dict) --
A custom document transformer for parsed data source documents.
intermediateStorage (dict) -- [REQUIRED]
An S3 bucket path for input and output objects.
s3Location (dict) -- [REQUIRED]
An S3 bucket path.
uri (string) -- [REQUIRED]
The location's URI. For example, s3://my-bucket/chunk-processor/.
transformations (list) -- [REQUIRED]
A Lambda function that processes documents.
(dict) --
A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
stepToApply (string) -- [REQUIRED]
When the service applies the transformation.
transformationFunction (dict) -- [REQUIRED]
A Lambda function that processes documents.
transformationLambdaConfiguration (dict) -- [REQUIRED]
The Lambda function.
lambdaArn (string) -- [REQUIRED]
The function's ARN identifier.
parsingConfiguration (dict) --
Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
bedrockDataAutomationConfiguration (dict) --
If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
bedrockFoundationModelConfiguration (dict) --
If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
modelArn (string) -- [REQUIRED]
The ARN of the foundation model or inference profile to use for parsing.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
parsingPrompt (dict) --
Instructions for interpreting the contents of a document.
parsingPromptText (string) -- [REQUIRED]
Instructions for interpreting the contents of a document.
parsingStrategy (string) -- [REQUIRED]
The parsing strategy for the data source.
dict
Response Syntax
{ 'dataSource': { 'createdAt': datetime(2015, 1, 1), 'dataDeletionPolicy': 'RETAIN'|'DELETE', 'dataSourceConfiguration': { 'confluenceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'BASIC'|'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostType': 'SAAS', 'hostUrl': 'string' } }, 's3Configuration': { 'bucketArn': 'string', 'bucketOwnerAccountId': 'string', 'inclusionPrefixes': [ 'string', ] }, 'salesforceConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'hostUrl': 'string' } }, 'sharePointConfiguration': { 'crawlerConfiguration': { 'filterConfiguration': { 'patternObjectFilter': { 'filters': [ { 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'objectType': 'string' }, ] }, 'type': 'PATTERN' } }, 'sourceConfiguration': { 'authType': 'OAUTH2_CLIENT_CREDENTIALS'|'OAUTH2_SHAREPOINT_APP_ONLY_CLIENT_CREDENTIALS', 'credentialsSecretArn': 'string', 'domain': 'string', 'hostType': 'ONLINE', 'siteUrls': [ 'string', ], 'tenantId': 'string' } }, 'type': 'S3'|'WEB'|'CONFLUENCE'|'SALESFORCE'|'SHAREPOINT'|'CUSTOM'|'REDSHIFT_METADATA', 'webConfiguration': { 'crawlerConfiguration': { 'crawlerLimits': { 'maxPages': 123, 'rateLimit': 123 }, 'exclusionFilters': [ 'string', ], 'inclusionFilters': [ 'string', ], 'scope': 'HOST_ONLY'|'SUBDOMAINS', 'userAgent': 'string' }, 'sourceConfiguration': { 'urlConfiguration': { 'seedUrls': [ { 'url': 'string' }, ] } } } }, 'dataSourceId': 'string', 'description': 'string', 'failureReasons': [ 'string', ], 'knowledgeBaseId': 'string', 'name': 'string', 'serverSideEncryptionConfiguration': { 'kmsKeyArn': 'string' }, 'status': 'AVAILABLE'|'DELETING'|'DELETE_UNSUCCESSFUL', 'updatedAt': datetime(2015, 1, 1), 'vectorIngestionConfiguration': { 'chunkingConfiguration': { 'chunkingStrategy': 'FIXED_SIZE'|'NONE'|'HIERARCHICAL'|'SEMANTIC', 'fixedSizeChunkingConfiguration': { 'maxTokens': 123, 'overlapPercentage': 123 }, 'hierarchicalChunkingConfiguration': { 'levelConfigurations': [ { 'maxTokens': 123 }, ], 'overlapTokens': 123 }, 'semanticChunkingConfiguration': { 'breakpointPercentileThreshold': 123, 'bufferSize': 123, 'maxTokens': 123 } }, 'customTransformationConfiguration': { 'intermediateStorage': { 's3Location': { 'uri': 'string' } }, 'transformations': [ { 'stepToApply': 'POST_CHUNKING', 'transformationFunction': { 'transformationLambdaConfiguration': { 'lambdaArn': 'string' } } }, ] }, 'parsingConfiguration': { 'bedrockDataAutomationConfiguration': { 'parsingModality': 'MULTIMODAL' }, 'bedrockFoundationModelConfiguration': { 'modelArn': 'string', 'parsingModality': 'MULTIMODAL', 'parsingPrompt': { 'parsingPromptText': 'string' } }, 'parsingStrategy': 'BEDROCK_FOUNDATION_MODEL'|'BEDROCK_DATA_AUTOMATION' } } } }
Response Structure
(dict) --
dataSource (dict) --
Contains details about the data source.
createdAt (datetime) --
The time at which the data source was created.
dataDeletionPolicy (string) --
The data deletion policy for the data source.
dataSourceConfiguration (dict) --
The connection configuration for the data source.
confluenceConfiguration (dict) --
The configuration information to connect to Confluence as your data source.
crawlerConfiguration (dict) --
The configuration of the Confluence content. For example, configuring specific types of Confluence content.
filterConfiguration (dict) --
The configuration of filtering the Confluence content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your Confluence data source.
authType (string) --
The supported authentication type to authenticate and connect to your Confluence instance.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Confluence instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Confluence connection configuration.
hostType (string) --
The supported host type, whether online/cloud or server/on-premises.
hostUrl (string) --
The Confluence host URL or instance URL.
s3Configuration (dict) --
The configuration information to connect to Amazon S3 as your data source.
bucketArn (string) --
The Amazon Resource Name (ARN) of the S3 bucket that contains your data.
bucketOwnerAccountId (string) --
The account ID for the owner of the S3 bucket.
inclusionPrefixes (list) --
A list of S3 prefixes to include certain files or content. For more information, see Organizing objects using prefixes.
(string) --
salesforceConfiguration (dict) --
The configuration information to connect to Salesforce as your data source.
crawlerConfiguration (dict) --
The configuration of the Salesforce content. For example, configuring specific types of Salesforce content.
filterConfiguration (dict) --
The configuration of filtering the Salesforce content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your Salesforce data source.
authType (string) --
The supported authentication type to authenticate and connect to your Salesforce instance.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your Salesforce instance URL. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see Salesforce connection configuration.
hostUrl (string) --
The Salesforce host URL or instance URL.
sharePointConfiguration (dict) --
The configuration information to connect to SharePoint as your data source.
crawlerConfiguration (dict) --
The configuration of the SharePoint content. For example, configuring specific types of SharePoint content.
filterConfiguration (dict) --
The configuration of filtering the SharePoint content. For example, configuring regular expression patterns to include or exclude certain content.
patternObjectFilter (dict) --
The configuration of filtering certain objects or content types of the data source.
filters (list) --
The configuration of specific filters applied to your data source content. You can filter out or include certain content.
(dict) --
The specific filters applied to your data source content. You can filter out or include certain content.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain object types that adhere to the pattern. If you specify an inclusion and exclusion filter/pattern and both match a document, the exclusion filter takes precedence and the document isn’t crawled.
(string) --
objectType (string) --
The supported object type or content type of the data source.
type (string) --
The type of filtering that you want to apply to certain objects or content of the data source. For example, the PATTERN type is regular expression patterns you can apply to filter your content.
sourceConfiguration (dict) --
The endpoint information to connect to your SharePoint data source.
authType (string) --
The supported authentication type to authenticate and connect to your SharePoint site/sites.
credentialsSecretArn (string) --
The Amazon Resource Name of an Secrets Manager secret that stores your authentication credentials for your SharePoint site/sites. For more information on the key-value pairs that must be included in your secret, depending on your authentication type, see SharePoint connection configuration.
domain (string) --
The domain of your SharePoint instance or site URL/URLs.
hostType (string) --
The supported host type, whether online/cloud or server/on-premises.
siteUrls (list) --
A list of one or more SharePoint site URLs.
(string) --
tenantId (string) --
The identifier of your Microsoft 365 tenant.
type (string) --
The type of data source.
webConfiguration (dict) --
The configuration of web URLs to crawl for your data source. You should be authorized to crawl the URLs.
crawlerConfiguration (dict) --
The Web Crawler configuration details for the web data source.
crawlerLimits (dict) --
The configuration of crawl limits for the web URLs.
maxPages (integer) --
The max number of web pages crawled from your source URLs, up to 25,000 pages. If the web pages exceed this limit, the data source sync will fail and no web pages will be ingested.
rateLimit (integer) --
The max rate at which pages are crawled, up to 300 per minute per host.
exclusionFilters (list) --
A list of one or more exclusion regular expression patterns to exclude certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
inclusionFilters (list) --
A list of one or more inclusion regular expression patterns to include certain URLs. If you specify an inclusion and exclusion filter/pattern and both match a URL, the exclusion filter takes precedence and the web content of the URL isn’t crawled.
(string) --
scope (string) --
The scope of what is crawled for your URLs.
You can choose to crawl only web pages that belong to the same host or primary domain. For example, only web pages that contain the seed URL "https://docs.aws.amazon.com/bedrock/latest/userguide/" and no other domains. You can choose to include sub domains in addition to the host or primary domain. For example, web pages that contain "aws.amazon.com" can also include sub domain "docs.aws.amazon.com".
userAgent (string) --
A string used for identifying the crawler or a bot when it accesses a web server. By default, this is set to bedrockbot_UUID for your crawler. You can optionally append a custom string to bedrockbot_UUID to allowlist a specific user agent permitted to access your source URLs.
sourceConfiguration (dict) --
The source configuration details for the web data source.
urlConfiguration (dict) --
The configuration of the URL/URLs.
seedUrls (list) --
One or more seed or starting point URLs.
(dict) --
The seed or starting point URL. You should be authorized to crawl the URL.
url (string) --
A seed or starting point URL.
dataSourceId (string) --
The unique identifier of the data source.
description (string) --
The description of the data source.
failureReasons (list) --
The detailed reasons on the failure to delete a data source.
(string) --
knowledgeBaseId (string) --
The unique identifier of the knowledge base to which the data source belongs.
name (string) --
The name of the data source.
serverSideEncryptionConfiguration (dict) --
Contains details about the configuration of the server-side encryption.
kmsKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key used to encrypt the resource.
status (string) --
The status of the data source. The following statuses are possible:
Available – The data source has been created and is ready for ingestion into the knowledge base.
Deleting – The data source is being deleted.
updatedAt (datetime) --
The time at which the data source was last updated.
vectorIngestionConfiguration (dict) --
Contains details about how to ingest the documents in the data source.
chunkingConfiguration (dict) --
Details about how to chunk the documents in the data source. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried.
chunkingStrategy (string) --
Knowledge base can split your source data into chunks. A chunk refers to an excerpt from a data source that is returned when the knowledge base that it belongs to is queried. You have the following options for chunking your data. If you opt for NONE, then you may want to pre-process your files by splitting them up such that each file corresponds to a chunk.
FIXED_SIZE – Amazon Bedrock splits your source data into chunks of the approximate size that you set in the fixedSizeChunkingConfiguration.
HIERARCHICAL – Split documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
SEMANTIC – Split documents into chunks based on groups of similar content derived with natural language processing.
NONE – Amazon Bedrock treats each file as one chunk. If you choose this option, you may want to pre-process your documents by splitting them into separate files.
fixedSizeChunkingConfiguration (dict) --
Configurations for when you choose fixed-size chunking. If you set the chunkingStrategy as NONE, exclude this field.
maxTokens (integer) --
The maximum number of tokens to include in a chunk.
overlapPercentage (integer) --
The percentage of overlap between adjacent chunks of a data source.
hierarchicalChunkingConfiguration (dict) --
Settings for hierarchical document chunking for a data source. Hierarchical chunking splits documents into layers of chunks where the first layer contains large chunks, and the second layer contains smaller chunks derived from the first layer.
levelConfigurations (list) --
Token settings for each layer.
(dict) --
Token settings for a layer in a hierarchical chunking configuration.
maxTokens (integer) --
The maximum number of tokens that a chunk can contain in this layer.
overlapTokens (integer) --
The number of tokens to repeat across chunks in the same layer.
semanticChunkingConfiguration (dict) --
Settings for semantic document chunking for a data source. Semantic chunking splits a document into into smaller documents based on groups of similar content derived from the text with natural language processing.
breakpointPercentileThreshold (integer) --
The dissimilarity threshold for splitting chunks.
bufferSize (integer) --
The buffer size.
maxTokens (integer) --
The maximum number of tokens that a chunk can contain.
customTransformationConfiguration (dict) --
A custom document transformer for parsed data source documents.
intermediateStorage (dict) --
An S3 bucket path for input and output objects.
s3Location (dict) --
An S3 bucket path.
uri (string) --
The location's URI. For example, s3://my-bucket/chunk-processor/.
transformations (list) --
A Lambda function that processes documents.
(dict) --
A custom processing step for documents moving through a data source ingestion pipeline. To process documents after they have been converted into chunks, set the step to apply to POST_CHUNKING.
stepToApply (string) --
When the service applies the transformation.
transformationFunction (dict) --
A Lambda function that processes documents.
transformationLambdaConfiguration (dict) --
The Lambda function.
lambdaArn (string) --
The function's ARN identifier.
parsingConfiguration (dict) --
Configurations for a parser to use for parsing documents in your data source. If you exclude this field, the default parser will be used.
bedrockDataAutomationConfiguration (dict) --
If you specify BEDROCK_DATA_AUTOMATION as the parsing strategy for ingesting your data source, use this object to modify configurations for using the Amazon Bedrock Data Automation parser.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
bedrockFoundationModelConfiguration (dict) --
If you specify BEDROCK_FOUNDATION_MODEL as the parsing strategy for ingesting your data source, use this object to modify configurations for using a foundation model to parse documents.
modelArn (string) --
The ARN of the foundation model or inference profile to use for parsing.
parsingModality (string) --
Specifies whether to enable parsing of multimodal data, including both text and/or images.
parsingPrompt (dict) --
Instructions for interpreting the contents of a document.
parsingPromptText (string) --
Instructions for interpreting the contents of a document.
parsingStrategy (string) --
The parsing strategy for the data source.