AWSKendraFrontendService

2021/12/01 - AWSKendraFrontendService - 12 new4 updated api methods

Changes  Experience Builder allows customers to build search applications without writing code. Analytics Dashboard provides quality and usability metrics for Kendra indexes. Custom Document Enrichment allows customers to build a custom ingestion pipeline to pre-process documents and generate metadata.

CreateExperience (new) Link ¶

Creates an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.create_experience(
    Name='string',
    IndexId='string',
    RoleArn='string',
    Configuration={
        'ContentSourceConfiguration': {
            'DataSourceIds': [
                'string',
            ],
            'FaqIds': [
                'string',
            ],
            'DirectPutContent': True|False
        },
        'UserIdentityConfiguration': {
            'IdentityAttributeName': 'string'
        }
    },
    Description='string',
    ClientToken='string'
)
type Name:

string

param Name:

[REQUIRED]

A name for your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type RoleArn:

string

param RoleArn:

The Amazon Resource Name (ARN) of a role with permission to access Query operations, QuerySuggestions operations, SubmitFeedback operations, and Amazon Web Services SSO that stores your user and group information. For more information, see IAM roles for Amazon Kendra.

type Configuration:

dict

param Configuration:

Provides the configuration information for your Amazon Kendra experience. This includes ContentSourceConfiguration, which specifies the data source IDs and/or FAQ IDs, and UserIdentityConfiguration, which specifies the user or group information to grant access to your Amazon Kendra experience.

  • ContentSourceConfiguration (dict) --

    The identifiers of your data sources and FAQs. Or, you can specify that you want to use documents indexed via the BatchPutDocument operation. This is the content you want to use for your Amazon Kendra experience.

    • DataSourceIds (list) --

      The identifier of the data sources you want to use for your Amazon Kendra experience.

      • (string) --

    • FaqIds (list) --

      The identifier of the FAQs that you want to use for your Amazon Kendra experience.

      • (string) --

    • DirectPutContent (boolean) --

      TRUE to use documents you indexed directly using the BatchPutDocument operation.

  • UserIdentityConfiguration (dict) --

    The Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails.

    • IdentityAttributeName (string) --

      The Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails. This is used for user context filtering and for granting access to your Amazon Kendra experience. You must set up Amazon Web Services SSO with Amazon Kendra. You must include your users and groups in your Access Control List when you ingest documents into your index. For more information, see Getting started with an Amazon Web Services SSO identity source.

type Description:

string

param Description:

A description for your Amazon Kendra experience.

type ClientToken:

string

param ClientToken:

A token that you provide to identify the request to create your Amazon Kendra experience. Multiple calls to the CreateExperience operation with the same client token creates only one Amazon Kendra experience.

This field is autopopulated if not provided.

rtype:

dict

returns:

Response Syntax

{
    'Id': 'string'
}

Response Structure

  • (dict) --

    • Id (string) --

      The identifier for your created Amazon Kendra experience.

DisassociateEntitiesFromExperience (new) Link ¶

Prevents users or groups in your Amazon Web Services SSO identity source from accessing your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.disassociate_entities_from_experience(
    Id='string',
    IndexId='string',
    EntityList=[
        {
            'EntityId': 'string',
            'EntityType': 'USER'|'GROUP'
        },
    ]
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type EntityList:

list

param EntityList:

[REQUIRED]

Lists users or groups in your Amazon Web Services SSO identity source.

  • (dict) --

    Provides the configuration information of users or groups in your Amazon Web Services SSO identity source to grant access your Amazon Kendra experience.

    • EntityId (string) -- [REQUIRED]

      The identifier of a user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

    • EntityType (string) -- [REQUIRED]

      Specifies whether you are configuring a User or a Group.

rtype:

dict

returns:

Response Syntax

{
    'FailedEntityList': [
        {
            'EntityId': 'string',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • FailedEntityList (list) --

      Lists the users or groups in your Amazon Web Services SSO identity source that failed to properly remove access to your Amazon Kendra experience.

      • (dict) --

        Information on the users or groups in your Amazon Web Services SSO identity source that failed to properly configure with your Amazon Kendra experience.

        • EntityId (string) --

          The identifier of the user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

        • ErrorMessage (string) --

          The reason the user or group in your Amazon Web Services SSO identity source failed to properly configure with your Amazon Kendra experience.

GetSnapshots (new) Link ¶

Retrieves search metrics data. The data provides a snapshot of how your users interact with your search application and how effective the application is.

See also: AWS API Documentation

Request Syntax

client.get_snapshots(
    IndexId='string',
    Interval='THIS_MONTH'|'THIS_WEEK'|'ONE_WEEK_AGO'|'TWO_WEEKS_AGO'|'ONE_MONTH_AGO'|'TWO_MONTHS_AGO',
    MetricType='QUERIES_BY_COUNT'|'QUERIES_BY_ZERO_CLICK_RATE'|'QUERIES_BY_ZERO_RESULT_RATE'|'DOCS_BY_CLICK_COUNT'|'AGG_QUERY_DOC_METRICS'|'TREND_QUERY_DOC_METRICS',
    NextToken='string',
    MaxResults=123
)
type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index to get search metrics data.

type Interval:

string

param Interval:

[REQUIRED]

The time interval or time window to get search metrics data. The time interval uses the time zone of your index. You can view data in the following time windows:

  • THIS_WEEK: The current week, starting on the Sunday and ending on the day before the current date.

  • ONE_WEEK_AGO: The previous week, starting on the Sunday and ending on the following Saturday.

  • TWO_WEEKS_AGO: The week before the previous week, starting on the Sunday and ending on the following Saturday.

  • THIS_MONTH: The current month, starting on the first day of the month and ending on the day before the current date.

  • ONE_MONTH_AGO: The previous month, starting on the first day of the month and ending on the last day of the month.

  • TWO_MONTHS_AGO: The month before the previous month, starting on the first day of the month and ending on last day of the month.

type MetricType:

string

param MetricType:

[REQUIRED]

The metric you want to retrieve. You can specify only one metric per call.

For more information about the metrics you can view, see Gaining insights with search analytics.

type NextToken:

string

param NextToken:

If the previous response was incomplete (because there is more data to retrieve), Amazon Kendra returns a pagination token in the response. You can use this pagination token to retrieve the next set of search metrics data.

type MaxResults:

integer

param MaxResults:

The maximum number of returned data for the metric.

rtype:

dict

returns:

Response Syntax

{
    'SnapShotTimeFilter': {
        'StartTime': datetime(2015, 1, 1),
        'EndTime': datetime(2015, 1, 1)
    },
    'SnapshotsDataHeader': [
        'string',
    ],
    'SnapshotsData': [
        [
            'string',
        ],
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • SnapShotTimeFilter (dict) --

      The date-time for the beginning and end of the time window for the search metrics data.

      • StartTime (datetime) --

        The UNIX datetime of the beginning of the time range.

      • EndTime (datetime) --

        The UNIX datetime of the end of the time range.

    • SnapshotsDataHeader (list) --

      The column headers for the search metrics data.

      • (string) --

    • SnapshotsData (list) --

      The search metrics data. The data returned depends on the metric type you requested.

      • (list) --

        • (string) --

    • NextToken (string) --

      If the response is truncated, Amazon Kendra returns this token, which you can use in a later request to retrieve the next set of search metrics data.

ListEntityPersonas (new) Link ¶

Lists specific permissions of users and groups with access to your Amazon Kendra experience.

See also: AWS API Documentation

Request Syntax

client.list_entity_personas(
    Id='string',
    IndexId='string',
    NextToken='string',
    MaxResults=123
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type NextToken:

string

param NextToken:

If the previous response was incomplete (because there is more data to retrieve), Amazon Kendra returns a pagination token in the response. You can use this pagination token to retrieve the next set of users or groups.

type MaxResults:

integer

param MaxResults:

The maximum number of returned users or groups.

rtype:

dict

returns:

Response Syntax

{
    'SummaryItems': [
        {
            'EntityId': 'string',
            'Persona': 'OWNER'|'VIEWER',
            'CreatedAt': datetime(2015, 1, 1),
            'UpdatedAt': datetime(2015, 1, 1)
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • SummaryItems (list) --

      An array of summary information for one or more users or groups.

      • (dict) --

        Summary information for users or groups in your Amazon Web Services SSO identity source. This applies to users and groups with specific permissions that define their level of access to your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

        • EntityId (string) --

          The identifier of a user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

        • Persona (string) --

          The persona that defines the specific permissions of the user or group in your Amazon Web Services SSO identity source. The available personas or access roles are Owner and Viewer. For more information on these personas, see Providing access to your search page.

        • CreatedAt (datetime) --

          The date-time the summary information was created.

        • UpdatedAt (datetime) --

          The date-time the summary information was last updated.

    • NextToken (string) --

      If the response is truncated, Amazon Kendra returns this token, which you can use in a later request to retrieve the next set of users or groups.

AssociateEntitiesToExperience (new) Link ¶

Grants users or groups in your Amazon Web Services SSO identity source access to your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.associate_entities_to_experience(
    Id='string',
    IndexId='string',
    EntityList=[
        {
            'EntityId': 'string',
            'EntityType': 'USER'|'GROUP'
        },
    ]
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type EntityList:

list

param EntityList:

[REQUIRED]

Lists users or groups in your Amazon Web Services SSO identity source.

  • (dict) --

    Provides the configuration information of users or groups in your Amazon Web Services SSO identity source to grant access your Amazon Kendra experience.

    • EntityId (string) -- [REQUIRED]

      The identifier of a user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

    • EntityType (string) -- [REQUIRED]

      Specifies whether you are configuring a User or a Group.

rtype:

dict

returns:

Response Syntax

{
    'FailedEntityList': [
        {
            'EntityId': 'string',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • FailedEntityList (list) --

      Lists the users or groups in your Amazon Web Services SSO identity source that failed to properly configure with your Amazon Kendra experience.

      • (dict) --

        Information on the users or groups in your Amazon Web Services SSO identity source that failed to properly configure with your Amazon Kendra experience.

        • EntityId (string) --

          The identifier of the user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

        • ErrorMessage (string) --

          The reason the user or group in your Amazon Web Services SSO identity source failed to properly configure with your Amazon Kendra experience.

UpdateExperience (new) Link ¶

Updates your Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.update_experience(
    Id='string',
    Name='string',
    IndexId='string',
    RoleArn='string',
    Configuration={
        'ContentSourceConfiguration': {
            'DataSourceIds': [
                'string',
            ],
            'FaqIds': [
                'string',
            ],
            'DirectPutContent': True|False
        },
        'UserIdentityConfiguration': {
            'IdentityAttributeName': 'string'
        }
    },
    Description='string'
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience you want to update.

type Name:

string

param Name:

The name of your Amazon Kendra experience you want to update.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience you want to update.

type RoleArn:

string

param RoleArn:

The Amazon Resource Name (ARN) of a role with permission to access Query operations, QuerySuggestions operations, SubmitFeedback operations, and Amazon Web Services SSO that stores your user and group information. For more information, see IAM roles for Amazon Kendra.

type Configuration:

dict

param Configuration:

Provides the user configuration information. This includes the Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails.

  • ContentSourceConfiguration (dict) --

    The identifiers of your data sources and FAQs. Or, you can specify that you want to use documents indexed via the BatchPutDocument operation. This is the content you want to use for your Amazon Kendra experience.

    • DataSourceIds (list) --

      The identifier of the data sources you want to use for your Amazon Kendra experience.

      • (string) --

    • FaqIds (list) --

      The identifier of the FAQs that you want to use for your Amazon Kendra experience.

      • (string) --

    • DirectPutContent (boolean) --

      TRUE to use documents you indexed directly using the BatchPutDocument operation.

  • UserIdentityConfiguration (dict) --

    The Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails.

    • IdentityAttributeName (string) --

      The Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails. This is used for user context filtering and for granting access to your Amazon Kendra experience. You must set up Amazon Web Services SSO with Amazon Kendra. You must include your users and groups in your Access Control List when you ingest documents into your index. For more information, see Getting started with an Amazon Web Services SSO identity source.

type Description:

string

param Description:

The description of your Amazon Kendra experience you want to update.

returns:

None

AssociatePersonasToEntities (new) Link ¶

Defines the specific permissions of users or groups in your Amazon Web Services SSO identity source with access to your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.associate_personas_to_entities(
    Id='string',
    IndexId='string',
    Personas=[
        {
            'EntityId': 'string',
            'Persona': 'OWNER'|'VIEWER'
        },
    ]
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type Personas:

list

param Personas:

[REQUIRED]

The personas that define the specific permissions of users or groups in your Amazon Web Services SSO identity source. The available personas or access roles are Owner and Viewer. For more information on these personas, see Providing access to your search page.

  • (dict) --

    Provides the configuration information of users or groups in your Amazon Web Services SSO identity source for access to your Amazon Kendra experience. Specific permissions are defined for each user or group once they are granted access to your Amazon Kendra experience.

    • EntityId (string) -- [REQUIRED]

      The identifier of a user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

    • Persona (string) -- [REQUIRED]

      The persona that defines the specific permissions of the user or group in your Amazon Web Services SSO identity source. The available personas or access roles are Owner and Viewer. For more information on these personas, see Providing access to your search page.

rtype:

dict

returns:

Response Syntax

{
    'FailedEntityList': [
        {
            'EntityId': 'string',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • FailedEntityList (list) --

      Lists the users or groups in your Amazon Web Services SSO identity source that failed to properly configure with your Amazon Kendra experience.

      • (dict) --

        Information on the users or groups in your Amazon Web Services SSO identity source that failed to properly configure with your Amazon Kendra experience.

        • EntityId (string) --

          The identifier of the user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

        • ErrorMessage (string) --

          The reason the user or group in your Amazon Web Services SSO identity source failed to properly configure with your Amazon Kendra experience.

DescribeExperience (new) Link ¶

Gets information about your Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.describe_experience(
    Id='string',
    IndexId='string'
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience you want to get information on.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience you want to get information on.

rtype:

dict

returns:

Response Syntax

{
    'Id': 'string',
    'IndexId': 'string',
    'Name': 'string',
    'Endpoints': [
        {
            'EndpointType': 'HOME',
            'Endpoint': 'string'
        },
    ],
    'Configuration': {
        'ContentSourceConfiguration': {
            'DataSourceIds': [
                'string',
            ],
            'FaqIds': [
                'string',
            ],
            'DirectPutContent': True|False
        },
        'UserIdentityConfiguration': {
            'IdentityAttributeName': 'string'
        }
    },
    'CreatedAt': datetime(2015, 1, 1),
    'UpdatedAt': datetime(2015, 1, 1),
    'Description': 'string',
    'Status': 'CREATING'|'ACTIVE'|'DELETING'|'FAILED',
    'RoleArn': 'string',
    'ErrorMessage': 'string'
}

Response Structure

  • (dict) --

    • Id (string) --

      Shows the identifier of your Amazon Kendra experience.

    • IndexId (string) --

      Shows the identifier of the index for your Amazon Kendra experience.

    • Name (string) --

      Shows the name of your Amazon Kendra experience.

    • Endpoints (list) --

      Shows the endpoint URLs for your Amazon Kendra experiences. The URLs are unique and fully hosted by Amazon Web Services.

      • (dict) --

        Provides the configuration information of the endpoint for your Amazon Kendra experience.

        • EndpointType (string) --

          The type of endpoint for your Amazon Kendra experience. The type currently available is HOME, which is a unique and fully hosted URL to the home page of your Amazon Kendra experience.

        • Endpoint (string) --

          The endpoint of your Amazon Kendra experience.

    • Configuration (dict) --

      Shows the configuration information for your Amazon Kendra experience. This includes ContentSourceConfiguration, which specifies the data source IDs and/or FAQ IDs, and UserIdentityConfiguration, which specifies the user or group information to grant access to your Amazon Kendra experience.

      • ContentSourceConfiguration (dict) --

        The identifiers of your data sources and FAQs. Or, you can specify that you want to use documents indexed via the BatchPutDocument operation. This is the content you want to use for your Amazon Kendra experience.

        • DataSourceIds (list) --

          The identifier of the data sources you want to use for your Amazon Kendra experience.

          • (string) --

        • FaqIds (list) --

          The identifier of the FAQs that you want to use for your Amazon Kendra experience.

          • (string) --

        • DirectPutContent (boolean) --

          TRUE to use documents you indexed directly using the BatchPutDocument operation.

      • UserIdentityConfiguration (dict) --

        The Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails.

        • IdentityAttributeName (string) --

          The Amazon Web Services SSO field name that contains the identifiers of your users, such as their emails. This is used for user context filtering and for granting access to your Amazon Kendra experience. You must set up Amazon Web Services SSO with Amazon Kendra. You must include your users and groups in your Access Control List when you ingest documents into your index. For more information, see Getting started with an Amazon Web Services SSO identity source.

    • CreatedAt (datetime) --

      Shows the date-time your Amazon Kendra experience was created.

    • UpdatedAt (datetime) --

      Shows the date-time your Amazon Kendra experience was last updated.

    • Description (string) --

      Shows the description for your Amazon Kendra experience.

    • Status (string) --

      The current processing status of your Amazon Kendra experience. When the status is ACTIVE, your Amazon Kendra experience is ready to use. When the status is FAILED, the ErrorMessage field contains the reason that this failed.

    • RoleArn (string) --

      Shows the Amazon Resource Name (ARN) of a role with permission to access Query operations, QuerySuggestions operations, SubmitFeedback operations, and Amazon Web Services SSO that stores your user and group information.

    • ErrorMessage (string) --

      The reason your Amazon Kendra experience could not properly process.

ListExperiences (new) Link ¶

Lists one or more Amazon Kendra experiences. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.list_experiences(
    IndexId='string',
    NextToken='string',
    MaxResults=123
)
type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type NextToken:

string

param NextToken:

If the previous response was incomplete (because there is more data to retrieve), Amazon Kendra returns a pagination token in the response. You can use this pagination token to retrieve the next set of Amazon Kendra experiences.

type MaxResults:

integer

param MaxResults:

The maximum number of returned Amazon Kendra experiences.

rtype:

dict

returns:

Response Syntax

{
    'SummaryItems': [
        {
            'Name': 'string',
            'Id': 'string',
            'CreatedAt': datetime(2015, 1, 1),
            'Status': 'CREATING'|'ACTIVE'|'DELETING'|'FAILED',
            'Endpoints': [
                {
                    'EndpointType': 'HOME',
                    'Endpoint': 'string'
                },
            ]
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • SummaryItems (list) --

      An array of summary information for one or more Amazon Kendra experiences.

      • (dict) --

        Summary information for your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

        • Name (string) --

          The name of your Amazon Kendra experience.

        • Id (string) --

          The identifier of your Amazon Kendra experience.

        • CreatedAt (datetime) --

          The date-time your Amazon Kendra experience was created.

        • Status (string) --

          The processing status of your Amazon Kendra experience.

        • Endpoints (list) --

          The endpoint URLs for your Amazon Kendra experiences. The URLs are unique and fully hosted by Amazon Web Services.

          • (dict) --

            Provides the configuration information of the endpoint for your Amazon Kendra experience.

            • EndpointType (string) --

              The type of endpoint for your Amazon Kendra experience. The type currently available is HOME, which is a unique and fully hosted URL to the home page of your Amazon Kendra experience.

            • Endpoint (string) --

              The endpoint of your Amazon Kendra experience.

    • NextToken (string) --

      If the response is truncated, Amazon Kendra returns this token, which you can use in a later request to retrieve the next set of Amazon Kendra experiences.

ListExperienceEntities (new) Link ¶

Lists users or groups in your Amazon Web Services SSO identity source that are granted access to your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.list_experience_entities(
    Id='string',
    IndexId='string',
    NextToken='string'
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type NextToken:

string

param NextToken:

If the previous response was incomplete (because there is more data to retrieve), Amazon Kendra returns a pagination token in the response. You can use this pagination token to retrieve the next set of users or groups.

rtype:

dict

returns:

Response Syntax

{
    'SummaryItems': [
        {
            'EntityId': 'string',
            'EntityType': 'USER'|'GROUP',
            'DisplayData': {
                'UserName': 'string',
                'GroupName': 'string',
                'IdentifiedUserName': 'string',
                'FirstName': 'string',
                'LastName': 'string'
            }
        },
    ],
    'NextToken': 'string'
}

Response Structure

  • (dict) --

    • SummaryItems (list) --

      An array of summary information for one or more users or groups.

      • (dict) --

        Summary information for users or groups in your Amazon Web Services SSO identity source with granted access to your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

        • EntityId (string) --

          The identifier of a user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

        • EntityType (string) --

          Shows the type as User or Group.

        • DisplayData (dict) --

          Information about the user entity.

          • UserName (string) --

            The name of the user.

          • GroupName (string) --

            The name of the group.

          • IdentifiedUserName (string) --

            The user name of the user.

          • FirstName (string) --

            The first name of the user.

          • LastName (string) --

            The last name of the user.

    • NextToken (string) --

      If the response is truncated, Amazon Kendra returns this token, which you can use in a later request to retrieve the next set of users or groups.

DisassociatePersonasFromEntities (new) Link ¶

Removes the specific permissions of users or groups in your Amazon Web Services SSO identity source with access to your Amazon Kendra experience. You can create an Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.disassociate_personas_from_entities(
    Id='string',
    IndexId='string',
    EntityIds=[
        'string',
    ]
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience.

type EntityIds:

list

param EntityIds:

[REQUIRED]

The identifiers of users or groups in your Amazon Web Services SSO identity source. For example, user IDs could be user emails.

  • (string) --

rtype:

dict

returns:

Response Syntax

{
    'FailedEntityList': [
        {
            'EntityId': 'string',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • FailedEntityList (list) --

      Lists the users or groups in your Amazon Web Services SSO identity source that failed to properly remove access to your Amazon Kendra experience.

      • (dict) --

        Information on the users or groups in your Amazon Web Services SSO identity source that failed to properly configure with your Amazon Kendra experience.

        • EntityId (string) --

          The identifier of the user or group in your Amazon Web Services SSO identity source. For example, a user ID could be an email.

        • ErrorMessage (string) --

          The reason the user or group in your Amazon Web Services SSO identity source failed to properly configure with your Amazon Kendra experience.

DeleteExperience (new) Link ¶

Deletes your Amazon Kendra experience such as a search application. For more information on creating a search application experience, see Building a search experience with no code.

See also: AWS API Documentation

Request Syntax

client.delete_experience(
    Id='string',
    IndexId='string'
)
type Id:

string

param Id:

[REQUIRED]

The identifier of your Amazon Kendra experience you want to delete.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index for your Amazon Kendra experience you want to delete.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

BatchPutDocument (updated) Link ¶
Changes (request)
{'CustomDocumentEnrichmentConfiguration': {'InlineConfigurations': [{'Condition': {'ConditionDocumentAttributeKey': 'string',
                                                                                   'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                        'LongValue': 'long',
                                                                                                        'StringListValue': ['string'],
                                                                                                        'StringValue': 'string'},
                                                                                   'Operator': 'GreaterThan '
                                                                                               '| '
                                                                                               'GreaterThanOrEquals '
                                                                                               '| '
                                                                                               'LessThan '
                                                                                               '| '
                                                                                               'LessThanOrEquals '
                                                                                               '| '
                                                                                               'Equals '
                                                                                               '| '
                                                                                               'NotEquals '
                                                                                               '| '
                                                                                               'Contains '
                                                                                               '| '
                                                                                               'NotContains '
                                                                                               '| '
                                                                                               'Exists '
                                                                                               '| '
                                                                                               'NotExists '
                                                                                               '| '
                                                                                               'BeginsWith'},
                                                                     'DocumentContentDeletion': 'boolean',
                                                                     'Target': {'TargetDocumentAttributeKey': 'string',
                                                                                'TargetDocumentAttributeValue': {'DateValue': 'timestamp',
                                                                                                                 'LongValue': 'long',
                                                                                                                 'StringListValue': ['string'],
                                                                                                                 'StringValue': 'string'},
                                                                                'TargetDocumentAttributeValueDeletion': 'boolean'}}],
                                           'PostExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                       'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                            'LongValue': 'long',
                                                                                                                            'StringListValue': ['string'],
                                                                                                                            'StringValue': 'string'},
                                                                                                       'Operator': 'GreaterThan '
                                                                                                                   '| '
                                                                                                                   'GreaterThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'LessThan '
                                                                                                                   '| '
                                                                                                                   'LessThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'Equals '
                                                                                                                   '| '
                                                                                                                   'NotEquals '
                                                                                                                   '| '
                                                                                                                   'Contains '
                                                                                                                   '| '
                                                                                                                   'NotContains '
                                                                                                                   '| '
                                                                                                                   'Exists '
                                                                                                                   '| '
                                                                                                                   'NotExists '
                                                                                                                   '| '
                                                                                                                   'BeginsWith'},
                                                                               'LambdaArn': 'string',
                                                                               'S3Bucket': 'string'},
                                           'PreExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                      'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                           'LongValue': 'long',
                                                                                                                           'StringListValue': ['string'],
                                                                                                                           'StringValue': 'string'},
                                                                                                      'Operator': 'GreaterThan '
                                                                                                                  '| '
                                                                                                                  'GreaterThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'LessThan '
                                                                                                                  '| '
                                                                                                                  'LessThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'Equals '
                                                                                                                  '| '
                                                                                                                  'NotEquals '
                                                                                                                  '| '
                                                                                                                  'Contains '
                                                                                                                  '| '
                                                                                                                  'NotContains '
                                                                                                                  '| '
                                                                                                                  'Exists '
                                                                                                                  '| '
                                                                                                                  'NotExists '
                                                                                                                  '| '
                                                                                                                  'BeginsWith'},
                                                                              'LambdaArn': 'string',
                                                                              'S3Bucket': 'string'},
                                           'RoleArn': 'string'}}

Adds one or more documents to an index.

The BatchPutDocument operation enables you to ingest inline documents or a set of documents stored in an Amazon S3 bucket. Use this operation to ingest your text and unstructured text into an index, add custom attributes to the documents, and to attach an access control list to the documents added to the index.

The documents are indexed asynchronously. You can see the progress of the batch using Amazon Web Services CloudWatch. Any error messages related to processing the batch are sent to your Amazon Web Services CloudWatch log.

See also: AWS API Documentation

Request Syntax

client.batch_put_document(
    IndexId='string',
    RoleArn='string',
    Documents=[
        {
            'Id': 'string',
            'Title': 'string',
            'Blob': b'bytes',
            'S3Path': {
                'Bucket': 'string',
                'Key': 'string'
            },
            'Attributes': [
                {
                    'Key': 'string',
                    'Value': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
            ],
            'AccessControlList': [
                {
                    'Name': 'string',
                    'Type': 'USER'|'GROUP',
                    'Access': 'ALLOW'|'DENY',
                    'DataSourceId': 'string'
                },
            ],
            'HierarchicalAccessControlList': [
                {
                    'PrincipalList': [
                        {
                            'Name': 'string',
                            'Type': 'USER'|'GROUP',
                            'Access': 'ALLOW'|'DENY',
                            'DataSourceId': 'string'
                        },
                    ]
                },
            ],
            'ContentType': 'PDF'|'HTML'|'MS_WORD'|'PLAIN_TEXT'|'PPT'
        },
    ],
    CustomDocumentEnrichmentConfiguration={
        'InlineConfigurations': [
            {
                'Condition': {
                    'ConditionDocumentAttributeKey': 'string',
                    'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                    'ConditionOnValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'Target': {
                    'TargetDocumentAttributeKey': 'string',
                    'TargetDocumentAttributeValueDeletion': True|False,
                    'TargetDocumentAttributeValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'DocumentContentDeletion': True|False
            },
        ],
        'PreExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'PostExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'RoleArn': 'string'
    }
)
type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index to add the documents to. You need to create the index first using the CreateIndex operation.

type RoleArn:

string

param RoleArn:

The Amazon Resource Name (ARN) of a role that is allowed to run the BatchPutDocument operation. For more information, see IAM Roles for Amazon Kendra.

type Documents:

list

param Documents:

[REQUIRED]

One or more documents to add to the index.

Documents can include custom attributes. For example, 'DataSourceId' and 'DataSourceSyncJobId' are custom attributes that provide information on the synchronization of documents running on a data source. Note, 'DataSourceSyncJobId' could be an optional custom attribute as Amazon Kendra will use the ID of a running sync job.

Documents have the following file size limits.

  • 5 MB total size for inline documents

  • 50 MB total size for files from an S3 bucket

  • 5 MB extracted text for any file

For more information about file size and transaction per second quotas, see Quotas.

  • (dict) --

    A document in an index.

    • Id (string) -- [REQUIRED]

      A unique identifier of the document in the index.

    • Title (string) --

      The title of the document.

    • Blob (bytes) --

      The contents of the document.

      Documents passed to the Blob parameter must be base64 encoded. Your code might not need to encode the document file bytes if you're using an Amazon Web Services SDK to call Amazon Kendra operations. If you are calling the Amazon Kendra endpoint directly using REST, you must base64 encode the contents before sending.

    • S3Path (dict) --

      Information required to find a specific file in an Amazon S3 bucket.

      • Bucket (string) -- [REQUIRED]

        The name of the S3 bucket that contains the file.

      • Key (string) -- [REQUIRED]

        The name of the file.

    • Attributes (list) --

      Custom attributes to apply to the document. Use the custom attributes to provide additional information for searching, to provide facets for refining searches, and to provide additional information in the query response.

      • (dict) --

        A custom attribute value assigned to a document.

        • Key (string) -- [REQUIRED]

          The identifier for the attribute.

        • Value (dict) -- [REQUIRED]

          The value of the attribute.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • AccessControlList (list) --

      Information on user and group access rights, which is used for user context filtering.

      • (dict) --

        Provides user and group information for document access filtering.

        • Name (string) -- [REQUIRED]

          The name of the user or group.

        • Type (string) -- [REQUIRED]

          The type of principal.

        • Access (string) -- [REQUIRED]

          Whether to allow or deny access to the principal.

        • DataSourceId (string) --

          The identifier of the data source the principal should access documents from.

    • HierarchicalAccessControlList (list) --

      The list of principal lists that define the hierarchy for which documents users should have access to.

      • (dict) --

        Information to define the hierarchy for which documents users should have access to.

        • PrincipalList (list) -- [REQUIRED]

          A list of principal lists that define the hierarchy for which documents users should have access to. Each hierarchical list specifies which user or group has allow or deny access for each document.

          • (dict) --

            Provides user and group information for document access filtering.

            • Name (string) -- [REQUIRED]

              The name of the user or group.

            • Type (string) -- [REQUIRED]

              The type of principal.

            • Access (string) -- [REQUIRED]

              Whether to allow or deny access to the principal.

            • DataSourceId (string) --

              The identifier of the data source the principal should access documents from.

    • ContentType (string) --

      The file type of the document in the Blob field.

type CustomDocumentEnrichmentConfiguration:

dict

param CustomDocumentEnrichmentConfiguration:

Configuration information for altering your document metadata and content during the document ingestion process when you use the BatchPutDocument operation.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

  • InlineConfigurations (list) --

    Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.

    • (dict) --

      Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.

      For more information, see Customizing document metadata during the ingestion process.

      • Condition (dict) --

        Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.

        • ConditionDocumentAttributeKey (string) -- [REQUIRED]

          The identifier of the document attribute used for the condition.

          For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

          Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

        • Operator (string) -- [REQUIRED]

          The condition operator.

          For example, you can use 'Contains' to partially match a string.

        • ConditionOnValue (dict) --

          The value used by the operator.

          For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

      • Target (dict) --

        Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.

        • TargetDocumentAttributeKey (string) --

          The identifier of the target document attribute or metadata field.

          For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.

        • TargetDocumentAttributeValueDeletion (boolean) --

          TRUE to delete the existing target value for your specified target attribute key. You cannot create a target value and set this to TRUE. To create a target value ( TargetDocumentAttributeValue), set this to FALSE.

        • TargetDocumentAttributeValue (dict) --

          The target value you want to create for the target attribute.

          For example, 'Finance' could be the target value for the target attribute key 'Department'.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

      • DocumentContentDeletion (boolean) --

        TRUE to delete content if the condition used for the target attribute is met.

  • PreExtractionHookConfiguration (dict) --

    Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

    • InvocationCondition (dict) --

      The condition used for when a Lambda function should be invoked.

      For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

      • ConditionDocumentAttributeKey (string) -- [REQUIRED]

        The identifier of the document attribute used for the condition.

        For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

        Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

      • Operator (string) -- [REQUIRED]

        The condition operator.

        For example, you can use 'Contains' to partially match a string.

      • ConditionOnValue (dict) --

        The value used by the operator.

        For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

        • StringValue (string) --

          A string, such as "department".

        • StringListValue (list) --

          A list of strings.

          • (string) --

        • LongValue (integer) --

          A long integer value.

        • DateValue (datetime) --

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LambdaArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

    • S3Bucket (string) -- [REQUIRED]

      Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

  • PostExtractionHookConfiguration (dict) --

    Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

    • InvocationCondition (dict) --

      The condition used for when a Lambda function should be invoked.

      For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

      • ConditionDocumentAttributeKey (string) -- [REQUIRED]

        The identifier of the document attribute used for the condition.

        For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

        Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

      • Operator (string) -- [REQUIRED]

        The condition operator.

        For example, you can use 'Contains' to partially match a string.

      • ConditionOnValue (dict) --

        The value used by the operator.

        For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

        • StringValue (string) --

          A string, such as "department".

        • StringListValue (list) --

          A list of strings.

          • (string) --

        • LongValue (integer) --

          A long integer value.

        • DateValue (datetime) --

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LambdaArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

    • S3Bucket (string) -- [REQUIRED]

      Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

  • RoleArn (string) --

    The Amazon Resource Name (ARN) of a role with permission to run PreExtractionHookConfiguration and PostExtractionHookConfiguration for altering document metadata and content during the document ingestion process. For more information, see IAM roles for Amazon Kendra.

rtype:

dict

returns:

Response Syntax

{
    'FailedDocuments': [
        {
            'Id': 'string',
            'ErrorCode': 'InternalError'|'InvalidRequest',
            'ErrorMessage': 'string'
        },
    ]
}

Response Structure

  • (dict) --

    • FailedDocuments (list) --

      A list of documents that were not added to the index because the document failed a validation check. Each document contains an error message that indicates why the document couldn't be added to the index.

      If there was an error adding a document to an index the error is reported in your Amazon Web Services CloudWatch log. For more information, see Monitoring Amazon Kendra with Amazon CloudWatch Logs

      • (dict) --

        Provides information about a document that could not be indexed.

        • Id (string) --

          The unique identifier of the document.

        • ErrorCode (string) --

          The type of error that caused the document to fail to be indexed.

        • ErrorMessage (string) --

          A description of the reason why the document could not be indexed.

CreateDataSource (updated) Link ¶
Changes (request)
{'CustomDocumentEnrichmentConfiguration': {'InlineConfigurations': [{'Condition': {'ConditionDocumentAttributeKey': 'string',
                                                                                   'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                        'LongValue': 'long',
                                                                                                        'StringListValue': ['string'],
                                                                                                        'StringValue': 'string'},
                                                                                   'Operator': 'GreaterThan '
                                                                                               '| '
                                                                                               'GreaterThanOrEquals '
                                                                                               '| '
                                                                                               'LessThan '
                                                                                               '| '
                                                                                               'LessThanOrEquals '
                                                                                               '| '
                                                                                               'Equals '
                                                                                               '| '
                                                                                               'NotEquals '
                                                                                               '| '
                                                                                               'Contains '
                                                                                               '| '
                                                                                               'NotContains '
                                                                                               '| '
                                                                                               'Exists '
                                                                                               '| '
                                                                                               'NotExists '
                                                                                               '| '
                                                                                               'BeginsWith'},
                                                                     'DocumentContentDeletion': 'boolean',
                                                                     'Target': {'TargetDocumentAttributeKey': 'string',
                                                                                'TargetDocumentAttributeValue': {'DateValue': 'timestamp',
                                                                                                                 'LongValue': 'long',
                                                                                                                 'StringListValue': ['string'],
                                                                                                                 'StringValue': 'string'},
                                                                                'TargetDocumentAttributeValueDeletion': 'boolean'}}],
                                           'PostExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                       'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                            'LongValue': 'long',
                                                                                                                            'StringListValue': ['string'],
                                                                                                                            'StringValue': 'string'},
                                                                                                       'Operator': 'GreaterThan '
                                                                                                                   '| '
                                                                                                                   'GreaterThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'LessThan '
                                                                                                                   '| '
                                                                                                                   'LessThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'Equals '
                                                                                                                   '| '
                                                                                                                   'NotEquals '
                                                                                                                   '| '
                                                                                                                   'Contains '
                                                                                                                   '| '
                                                                                                                   'NotContains '
                                                                                                                   '| '
                                                                                                                   'Exists '
                                                                                                                   '| '
                                                                                                                   'NotExists '
                                                                                                                   '| '
                                                                                                                   'BeginsWith'},
                                                                               'LambdaArn': 'string',
                                                                               'S3Bucket': 'string'},
                                           'PreExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                      'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                           'LongValue': 'long',
                                                                                                                           'StringListValue': ['string'],
                                                                                                                           'StringValue': 'string'},
                                                                                                      'Operator': 'GreaterThan '
                                                                                                                  '| '
                                                                                                                  'GreaterThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'LessThan '
                                                                                                                  '| '
                                                                                                                  'LessThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'Equals '
                                                                                                                  '| '
                                                                                                                  'NotEquals '
                                                                                                                  '| '
                                                                                                                  'Contains '
                                                                                                                  '| '
                                                                                                                  'NotContains '
                                                                                                                  '| '
                                                                                                                  'Exists '
                                                                                                                  '| '
                                                                                                                  'NotExists '
                                                                                                                  '| '
                                                                                                                  'BeginsWith'},
                                                                              'LambdaArn': 'string',
                                                                              'S3Bucket': 'string'},
                                           'RoleArn': 'string'}}

Creates a data source that you want to use with an Amazon Kendra index.

You specify a name, data source connector type and description for your data source. You also specify configuration information for the data source connector.

CreateDataSource is a synchronous operation. The operation returns 200 if the data source was successfully created. Otherwise, an exception is raised.

Amazon S3 and custom data sources are the only supported data sources in the Amazon Web Services GovCloud (US-West) region.

See also: AWS API Documentation

Request Syntax

client.create_data_source(
    Name='string',
    IndexId='string',
    Type='S3'|'SHAREPOINT'|'DATABASE'|'SALESFORCE'|'ONEDRIVE'|'SERVICENOW'|'CUSTOM'|'CONFLUENCE'|'GOOGLEDRIVE'|'WEBCRAWLER'|'WORKDOCS',
    Configuration={
        'S3Configuration': {
            'BucketName': 'string',
            'InclusionPrefixes': [
                'string',
            ],
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'DocumentsMetadataConfiguration': {
                'S3Prefix': 'string'
            },
            'AccessControlListConfiguration': {
                'KeyPath': 'string'
            }
        },
        'SharePointConfiguration': {
            'SharePointVersion': 'SHAREPOINT_2013'|'SHAREPOINT_2016'|'SHAREPOINT_ONLINE',
            'Urls': [
                'string',
            ],
            'SecretArn': 'string',
            'CrawlAttachments': True|False,
            'UseChangeLog': True|False,
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'DocumentTitleFieldName': 'string',
            'DisableLocalGroups': True|False,
            'SslCertificateS3Path': {
                'Bucket': 'string',
                'Key': 'string'
            }
        },
        'DatabaseConfiguration': {
            'DatabaseEngineType': 'RDS_AURORA_MYSQL'|'RDS_AURORA_POSTGRESQL'|'RDS_MYSQL'|'RDS_POSTGRESQL',
            'ConnectionConfiguration': {
                'DatabaseHost': 'string',
                'DatabasePort': 123,
                'DatabaseName': 'string',
                'TableName': 'string',
                'SecretArn': 'string'
            },
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'ColumnConfiguration': {
                'DocumentIdColumnName': 'string',
                'DocumentDataColumnName': 'string',
                'DocumentTitleColumnName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'ChangeDetectingColumns': [
                    'string',
                ]
            },
            'AclConfiguration': {
                'AllowedGroupsColumnName': 'string'
            },
            'SqlConfiguration': {
                'QueryIdentifiersEnclosingOption': 'DOUBLE_QUOTES'|'NONE'
            }
        },
        'SalesforceConfiguration': {
            'ServerUrl': 'string',
            'SecretArn': 'string',
            'StandardObjectConfigurations': [
                {
                    'Name': 'ACCOUNT'|'CAMPAIGN'|'CASE'|'CONTACT'|'CONTRACT'|'DOCUMENT'|'GROUP'|'IDEA'|'LEAD'|'OPPORTUNITY'|'PARTNER'|'PRICEBOOK'|'PRODUCT'|'PROFILE'|'SOLUTION'|'TASK'|'USER',
                    'DocumentDataFieldName': 'string',
                    'DocumentTitleFieldName': 'string',
                    'FieldMappings': [
                        {
                            'DataSourceFieldName': 'string',
                            'DateFieldFormat': 'string',
                            'IndexFieldName': 'string'
                        },
                    ]
                },
            ],
            'KnowledgeArticleConfiguration': {
                'IncludedStates': [
                    'DRAFT'|'PUBLISHED'|'ARCHIVED',
                ],
                'StandardKnowledgeArticleTypeConfiguration': {
                    'DocumentDataFieldName': 'string',
                    'DocumentTitleFieldName': 'string',
                    'FieldMappings': [
                        {
                            'DataSourceFieldName': 'string',
                            'DateFieldFormat': 'string',
                            'IndexFieldName': 'string'
                        },
                    ]
                },
                'CustomKnowledgeArticleTypeConfigurations': [
                    {
                        'Name': 'string',
                        'DocumentDataFieldName': 'string',
                        'DocumentTitleFieldName': 'string',
                        'FieldMappings': [
                            {
                                'DataSourceFieldName': 'string',
                                'DateFieldFormat': 'string',
                                'IndexFieldName': 'string'
                            },
                        ]
                    },
                ]
            },
            'ChatterFeedConfiguration': {
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'IncludeFilterTypes': [
                    'ACTIVE_USER'|'STANDARD_USER',
                ]
            },
            'CrawlAttachments': True|False,
            'StandardObjectAttachmentConfiguration': {
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'IncludeAttachmentFilePatterns': [
                'string',
            ],
            'ExcludeAttachmentFilePatterns': [
                'string',
            ]
        },
        'OneDriveConfiguration': {
            'TenantDomain': 'string',
            'SecretArn': 'string',
            'OneDriveUsers': {
                'OneDriveUserList': [
                    'string',
                ],
                'OneDriveUserS3Path': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'DisableLocalGroups': True|False
        },
        'ServiceNowConfiguration': {
            'HostUrl': 'string',
            'SecretArn': 'string',
            'ServiceNowBuildVersion': 'LONDON'|'OTHERS',
            'KnowledgeArticleConfiguration': {
                'CrawlAttachments': True|False,
                'IncludeAttachmentFilePatterns': [
                    'string',
                ],
                'ExcludeAttachmentFilePatterns': [
                    'string',
                ],
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'FilterQuery': 'string'
            },
            'ServiceCatalogConfiguration': {
                'CrawlAttachments': True|False,
                'IncludeAttachmentFilePatterns': [
                    'string',
                ],
                'ExcludeAttachmentFilePatterns': [
                    'string',
                ],
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'AuthenticationType': 'HTTP_BASIC'|'OAUTH2'
        },
        'ConfluenceConfiguration': {
            'ServerUrl': 'string',
            'SecretArn': 'string',
            'Version': 'CLOUD'|'SERVER',
            'SpaceConfiguration': {
                'CrawlPersonalSpaces': True|False,
                'CrawlArchivedSpaces': True|False,
                'IncludeSpaces': [
                    'string',
                ],
                'ExcludeSpaces': [
                    'string',
                ],
                'SpaceFieldMappings': [
                    {
                        'DataSourceFieldName': 'DISPLAY_URL'|'ITEM_TYPE'|'SPACE_KEY'|'URL',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'PageConfiguration': {
                'PageFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'CONTENT_STATUS'|'CREATED_DATE'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'MODIFIED_DATE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'BlogConfiguration': {
                'BlogFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'PUBLISH_DATE'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'AttachmentConfiguration': {
                'CrawlAttachments': True|False,
                'AttachmentFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'CONTENT_TYPE'|'CREATED_DATE'|'DISPLAY_URL'|'FILE_SIZE'|'ITEM_TYPE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ]
        },
        'GoogleDriveConfiguration': {
            'SecretArn': 'string',
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'ExcludeMimeTypes': [
                'string',
            ],
            'ExcludeUserAccounts': [
                'string',
            ],
            'ExcludeSharedDrives': [
                'string',
            ]
        },
        'WebCrawlerConfiguration': {
            'Urls': {
                'SeedUrlConfiguration': {
                    'SeedUrls': [
                        'string',
                    ],
                    'WebCrawlerMode': 'HOST_ONLY'|'SUBDOMAINS'|'EVERYTHING'
                },
                'SiteMapsConfiguration': {
                    'SiteMaps': [
                        'string',
                    ]
                }
            },
            'CrawlDepth': 123,
            'MaxLinksPerPage': 123,
            'MaxContentSizePerPageInMegaBytes': ...,
            'MaxUrlsPerMinuteCrawlRate': 123,
            'UrlInclusionPatterns': [
                'string',
            ],
            'UrlExclusionPatterns': [
                'string',
            ],
            'ProxyConfiguration': {
                'Host': 'string',
                'Port': 123,
                'Credentials': 'string'
            },
            'AuthenticationConfiguration': {
                'BasicAuthentication': [
                    {
                        'Host': 'string',
                        'Port': 123,
                        'Credentials': 'string'
                    },
                ]
            }
        },
        'WorkDocsConfiguration': {
            'OrganizationId': 'string',
            'CrawlComments': True|False,
            'UseChangeLog': True|False,
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ]
        }
    },
    Description='string',
    Schedule='string',
    RoleArn='string',
    Tags=[
        {
            'Key': 'string',
            'Value': 'string'
        },
    ],
    ClientToken='string',
    LanguageCode='string',
    CustomDocumentEnrichmentConfiguration={
        'InlineConfigurations': [
            {
                'Condition': {
                    'ConditionDocumentAttributeKey': 'string',
                    'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                    'ConditionOnValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'Target': {
                    'TargetDocumentAttributeKey': 'string',
                    'TargetDocumentAttributeValueDeletion': True|False,
                    'TargetDocumentAttributeValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'DocumentContentDeletion': True|False
            },
        ],
        'PreExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'PostExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'RoleArn': 'string'
    }
)
type Name:

string

param Name:

[REQUIRED]

A unique name for the data source. A data source name can't be changed without deleting and recreating the data source.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index that should be associated with this data source.

type Type:

string

param Type:

[REQUIRED]

The type of repository that contains the data source.

type Configuration:

dict

param Configuration:

The connector configuration information that is required to access the repository.

You can't specify the Configuration parameter when the Type parameter is set to CUSTOM. If you do, you receive a ValidationException exception.

The Configuration parameter is required for all other data sources.

  • S3Configuration (dict) --

    Provides information to create a data source connector for a document repository in an Amazon S3 bucket.

    • BucketName (string) -- [REQUIRED]

      The name of the bucket that contains the documents.

    • InclusionPrefixes (list) --

      A list of S3 prefixes for the documents that should be included in the index.

      • (string) --

    • InclusionPatterns (list) --

      A list of glob patterns for documents that should be indexed. If a document that matches an inclusion pattern also matches an exclusion pattern, the document is not indexed.

      Some examples are:

      • **.txt* will include all text files in a directory (files with the extension .txt).

      • **/.txt* will include all text files in a directory and its subdirectories.

      • tax will include all files in a directory that contain 'tax' in the file name, such as 'tax', 'taxes', 'income_tax'.

      • (string) --

    • ExclusionPatterns (list) --

      A list of glob patterns for documents that should not be indexed. If a document that matches an inclusion prefix or inclusion pattern also matches an exclusion pattern, the document is not indexed.

      Some examples are:

      • **.png , .jpg will exclude all PNG and JPEG image files in a directory (files with the extensions .png and .jpg).

      • internal will exclude all files in a directory that contain 'internal' in the file name, such as 'internal', 'internal_only', 'company_internal'.

      • */*internal will exclude all internal-related files in a directory and its subdirectories.

      • (string) --

    • DocumentsMetadataConfiguration (dict) --

      Document metadata files that contain information such as the document access control information, source URI, document author, and custom attributes. Each metadata file contains metadata about a single document.

      • S3Prefix (string) --

        A prefix used to filter metadata configuration files in the Amazon Web Services S3 bucket. The S3 bucket might contain multiple metadata files. Use S3Prefix to include only the desired metadata files.

    • AccessControlListConfiguration (dict) --

      Provides the path to the S3 bucket that contains the user context filtering files for the data source. For the format of the file, see Access control for S3 data sources.

      • KeyPath (string) --

        Path to the Amazon Web Services S3 bucket that contains the ACL files.

  • SharePointConfiguration (dict) --

    Provides information necessary to create a data source connector for a Microsoft SharePoint site.

    • SharePointVersion (string) -- [REQUIRED]

      The version of Microsoft SharePoint that you are using as a data source.

    • Urls (list) -- [REQUIRED]

      The URLs of the Microsoft SharePoint site that contains the documents that should be indexed.

      • (string) --

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. If you use SharePoint Server, you also need to provide the sever domain name as part of the credentials. For more information, see Using a Microsoft SharePoint Data Source. For more information about Secrets Manager see What Is Secrets Manager in the Secrets Manager user guide.

    • CrawlAttachments (boolean) --

      TRUE to include attachments to documents stored in your Microsoft SharePoint site in the index; otherwise, FALSE.

    • UseChangeLog (boolean) --

      Set to TRUE to use the Microsoft SharePoint change log to determine the documents that need to be updated in the index. Depending on the size of the SharePoint change log, it may take longer for Amazon Kendra to use the change log than it takes it to determine the changed documents using the Amazon Kendra document crawler.

    • InclusionPatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

      The regex is applied to the display URL of the SharePoint document.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

      The regex is applied to the display URL of the SharePoint document.

      • (string) --

    • VpcConfiguration (dict) --

      Provides information for connecting to an Amazon VPC.

      • SubnetIds (list) -- [REQUIRED]

        A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

        • (string) --

      • SecurityGroupIds (list) -- [REQUIRED]

        A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

        • (string) --

    • FieldMappings (list) --

      A list of DataSourceToIndexFieldMapping objects that map Microsoft SharePoint attributes to custom fields in the Amazon Kendra index. You must first create the index fields using the UpdateIndex operation before you map SharePoint attributes. For more information, see Mapping Data Source Fields.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

    • DocumentTitleFieldName (string) --

      The Microsoft SharePoint attribute field that contains the title of the document.

    • DisableLocalGroups (boolean) --

      A Boolean value that specifies whether local groups are disabled ( True) or enabled ( False).

    • SslCertificateS3Path (dict) --

      Information required to find a specific file in an Amazon S3 bucket.

      • Bucket (string) -- [REQUIRED]

        The name of the S3 bucket that contains the file.

      • Key (string) -- [REQUIRED]

        The name of the file.

  • DatabaseConfiguration (dict) --

    Provides information necessary to create a data source connector for a database.

    • DatabaseEngineType (string) -- [REQUIRED]

      The type of database engine that runs the database.

    • ConnectionConfiguration (dict) -- [REQUIRED]

      The information necessary to connect to a database.

      • DatabaseHost (string) -- [REQUIRED]

        The name of the host for the database. Can be either a string (host.subdomain.domain.tld) or an IPv4 or IPv6 address.

      • DatabasePort (integer) -- [REQUIRED]

        The port that the database uses for connections.

      • DatabaseName (string) -- [REQUIRED]

        The name of the database containing the document data.

      • TableName (string) -- [REQUIRED]

        The name of the table that contains the document data.

      • SecretArn (string) -- [REQUIRED]

        The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. For more information, see Using a Database Data Source. For more information about Secrets Manager, see What Is Secrets Manager in the Secrets Manager user guide.

    • VpcConfiguration (dict) --

      Provides information for connecting to an Amazon VPC.

      • SubnetIds (list) -- [REQUIRED]

        A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

        • (string) --

      • SecurityGroupIds (list) -- [REQUIRED]

        A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

        • (string) --

    • ColumnConfiguration (dict) -- [REQUIRED]

      Information about where the index should get the document information from the database.

      • DocumentIdColumnName (string) -- [REQUIRED]

        The column that provides the document's unique identifier.

      • DocumentDataColumnName (string) -- [REQUIRED]

        The column that contains the contents of the document.

      • DocumentTitleColumnName (string) --

        The column that contains the title of the document.

      • FieldMappings (list) --

        An array of objects that map database column names to the corresponding fields in an index. You must first create the fields in the index using the UpdateIndex operation.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

      • ChangeDetectingColumns (list) -- [REQUIRED]

        One to five columns that indicate when a document in the database has changed.

        • (string) --

    • AclConfiguration (dict) --

      Information about the database column that provides information for user context filtering.

      • AllowedGroupsColumnName (string) -- [REQUIRED]

        A list of groups, separated by semi-colons, that filters a query response based on user context. The document is only returned to users that are in one of the groups specified in the UserContext field of the Query operation.

    • SqlConfiguration (dict) --

      Provides information about how Amazon Kendra uses quote marks around SQL identifiers when querying a database data source.

      • QueryIdentifiersEnclosingOption (string) --

        Determines whether Amazon Kendra encloses SQL identifiers for tables and column names in double quotes (") when making a database query.

        By default, Amazon Kendra passes SQL identifiers the way that they are entered into the data source configuration. It does not change the case of identifiers or enclose them in quotes.

        PostgreSQL internally converts uppercase characters to lower case characters in identifiers unless they are quoted. Choosing this option encloses identifiers in quotes so that PostgreSQL does not convert the character's case.

        For MySQL databases, you must enable the ansi_quotes option when you set this field to DOUBLE_QUOTES.

  • SalesforceConfiguration (dict) --

    Provides configuration information for data sources that connect to a Salesforce site.

    • ServerUrl (string) -- [REQUIRED]

      The instance URL for the Salesforce site that you want to index.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the key/value pairs required to connect to your Salesforce instance. The secret must contain a JSON structure with the following keys:

      • authenticationUrl - The OAUTH endpoint that Amazon Kendra connects to get an OAUTH token.

      • consumerKey - The application public key generated when you created your Salesforce application.

      • consumerSecret - The application private key generated when you created your Salesforce application.

      • password - The password associated with the user logging in to the Salesforce instance.

      • securityToken - The token associated with the user account logging in to the Salesforce instance.

      • username - The user name of the user logging in to the Salesforce instance.

    • StandardObjectConfigurations (list) --

      Specifies the Salesforce standard objects that Amazon Kendra indexes.

      • (dict) --

        Specifies configuration information for indexing a single standard object.

        • Name (string) -- [REQUIRED]

          The name of the standard object.

        • DocumentDataFieldName (string) -- [REQUIRED]

          The name of the field in the standard object table that contains the document contents.

        • DocumentTitleFieldName (string) --

          The name of the field in the standard object table that contains the document title.

        • FieldMappings (list) --

          One or more objects that map fields in the standard object to Amazon Kendra index fields. The index field must exist before you can map a Salesforce field to it.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) -- [REQUIRED]

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) -- [REQUIRED]

              The name of the field in the index.

    • KnowledgeArticleConfiguration (dict) --

      Specifies configuration information for the knowledge article types that Amazon Kendra indexes. Amazon Kendra indexes standard knowledge articles and the standard fields of knowledge articles, or the custom fields of custom knowledge articles, but not both.

      • IncludedStates (list) -- [REQUIRED]

        Specifies the document states that should be included when Amazon Kendra indexes knowledge articles. You must specify at least one state.

        • (string) --

      • StandardKnowledgeArticleTypeConfiguration (dict) --

        Provides configuration information for standard Salesforce knowledge articles.

        • DocumentDataFieldName (string) -- [REQUIRED]

          The name of the field that contains the document data to index.

        • DocumentTitleFieldName (string) --

          The name of the field that contains the document title.

        • FieldMappings (list) --

          One or more objects that map fields in the knowledge article to Amazon Kendra index fields. The index field must exist before you can map a Salesforce field to it.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) -- [REQUIRED]

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) -- [REQUIRED]

              The name of the field in the index.

      • CustomKnowledgeArticleTypeConfigurations (list) --

        Provides configuration information for custom Salesforce knowledge articles.

        • (dict) --

          Provides configuration information for indexing Salesforce custom articles.

          • Name (string) -- [REQUIRED]

            The name of the configuration.

          • DocumentDataFieldName (string) -- [REQUIRED]

            The name of the field in the custom knowledge article that contains the document data to index.

          • DocumentTitleFieldName (string) --

            The name of the field in the custom knowledge article that contains the document title.

          • FieldMappings (list) --

            One or more objects that map fields in the custom knowledge article to fields in the Amazon Kendra index.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) -- [REQUIRED]

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) -- [REQUIRED]

                The name of the field in the index.

    • ChatterFeedConfiguration (dict) --

      Specifies configuration information for Salesforce chatter feeds.

      • DocumentDataFieldName (string) -- [REQUIRED]

        The name of the column in the Salesforce FeedItem table that contains the content to index. Typically this is the Body column.

      • DocumentTitleFieldName (string) --

        The name of the column in the Salesforce FeedItem table that contains the title of the document. This is typically the Title column.

      • FieldMappings (list) --

        Maps fields from a Salesforce chatter feed into Amazon Kendra index fields.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

      • IncludeFilterTypes (list) --

        Filters the documents in the feed based on status of the user. When you specify ACTIVE_USERS only documents from users who have an active account are indexed. When you specify STANDARD_USER only documents for Salesforce standard users are documented. You can specify both.

        • (string) --

    • CrawlAttachments (boolean) --

      Indicates whether Amazon Kendra should index attachments to Salesforce objects.

    • StandardObjectAttachmentConfiguration (dict) --

      Provides configuration information for processing attachments to Salesforce standard objects.

      • DocumentTitleFieldName (string) --

        The name of the field used for the document title.

      • FieldMappings (list) --

        One or more objects that map fields in attachments to Amazon Kendra index fields.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

    • IncludeAttachmentFilePatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

      The regex is applied to the name of the attached file.

      • (string) --

    • ExcludeAttachmentFilePatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

      The regex is applied to the name of the attached file.

      • (string) --

  • OneDriveConfiguration (dict) --

    Provides configuration for data sources that connect to Microsoft OneDrive.

    • TenantDomain (string) -- [REQUIRED]

      The Azure Active Directory domain of the organization.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the user name and password to connect to OneDrive. The user namd should be the application ID for the OneDrive application, and the password is the application key for the OneDrive application.

    • OneDriveUsers (dict) -- [REQUIRED]

      A list of user accounts whose documents should be indexed.

      • OneDriveUserList (list) --

        A list of users whose documents should be indexed. Specify the user names in email format, for example, username@tenantdomain. If you need to index the documents of more than 100 users, use the OneDriveUserS3Path field to specify the location of a file containing a list of users.

        • (string) --

      • OneDriveUserS3Path (dict) --

        The S3 bucket location of a file containing a list of users whose documents should be indexed.

        • Bucket (string) -- [REQUIRED]

          The name of the S3 bucket that contains the file.

        • Key (string) -- [REQUIRED]

          The name of the file.

    • InclusionPatterns (list) --

      A list of regular expression patterns. Documents that match the pattern are included in the index. Documents that don't match the pattern are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

      The exclusion pattern is applied to the file name.

      • (string) --

    • ExclusionPatterns (list) --

      List of regular expressions applied to documents. Items that match the exclusion pattern are not indexed. If you provide both an inclusion pattern and an exclusion pattern, any item that matches the exclusion pattern isn't indexed.

      The exclusion pattern is applied to the file name.

      • (string) --

    • FieldMappings (list) --

      A list of DataSourceToIndexFieldMapping objects that map Microsoft OneDrive fields to custom fields in the Amazon Kendra index. You must first create the index fields before you map OneDrive fields.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

    • DisableLocalGroups (boolean) --

      A Boolean value that specifies whether local groups are disabled ( True) or enabled ( False).

  • ServiceNowConfiguration (dict) --

    Provides configuration for data sources that connect to ServiceNow instances.

    • HostUrl (string) -- [REQUIRED]

      The ServiceNow instance that the data source connects to. The host endpoint should look like the following: {instance}.service-now.com.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of the Secrets Manager secret that contains the user name and password required to connect to the ServiceNow instance.

    • ServiceNowBuildVersion (string) -- [REQUIRED]

      The identifier of the release that the ServiceNow host is running. If the host is not running the LONDON release, use OTHERS.

    • KnowledgeArticleConfiguration (dict) --

      Provides configuration information for crawling knowledge articles in the ServiceNow site.

      • CrawlAttachments (boolean) --

        Indicates whether Amazon Kendra should index attachments to knowledge articles.

      • IncludeAttachmentFilePatterns (list) --

        List of regular expressions applied to knowledge articles. Items that don't match the inclusion pattern are not indexed. The regex is applied to the field specified in the PatternTargetField.

        • (string) --

      • ExcludeAttachmentFilePatterns (list) --

        List of regular expressions applied to knowledge articles. Items that don't match the inclusion pattern are not indexed. The regex is applied to the field specified in the PatternTargetField

        • (string) --

      • DocumentDataFieldName (string) -- [REQUIRED]

        The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.

      • DocumentTitleFieldName (string) --

        The name of the ServiceNow field that is mapped to the index document title field.

      • FieldMappings (list) --

        Mapping between ServiceNow fields and Amazon Kendra index fields. You must create the index field before you map the field.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

      • FilterQuery (string) --

        A query that selects the knowledge articles to index. The query can return articles from multiple knowledge bases, and the knowledge bases can be public or private.

        The query string must be one generated by the ServiceNow console. For more information, see Specifying documents to index with a query.

    • ServiceCatalogConfiguration (dict) --

      Provides configuration information for crawling service catalogs in the ServiceNow site.

      • CrawlAttachments (boolean) --

        Indicates whether Amazon Kendra should crawl attachments to the service catalog items.

      • IncludeAttachmentFilePatterns (list) --

        A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

        The regex is applied to the file name of the attachment.

        • (string) --

      • ExcludeAttachmentFilePatterns (list) --

        A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

        The regex is applied to the file name of the attachment.

        • (string) --

      • DocumentDataFieldName (string) -- [REQUIRED]

        The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.

      • DocumentTitleFieldName (string) --

        The name of the ServiceNow field that is mapped to the index document title field.

      • FieldMappings (list) --

        Mapping between ServiceNow fields and Amazon Kendra index fields. You must create the index field before you map the field.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

    • AuthenticationType (string) --

      Determines the type of authentication used to connect to the ServiceNow instance. If you choose HTTP_BASIC, Amazon Kendra is authenticated using the user name and password provided in the Secrets Manager secret in the SecretArn field. When you choose OAUTH2, Amazon Kendra is authenticated using the OAuth token and secret provided in the Secrets Manager secret, and the user name and password are used to determine which information Amazon Kendra has access to.

      When you use OAUTH2 authentication, you must generate a token and a client secret using the ServiceNow console. For more information, see Using a ServiceNow data source.

  • ConfluenceConfiguration (dict) --

    Provides configuration information for connecting to a Confluence data source.

    • ServerUrl (string) -- [REQUIRED]

      The URL of your Confluence instance. Use the full URL of the server. For example, https://server.example.com:port/. You can also use an IP address, for example, https://192.168.1.113/.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key/value pairs required to connect to your Confluence server. The secret must contain a JSON structure with the following keys:

      • username - The user name or email address of a user with administrative privileges for the Confluence server.

      • password - The password associated with the user logging in to the Confluence server.

    • Version (string) -- [REQUIRED]

      Specifies the version of the Confluence installation that you are connecting to.

    • SpaceConfiguration (dict) --

      Specifies configuration information for indexing Confluence spaces.

      • CrawlPersonalSpaces (boolean) --

        Specifies whether Amazon Kendra should index personal spaces. Users can add restrictions to items in personal spaces. If personal spaces are indexed, queries without user context information may return restricted items from a personal space in their results. For more information, see Filtering on user context.

      • CrawlArchivedSpaces (boolean) --

        Specifies whether Amazon Kendra should index archived spaces.

      • IncludeSpaces (list) --

        A list of space keys for Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are indexed. Spaces that aren't in the list aren't indexed. A space in the list must exist. Otherwise, Amazon Kendra logs an error when the data source is synchronized. If a space is in both the IncludeSpaces and the ExcludeSpaces list, the space is excluded.

        • (string) --

      • ExcludeSpaces (list) --

        A list of space keys of Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are not indexed. If a space is in both the ExcludeSpaces and the IncludeSpaces list, the space is excluded.

        • (string) --

      • SpaceFieldMappings (list) --

        Defines how space metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the SpaceFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • PageConfiguration (dict) --

      Specifies configuration information for indexing Confluence pages.

      • PageFieldMappings (list) --

        Defines how page metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the PageFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • BlogConfiguration (dict) --

      Specifies configuration information for indexing Confluence blogs.

      • BlogFieldMappings (list) --

        Defines how blog metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the BlogFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a blog field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • AttachmentConfiguration (dict) --

      Specifies configuration information for indexing attachments to Confluence blogs and pages.

      • CrawlAttachments (boolean) --

        Indicates whether Amazon Kendra indexes attachments to the pages and blogs in the Confluence data source.

      • AttachmentFieldMappings (list) --

        Defines how attachment metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the AttachentFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

            You must first create the index field using the UpdateIndex operation.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • VpcConfiguration (dict) --

      Specifies the information for connecting to an Amazon VPC.

      • SubnetIds (list) -- [REQUIRED]

        A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

        • (string) --

      • SecurityGroupIds (list) -- [REQUIRED]

        A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

        • (string) --

    • InclusionPatterns (list) --

      A list of regular expression patterns that apply to a URL on the Confluence server. An inclusion pattern can apply to a blog post, a page, a space, or an attachment. Items that match the patterns are included in the index. Items that don't match the pattern are excluded from the index. If an item matches both an inclusion pattern and an exclusion pattern, the item isn't included in the index.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns that apply to a URL on the Confluence server. An exclusion pattern can apply to a blog post, a page, a space, or an attachment. Items that match the pattern are excluded from the index. Items that don't match the pattern are included in the index. If a item matches both an exclusion pattern and an inclusion pattern, the item isn't included in the index.

      • (string) --

  • GoogleDriveConfiguration (dict) --

    Provides configuration for data sources that connect to Google Drive.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a Secrets Managersecret that contains the credentials required to connect to Google Drive. For more information, see Using a Google Workspace Drive data source.

    • InclusionPatterns (list) --

      A list of regular expression patterns that apply to path on Google Drive. Items that match the pattern are included in the index from both shared drives and users' My Drives. Items that don't match the pattern are excluded from the index. If an item matches both an inclusion pattern and an exclusion pattern, it is excluded from the index.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns that apply to the path on Google Drive. Items that match the pattern are excluded from the index from both shared drives and users' My Drives. Items that don't match the pattern are included in the index. If an item matches both an exclusion pattern and an inclusion pattern, it is excluded from the index.

      • (string) --

    • FieldMappings (list) --

      Defines mapping between a field in the Google Drive and a Amazon Kendra index field.

      If you are using the console, you can define index fields when creating the mapping. If you are using the API, you must first create the field using the UpdateIndex operation.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

    • ExcludeMimeTypes (list) --

      A list of MIME types to exclude from the index. All documents matching the specified MIME type are excluded.

      For a list of MIME types, see Using a Google Workspace Drive data source.

      • (string) --

    • ExcludeUserAccounts (list) --

      A list of email addresses of the users. Documents owned by these users are excluded from the index. Documents shared with excluded users are indexed unless they are excluded in another way.

      • (string) --

    • ExcludeSharedDrives (list) --

      A list of identifiers or shared drives to exclude from the index. All files and folders stored on the shared drive are excluded.

      • (string) --

  • WebCrawlerConfiguration (dict) --

    Provides the configuration information required for Amazon Kendra Web Crawler.

    • Urls (dict) -- [REQUIRED]

      Specifies the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl.

      You can include website subdomains. You can list up to 100 seed URLs and up to three sitemap URLs.

      You can only crawl websites that use the secure communication protocol, Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling.

      When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own webpages, or webpages that you have authorization to index.

      • SeedUrlConfiguration (dict) --

        Provides the configuration of the seed or starting point URLs of the websites you want to crawl.

        You can choose to crawl only the website host names, or the website host names with subdomains, or the website host names with subdomains and other domains that the webpages link to.

        You can list up to 100 seed URLs.

        • SeedUrls (list) -- [REQUIRED]

          The list of seed or starting point URLs of the websites you want to crawl.

          The list can include a maximum of 100 seed URLs.

          • (string) --

        • WebCrawlerMode (string) --

          You can choose one of the following modes:

          • HOST_ONLY – crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

          • SUBDOMAINS – crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

          • EVERYTHING – crawl the website host names with subdomains and other domains that the webpages link to.

          The default mode is set to HOST_ONLY.

      • SiteMapsConfiguration (dict) --

        Provides the configuration of the sitemap URLs of the websites you want to crawl.

        Only URLs belonging to the same website host names are crawled. You can list up to three sitemap URLs.

        • SiteMaps (list) -- [REQUIRED]

          The list of sitemap URLs of the websites you want to crawl.

          The list can include a maximum of three sitemap URLs.

          • (string) --

    • CrawlDepth (integer) --

      Specifies the number of levels in a website that you want to crawl.

      The first level begins from the website seed or starting point URL. For example, if a website has 3 levels – index level (i.e. seed in this example), sections level, and subsections level – and you are only interested in crawling information up to the sections level (i.e. levels 0-1), you can set your depth to 1.

      The default crawl depth is set to 2.

    • MaxLinksPerPage (integer) --

      The maximum number of URLs on a webpage to include when crawling a website. This number is per webpage.

      As a website’s webpages are crawled, any URLs the webpages link to are also crawled. URLs on a webpage are crawled in order of appearance.

      The default maximum links per page is 100.

    • MaxContentSizePerPageInMegaBytes (float) --

      The maximum size (in MB) of a webpage or attachment to crawl.

      Files larger than this size (in MB) are skipped/not crawled.

      The default maximum size of a webpage or attachment is set to 50 MB.

    • MaxUrlsPerMinuteCrawlRate (integer) --

      The maximum number of URLs crawled per website host per minute.

      A minimum of one URL is required.

      The default maximum number of URLs crawled per website host per minute is 300.

    • UrlInclusionPatterns (list) --

      The regular expression pattern to include certain URLs to crawl.

      If there is a regular expression pattern to exclude certain URLs that conflicts with the include pattern, the exclude pattern takes precedence.

      • (string) --

    • UrlExclusionPatterns (list) --

      The regular expression pattern to exclude certain URLs to crawl.

      If there is a regular expression pattern to include certain URLs that conflicts with the exclude pattern, the exclude pattern takes precedence.

      • (string) --

    • ProxyConfiguration (dict) --

      Provides configuration information required to connect to your internal websites via a web proxy.

      You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.

      Web proxy credentials are optional and you can use them to connect to a web proxy server that requires basic authentication. To store web proxy credentials, you use a secret in Secrets Manager.

      • Host (string) -- [REQUIRED]

        The name of the website host you want to connect to via a web proxy server.

        For example, the host name of https://a.example.com/page1.html is "a.example.com".

      • Port (integer) -- [REQUIRED]

        The port number of the website host you want to connect to via a web proxy server.

        For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.

      • Credentials (string) --

        Your secret ARN, which you can create in Secrets Manager

        The credentials are optional. You use a secret if web proxy credentials are required to connect to a website host. Amazon Kendra currently support basic authentication to connect to a web proxy server. The secret stores your credentials.

    • AuthenticationConfiguration (dict) --

      Provides configuration information required to connect to websites using authentication.

      You can connect to websites using basic authentication of user name and password.

      You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS. You use a secret in Secrets Manager to store your authentication credentials.

      • BasicAuthentication (list) --

        The list of configuration information that's required to connect to and crawl a website host using basic authentication credentials.

        The list includes the name and port number of the website host.

        • (dict) --

          Provides the configuration information to connect to websites that require basic user authentication.

          • Host (string) -- [REQUIRED]

            The name of the website host you want to connect to using authentication credentials.

            For example, the host name of https://a.example.com/page1.html is "a.example.com".

          • Port (integer) -- [REQUIRED]

            The port number of the website host you want to connect to using authentication credentials.

            For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.

          • Credentials (string) -- [REQUIRED]

            Your secret ARN, which you can create in Secrets Manager

            You use a secret if basic authentication credentials are required to connect to a website. The secret stores your credentials of user name and password.

  • WorkDocsConfiguration (dict) --

    Provides the configuration information to connect to WorkDocs as your data source.

    • OrganizationId (string) -- [REQUIRED]

      The identifier of the directory corresponding to your Amazon WorkDocs site repository.

      You can find the organization ID in the Directory Service by going to Active Directory, then Directories. Your Amazon WorkDocs site directory has an ID, which is the organization ID. You can also set up a new Amazon WorkDocs directory in the Directory Service console and enable a Amazon WorkDocs site for the directory in the Amazon WorkDocs console.

    • CrawlComments (boolean) --

      TRUE to include comments on documents in your index. Including comments in your index means each comment is a document that can be searched on.

      The default is set to FALSE.

    • UseChangeLog (boolean) --

      TRUE to use the change logs to update documents in your index instead of scanning all documents.

      If you are syncing your Amazon WorkDocs data source with your index for the first time, all documents are scanned. After your first sync, you can use the change logs to update your documents in your index for future syncs.

      The default is set to FALSE.

    • InclusionPatterns (list) --

      A list of regular expression patterns to include certain files in your Amazon WorkDocs site repository. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn’t included in the index.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns to exclude certain files in your Amazon WorkDocs site repository. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn’t included in the index.

      • (string) --

    • FieldMappings (list) --

      A list of DataSourceToIndexFieldMapping objects that map Amazon WorkDocs field names to custom index field names in Amazon Kendra. You must first create the custom index fields using the UpdateIndex operation before you map to Amazon WorkDocs fields. For more information, see Mapping Data Source Fields. The Amazon WorkDocs data source field names need to exist in your Amazon WorkDocs custom metadata.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

type Description:

string

param Description:

A description for the data source.

type Schedule:

string

param Schedule:

Sets the frequency that Amazon Kendra will check the documents in your repository and update the index. If you don't set a schedule Amazon Kendra will not periodically update the index. You can call the StartDataSourceSyncJob operation to update the index.

You can't specify the Schedule parameter when the Type parameter is set to CUSTOM. If you do, you receive a ValidationException exception.

type RoleArn:

string

param RoleArn:

The Amazon Resource Name (ARN) of a role with permission to access the data source. For more information, see IAM Roles for Amazon Kendra.

You can't specify the RoleArn parameter when the Type parameter is set to CUSTOM. If you do, you receive a ValidationException exception.

The RoleArn parameter is required for all other data sources.

type Tags:

list

param Tags:

A list of key-value pairs that identify the data source. You can use the tags to identify and organize your resources and to control access to resources.

  • (dict) --

    A list of key/value pairs that identify an index, FAQ, or data source. Tag keys and values can consist of Unicode letters, digits, white space, and any of the following symbols: _ . : / = + - @.

    • Key (string) -- [REQUIRED]

      The key for the tag. Keys are not case sensitive and must be unique for the index, FAQ, or data source.

    • Value (string) -- [REQUIRED]

      The value associated with the tag. The value may be an empty string but it can't be null.

type ClientToken:

string

param ClientToken:

A token that you provide to identify the request to create a data source. Multiple calls to the CreateDataSource operation with the same client token will create only one data source.

This field is autopopulated if not provided.

type LanguageCode:

string

param LanguageCode:

The code for a language. This allows you to support a language for all documents when creating the data source. English is supported by default. For more information on supported languages, including their codes, see Adding documents in languages other than English.

type CustomDocumentEnrichmentConfiguration:

dict

param CustomDocumentEnrichmentConfiguration:

Configuration information for altering document metadata and content during the document ingestion process when you create a data source.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

  • InlineConfigurations (list) --

    Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.

    • (dict) --

      Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.

      For more information, see Customizing document metadata during the ingestion process.

      • Condition (dict) --

        Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.

        • ConditionDocumentAttributeKey (string) -- [REQUIRED]

          The identifier of the document attribute used for the condition.

          For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

          Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

        • Operator (string) -- [REQUIRED]

          The condition operator.

          For example, you can use 'Contains' to partially match a string.

        • ConditionOnValue (dict) --

          The value used by the operator.

          For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

      • Target (dict) --

        Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.

        • TargetDocumentAttributeKey (string) --

          The identifier of the target document attribute or metadata field.

          For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.

        • TargetDocumentAttributeValueDeletion (boolean) --

          TRUE to delete the existing target value for your specified target attribute key. You cannot create a target value and set this to TRUE. To create a target value ( TargetDocumentAttributeValue), set this to FALSE.

        • TargetDocumentAttributeValue (dict) --

          The target value you want to create for the target attribute.

          For example, 'Finance' could be the target value for the target attribute key 'Department'.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

      • DocumentContentDeletion (boolean) --

        TRUE to delete content if the condition used for the target attribute is met.

  • PreExtractionHookConfiguration (dict) --

    Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

    • InvocationCondition (dict) --

      The condition used for when a Lambda function should be invoked.

      For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

      • ConditionDocumentAttributeKey (string) -- [REQUIRED]

        The identifier of the document attribute used for the condition.

        For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

        Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

      • Operator (string) -- [REQUIRED]

        The condition operator.

        For example, you can use 'Contains' to partially match a string.

      • ConditionOnValue (dict) --

        The value used by the operator.

        For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

        • StringValue (string) --

          A string, such as "department".

        • StringListValue (list) --

          A list of strings.

          • (string) --

        • LongValue (integer) --

          A long integer value.

        • DateValue (datetime) --

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LambdaArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

    • S3Bucket (string) -- [REQUIRED]

      Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

  • PostExtractionHookConfiguration (dict) --

    Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

    • InvocationCondition (dict) --

      The condition used for when a Lambda function should be invoked.

      For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

      • ConditionDocumentAttributeKey (string) -- [REQUIRED]

        The identifier of the document attribute used for the condition.

        For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

        Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

      • Operator (string) -- [REQUIRED]

        The condition operator.

        For example, you can use 'Contains' to partially match a string.

      • ConditionOnValue (dict) --

        The value used by the operator.

        For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

        • StringValue (string) --

          A string, such as "department".

        • StringListValue (list) --

          A list of strings.

          • (string) --

        • LongValue (integer) --

          A long integer value.

        • DateValue (datetime) --

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LambdaArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

    • S3Bucket (string) -- [REQUIRED]

      Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

  • RoleArn (string) --

    The Amazon Resource Name (ARN) of a role with permission to run PreExtractionHookConfiguration and PostExtractionHookConfiguration for altering document metadata and content during the document ingestion process. For more information, see IAM roles for Amazon Kendra.

rtype:

dict

returns:

Response Syntax

{
    'Id': 'string'
}

Response Structure

  • (dict) --

    • Id (string) --

      A unique identifier for the data source.

DescribeDataSource (updated) Link ¶
Changes (response)
{'CustomDocumentEnrichmentConfiguration': {'InlineConfigurations': [{'Condition': {'ConditionDocumentAttributeKey': 'string',
                                                                                   'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                        'LongValue': 'long',
                                                                                                        'StringListValue': ['string'],
                                                                                                        'StringValue': 'string'},
                                                                                   'Operator': 'GreaterThan '
                                                                                               '| '
                                                                                               'GreaterThanOrEquals '
                                                                                               '| '
                                                                                               'LessThan '
                                                                                               '| '
                                                                                               'LessThanOrEquals '
                                                                                               '| '
                                                                                               'Equals '
                                                                                               '| '
                                                                                               'NotEquals '
                                                                                               '| '
                                                                                               'Contains '
                                                                                               '| '
                                                                                               'NotContains '
                                                                                               '| '
                                                                                               'Exists '
                                                                                               '| '
                                                                                               'NotExists '
                                                                                               '| '
                                                                                               'BeginsWith'},
                                                                     'DocumentContentDeletion': 'boolean',
                                                                     'Target': {'TargetDocumentAttributeKey': 'string',
                                                                                'TargetDocumentAttributeValue': {'DateValue': 'timestamp',
                                                                                                                 'LongValue': 'long',
                                                                                                                 'StringListValue': ['string'],
                                                                                                                 'StringValue': 'string'},
                                                                                'TargetDocumentAttributeValueDeletion': 'boolean'}}],
                                           'PostExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                       'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                            'LongValue': 'long',
                                                                                                                            'StringListValue': ['string'],
                                                                                                                            'StringValue': 'string'},
                                                                                                       'Operator': 'GreaterThan '
                                                                                                                   '| '
                                                                                                                   'GreaterThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'LessThan '
                                                                                                                   '| '
                                                                                                                   'LessThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'Equals '
                                                                                                                   '| '
                                                                                                                   'NotEquals '
                                                                                                                   '| '
                                                                                                                   'Contains '
                                                                                                                   '| '
                                                                                                                   'NotContains '
                                                                                                                   '| '
                                                                                                                   'Exists '
                                                                                                                   '| '
                                                                                                                   'NotExists '
                                                                                                                   '| '
                                                                                                                   'BeginsWith'},
                                                                               'LambdaArn': 'string',
                                                                               'S3Bucket': 'string'},
                                           'PreExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                      'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                           'LongValue': 'long',
                                                                                                                           'StringListValue': ['string'],
                                                                                                                           'StringValue': 'string'},
                                                                                                      'Operator': 'GreaterThan '
                                                                                                                  '| '
                                                                                                                  'GreaterThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'LessThan '
                                                                                                                  '| '
                                                                                                                  'LessThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'Equals '
                                                                                                                  '| '
                                                                                                                  'NotEquals '
                                                                                                                  '| '
                                                                                                                  'Contains '
                                                                                                                  '| '
                                                                                                                  'NotContains '
                                                                                                                  '| '
                                                                                                                  'Exists '
                                                                                                                  '| '
                                                                                                                  'NotExists '
                                                                                                                  '| '
                                                                                                                  'BeginsWith'},
                                                                              'LambdaArn': 'string',
                                                                              'S3Bucket': 'string'},
                                           'RoleArn': 'string'}}

Gets information about a Amazon Kendra data source.

See also: AWS API Documentation

Request Syntax

client.describe_data_source(
    Id='string',
    IndexId='string'
)
type Id:

string

param Id:

[REQUIRED]

The unique identifier of the data source to describe.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index that contains the data source.

rtype:

dict

returns:

Response Syntax

{
    'Id': 'string',
    'IndexId': 'string',
    'Name': 'string',
    'Type': 'S3'|'SHAREPOINT'|'DATABASE'|'SALESFORCE'|'ONEDRIVE'|'SERVICENOW'|'CUSTOM'|'CONFLUENCE'|'GOOGLEDRIVE'|'WEBCRAWLER'|'WORKDOCS',
    'Configuration': {
        'S3Configuration': {
            'BucketName': 'string',
            'InclusionPrefixes': [
                'string',
            ],
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'DocumentsMetadataConfiguration': {
                'S3Prefix': 'string'
            },
            'AccessControlListConfiguration': {
                'KeyPath': 'string'
            }
        },
        'SharePointConfiguration': {
            'SharePointVersion': 'SHAREPOINT_2013'|'SHAREPOINT_2016'|'SHAREPOINT_ONLINE',
            'Urls': [
                'string',
            ],
            'SecretArn': 'string',
            'CrawlAttachments': True|False,
            'UseChangeLog': True|False,
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'DocumentTitleFieldName': 'string',
            'DisableLocalGroups': True|False,
            'SslCertificateS3Path': {
                'Bucket': 'string',
                'Key': 'string'
            }
        },
        'DatabaseConfiguration': {
            'DatabaseEngineType': 'RDS_AURORA_MYSQL'|'RDS_AURORA_POSTGRESQL'|'RDS_MYSQL'|'RDS_POSTGRESQL',
            'ConnectionConfiguration': {
                'DatabaseHost': 'string',
                'DatabasePort': 123,
                'DatabaseName': 'string',
                'TableName': 'string',
                'SecretArn': 'string'
            },
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'ColumnConfiguration': {
                'DocumentIdColumnName': 'string',
                'DocumentDataColumnName': 'string',
                'DocumentTitleColumnName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'ChangeDetectingColumns': [
                    'string',
                ]
            },
            'AclConfiguration': {
                'AllowedGroupsColumnName': 'string'
            },
            'SqlConfiguration': {
                'QueryIdentifiersEnclosingOption': 'DOUBLE_QUOTES'|'NONE'
            }
        },
        'SalesforceConfiguration': {
            'ServerUrl': 'string',
            'SecretArn': 'string',
            'StandardObjectConfigurations': [
                {
                    'Name': 'ACCOUNT'|'CAMPAIGN'|'CASE'|'CONTACT'|'CONTRACT'|'DOCUMENT'|'GROUP'|'IDEA'|'LEAD'|'OPPORTUNITY'|'PARTNER'|'PRICEBOOK'|'PRODUCT'|'PROFILE'|'SOLUTION'|'TASK'|'USER',
                    'DocumentDataFieldName': 'string',
                    'DocumentTitleFieldName': 'string',
                    'FieldMappings': [
                        {
                            'DataSourceFieldName': 'string',
                            'DateFieldFormat': 'string',
                            'IndexFieldName': 'string'
                        },
                    ]
                },
            ],
            'KnowledgeArticleConfiguration': {
                'IncludedStates': [
                    'DRAFT'|'PUBLISHED'|'ARCHIVED',
                ],
                'StandardKnowledgeArticleTypeConfiguration': {
                    'DocumentDataFieldName': 'string',
                    'DocumentTitleFieldName': 'string',
                    'FieldMappings': [
                        {
                            'DataSourceFieldName': 'string',
                            'DateFieldFormat': 'string',
                            'IndexFieldName': 'string'
                        },
                    ]
                },
                'CustomKnowledgeArticleTypeConfigurations': [
                    {
                        'Name': 'string',
                        'DocumentDataFieldName': 'string',
                        'DocumentTitleFieldName': 'string',
                        'FieldMappings': [
                            {
                                'DataSourceFieldName': 'string',
                                'DateFieldFormat': 'string',
                                'IndexFieldName': 'string'
                            },
                        ]
                    },
                ]
            },
            'ChatterFeedConfiguration': {
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'IncludeFilterTypes': [
                    'ACTIVE_USER'|'STANDARD_USER',
                ]
            },
            'CrawlAttachments': True|False,
            'StandardObjectAttachmentConfiguration': {
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'IncludeAttachmentFilePatterns': [
                'string',
            ],
            'ExcludeAttachmentFilePatterns': [
                'string',
            ]
        },
        'OneDriveConfiguration': {
            'TenantDomain': 'string',
            'SecretArn': 'string',
            'OneDriveUsers': {
                'OneDriveUserList': [
                    'string',
                ],
                'OneDriveUserS3Path': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'DisableLocalGroups': True|False
        },
        'ServiceNowConfiguration': {
            'HostUrl': 'string',
            'SecretArn': 'string',
            'ServiceNowBuildVersion': 'LONDON'|'OTHERS',
            'KnowledgeArticleConfiguration': {
                'CrawlAttachments': True|False,
                'IncludeAttachmentFilePatterns': [
                    'string',
                ],
                'ExcludeAttachmentFilePatterns': [
                    'string',
                ],
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'FilterQuery': 'string'
            },
            'ServiceCatalogConfiguration': {
                'CrawlAttachments': True|False,
                'IncludeAttachmentFilePatterns': [
                    'string',
                ],
                'ExcludeAttachmentFilePatterns': [
                    'string',
                ],
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'AuthenticationType': 'HTTP_BASIC'|'OAUTH2'
        },
        'ConfluenceConfiguration': {
            'ServerUrl': 'string',
            'SecretArn': 'string',
            'Version': 'CLOUD'|'SERVER',
            'SpaceConfiguration': {
                'CrawlPersonalSpaces': True|False,
                'CrawlArchivedSpaces': True|False,
                'IncludeSpaces': [
                    'string',
                ],
                'ExcludeSpaces': [
                    'string',
                ],
                'SpaceFieldMappings': [
                    {
                        'DataSourceFieldName': 'DISPLAY_URL'|'ITEM_TYPE'|'SPACE_KEY'|'URL',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'PageConfiguration': {
                'PageFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'CONTENT_STATUS'|'CREATED_DATE'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'MODIFIED_DATE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'BlogConfiguration': {
                'BlogFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'PUBLISH_DATE'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'AttachmentConfiguration': {
                'CrawlAttachments': True|False,
                'AttachmentFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'CONTENT_TYPE'|'CREATED_DATE'|'DISPLAY_URL'|'FILE_SIZE'|'ITEM_TYPE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ]
        },
        'GoogleDriveConfiguration': {
            'SecretArn': 'string',
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'ExcludeMimeTypes': [
                'string',
            ],
            'ExcludeUserAccounts': [
                'string',
            ],
            'ExcludeSharedDrives': [
                'string',
            ]
        },
        'WebCrawlerConfiguration': {
            'Urls': {
                'SeedUrlConfiguration': {
                    'SeedUrls': [
                        'string',
                    ],
                    'WebCrawlerMode': 'HOST_ONLY'|'SUBDOMAINS'|'EVERYTHING'
                },
                'SiteMapsConfiguration': {
                    'SiteMaps': [
                        'string',
                    ]
                }
            },
            'CrawlDepth': 123,
            'MaxLinksPerPage': 123,
            'MaxContentSizePerPageInMegaBytes': ...,
            'MaxUrlsPerMinuteCrawlRate': 123,
            'UrlInclusionPatterns': [
                'string',
            ],
            'UrlExclusionPatterns': [
                'string',
            ],
            'ProxyConfiguration': {
                'Host': 'string',
                'Port': 123,
                'Credentials': 'string'
            },
            'AuthenticationConfiguration': {
                'BasicAuthentication': [
                    {
                        'Host': 'string',
                        'Port': 123,
                        'Credentials': 'string'
                    },
                ]
            }
        },
        'WorkDocsConfiguration': {
            'OrganizationId': 'string',
            'CrawlComments': True|False,
            'UseChangeLog': True|False,
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ]
        }
    },
    'CreatedAt': datetime(2015, 1, 1),
    'UpdatedAt': datetime(2015, 1, 1),
    'Description': 'string',
    'Status': 'CREATING'|'DELETING'|'FAILED'|'UPDATING'|'ACTIVE',
    'Schedule': 'string',
    'RoleArn': 'string',
    'ErrorMessage': 'string',
    'LanguageCode': 'string',
    'CustomDocumentEnrichmentConfiguration': {
        'InlineConfigurations': [
            {
                'Condition': {
                    'ConditionDocumentAttributeKey': 'string',
                    'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                    'ConditionOnValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'Target': {
                    'TargetDocumentAttributeKey': 'string',
                    'TargetDocumentAttributeValueDeletion': True|False,
                    'TargetDocumentAttributeValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'DocumentContentDeletion': True|False
            },
        ],
        'PreExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'PostExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'RoleArn': 'string'
    }
}

Response Structure

  • (dict) --

    • Id (string) --

      The identifier of the data source.

    • IndexId (string) --

      The identifier of the index that contains the data source.

    • Name (string) --

      The name that you gave the data source when it was created.

    • Type (string) --

      The type of the data source.

    • Configuration (dict) --

      Information that describes where the data source is located and how the data source is configured. The specific information in the description depends on the data source provider.

      • S3Configuration (dict) --

        Provides information to create a data source connector for a document repository in an Amazon S3 bucket.

        • BucketName (string) --

          The name of the bucket that contains the documents.

        • InclusionPrefixes (list) --

          A list of S3 prefixes for the documents that should be included in the index.

          • (string) --

        • InclusionPatterns (list) --

          A list of glob patterns for documents that should be indexed. If a document that matches an inclusion pattern also matches an exclusion pattern, the document is not indexed.

          Some examples are:

          • **.txt* will include all text files in a directory (files with the extension .txt).

          • **/.txt* will include all text files in a directory and its subdirectories.

          • tax will include all files in a directory that contain 'tax' in the file name, such as 'tax', 'taxes', 'income_tax'.

          • (string) --

        • ExclusionPatterns (list) --

          A list of glob patterns for documents that should not be indexed. If a document that matches an inclusion prefix or inclusion pattern also matches an exclusion pattern, the document is not indexed.

          Some examples are:

          • **.png , .jpg will exclude all PNG and JPEG image files in a directory (files with the extensions .png and .jpg).

          • internal will exclude all files in a directory that contain 'internal' in the file name, such as 'internal', 'internal_only', 'company_internal'.

          • */*internal will exclude all internal-related files in a directory and its subdirectories.

          • (string) --

        • DocumentsMetadataConfiguration (dict) --

          Document metadata files that contain information such as the document access control information, source URI, document author, and custom attributes. Each metadata file contains metadata about a single document.

          • S3Prefix (string) --

            A prefix used to filter metadata configuration files in the Amazon Web Services S3 bucket. The S3 bucket might contain multiple metadata files. Use S3Prefix to include only the desired metadata files.

        • AccessControlListConfiguration (dict) --

          Provides the path to the S3 bucket that contains the user context filtering files for the data source. For the format of the file, see Access control for S3 data sources.

          • KeyPath (string) --

            Path to the Amazon Web Services S3 bucket that contains the ACL files.

      • SharePointConfiguration (dict) --

        Provides information necessary to create a data source connector for a Microsoft SharePoint site.

        • SharePointVersion (string) --

          The version of Microsoft SharePoint that you are using as a data source.

        • Urls (list) --

          The URLs of the Microsoft SharePoint site that contains the documents that should be indexed.

          • (string) --

        • SecretArn (string) --

          The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. If you use SharePoint Server, you also need to provide the sever domain name as part of the credentials. For more information, see Using a Microsoft SharePoint Data Source. For more information about Secrets Manager see What Is Secrets Manager in the Secrets Manager user guide.

        • CrawlAttachments (boolean) --

          TRUE to include attachments to documents stored in your Microsoft SharePoint site in the index; otherwise, FALSE.

        • UseChangeLog (boolean) --

          Set to TRUE to use the Microsoft SharePoint change log to determine the documents that need to be updated in the index. Depending on the size of the SharePoint change log, it may take longer for Amazon Kendra to use the change log than it takes it to determine the changed documents using the Amazon Kendra document crawler.

        • InclusionPatterns (list) --

          A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

          The regex is applied to the display URL of the SharePoint document.

          • (string) --

        • ExclusionPatterns (list) --

          A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

          The regex is applied to the display URL of the SharePoint document.

          • (string) --

        • VpcConfiguration (dict) --

          Provides information for connecting to an Amazon VPC.

          • SubnetIds (list) --

            A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

            • (string) --

          • SecurityGroupIds (list) --

            A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

            • (string) --

        • FieldMappings (list) --

          A list of DataSourceToIndexFieldMapping objects that map Microsoft SharePoint attributes to custom fields in the Amazon Kendra index. You must first create the index fields using the UpdateIndex operation before you map SharePoint attributes. For more information, see Mapping Data Source Fields.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) --

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) --

              The name of the field in the index.

        • DocumentTitleFieldName (string) --

          The Microsoft SharePoint attribute field that contains the title of the document.

        • DisableLocalGroups (boolean) --

          A Boolean value that specifies whether local groups are disabled ( True) or enabled ( False).

        • SslCertificateS3Path (dict) --

          Information required to find a specific file in an Amazon S3 bucket.

          • Bucket (string) --

            The name of the S3 bucket that contains the file.

          • Key (string) --

            The name of the file.

      • DatabaseConfiguration (dict) --

        Provides information necessary to create a data source connector for a database.

        • DatabaseEngineType (string) --

          The type of database engine that runs the database.

        • ConnectionConfiguration (dict) --

          The information necessary to connect to a database.

          • DatabaseHost (string) --

            The name of the host for the database. Can be either a string (host.subdomain.domain.tld) or an IPv4 or IPv6 address.

          • DatabasePort (integer) --

            The port that the database uses for connections.

          • DatabaseName (string) --

            The name of the database containing the document data.

          • TableName (string) --

            The name of the table that contains the document data.

          • SecretArn (string) --

            The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. For more information, see Using a Database Data Source. For more information about Secrets Manager, see What Is Secrets Manager in the Secrets Manager user guide.

        • VpcConfiguration (dict) --

          Provides information for connecting to an Amazon VPC.

          • SubnetIds (list) --

            A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

            • (string) --

          • SecurityGroupIds (list) --

            A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

            • (string) --

        • ColumnConfiguration (dict) --

          Information about where the index should get the document information from the database.

          • DocumentIdColumnName (string) --

            The column that provides the document's unique identifier.

          • DocumentDataColumnName (string) --

            The column that contains the contents of the document.

          • DocumentTitleColumnName (string) --

            The column that contains the title of the document.

          • FieldMappings (list) --

            An array of objects that map database column names to the corresponding fields in an index. You must first create the fields in the index using the UpdateIndex operation.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) --

                The name of the field in the index.

          • ChangeDetectingColumns (list) --

            One to five columns that indicate when a document in the database has changed.

            • (string) --

        • AclConfiguration (dict) --

          Information about the database column that provides information for user context filtering.

          • AllowedGroupsColumnName (string) --

            A list of groups, separated by semi-colons, that filters a query response based on user context. The document is only returned to users that are in one of the groups specified in the UserContext field of the Query operation.

        • SqlConfiguration (dict) --

          Provides information about how Amazon Kendra uses quote marks around SQL identifiers when querying a database data source.

          • QueryIdentifiersEnclosingOption (string) --

            Determines whether Amazon Kendra encloses SQL identifiers for tables and column names in double quotes (") when making a database query.

            By default, Amazon Kendra passes SQL identifiers the way that they are entered into the data source configuration. It does not change the case of identifiers or enclose them in quotes.

            PostgreSQL internally converts uppercase characters to lower case characters in identifiers unless they are quoted. Choosing this option encloses identifiers in quotes so that PostgreSQL does not convert the character's case.

            For MySQL databases, you must enable the ansi_quotes option when you set this field to DOUBLE_QUOTES.

      • SalesforceConfiguration (dict) --

        Provides configuration information for data sources that connect to a Salesforce site.

        • ServerUrl (string) --

          The instance URL for the Salesforce site that you want to index.

        • SecretArn (string) --

          The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the key/value pairs required to connect to your Salesforce instance. The secret must contain a JSON structure with the following keys:

          • authenticationUrl - The OAUTH endpoint that Amazon Kendra connects to get an OAUTH token.

          • consumerKey - The application public key generated when you created your Salesforce application.

          • consumerSecret - The application private key generated when you created your Salesforce application.

          • password - The password associated with the user logging in to the Salesforce instance.

          • securityToken - The token associated with the user account logging in to the Salesforce instance.

          • username - The user name of the user logging in to the Salesforce instance.

        • StandardObjectConfigurations (list) --

          Specifies the Salesforce standard objects that Amazon Kendra indexes.

          • (dict) --

            Specifies configuration information for indexing a single standard object.

            • Name (string) --

              The name of the standard object.

            • DocumentDataFieldName (string) --

              The name of the field in the standard object table that contains the document contents.

            • DocumentTitleFieldName (string) --

              The name of the field in the standard object table that contains the document title.

            • FieldMappings (list) --

              One or more objects that map fields in the standard object to Amazon Kendra index fields. The index field must exist before you can map a Salesforce field to it.

              • (dict) --

                Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

                • DataSourceFieldName (string) --

                  The name of the column or attribute in the data source.

                • DateFieldFormat (string) --

                  The type of data stored in the column or attribute.

                • IndexFieldName (string) --

                  The name of the field in the index.

        • KnowledgeArticleConfiguration (dict) --

          Specifies configuration information for the knowledge article types that Amazon Kendra indexes. Amazon Kendra indexes standard knowledge articles and the standard fields of knowledge articles, or the custom fields of custom knowledge articles, but not both.

          • IncludedStates (list) --

            Specifies the document states that should be included when Amazon Kendra indexes knowledge articles. You must specify at least one state.

            • (string) --

          • StandardKnowledgeArticleTypeConfiguration (dict) --

            Provides configuration information for standard Salesforce knowledge articles.

            • DocumentDataFieldName (string) --

              The name of the field that contains the document data to index.

            • DocumentTitleFieldName (string) --

              The name of the field that contains the document title.

            • FieldMappings (list) --

              One or more objects that map fields in the knowledge article to Amazon Kendra index fields. The index field must exist before you can map a Salesforce field to it.

              • (dict) --

                Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

                • DataSourceFieldName (string) --

                  The name of the column or attribute in the data source.

                • DateFieldFormat (string) --

                  The type of data stored in the column or attribute.

                • IndexFieldName (string) --

                  The name of the field in the index.

          • CustomKnowledgeArticleTypeConfigurations (list) --

            Provides configuration information for custom Salesforce knowledge articles.

            • (dict) --

              Provides configuration information for indexing Salesforce custom articles.

              • Name (string) --

                The name of the configuration.

              • DocumentDataFieldName (string) --

                The name of the field in the custom knowledge article that contains the document data to index.

              • DocumentTitleFieldName (string) --

                The name of the field in the custom knowledge article that contains the document title.

              • FieldMappings (list) --

                One or more objects that map fields in the custom knowledge article to fields in the Amazon Kendra index.

                • (dict) --

                  Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

                  • DataSourceFieldName (string) --

                    The name of the column or attribute in the data source.

                  • DateFieldFormat (string) --

                    The type of data stored in the column or attribute.

                  • IndexFieldName (string) --

                    The name of the field in the index.

        • ChatterFeedConfiguration (dict) --

          Specifies configuration information for Salesforce chatter feeds.

          • DocumentDataFieldName (string) --

            The name of the column in the Salesforce FeedItem table that contains the content to index. Typically this is the Body column.

          • DocumentTitleFieldName (string) --

            The name of the column in the Salesforce FeedItem table that contains the title of the document. This is typically the Title column.

          • FieldMappings (list) --

            Maps fields from a Salesforce chatter feed into Amazon Kendra index fields.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) --

                The name of the field in the index.

          • IncludeFilterTypes (list) --

            Filters the documents in the feed based on status of the user. When you specify ACTIVE_USERS only documents from users who have an active account are indexed. When you specify STANDARD_USER only documents for Salesforce standard users are documented. You can specify both.

            • (string) --

        • CrawlAttachments (boolean) --

          Indicates whether Amazon Kendra should index attachments to Salesforce objects.

        • StandardObjectAttachmentConfiguration (dict) --

          Provides configuration information for processing attachments to Salesforce standard objects.

          • DocumentTitleFieldName (string) --

            The name of the field used for the document title.

          • FieldMappings (list) --

            One or more objects that map fields in attachments to Amazon Kendra index fields.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) --

                The name of the field in the index.

        • IncludeAttachmentFilePatterns (list) --

          A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

          The regex is applied to the name of the attached file.

          • (string) --

        • ExcludeAttachmentFilePatterns (list) --

          A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

          The regex is applied to the name of the attached file.

          • (string) --

      • OneDriveConfiguration (dict) --

        Provides configuration for data sources that connect to Microsoft OneDrive.

        • TenantDomain (string) --

          The Azure Active Directory domain of the organization.

        • SecretArn (string) --

          The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the user name and password to connect to OneDrive. The user namd should be the application ID for the OneDrive application, and the password is the application key for the OneDrive application.

        • OneDriveUsers (dict) --

          A list of user accounts whose documents should be indexed.

          • OneDriveUserList (list) --

            A list of users whose documents should be indexed. Specify the user names in email format, for example, username@tenantdomain. If you need to index the documents of more than 100 users, use the OneDriveUserS3Path field to specify the location of a file containing a list of users.

            • (string) --

          • OneDriveUserS3Path (dict) --

            The S3 bucket location of a file containing a list of users whose documents should be indexed.

            • Bucket (string) --

              The name of the S3 bucket that contains the file.

            • Key (string) --

              The name of the file.

        • InclusionPatterns (list) --

          A list of regular expression patterns. Documents that match the pattern are included in the index. Documents that don't match the pattern are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

          The exclusion pattern is applied to the file name.

          • (string) --

        • ExclusionPatterns (list) --

          List of regular expressions applied to documents. Items that match the exclusion pattern are not indexed. If you provide both an inclusion pattern and an exclusion pattern, any item that matches the exclusion pattern isn't indexed.

          The exclusion pattern is applied to the file name.

          • (string) --

        • FieldMappings (list) --

          A list of DataSourceToIndexFieldMapping objects that map Microsoft OneDrive fields to custom fields in the Amazon Kendra index. You must first create the index fields before you map OneDrive fields.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) --

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) --

              The name of the field in the index.

        • DisableLocalGroups (boolean) --

          A Boolean value that specifies whether local groups are disabled ( True) or enabled ( False).

      • ServiceNowConfiguration (dict) --

        Provides configuration for data sources that connect to ServiceNow instances.

        • HostUrl (string) --

          The ServiceNow instance that the data source connects to. The host endpoint should look like the following: {instance}.service-now.com.

        • SecretArn (string) --

          The Amazon Resource Name (ARN) of the Secrets Manager secret that contains the user name and password required to connect to the ServiceNow instance.

        • ServiceNowBuildVersion (string) --

          The identifier of the release that the ServiceNow host is running. If the host is not running the LONDON release, use OTHERS.

        • KnowledgeArticleConfiguration (dict) --

          Provides configuration information for crawling knowledge articles in the ServiceNow site.

          • CrawlAttachments (boolean) --

            Indicates whether Amazon Kendra should index attachments to knowledge articles.

          • IncludeAttachmentFilePatterns (list) --

            List of regular expressions applied to knowledge articles. Items that don't match the inclusion pattern are not indexed. The regex is applied to the field specified in the PatternTargetField.

            • (string) --

          • ExcludeAttachmentFilePatterns (list) --

            List of regular expressions applied to knowledge articles. Items that don't match the inclusion pattern are not indexed. The regex is applied to the field specified in the PatternTargetField

            • (string) --

          • DocumentDataFieldName (string) --

            The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.

          • DocumentTitleFieldName (string) --

            The name of the ServiceNow field that is mapped to the index document title field.

          • FieldMappings (list) --

            Mapping between ServiceNow fields and Amazon Kendra index fields. You must create the index field before you map the field.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) --

                The name of the field in the index.

          • FilterQuery (string) --

            A query that selects the knowledge articles to index. The query can return articles from multiple knowledge bases, and the knowledge bases can be public or private.

            The query string must be one generated by the ServiceNow console. For more information, see Specifying documents to index with a query.

        • ServiceCatalogConfiguration (dict) --

          Provides configuration information for crawling service catalogs in the ServiceNow site.

          • CrawlAttachments (boolean) --

            Indicates whether Amazon Kendra should crawl attachments to the service catalog items.

          • IncludeAttachmentFilePatterns (list) --

            A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

            The regex is applied to the file name of the attachment.

            • (string) --

          • ExcludeAttachmentFilePatterns (list) --

            A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

            The regex is applied to the file name of the attachment.

            • (string) --

          • DocumentDataFieldName (string) --

            The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.

          • DocumentTitleFieldName (string) --

            The name of the ServiceNow field that is mapped to the index document title field.

          • FieldMappings (list) --

            Mapping between ServiceNow fields and Amazon Kendra index fields. You must create the index field before you map the field.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) --

                The name of the field in the index.

        • AuthenticationType (string) --

          Determines the type of authentication used to connect to the ServiceNow instance. If you choose HTTP_BASIC, Amazon Kendra is authenticated using the user name and password provided in the Secrets Manager secret in the SecretArn field. When you choose OAUTH2, Amazon Kendra is authenticated using the OAuth token and secret provided in the Secrets Manager secret, and the user name and password are used to determine which information Amazon Kendra has access to.

          When you use OAUTH2 authentication, you must generate a token and a client secret using the ServiceNow console. For more information, see Using a ServiceNow data source.

      • ConfluenceConfiguration (dict) --

        Provides configuration information for connecting to a Confluence data source.

        • ServerUrl (string) --

          The URL of your Confluence instance. Use the full URL of the server. For example, https://server.example.com:port/. You can also use an IP address, for example, https://192.168.1.113/.

        • SecretArn (string) --

          The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key/value pairs required to connect to your Confluence server. The secret must contain a JSON structure with the following keys:

          • username - The user name or email address of a user with administrative privileges for the Confluence server.

          • password - The password associated with the user logging in to the Confluence server.

        • Version (string) --

          Specifies the version of the Confluence installation that you are connecting to.

        • SpaceConfiguration (dict) --

          Specifies configuration information for indexing Confluence spaces.

          • CrawlPersonalSpaces (boolean) --

            Specifies whether Amazon Kendra should index personal spaces. Users can add restrictions to items in personal spaces. If personal spaces are indexed, queries without user context information may return restricted items from a personal space in their results. For more information, see Filtering on user context.

          • CrawlArchivedSpaces (boolean) --

            Specifies whether Amazon Kendra should index archived spaces.

          • IncludeSpaces (list) --

            A list of space keys for Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are indexed. Spaces that aren't in the list aren't indexed. A space in the list must exist. Otherwise, Amazon Kendra logs an error when the data source is synchronized. If a space is in both the IncludeSpaces and the ExcludeSpaces list, the space is excluded.

            • (string) --

          • ExcludeSpaces (list) --

            A list of space keys of Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are not indexed. If a space is in both the ExcludeSpaces and the IncludeSpaces list, the space is excluded.

            • (string) --

          • SpaceFieldMappings (list) --

            Defines how space metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

            If you specify the SpaceFieldMappings parameter, you must specify at least one field mapping.

            • (dict) --

              Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

              You must first create the index field using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the field in the data source.

              • DateFieldFormat (string) --

                The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

              • IndexFieldName (string) --

                The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

        • PageConfiguration (dict) --

          Specifies configuration information for indexing Confluence pages.

          • PageFieldMappings (list) --

            Defines how page metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

            If you specify the PageFieldMappings parameter, you must specify at least one field mapping.

            • (dict) --

              Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

              You must first create the index field using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the field in the data source.

              • DateFieldFormat (string) --

                The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

              • IndexFieldName (string) --

                The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

        • BlogConfiguration (dict) --

          Specifies configuration information for indexing Confluence blogs.

          • BlogFieldMappings (list) --

            Defines how blog metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

            If you specify the BlogFieldMappings parameter, you must specify at least one field mapping.

            • (dict) --

              Defines the mapping between a blog field in the Confluence data source to a Amazon Kendra index field.

              You must first create the index field using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the field in the data source.

              • DateFieldFormat (string) --

                The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

              • IndexFieldName (string) --

                The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

        • AttachmentConfiguration (dict) --

          Specifies configuration information for indexing attachments to Confluence blogs and pages.

          • CrawlAttachments (boolean) --

            Indicates whether Amazon Kendra indexes attachments to the pages and blogs in the Confluence data source.

          • AttachmentFieldMappings (list) --

            Defines how attachment metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

            If you specify the AttachentFieldMappings parameter, you must specify at least one field mapping.

            • (dict) --

              Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

              You must first create the index field using the UpdateIndex operation.

              • DataSourceFieldName (string) --

                The name of the field in the data source.

                You must first create the index field using the UpdateIndex operation.

              • DateFieldFormat (string) --

                The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

              • IndexFieldName (string) --

                The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

        • VpcConfiguration (dict) --

          Specifies the information for connecting to an Amazon VPC.

          • SubnetIds (list) --

            A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

            • (string) --

          • SecurityGroupIds (list) --

            A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

            • (string) --

        • InclusionPatterns (list) --

          A list of regular expression patterns that apply to a URL on the Confluence server. An inclusion pattern can apply to a blog post, a page, a space, or an attachment. Items that match the patterns are included in the index. Items that don't match the pattern are excluded from the index. If an item matches both an inclusion pattern and an exclusion pattern, the item isn't included in the index.

          • (string) --

        • ExclusionPatterns (list) --

          A list of regular expression patterns that apply to a URL on the Confluence server. An exclusion pattern can apply to a blog post, a page, a space, or an attachment. Items that match the pattern are excluded from the index. Items that don't match the pattern are included in the index. If a item matches both an exclusion pattern and an inclusion pattern, the item isn't included in the index.

          • (string) --

      • GoogleDriveConfiguration (dict) --

        Provides configuration for data sources that connect to Google Drive.

        • SecretArn (string) --

          The Amazon Resource Name (ARN) of a Secrets Managersecret that contains the credentials required to connect to Google Drive. For more information, see Using a Google Workspace Drive data source.

        • InclusionPatterns (list) --

          A list of regular expression patterns that apply to path on Google Drive. Items that match the pattern are included in the index from both shared drives and users' My Drives. Items that don't match the pattern are excluded from the index. If an item matches both an inclusion pattern and an exclusion pattern, it is excluded from the index.

          • (string) --

        • ExclusionPatterns (list) --

          A list of regular expression patterns that apply to the path on Google Drive. Items that match the pattern are excluded from the index from both shared drives and users' My Drives. Items that don't match the pattern are included in the index. If an item matches both an exclusion pattern and an inclusion pattern, it is excluded from the index.

          • (string) --

        • FieldMappings (list) --

          Defines mapping between a field in the Google Drive and a Amazon Kendra index field.

          If you are using the console, you can define index fields when creating the mapping. If you are using the API, you must first create the field using the UpdateIndex operation.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) --

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) --

              The name of the field in the index.

        • ExcludeMimeTypes (list) --

          A list of MIME types to exclude from the index. All documents matching the specified MIME type are excluded.

          For a list of MIME types, see Using a Google Workspace Drive data source.

          • (string) --

        • ExcludeUserAccounts (list) --

          A list of email addresses of the users. Documents owned by these users are excluded from the index. Documents shared with excluded users are indexed unless they are excluded in another way.

          • (string) --

        • ExcludeSharedDrives (list) --

          A list of identifiers or shared drives to exclude from the index. All files and folders stored on the shared drive are excluded.

          • (string) --

      • WebCrawlerConfiguration (dict) --

        Provides the configuration information required for Amazon Kendra Web Crawler.

        • Urls (dict) --

          Specifies the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl.

          You can include website subdomains. You can list up to 100 seed URLs and up to three sitemap URLs.

          You can only crawl websites that use the secure communication protocol, Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling.

          When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own webpages, or webpages that you have authorization to index.

          • SeedUrlConfiguration (dict) --

            Provides the configuration of the seed or starting point URLs of the websites you want to crawl.

            You can choose to crawl only the website host names, or the website host names with subdomains, or the website host names with subdomains and other domains that the webpages link to.

            You can list up to 100 seed URLs.

            • SeedUrls (list) --

              The list of seed or starting point URLs of the websites you want to crawl.

              The list can include a maximum of 100 seed URLs.

              • (string) --

            • WebCrawlerMode (string) --

              You can choose one of the following modes:

              • HOST_ONLY – crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

              • SUBDOMAINS – crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

              • EVERYTHING – crawl the website host names with subdomains and other domains that the webpages link to.

              The default mode is set to HOST_ONLY.

          • SiteMapsConfiguration (dict) --

            Provides the configuration of the sitemap URLs of the websites you want to crawl.

            Only URLs belonging to the same website host names are crawled. You can list up to three sitemap URLs.

            • SiteMaps (list) --

              The list of sitemap URLs of the websites you want to crawl.

              The list can include a maximum of three sitemap URLs.

              • (string) --

        • CrawlDepth (integer) --

          Specifies the number of levels in a website that you want to crawl.

          The first level begins from the website seed or starting point URL. For example, if a website has 3 levels – index level (i.e. seed in this example), sections level, and subsections level – and you are only interested in crawling information up to the sections level (i.e. levels 0-1), you can set your depth to 1.

          The default crawl depth is set to 2.

        • MaxLinksPerPage (integer) --

          The maximum number of URLs on a webpage to include when crawling a website. This number is per webpage.

          As a website’s webpages are crawled, any URLs the webpages link to are also crawled. URLs on a webpage are crawled in order of appearance.

          The default maximum links per page is 100.

        • MaxContentSizePerPageInMegaBytes (float) --

          The maximum size (in MB) of a webpage or attachment to crawl.

          Files larger than this size (in MB) are skipped/not crawled.

          The default maximum size of a webpage or attachment is set to 50 MB.

        • MaxUrlsPerMinuteCrawlRate (integer) --

          The maximum number of URLs crawled per website host per minute.

          A minimum of one URL is required.

          The default maximum number of URLs crawled per website host per minute is 300.

        • UrlInclusionPatterns (list) --

          The regular expression pattern to include certain URLs to crawl.

          If there is a regular expression pattern to exclude certain URLs that conflicts with the include pattern, the exclude pattern takes precedence.

          • (string) --

        • UrlExclusionPatterns (list) --

          The regular expression pattern to exclude certain URLs to crawl.

          If there is a regular expression pattern to include certain URLs that conflicts with the exclude pattern, the exclude pattern takes precedence.

          • (string) --

        • ProxyConfiguration (dict) --

          Provides configuration information required to connect to your internal websites via a web proxy.

          You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.

          Web proxy credentials are optional and you can use them to connect to a web proxy server that requires basic authentication. To store web proxy credentials, you use a secret in Secrets Manager.

          • Host (string) --

            The name of the website host you want to connect to via a web proxy server.

            For example, the host name of https://a.example.com/page1.html is "a.example.com".

          • Port (integer) --

            The port number of the website host you want to connect to via a web proxy server.

            For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.

          • Credentials (string) --

            Your secret ARN, which you can create in Secrets Manager

            The credentials are optional. You use a secret if web proxy credentials are required to connect to a website host. Amazon Kendra currently support basic authentication to connect to a web proxy server. The secret stores your credentials.

        • AuthenticationConfiguration (dict) --

          Provides configuration information required to connect to websites using authentication.

          You can connect to websites using basic authentication of user name and password.

          You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS. You use a secret in Secrets Manager to store your authentication credentials.

          • BasicAuthentication (list) --

            The list of configuration information that's required to connect to and crawl a website host using basic authentication credentials.

            The list includes the name and port number of the website host.

            • (dict) --

              Provides the configuration information to connect to websites that require basic user authentication.

              • Host (string) --

                The name of the website host you want to connect to using authentication credentials.

                For example, the host name of https://a.example.com/page1.html is "a.example.com".

              • Port (integer) --

                The port number of the website host you want to connect to using authentication credentials.

                For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.

              • Credentials (string) --

                Your secret ARN, which you can create in Secrets Manager

                You use a secret if basic authentication credentials are required to connect to a website. The secret stores your credentials of user name and password.

      • WorkDocsConfiguration (dict) --

        Provides the configuration information to connect to WorkDocs as your data source.

        • OrganizationId (string) --

          The identifier of the directory corresponding to your Amazon WorkDocs site repository.

          You can find the organization ID in the Directory Service by going to Active Directory, then Directories. Your Amazon WorkDocs site directory has an ID, which is the organization ID. You can also set up a new Amazon WorkDocs directory in the Directory Service console and enable a Amazon WorkDocs site for the directory in the Amazon WorkDocs console.

        • CrawlComments (boolean) --

          TRUE to include comments on documents in your index. Including comments in your index means each comment is a document that can be searched on.

          The default is set to FALSE.

        • UseChangeLog (boolean) --

          TRUE to use the change logs to update documents in your index instead of scanning all documents.

          If you are syncing your Amazon WorkDocs data source with your index for the first time, all documents are scanned. After your first sync, you can use the change logs to update your documents in your index for future syncs.

          The default is set to FALSE.

        • InclusionPatterns (list) --

          A list of regular expression patterns to include certain files in your Amazon WorkDocs site repository. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn’t included in the index.

          • (string) --

        • ExclusionPatterns (list) --

          A list of regular expression patterns to exclude certain files in your Amazon WorkDocs site repository. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn’t included in the index.

          • (string) --

        • FieldMappings (list) --

          A list of DataSourceToIndexFieldMapping objects that map Amazon WorkDocs field names to custom index field names in Amazon Kendra. You must first create the custom index fields using the UpdateIndex operation before you map to Amazon WorkDocs fields. For more information, see Mapping Data Source Fields. The Amazon WorkDocs data source field names need to exist in your Amazon WorkDocs custom metadata.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) --

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) --

              The name of the field in the index.

    • CreatedAt (datetime) --

      The Unix timestamp of when the data source was created.

    • UpdatedAt (datetime) --

      The Unix timestamp of when the data source was last updated.

    • Description (string) --

      The description of the data source.

    • Status (string) --

      The current status of the data source. When the status is ACTIVE the data source is ready to use. When the status is FAILED, the ErrorMessage field contains the reason that the data source failed.

    • Schedule (string) --

      The schedule that Amazon Kendra will update the data source.

    • RoleArn (string) --

      The Amazon Resource Name (ARN) of the role that enables the data source to access its resources.

    • ErrorMessage (string) --

      When the Status field value is FAILED, the ErrorMessage field contains a description of the error that caused the data source to fail.

    • LanguageCode (string) --

      The code for a language. This shows a supported language for all documents in the data source. English is supported by default. For more information on supported languages, including their codes, see Adding documents in languages other than English.

    • CustomDocumentEnrichmentConfiguration (dict) --

      Configuration information for altering document metadata and content during the document ingestion process when you describe a data source.

      For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

      • InlineConfigurations (list) --

        Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.

        • (dict) --

          Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.

          For more information, see Customizing document metadata during the ingestion process.

          • Condition (dict) --

            Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.

            • ConditionDocumentAttributeKey (string) --

              The identifier of the document attribute used for the condition.

              For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

              Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

            • Operator (string) --

              The condition operator.

              For example, you can use 'Contains' to partially match a string.

            • ConditionOnValue (dict) --

              The value used by the operator.

              For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

              • StringValue (string) --

                A string, such as "department".

              • StringListValue (list) --

                A list of strings.

                • (string) --

              • LongValue (integer) --

                A long integer value.

              • DateValue (datetime) --

                A date expressed as an ISO 8601 string.

                It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

          • Target (dict) --

            Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.

            • TargetDocumentAttributeKey (string) --

              The identifier of the target document attribute or metadata field.

              For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.

            • TargetDocumentAttributeValueDeletion (boolean) --

              TRUE to delete the existing target value for your specified target attribute key. You cannot create a target value and set this to TRUE. To create a target value ( TargetDocumentAttributeValue), set this to FALSE.

            • TargetDocumentAttributeValue (dict) --

              The target value you want to create for the target attribute.

              For example, 'Finance' could be the target value for the target attribute key 'Department'.

              • StringValue (string) --

                A string, such as "department".

              • StringListValue (list) --

                A list of strings.

                • (string) --

              • LongValue (integer) --

                A long integer value.

              • DateValue (datetime) --

                A date expressed as an ISO 8601 string.

                It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

          • DocumentContentDeletion (boolean) --

            TRUE to delete content if the condition used for the target attribute is met.

      • PreExtractionHookConfiguration (dict) --

        Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

        • InvocationCondition (dict) --

          The condition used for when a Lambda function should be invoked.

          For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

          • ConditionDocumentAttributeKey (string) --

            The identifier of the document attribute used for the condition.

            For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

            Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

          • Operator (string) --

            The condition operator.

            For example, you can use 'Contains' to partially match a string.

          • ConditionOnValue (dict) --

            The value used by the operator.

            For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

            • StringValue (string) --

              A string, such as "department".

            • StringListValue (list) --

              A list of strings.

              • (string) --

            • LongValue (integer) --

              A long integer value.

            • DateValue (datetime) --

              A date expressed as an ISO 8601 string.

              It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

        • LambdaArn (string) --

          The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

        • S3Bucket (string) --

          Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

      • PostExtractionHookConfiguration (dict) --

        Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

        • InvocationCondition (dict) --

          The condition used for when a Lambda function should be invoked.

          For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

          • ConditionDocumentAttributeKey (string) --

            The identifier of the document attribute used for the condition.

            For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

            Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

          • Operator (string) --

            The condition operator.

            For example, you can use 'Contains' to partially match a string.

          • ConditionOnValue (dict) --

            The value used by the operator.

            For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

            • StringValue (string) --

              A string, such as "department".

            • StringListValue (list) --

              A list of strings.

              • (string) --

            • LongValue (integer) --

              A long integer value.

            • DateValue (datetime) --

              A date expressed as an ISO 8601 string.

              It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

        • LambdaArn (string) --

          The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

        • S3Bucket (string) --

          Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

      • RoleArn (string) --

        The Amazon Resource Name (ARN) of a role with permission to run PreExtractionHookConfiguration and PostExtractionHookConfiguration for altering document metadata and content during the document ingestion process. For more information, see IAM roles for Amazon Kendra.

UpdateDataSource (updated) Link ¶
Changes (request)
{'CustomDocumentEnrichmentConfiguration': {'InlineConfigurations': [{'Condition': {'ConditionDocumentAttributeKey': 'string',
                                                                                   'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                        'LongValue': 'long',
                                                                                                        'StringListValue': ['string'],
                                                                                                        'StringValue': 'string'},
                                                                                   'Operator': 'GreaterThan '
                                                                                               '| '
                                                                                               'GreaterThanOrEquals '
                                                                                               '| '
                                                                                               'LessThan '
                                                                                               '| '
                                                                                               'LessThanOrEquals '
                                                                                               '| '
                                                                                               'Equals '
                                                                                               '| '
                                                                                               'NotEquals '
                                                                                               '| '
                                                                                               'Contains '
                                                                                               '| '
                                                                                               'NotContains '
                                                                                               '| '
                                                                                               'Exists '
                                                                                               '| '
                                                                                               'NotExists '
                                                                                               '| '
                                                                                               'BeginsWith'},
                                                                     'DocumentContentDeletion': 'boolean',
                                                                     'Target': {'TargetDocumentAttributeKey': 'string',
                                                                                'TargetDocumentAttributeValue': {'DateValue': 'timestamp',
                                                                                                                 'LongValue': 'long',
                                                                                                                 'StringListValue': ['string'],
                                                                                                                 'StringValue': 'string'},
                                                                                'TargetDocumentAttributeValueDeletion': 'boolean'}}],
                                           'PostExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                       'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                            'LongValue': 'long',
                                                                                                                            'StringListValue': ['string'],
                                                                                                                            'StringValue': 'string'},
                                                                                                       'Operator': 'GreaterThan '
                                                                                                                   '| '
                                                                                                                   'GreaterThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'LessThan '
                                                                                                                   '| '
                                                                                                                   'LessThanOrEquals '
                                                                                                                   '| '
                                                                                                                   'Equals '
                                                                                                                   '| '
                                                                                                                   'NotEquals '
                                                                                                                   '| '
                                                                                                                   'Contains '
                                                                                                                   '| '
                                                                                                                   'NotContains '
                                                                                                                   '| '
                                                                                                                   'Exists '
                                                                                                                   '| '
                                                                                                                   'NotExists '
                                                                                                                   '| '
                                                                                                                   'BeginsWith'},
                                                                               'LambdaArn': 'string',
                                                                               'S3Bucket': 'string'},
                                           'PreExtractionHookConfiguration': {'InvocationCondition': {'ConditionDocumentAttributeKey': 'string',
                                                                                                      'ConditionOnValue': {'DateValue': 'timestamp',
                                                                                                                           'LongValue': 'long',
                                                                                                                           'StringListValue': ['string'],
                                                                                                                           'StringValue': 'string'},
                                                                                                      'Operator': 'GreaterThan '
                                                                                                                  '| '
                                                                                                                  'GreaterThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'LessThan '
                                                                                                                  '| '
                                                                                                                  'LessThanOrEquals '
                                                                                                                  '| '
                                                                                                                  'Equals '
                                                                                                                  '| '
                                                                                                                  'NotEquals '
                                                                                                                  '| '
                                                                                                                  'Contains '
                                                                                                                  '| '
                                                                                                                  'NotContains '
                                                                                                                  '| '
                                                                                                                  'Exists '
                                                                                                                  '| '
                                                                                                                  'NotExists '
                                                                                                                  '| '
                                                                                                                  'BeginsWith'},
                                                                              'LambdaArn': 'string',
                                                                              'S3Bucket': 'string'},
                                           'RoleArn': 'string'}}

Updates an existing Amazon Kendra data source.

See also: AWS API Documentation

Request Syntax

client.update_data_source(
    Id='string',
    Name='string',
    IndexId='string',
    Configuration={
        'S3Configuration': {
            'BucketName': 'string',
            'InclusionPrefixes': [
                'string',
            ],
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'DocumentsMetadataConfiguration': {
                'S3Prefix': 'string'
            },
            'AccessControlListConfiguration': {
                'KeyPath': 'string'
            }
        },
        'SharePointConfiguration': {
            'SharePointVersion': 'SHAREPOINT_2013'|'SHAREPOINT_2016'|'SHAREPOINT_ONLINE',
            'Urls': [
                'string',
            ],
            'SecretArn': 'string',
            'CrawlAttachments': True|False,
            'UseChangeLog': True|False,
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'DocumentTitleFieldName': 'string',
            'DisableLocalGroups': True|False,
            'SslCertificateS3Path': {
                'Bucket': 'string',
                'Key': 'string'
            }
        },
        'DatabaseConfiguration': {
            'DatabaseEngineType': 'RDS_AURORA_MYSQL'|'RDS_AURORA_POSTGRESQL'|'RDS_MYSQL'|'RDS_POSTGRESQL',
            'ConnectionConfiguration': {
                'DatabaseHost': 'string',
                'DatabasePort': 123,
                'DatabaseName': 'string',
                'TableName': 'string',
                'SecretArn': 'string'
            },
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'ColumnConfiguration': {
                'DocumentIdColumnName': 'string',
                'DocumentDataColumnName': 'string',
                'DocumentTitleColumnName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'ChangeDetectingColumns': [
                    'string',
                ]
            },
            'AclConfiguration': {
                'AllowedGroupsColumnName': 'string'
            },
            'SqlConfiguration': {
                'QueryIdentifiersEnclosingOption': 'DOUBLE_QUOTES'|'NONE'
            }
        },
        'SalesforceConfiguration': {
            'ServerUrl': 'string',
            'SecretArn': 'string',
            'StandardObjectConfigurations': [
                {
                    'Name': 'ACCOUNT'|'CAMPAIGN'|'CASE'|'CONTACT'|'CONTRACT'|'DOCUMENT'|'GROUP'|'IDEA'|'LEAD'|'OPPORTUNITY'|'PARTNER'|'PRICEBOOK'|'PRODUCT'|'PROFILE'|'SOLUTION'|'TASK'|'USER',
                    'DocumentDataFieldName': 'string',
                    'DocumentTitleFieldName': 'string',
                    'FieldMappings': [
                        {
                            'DataSourceFieldName': 'string',
                            'DateFieldFormat': 'string',
                            'IndexFieldName': 'string'
                        },
                    ]
                },
            ],
            'KnowledgeArticleConfiguration': {
                'IncludedStates': [
                    'DRAFT'|'PUBLISHED'|'ARCHIVED',
                ],
                'StandardKnowledgeArticleTypeConfiguration': {
                    'DocumentDataFieldName': 'string',
                    'DocumentTitleFieldName': 'string',
                    'FieldMappings': [
                        {
                            'DataSourceFieldName': 'string',
                            'DateFieldFormat': 'string',
                            'IndexFieldName': 'string'
                        },
                    ]
                },
                'CustomKnowledgeArticleTypeConfigurations': [
                    {
                        'Name': 'string',
                        'DocumentDataFieldName': 'string',
                        'DocumentTitleFieldName': 'string',
                        'FieldMappings': [
                            {
                                'DataSourceFieldName': 'string',
                                'DateFieldFormat': 'string',
                                'IndexFieldName': 'string'
                            },
                        ]
                    },
                ]
            },
            'ChatterFeedConfiguration': {
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'IncludeFilterTypes': [
                    'ACTIVE_USER'|'STANDARD_USER',
                ]
            },
            'CrawlAttachments': True|False,
            'StandardObjectAttachmentConfiguration': {
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'IncludeAttachmentFilePatterns': [
                'string',
            ],
            'ExcludeAttachmentFilePatterns': [
                'string',
            ]
        },
        'OneDriveConfiguration': {
            'TenantDomain': 'string',
            'SecretArn': 'string',
            'OneDriveUsers': {
                'OneDriveUserList': [
                    'string',
                ],
                'OneDriveUserS3Path': {
                    'Bucket': 'string',
                    'Key': 'string'
                }
            },
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'DisableLocalGroups': True|False
        },
        'ServiceNowConfiguration': {
            'HostUrl': 'string',
            'SecretArn': 'string',
            'ServiceNowBuildVersion': 'LONDON'|'OTHERS',
            'KnowledgeArticleConfiguration': {
                'CrawlAttachments': True|False,
                'IncludeAttachmentFilePatterns': [
                    'string',
                ],
                'ExcludeAttachmentFilePatterns': [
                    'string',
                ],
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ],
                'FilterQuery': 'string'
            },
            'ServiceCatalogConfiguration': {
                'CrawlAttachments': True|False,
                'IncludeAttachmentFilePatterns': [
                    'string',
                ],
                'ExcludeAttachmentFilePatterns': [
                    'string',
                ],
                'DocumentDataFieldName': 'string',
                'DocumentTitleFieldName': 'string',
                'FieldMappings': [
                    {
                        'DataSourceFieldName': 'string',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'AuthenticationType': 'HTTP_BASIC'|'OAUTH2'
        },
        'ConfluenceConfiguration': {
            'ServerUrl': 'string',
            'SecretArn': 'string',
            'Version': 'CLOUD'|'SERVER',
            'SpaceConfiguration': {
                'CrawlPersonalSpaces': True|False,
                'CrawlArchivedSpaces': True|False,
                'IncludeSpaces': [
                    'string',
                ],
                'ExcludeSpaces': [
                    'string',
                ],
                'SpaceFieldMappings': [
                    {
                        'DataSourceFieldName': 'DISPLAY_URL'|'ITEM_TYPE'|'SPACE_KEY'|'URL',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'PageConfiguration': {
                'PageFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'CONTENT_STATUS'|'CREATED_DATE'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'MODIFIED_DATE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'BlogConfiguration': {
                'BlogFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'DISPLAY_URL'|'ITEM_TYPE'|'LABELS'|'PUBLISH_DATE'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'AttachmentConfiguration': {
                'CrawlAttachments': True|False,
                'AttachmentFieldMappings': [
                    {
                        'DataSourceFieldName': 'AUTHOR'|'CONTENT_TYPE'|'CREATED_DATE'|'DISPLAY_URL'|'FILE_SIZE'|'ITEM_TYPE'|'PARENT_ID'|'SPACE_KEY'|'SPACE_NAME'|'URL'|'VERSION',
                        'DateFieldFormat': 'string',
                        'IndexFieldName': 'string'
                    },
                ]
            },
            'VpcConfiguration': {
                'SubnetIds': [
                    'string',
                ],
                'SecurityGroupIds': [
                    'string',
                ]
            },
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ]
        },
        'GoogleDriveConfiguration': {
            'SecretArn': 'string',
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ],
            'ExcludeMimeTypes': [
                'string',
            ],
            'ExcludeUserAccounts': [
                'string',
            ],
            'ExcludeSharedDrives': [
                'string',
            ]
        },
        'WebCrawlerConfiguration': {
            'Urls': {
                'SeedUrlConfiguration': {
                    'SeedUrls': [
                        'string',
                    ],
                    'WebCrawlerMode': 'HOST_ONLY'|'SUBDOMAINS'|'EVERYTHING'
                },
                'SiteMapsConfiguration': {
                    'SiteMaps': [
                        'string',
                    ]
                }
            },
            'CrawlDepth': 123,
            'MaxLinksPerPage': 123,
            'MaxContentSizePerPageInMegaBytes': ...,
            'MaxUrlsPerMinuteCrawlRate': 123,
            'UrlInclusionPatterns': [
                'string',
            ],
            'UrlExclusionPatterns': [
                'string',
            ],
            'ProxyConfiguration': {
                'Host': 'string',
                'Port': 123,
                'Credentials': 'string'
            },
            'AuthenticationConfiguration': {
                'BasicAuthentication': [
                    {
                        'Host': 'string',
                        'Port': 123,
                        'Credentials': 'string'
                    },
                ]
            }
        },
        'WorkDocsConfiguration': {
            'OrganizationId': 'string',
            'CrawlComments': True|False,
            'UseChangeLog': True|False,
            'InclusionPatterns': [
                'string',
            ],
            'ExclusionPatterns': [
                'string',
            ],
            'FieldMappings': [
                {
                    'DataSourceFieldName': 'string',
                    'DateFieldFormat': 'string',
                    'IndexFieldName': 'string'
                },
            ]
        }
    },
    Description='string',
    Schedule='string',
    RoleArn='string',
    LanguageCode='string',
    CustomDocumentEnrichmentConfiguration={
        'InlineConfigurations': [
            {
                'Condition': {
                    'ConditionDocumentAttributeKey': 'string',
                    'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                    'ConditionOnValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'Target': {
                    'TargetDocumentAttributeKey': 'string',
                    'TargetDocumentAttributeValueDeletion': True|False,
                    'TargetDocumentAttributeValue': {
                        'StringValue': 'string',
                        'StringListValue': [
                            'string',
                        ],
                        'LongValue': 123,
                        'DateValue': datetime(2015, 1, 1)
                    }
                },
                'DocumentContentDeletion': True|False
            },
        ],
        'PreExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'PostExtractionHookConfiguration': {
            'InvocationCondition': {
                'ConditionDocumentAttributeKey': 'string',
                'Operator': 'GreaterThan'|'GreaterThanOrEquals'|'LessThan'|'LessThanOrEquals'|'Equals'|'NotEquals'|'Contains'|'NotContains'|'Exists'|'NotExists'|'BeginsWith',
                'ConditionOnValue': {
                    'StringValue': 'string',
                    'StringListValue': [
                        'string',
                    ],
                    'LongValue': 123,
                    'DateValue': datetime(2015, 1, 1)
                }
            },
            'LambdaArn': 'string',
            'S3Bucket': 'string'
        },
        'RoleArn': 'string'
    }
)
type Id:

string

param Id:

[REQUIRED]

The unique identifier of the data source to update.

type Name:

string

param Name:

The name of the data source to update. The name of the data source can't be updated. To rename a data source you must delete the data source and re-create it.

type IndexId:

string

param IndexId:

[REQUIRED]

The identifier of the index that contains the data source to update.

type Configuration:

dict

param Configuration:

Configuration information for an Amazon Kendra data source.

  • S3Configuration (dict) --

    Provides information to create a data source connector for a document repository in an Amazon S3 bucket.

    • BucketName (string) -- [REQUIRED]

      The name of the bucket that contains the documents.

    • InclusionPrefixes (list) --

      A list of S3 prefixes for the documents that should be included in the index.

      • (string) --

    • InclusionPatterns (list) --

      A list of glob patterns for documents that should be indexed. If a document that matches an inclusion pattern also matches an exclusion pattern, the document is not indexed.

      Some examples are:

      • **.txt* will include all text files in a directory (files with the extension .txt).

      • **/.txt* will include all text files in a directory and its subdirectories.

      • tax will include all files in a directory that contain 'tax' in the file name, such as 'tax', 'taxes', 'income_tax'.

      • (string) --

    • ExclusionPatterns (list) --

      A list of glob patterns for documents that should not be indexed. If a document that matches an inclusion prefix or inclusion pattern also matches an exclusion pattern, the document is not indexed.

      Some examples are:

      • **.png , .jpg will exclude all PNG and JPEG image files in a directory (files with the extensions .png and .jpg).

      • internal will exclude all files in a directory that contain 'internal' in the file name, such as 'internal', 'internal_only', 'company_internal'.

      • */*internal will exclude all internal-related files in a directory and its subdirectories.

      • (string) --

    • DocumentsMetadataConfiguration (dict) --

      Document metadata files that contain information such as the document access control information, source URI, document author, and custom attributes. Each metadata file contains metadata about a single document.

      • S3Prefix (string) --

        A prefix used to filter metadata configuration files in the Amazon Web Services S3 bucket. The S3 bucket might contain multiple metadata files. Use S3Prefix to include only the desired metadata files.

    • AccessControlListConfiguration (dict) --

      Provides the path to the S3 bucket that contains the user context filtering files for the data source. For the format of the file, see Access control for S3 data sources.

      • KeyPath (string) --

        Path to the Amazon Web Services S3 bucket that contains the ACL files.

  • SharePointConfiguration (dict) --

    Provides information necessary to create a data source connector for a Microsoft SharePoint site.

    • SharePointVersion (string) -- [REQUIRED]

      The version of Microsoft SharePoint that you are using as a data source.

    • Urls (list) -- [REQUIRED]

      The URLs of the Microsoft SharePoint site that contains the documents that should be indexed.

      • (string) --

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. If you use SharePoint Server, you also need to provide the sever domain name as part of the credentials. For more information, see Using a Microsoft SharePoint Data Source. For more information about Secrets Manager see What Is Secrets Manager in the Secrets Manager user guide.

    • CrawlAttachments (boolean) --

      TRUE to include attachments to documents stored in your Microsoft SharePoint site in the index; otherwise, FALSE.

    • UseChangeLog (boolean) --

      Set to TRUE to use the Microsoft SharePoint change log to determine the documents that need to be updated in the index. Depending on the size of the SharePoint change log, it may take longer for Amazon Kendra to use the change log than it takes it to determine the changed documents using the Amazon Kendra document crawler.

    • InclusionPatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

      The regex is applied to the display URL of the SharePoint document.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

      The regex is applied to the display URL of the SharePoint document.

      • (string) --

    • VpcConfiguration (dict) --

      Provides information for connecting to an Amazon VPC.

      • SubnetIds (list) -- [REQUIRED]

        A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

        • (string) --

      • SecurityGroupIds (list) -- [REQUIRED]

        A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

        • (string) --

    • FieldMappings (list) --

      A list of DataSourceToIndexFieldMapping objects that map Microsoft SharePoint attributes to custom fields in the Amazon Kendra index. You must first create the index fields using the UpdateIndex operation before you map SharePoint attributes. For more information, see Mapping Data Source Fields.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

    • DocumentTitleFieldName (string) --

      The Microsoft SharePoint attribute field that contains the title of the document.

    • DisableLocalGroups (boolean) --

      A Boolean value that specifies whether local groups are disabled ( True) or enabled ( False).

    • SslCertificateS3Path (dict) --

      Information required to find a specific file in an Amazon S3 bucket.

      • Bucket (string) -- [REQUIRED]

        The name of the S3 bucket that contains the file.

      • Key (string) -- [REQUIRED]

        The name of the file.

  • DatabaseConfiguration (dict) --

    Provides information necessary to create a data source connector for a database.

    • DatabaseEngineType (string) -- [REQUIRED]

      The type of database engine that runs the database.

    • ConnectionConfiguration (dict) -- [REQUIRED]

      The information necessary to connect to a database.

      • DatabaseHost (string) -- [REQUIRED]

        The name of the host for the database. Can be either a string (host.subdomain.domain.tld) or an IPv4 or IPv6 address.

      • DatabasePort (integer) -- [REQUIRED]

        The port that the database uses for connections.

      • DatabaseName (string) -- [REQUIRED]

        The name of the database containing the document data.

      • TableName (string) -- [REQUIRED]

        The name of the table that contains the document data.

      • SecretArn (string) -- [REQUIRED]

        The Amazon Resource Name (ARN) of credentials stored in Secrets Manager. The credentials should be a user/password pair. For more information, see Using a Database Data Source. For more information about Secrets Manager, see What Is Secrets Manager in the Secrets Manager user guide.

    • VpcConfiguration (dict) --

      Provides information for connecting to an Amazon VPC.

      • SubnetIds (list) -- [REQUIRED]

        A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

        • (string) --

      • SecurityGroupIds (list) -- [REQUIRED]

        A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

        • (string) --

    • ColumnConfiguration (dict) -- [REQUIRED]

      Information about where the index should get the document information from the database.

      • DocumentIdColumnName (string) -- [REQUIRED]

        The column that provides the document's unique identifier.

      • DocumentDataColumnName (string) -- [REQUIRED]

        The column that contains the contents of the document.

      • DocumentTitleColumnName (string) --

        The column that contains the title of the document.

      • FieldMappings (list) --

        An array of objects that map database column names to the corresponding fields in an index. You must first create the fields in the index using the UpdateIndex operation.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

      • ChangeDetectingColumns (list) -- [REQUIRED]

        One to five columns that indicate when a document in the database has changed.

        • (string) --

    • AclConfiguration (dict) --

      Information about the database column that provides information for user context filtering.

      • AllowedGroupsColumnName (string) -- [REQUIRED]

        A list of groups, separated by semi-colons, that filters a query response based on user context. The document is only returned to users that are in one of the groups specified in the UserContext field of the Query operation.

    • SqlConfiguration (dict) --

      Provides information about how Amazon Kendra uses quote marks around SQL identifiers when querying a database data source.

      • QueryIdentifiersEnclosingOption (string) --

        Determines whether Amazon Kendra encloses SQL identifiers for tables and column names in double quotes (") when making a database query.

        By default, Amazon Kendra passes SQL identifiers the way that they are entered into the data source configuration. It does not change the case of identifiers or enclose them in quotes.

        PostgreSQL internally converts uppercase characters to lower case characters in identifiers unless they are quoted. Choosing this option encloses identifiers in quotes so that PostgreSQL does not convert the character's case.

        For MySQL databases, you must enable the ansi_quotes option when you set this field to DOUBLE_QUOTES.

  • SalesforceConfiguration (dict) --

    Provides configuration information for data sources that connect to a Salesforce site.

    • ServerUrl (string) -- [REQUIRED]

      The instance URL for the Salesforce site that you want to index.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the key/value pairs required to connect to your Salesforce instance. The secret must contain a JSON structure with the following keys:

      • authenticationUrl - The OAUTH endpoint that Amazon Kendra connects to get an OAUTH token.

      • consumerKey - The application public key generated when you created your Salesforce application.

      • consumerSecret - The application private key generated when you created your Salesforce application.

      • password - The password associated with the user logging in to the Salesforce instance.

      • securityToken - The token associated with the user account logging in to the Salesforce instance.

      • username - The user name of the user logging in to the Salesforce instance.

    • StandardObjectConfigurations (list) --

      Specifies the Salesforce standard objects that Amazon Kendra indexes.

      • (dict) --

        Specifies configuration information for indexing a single standard object.

        • Name (string) -- [REQUIRED]

          The name of the standard object.

        • DocumentDataFieldName (string) -- [REQUIRED]

          The name of the field in the standard object table that contains the document contents.

        • DocumentTitleFieldName (string) --

          The name of the field in the standard object table that contains the document title.

        • FieldMappings (list) --

          One or more objects that map fields in the standard object to Amazon Kendra index fields. The index field must exist before you can map a Salesforce field to it.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) -- [REQUIRED]

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) -- [REQUIRED]

              The name of the field in the index.

    • KnowledgeArticleConfiguration (dict) --

      Specifies configuration information for the knowledge article types that Amazon Kendra indexes. Amazon Kendra indexes standard knowledge articles and the standard fields of knowledge articles, or the custom fields of custom knowledge articles, but not both.

      • IncludedStates (list) -- [REQUIRED]

        Specifies the document states that should be included when Amazon Kendra indexes knowledge articles. You must specify at least one state.

        • (string) --

      • StandardKnowledgeArticleTypeConfiguration (dict) --

        Provides configuration information for standard Salesforce knowledge articles.

        • DocumentDataFieldName (string) -- [REQUIRED]

          The name of the field that contains the document data to index.

        • DocumentTitleFieldName (string) --

          The name of the field that contains the document title.

        • FieldMappings (list) --

          One or more objects that map fields in the knowledge article to Amazon Kendra index fields. The index field must exist before you can map a Salesforce field to it.

          • (dict) --

            Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

            • DataSourceFieldName (string) -- [REQUIRED]

              The name of the column or attribute in the data source.

            • DateFieldFormat (string) --

              The type of data stored in the column or attribute.

            • IndexFieldName (string) -- [REQUIRED]

              The name of the field in the index.

      • CustomKnowledgeArticleTypeConfigurations (list) --

        Provides configuration information for custom Salesforce knowledge articles.

        • (dict) --

          Provides configuration information for indexing Salesforce custom articles.

          • Name (string) -- [REQUIRED]

            The name of the configuration.

          • DocumentDataFieldName (string) -- [REQUIRED]

            The name of the field in the custom knowledge article that contains the document data to index.

          • DocumentTitleFieldName (string) --

            The name of the field in the custom knowledge article that contains the document title.

          • FieldMappings (list) --

            One or more objects that map fields in the custom knowledge article to fields in the Amazon Kendra index.

            • (dict) --

              Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

              • DataSourceFieldName (string) -- [REQUIRED]

                The name of the column or attribute in the data source.

              • DateFieldFormat (string) --

                The type of data stored in the column or attribute.

              • IndexFieldName (string) -- [REQUIRED]

                The name of the field in the index.

    • ChatterFeedConfiguration (dict) --

      Specifies configuration information for Salesforce chatter feeds.

      • DocumentDataFieldName (string) -- [REQUIRED]

        The name of the column in the Salesforce FeedItem table that contains the content to index. Typically this is the Body column.

      • DocumentTitleFieldName (string) --

        The name of the column in the Salesforce FeedItem table that contains the title of the document. This is typically the Title column.

      • FieldMappings (list) --

        Maps fields from a Salesforce chatter feed into Amazon Kendra index fields.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

      • IncludeFilterTypes (list) --

        Filters the documents in the feed based on status of the user. When you specify ACTIVE_USERS only documents from users who have an active account are indexed. When you specify STANDARD_USER only documents for Salesforce standard users are documented. You can specify both.

        • (string) --

    • CrawlAttachments (boolean) --

      Indicates whether Amazon Kendra should index attachments to Salesforce objects.

    • StandardObjectAttachmentConfiguration (dict) --

      Provides configuration information for processing attachments to Salesforce standard objects.

      • DocumentTitleFieldName (string) --

        The name of the field used for the document title.

      • FieldMappings (list) --

        One or more objects that map fields in attachments to Amazon Kendra index fields.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

    • IncludeAttachmentFilePatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

      The regex is applied to the name of the attached file.

      • (string) --

    • ExcludeAttachmentFilePatterns (list) --

      A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

      The regex is applied to the name of the attached file.

      • (string) --

  • OneDriveConfiguration (dict) --

    Provides configuration for data sources that connect to Microsoft OneDrive.

    • TenantDomain (string) -- [REQUIRED]

      The Azure Active Directory domain of the organization.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of an Secrets Managersecret that contains the user name and password to connect to OneDrive. The user namd should be the application ID for the OneDrive application, and the password is the application key for the OneDrive application.

    • OneDriveUsers (dict) -- [REQUIRED]

      A list of user accounts whose documents should be indexed.

      • OneDriveUserList (list) --

        A list of users whose documents should be indexed. Specify the user names in email format, for example, username@tenantdomain. If you need to index the documents of more than 100 users, use the OneDriveUserS3Path field to specify the location of a file containing a list of users.

        • (string) --

      • OneDriveUserS3Path (dict) --

        The S3 bucket location of a file containing a list of users whose documents should be indexed.

        • Bucket (string) -- [REQUIRED]

          The name of the S3 bucket that contains the file.

        • Key (string) -- [REQUIRED]

          The name of the file.

    • InclusionPatterns (list) --

      A list of regular expression patterns. Documents that match the pattern are included in the index. Documents that don't match the pattern are excluded from the index. If a document matches both an inclusion pattern and an exclusion pattern, the document is not included in the index.

      The exclusion pattern is applied to the file name.

      • (string) --

    • ExclusionPatterns (list) --

      List of regular expressions applied to documents. Items that match the exclusion pattern are not indexed. If you provide both an inclusion pattern and an exclusion pattern, any item that matches the exclusion pattern isn't indexed.

      The exclusion pattern is applied to the file name.

      • (string) --

    • FieldMappings (list) --

      A list of DataSourceToIndexFieldMapping objects that map Microsoft OneDrive fields to custom fields in the Amazon Kendra index. You must first create the index fields before you map OneDrive fields.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

    • DisableLocalGroups (boolean) --

      A Boolean value that specifies whether local groups are disabled ( True) or enabled ( False).

  • ServiceNowConfiguration (dict) --

    Provides configuration for data sources that connect to ServiceNow instances.

    • HostUrl (string) -- [REQUIRED]

      The ServiceNow instance that the data source connects to. The host endpoint should look like the following: {instance}.service-now.com.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of the Secrets Manager secret that contains the user name and password required to connect to the ServiceNow instance.

    • ServiceNowBuildVersion (string) -- [REQUIRED]

      The identifier of the release that the ServiceNow host is running. If the host is not running the LONDON release, use OTHERS.

    • KnowledgeArticleConfiguration (dict) --

      Provides configuration information for crawling knowledge articles in the ServiceNow site.

      • CrawlAttachments (boolean) --

        Indicates whether Amazon Kendra should index attachments to knowledge articles.

      • IncludeAttachmentFilePatterns (list) --

        List of regular expressions applied to knowledge articles. Items that don't match the inclusion pattern are not indexed. The regex is applied to the field specified in the PatternTargetField.

        • (string) --

      • ExcludeAttachmentFilePatterns (list) --

        List of regular expressions applied to knowledge articles. Items that don't match the inclusion pattern are not indexed. The regex is applied to the field specified in the PatternTargetField

        • (string) --

      • DocumentDataFieldName (string) -- [REQUIRED]

        The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.

      • DocumentTitleFieldName (string) --

        The name of the ServiceNow field that is mapped to the index document title field.

      • FieldMappings (list) --

        Mapping between ServiceNow fields and Amazon Kendra index fields. You must create the index field before you map the field.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

      • FilterQuery (string) --

        A query that selects the knowledge articles to index. The query can return articles from multiple knowledge bases, and the knowledge bases can be public or private.

        The query string must be one generated by the ServiceNow console. For more information, see Specifying documents to index with a query.

    • ServiceCatalogConfiguration (dict) --

      Provides configuration information for crawling service catalogs in the ServiceNow site.

      • CrawlAttachments (boolean) --

        Indicates whether Amazon Kendra should crawl attachments to the service catalog items.

      • IncludeAttachmentFilePatterns (list) --

        A list of regular expression patterns. Documents that match the patterns are included in the index. Documents that don't match the patterns are excluded from the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

        The regex is applied to the file name of the attachment.

        • (string) --

      • ExcludeAttachmentFilePatterns (list) --

        A list of regular expression patterns. Documents that match the patterns are excluded from the index. Documents that don't match the patterns are included in the index. If a document matches both an exclusion pattern and an inclusion pattern, the document is not included in the index.

        The regex is applied to the file name of the attachment.

        • (string) --

      • DocumentDataFieldName (string) -- [REQUIRED]

        The name of the ServiceNow field that is mapped to the index document contents field in the Amazon Kendra index.

      • DocumentTitleFieldName (string) --

        The name of the ServiceNow field that is mapped to the index document title field.

      • FieldMappings (list) --

        Mapping between ServiceNow fields and Amazon Kendra index fields. You must create the index field before you map the field.

        • (dict) --

          Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

          • DataSourceFieldName (string) -- [REQUIRED]

            The name of the column or attribute in the data source.

          • DateFieldFormat (string) --

            The type of data stored in the column or attribute.

          • IndexFieldName (string) -- [REQUIRED]

            The name of the field in the index.

    • AuthenticationType (string) --

      Determines the type of authentication used to connect to the ServiceNow instance. If you choose HTTP_BASIC, Amazon Kendra is authenticated using the user name and password provided in the Secrets Manager secret in the SecretArn field. When you choose OAUTH2, Amazon Kendra is authenticated using the OAuth token and secret provided in the Secrets Manager secret, and the user name and password are used to determine which information Amazon Kendra has access to.

      When you use OAUTH2 authentication, you must generate a token and a client secret using the ServiceNow console. For more information, see Using a ServiceNow data source.

  • ConfluenceConfiguration (dict) --

    Provides configuration information for connecting to a Confluence data source.

    • ServerUrl (string) -- [REQUIRED]

      The URL of your Confluence instance. Use the full URL of the server. For example, https://server.example.com:port/. You can also use an IP address, for example, https://192.168.1.113/.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of an Secrets Manager secret that contains the key/value pairs required to connect to your Confluence server. The secret must contain a JSON structure with the following keys:

      • username - The user name or email address of a user with administrative privileges for the Confluence server.

      • password - The password associated with the user logging in to the Confluence server.

    • Version (string) -- [REQUIRED]

      Specifies the version of the Confluence installation that you are connecting to.

    • SpaceConfiguration (dict) --

      Specifies configuration information for indexing Confluence spaces.

      • CrawlPersonalSpaces (boolean) --

        Specifies whether Amazon Kendra should index personal spaces. Users can add restrictions to items in personal spaces. If personal spaces are indexed, queries without user context information may return restricted items from a personal space in their results. For more information, see Filtering on user context.

      • CrawlArchivedSpaces (boolean) --

        Specifies whether Amazon Kendra should index archived spaces.

      • IncludeSpaces (list) --

        A list of space keys for Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are indexed. Spaces that aren't in the list aren't indexed. A space in the list must exist. Otherwise, Amazon Kendra logs an error when the data source is synchronized. If a space is in both the IncludeSpaces and the ExcludeSpaces list, the space is excluded.

        • (string) --

      • ExcludeSpaces (list) --

        A list of space keys of Confluence spaces. If you include a key, the blogs, documents, and attachments in the space are not indexed. If a space is in both the ExcludeSpaces and the IncludeSpaces list, the space is excluded.

        • (string) --

      • SpaceFieldMappings (list) --

        Defines how space metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the SpaceFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • PageConfiguration (dict) --

      Specifies configuration information for indexing Confluence pages.

      • PageFieldMappings (list) --

        Defines how page metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the PageFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • BlogConfiguration (dict) --

      Specifies configuration information for indexing Confluence blogs.

      • BlogFieldMappings (list) --

        Defines how blog metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the BlogFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a blog field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • AttachmentConfiguration (dict) --

      Specifies configuration information for indexing attachments to Confluence blogs and pages.

      • CrawlAttachments (boolean) --

        Indicates whether Amazon Kendra indexes attachments to the pages and blogs in the Confluence data source.

      • AttachmentFieldMappings (list) --

        Defines how attachment metadata fields should be mapped to index fields. Before you can map a field, you must first create an index field with a matching type using the console or the UpdateIndex operation.

        If you specify the AttachentFieldMappings parameter, you must specify at least one field mapping.

        • (dict) --

          Defines the mapping between a field in the Confluence data source to a Amazon Kendra index field.

          You must first create the index field using the UpdateIndex operation.

          • DataSourceFieldName (string) --

            The name of the field in the data source.

            You must first create the index field using the UpdateIndex operation.

          • DateFieldFormat (string) --

            The format for date fields in the data source. If the field specified in DataSourceFieldName is a date field you must specify the date format. If the field is not a date field, an exception is thrown.

          • IndexFieldName (string) --

            The name of the index field to map to the Confluence data source field. The index field type must match the Confluence field type.

    • VpcConfiguration (dict) --

      Specifies the information for connecting to an Amazon VPC.

      • SubnetIds (list) -- [REQUIRED]

        A list of identifiers for subnets within your Amazon VPC. The subnets should be able to connect to each other in the VPC, and they should have outgoing access to the Internet through a NAT device.

        • (string) --

      • SecurityGroupIds (list) -- [REQUIRED]

        A list of identifiers of security groups within your Amazon VPC. The security groups should enable Amazon Kendra to connect to the data source.

        • (string) --

    • InclusionPatterns (list) --

      A list of regular expression patterns that apply to a URL on the Confluence server. An inclusion pattern can apply to a blog post, a page, a space, or an attachment. Items that match the patterns are included in the index. Items that don't match the pattern are excluded from the index. If an item matches both an inclusion pattern and an exclusion pattern, the item isn't included in the index.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns that apply to a URL on the Confluence server. An exclusion pattern can apply to a blog post, a page, a space, or an attachment. Items that match the pattern are excluded from the index. Items that don't match the pattern are included in the index. If a item matches both an exclusion pattern and an inclusion pattern, the item isn't included in the index.

      • (string) --

  • GoogleDriveConfiguration (dict) --

    Provides configuration for data sources that connect to Google Drive.

    • SecretArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a Secrets Managersecret that contains the credentials required to connect to Google Drive. For more information, see Using a Google Workspace Drive data source.

    • InclusionPatterns (list) --

      A list of regular expression patterns that apply to path on Google Drive. Items that match the pattern are included in the index from both shared drives and users' My Drives. Items that don't match the pattern are excluded from the index. If an item matches both an inclusion pattern and an exclusion pattern, it is excluded from the index.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns that apply to the path on Google Drive. Items that match the pattern are excluded from the index from both shared drives and users' My Drives. Items that don't match the pattern are included in the index. If an item matches both an exclusion pattern and an inclusion pattern, it is excluded from the index.

      • (string) --

    • FieldMappings (list) --

      Defines mapping between a field in the Google Drive and a Amazon Kendra index field.

      If you are using the console, you can define index fields when creating the mapping. If you are using the API, you must first create the field using the UpdateIndex operation.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

    • ExcludeMimeTypes (list) --

      A list of MIME types to exclude from the index. All documents matching the specified MIME type are excluded.

      For a list of MIME types, see Using a Google Workspace Drive data source.

      • (string) --

    • ExcludeUserAccounts (list) --

      A list of email addresses of the users. Documents owned by these users are excluded from the index. Documents shared with excluded users are indexed unless they are excluded in another way.

      • (string) --

    • ExcludeSharedDrives (list) --

      A list of identifiers or shared drives to exclude from the index. All files and folders stored on the shared drive are excluded.

      • (string) --

  • WebCrawlerConfiguration (dict) --

    Provides the configuration information required for Amazon Kendra Web Crawler.

    • Urls (dict) -- [REQUIRED]

      Specifies the seed or starting point URLs of the websites or the sitemap URLs of the websites you want to crawl.

      You can include website subdomains. You can list up to 100 seed URLs and up to three sitemap URLs.

      You can only crawl websites that use the secure communication protocol, Hypertext Transfer Protocol Secure (HTTPS). If you receive an error when crawling a website, it could be that the website is blocked from crawling.

      When selecting websites to index, you must adhere to the Amazon Acceptable Use Policy and all other Amazon terms. Remember that you must only use Amazon Kendra Web Crawler to index your own webpages, or webpages that you have authorization to index.

      • SeedUrlConfiguration (dict) --

        Provides the configuration of the seed or starting point URLs of the websites you want to crawl.

        You can choose to crawl only the website host names, or the website host names with subdomains, or the website host names with subdomains and other domains that the webpages link to.

        You can list up to 100 seed URLs.

        • SeedUrls (list) -- [REQUIRED]

          The list of seed or starting point URLs of the websites you want to crawl.

          The list can include a maximum of 100 seed URLs.

          • (string) --

        • WebCrawlerMode (string) --

          You can choose one of the following modes:

          • HOST_ONLY – crawl only the website host names. For example, if the seed URL is "abc.example.com", then only URLs with host name "abc.example.com" are crawled.

          • SUBDOMAINS – crawl the website host names with subdomains. For example, if the seed URL is "abc.example.com", then "a.abc.example.com" and "b.abc.example.com" are also crawled.

          • EVERYTHING – crawl the website host names with subdomains and other domains that the webpages link to.

          The default mode is set to HOST_ONLY.

      • SiteMapsConfiguration (dict) --

        Provides the configuration of the sitemap URLs of the websites you want to crawl.

        Only URLs belonging to the same website host names are crawled. You can list up to three sitemap URLs.

        • SiteMaps (list) -- [REQUIRED]

          The list of sitemap URLs of the websites you want to crawl.

          The list can include a maximum of three sitemap URLs.

          • (string) --

    • CrawlDepth (integer) --

      Specifies the number of levels in a website that you want to crawl.

      The first level begins from the website seed or starting point URL. For example, if a website has 3 levels – index level (i.e. seed in this example), sections level, and subsections level – and you are only interested in crawling information up to the sections level (i.e. levels 0-1), you can set your depth to 1.

      The default crawl depth is set to 2.

    • MaxLinksPerPage (integer) --

      The maximum number of URLs on a webpage to include when crawling a website. This number is per webpage.

      As a website’s webpages are crawled, any URLs the webpages link to are also crawled. URLs on a webpage are crawled in order of appearance.

      The default maximum links per page is 100.

    • MaxContentSizePerPageInMegaBytes (float) --

      The maximum size (in MB) of a webpage or attachment to crawl.

      Files larger than this size (in MB) are skipped/not crawled.

      The default maximum size of a webpage or attachment is set to 50 MB.

    • MaxUrlsPerMinuteCrawlRate (integer) --

      The maximum number of URLs crawled per website host per minute.

      A minimum of one URL is required.

      The default maximum number of URLs crawled per website host per minute is 300.

    • UrlInclusionPatterns (list) --

      The regular expression pattern to include certain URLs to crawl.

      If there is a regular expression pattern to exclude certain URLs that conflicts with the include pattern, the exclude pattern takes precedence.

      • (string) --

    • UrlExclusionPatterns (list) --

      The regular expression pattern to exclude certain URLs to crawl.

      If there is a regular expression pattern to include certain URLs that conflicts with the exclude pattern, the exclude pattern takes precedence.

      • (string) --

    • ProxyConfiguration (dict) --

      Provides configuration information required to connect to your internal websites via a web proxy.

      You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS.

      Web proxy credentials are optional and you can use them to connect to a web proxy server that requires basic authentication. To store web proxy credentials, you use a secret in Secrets Manager.

      • Host (string) -- [REQUIRED]

        The name of the website host you want to connect to via a web proxy server.

        For example, the host name of https://a.example.com/page1.html is "a.example.com".

      • Port (integer) -- [REQUIRED]

        The port number of the website host you want to connect to via a web proxy server.

        For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.

      • Credentials (string) --

        Your secret ARN, which you can create in Secrets Manager

        The credentials are optional. You use a secret if web proxy credentials are required to connect to a website host. Amazon Kendra currently support basic authentication to connect to a web proxy server. The secret stores your credentials.

    • AuthenticationConfiguration (dict) --

      Provides configuration information required to connect to websites using authentication.

      You can connect to websites using basic authentication of user name and password.

      You must provide the website host name and port number. For example, the host name of https://a.example.com/page1.html is "a.example.com" and the port is 443, the standard port for HTTPS. You use a secret in Secrets Manager to store your authentication credentials.

      • BasicAuthentication (list) --

        The list of configuration information that's required to connect to and crawl a website host using basic authentication credentials.

        The list includes the name and port number of the website host.

        • (dict) --

          Provides the configuration information to connect to websites that require basic user authentication.

          • Host (string) -- [REQUIRED]

            The name of the website host you want to connect to using authentication credentials.

            For example, the host name of https://a.example.com/page1.html is "a.example.com".

          • Port (integer) -- [REQUIRED]

            The port number of the website host you want to connect to using authentication credentials.

            For example, the port for https://a.example.com/page1.html is 443, the standard port for HTTPS.

          • Credentials (string) -- [REQUIRED]

            Your secret ARN, which you can create in Secrets Manager

            You use a secret if basic authentication credentials are required to connect to a website. The secret stores your credentials of user name and password.

  • WorkDocsConfiguration (dict) --

    Provides the configuration information to connect to WorkDocs as your data source.

    • OrganizationId (string) -- [REQUIRED]

      The identifier of the directory corresponding to your Amazon WorkDocs site repository.

      You can find the organization ID in the Directory Service by going to Active Directory, then Directories. Your Amazon WorkDocs site directory has an ID, which is the organization ID. You can also set up a new Amazon WorkDocs directory in the Directory Service console and enable a Amazon WorkDocs site for the directory in the Amazon WorkDocs console.

    • CrawlComments (boolean) --

      TRUE to include comments on documents in your index. Including comments in your index means each comment is a document that can be searched on.

      The default is set to FALSE.

    • UseChangeLog (boolean) --

      TRUE to use the change logs to update documents in your index instead of scanning all documents.

      If you are syncing your Amazon WorkDocs data source with your index for the first time, all documents are scanned. After your first sync, you can use the change logs to update your documents in your index for future syncs.

      The default is set to FALSE.

    • InclusionPatterns (list) --

      A list of regular expression patterns to include certain files in your Amazon WorkDocs site repository. Files that match the patterns are included in the index. Files that don't match the patterns are excluded from the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn’t included in the index.

      • (string) --

    • ExclusionPatterns (list) --

      A list of regular expression patterns to exclude certain files in your Amazon WorkDocs site repository. Files that match the patterns are excluded from the index. Files that don’t match the patterns are included in the index. If a file matches both an inclusion pattern and an exclusion pattern, the exclusion pattern takes precedence and the file isn’t included in the index.

      • (string) --

    • FieldMappings (list) --

      A list of DataSourceToIndexFieldMapping objects that map Amazon WorkDocs field names to custom index field names in Amazon Kendra. You must first create the custom index fields using the UpdateIndex operation before you map to Amazon WorkDocs fields. For more information, see Mapping Data Source Fields. The Amazon WorkDocs data source field names need to exist in your Amazon WorkDocs custom metadata.

      • (dict) --

        Maps a column or attribute in the data source to an index field. You must first create the fields in the index using the UpdateIndex operation.

        • DataSourceFieldName (string) -- [REQUIRED]

          The name of the column or attribute in the data source.

        • DateFieldFormat (string) --

          The type of data stored in the column or attribute.

        • IndexFieldName (string) -- [REQUIRED]

          The name of the field in the index.

type Description:

string

param Description:

The new description for the data source.

type Schedule:

string

param Schedule:

The new update schedule for the data source.

type RoleArn:

string

param RoleArn:

The Amazon Resource Name (ARN) of the new role to use when the data source is accessing resources on your behalf.

type LanguageCode:

string

param LanguageCode:

The code for a language. This allows you to support a language for all documents when updating the data source. English is supported by default. For more information on supported languages, including their codes, see Adding documents in languages other than English.

type CustomDocumentEnrichmentConfiguration:

dict

param CustomDocumentEnrichmentConfiguration:

Configuration information for altering document metadata and content during the document ingestion process when you update a data source.

For more information on how to create, modify and delete document metadata, or make other content alterations when you ingest documents into Amazon Kendra, see Customizing document metadata during the ingestion process.

  • InlineConfigurations (list) --

    Configuration information to alter document attributes or metadata fields and content when ingesting documents into Amazon Kendra.

    • (dict) --

      Provides the configuration information for applying basic logic to alter document metadata and content when ingesting documents into Amazon Kendra. To apply advanced logic, to go beyond what you can do with basic logic, see HookConfiguration.

      For more information, see Customizing document metadata during the ingestion process.

      • Condition (dict) --

        Configuration of the condition used for the target document attribute or metadata field when ingesting documents into Amazon Kendra.

        • ConditionDocumentAttributeKey (string) -- [REQUIRED]

          The identifier of the document attribute used for the condition.

          For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

          Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

        • Operator (string) -- [REQUIRED]

          The condition operator.

          For example, you can use 'Contains' to partially match a string.

        • ConditionOnValue (dict) --

          The value used by the operator.

          For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

      • Target (dict) --

        Configuration of the target document attribute or metadata field when ingesting documents into Amazon Kendra. You can also include a value.

        • TargetDocumentAttributeKey (string) --

          The identifier of the target document attribute or metadata field.

          For example, 'Department' could be an identifier for the target attribute or metadata field that includes the department names associated with the documents.

        • TargetDocumentAttributeValueDeletion (boolean) --

          TRUE to delete the existing target value for your specified target attribute key. You cannot create a target value and set this to TRUE. To create a target value ( TargetDocumentAttributeValue), set this to FALSE.

        • TargetDocumentAttributeValue (dict) --

          The target value you want to create for the target attribute.

          For example, 'Finance' could be the target value for the target attribute key 'Department'.

          • StringValue (string) --

            A string, such as "department".

          • StringListValue (list) --

            A list of strings.

            • (string) --

          • LongValue (integer) --

            A long integer value.

          • DateValue (datetime) --

            A date expressed as an ISO 8601 string.

            It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

      • DocumentContentDeletion (boolean) --

        TRUE to delete content if the condition used for the target attribute is met.

  • PreExtractionHookConfiguration (dict) --

    Configuration information for invoking a Lambda function in Lambda on the original or raw documents before extracting their metadata and text. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

    • InvocationCondition (dict) --

      The condition used for when a Lambda function should be invoked.

      For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

      • ConditionDocumentAttributeKey (string) -- [REQUIRED]

        The identifier of the document attribute used for the condition.

        For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

        Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

      • Operator (string) -- [REQUIRED]

        The condition operator.

        For example, you can use 'Contains' to partially match a string.

      • ConditionOnValue (dict) --

        The value used by the operator.

        For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

        • StringValue (string) --

          A string, such as "department".

        • StringListValue (list) --

          A list of strings.

          • (string) --

        • LongValue (integer) --

          A long integer value.

        • DateValue (datetime) --

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LambdaArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

    • S3Bucket (string) -- [REQUIRED]

      Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

  • PostExtractionHookConfiguration (dict) --

    Configuration information for invoking a Lambda function in Lambda on the structured documents with their metadata and text extracted. You can use a Lambda function to apply advanced logic for creating, modifying, or deleting document metadata and content. For more information, see Advanced data manipulation.

    • InvocationCondition (dict) --

      The condition used for when a Lambda function should be invoked.

      For example, you can specify a condition that if there are empty date-time values, then Amazon Kendra should invoke a function that inserts the current date-time.

      • ConditionDocumentAttributeKey (string) -- [REQUIRED]

        The identifier of the document attribute used for the condition.

        For example, 'Source_URI' could be an identifier for the attribute or metadata field that contains source URIs associated with the documents.

        Amazon Kendra currently does not support _document_body as an attribute key used for the condition.

      • Operator (string) -- [REQUIRED]

        The condition operator.

        For example, you can use 'Contains' to partially match a string.

      • ConditionOnValue (dict) --

        The value used by the operator.

        For example, you can specify the value 'financial' for strings in the 'Source_URI' field that partially match or contain this value.

        • StringValue (string) --

          A string, such as "department".

        • StringListValue (list) --

          A list of strings.

          • (string) --

        • LongValue (integer) --

          A long integer value.

        • DateValue (datetime) --

          A date expressed as an ISO 8601 string.

          It is important for the time zone to be included in the ISO 8601 date-time format. For example, 20120325T123010+01:00 is the ISO 8601 date-time format for March 25th 2012 at 12:30PM (plus 10 seconds) in Central European Time.

    • LambdaArn (string) -- [REQUIRED]

      The Amazon Resource Name (ARN) of a role with permission to run a Lambda function during ingestion. For more information, see IAM roles for Amazon Kendra.

    • S3Bucket (string) -- [REQUIRED]

      Stores the original, raw documents or the structured, parsed documents before and after altering them. For more information, see Data contracts for Lambda functions.

  • RoleArn (string) --

    The Amazon Resource Name (ARN) of a role with permission to run PreExtractionHookConfiguration and PostExtractionHookConfiguration for altering document metadata and content during the document ingestion process. For more information, see IAM roles for Amazon Kendra.

returns:

None