Amazon Transcribe Service

2021/03/31 - Amazon Transcribe Service - 3 updated api methods

Changes  Amazon Transcribe now supports creating custom language models in the following languages: British English (en-GB), Australian English (en-AU), Indian Hindi (hi-IN), and US Spanish (es-US).

CreateLanguageModel (updated) Link ¶
Changes (both)
{'LanguageCode': {'en-AU', 'en-GB', 'es-US', 'hi-IN'}}

Creates a new custom language model. Use Amazon S3 prefixes to provide the location of your input files. The time it takes to create your model depends on the size of your training data.

See also: AWS API Documentation

Request Syntax

client.create_language_model(
    LanguageCode='en-US'|'hi-IN'|'es-US'|'en-GB'|'en-AU',
    BaseModelName='NarrowBand'|'WideBand',
    ModelName='string',
    InputDataConfig={
        'S3Uri': 'string',
        'TuningDataS3Uri': 'string',
        'DataAccessRoleArn': 'string'
    }
)
type LanguageCode

string

param LanguageCode

[REQUIRED]

The language of the input text you're using to train your custom language model.

type BaseModelName

string

param BaseModelName

[REQUIRED]

The Amazon Transcribe standard language model, or base model used to create your custom language model.

If you want to use your custom language model to transcribe audio with a sample rate of 16 kHz or greater, choose Wideband .

If you want to use your custom language model to transcribe audio with a sample rate that is less than 16 kHz, choose Narrowband .

type ModelName

string

param ModelName

[REQUIRED]

The name you choose for your custom language model when you create it.

type InputDataConfig

dict

param InputDataConfig

[REQUIRED]

Contains the data access role and the Amazon S3 prefixes to read the required input files to create a custom language model.

  • S3Uri (string) -- [REQUIRED]

    The Amazon S3 prefix you specify to access the plain text files that you use to train your custom language model.

  • TuningDataS3Uri (string) --

    The Amazon S3 prefix you specify to access the plain text files that you use to tune your custom language model.

  • DataAccessRoleArn (string) -- [REQUIRED]

    The Amazon Resource Name (ARN) that uniquely identifies the permissions you've given Amazon Transcribe to access your Amazon S3 buckets containing your media files or text data.

rtype

dict

returns

Response Syntax

{
    'LanguageCode': 'en-US'|'hi-IN'|'es-US'|'en-GB'|'en-AU',
    'BaseModelName': 'NarrowBand'|'WideBand',
    'ModelName': 'string',
    'InputDataConfig': {
        'S3Uri': 'string',
        'TuningDataS3Uri': 'string',
        'DataAccessRoleArn': 'string'
    },
    'ModelStatus': 'IN_PROGRESS'|'FAILED'|'COMPLETED'
}

Response Structure

  • (dict) --

    • LanguageCode (string) --

      The language code of the text you've used to create a custom language model.

    • BaseModelName (string) --

      The Amazon Transcribe standard language model, or base model you've used to create a custom language model.

    • ModelName (string) --

      The name you've chosen for your custom language model.

    • InputDataConfig (dict) --

      The data access role and Amazon S3 prefixes you've chosen to create your custom language model.

      • S3Uri (string) --

        The Amazon S3 prefix you specify to access the plain text files that you use to train your custom language model.

      • TuningDataS3Uri (string) --

        The Amazon S3 prefix you specify to access the plain text files that you use to tune your custom language model.

      • DataAccessRoleArn (string) --

        The Amazon Resource Name (ARN) that uniquely identifies the permissions you've given Amazon Transcribe to access your Amazon S3 buckets containing your media files or text data.

    • ModelStatus (string) --

      The status of the custom language model. When the status is COMPLETED the model is ready to use.

DescribeLanguageModel (updated) Link ¶
Changes (response)
{'LanguageModel': {'LanguageCode': {'en-AU', 'en-GB', 'es-US', 'hi-IN'}}}

Gets information about a single custom language model. Use this information to see details about the language model in your AWS account. You can also see whether the base language model used to create your custom language model has been updated. If Amazon Transcribe has updated the base model, you can create a new custom language model using the updated base model. If the language model wasn't created, you can use this operation to understand why Amazon Transcribe couldn't create it.

See also: AWS API Documentation

Request Syntax

client.describe_language_model(
    ModelName='string'
)
type ModelName

string

param ModelName

[REQUIRED]

The name of the custom language model you submit to get more information.

rtype

dict

returns

Response Syntax

{
    'LanguageModel': {
        'ModelName': 'string',
        'CreateTime': datetime(2015, 1, 1),
        'LastModifiedTime': datetime(2015, 1, 1),
        'LanguageCode': 'en-US'|'hi-IN'|'es-US'|'en-GB'|'en-AU',
        'BaseModelName': 'NarrowBand'|'WideBand',
        'ModelStatus': 'IN_PROGRESS'|'FAILED'|'COMPLETED',
        'UpgradeAvailability': True|False,
        'FailureReason': 'string',
        'InputDataConfig': {
            'S3Uri': 'string',
            'TuningDataS3Uri': 'string',
            'DataAccessRoleArn': 'string'
        }
    }
}

Response Structure

  • (dict) --

    • LanguageModel (dict) --

      The name of the custom language model you requested more information about.

      • ModelName (string) --

        The name of the custom language model.

      • CreateTime (datetime) --

        The time the custom language model was created.

      • LastModifiedTime (datetime) --

        The most recent time the custom language model was modified.

      • LanguageCode (string) --

        The language code you used to create your custom language model.

      • BaseModelName (string) --

        The Amazon Transcribe standard language model, or base model used to create the custom language model.

      • ModelStatus (string) --

        The creation status of a custom language model. When the status is COMPLETED the model is ready for use.

      • UpgradeAvailability (boolean) --

        Whether the base model used for the custom language model is up to date. If this field is true then you are running the most up-to-date version of the base model in your custom language model.

      • FailureReason (string) --

        The reason why the custom language model couldn't be created.

      • InputDataConfig (dict) --

        The data access role and Amazon S3 prefixes for the input files used to train the custom language model.

        • S3Uri (string) --

          The Amazon S3 prefix you specify to access the plain text files that you use to train your custom language model.

        • TuningDataS3Uri (string) --

          The Amazon S3 prefix you specify to access the plain text files that you use to tune your custom language model.

        • DataAccessRoleArn (string) --

          The Amazon Resource Name (ARN) that uniquely identifies the permissions you've given Amazon Transcribe to access your Amazon S3 buckets containing your media files or text data.

ListLanguageModels (updated) Link ¶
Changes (response)
{'Models': {'LanguageCode': {'en-AU', 'en-GB', 'es-US', 'hi-IN'}}}

Provides more information about the custom language models you've created. You can use the information in this list to find a specific custom language model. You can then use the operation to get more information about it.

See also: AWS API Documentation

Request Syntax

client.list_language_models(
    StatusEquals='IN_PROGRESS'|'FAILED'|'COMPLETED',
    NameContains='string',
    NextToken='string',
    MaxResults=123
)
type StatusEquals

string

param StatusEquals

When specified, returns only custom language models with the specified status. Language models are ordered by creation date, with the newest models first. If you don't specify a status, Amazon Transcribe returns all custom language models ordered by date.

type NameContains

string

param NameContains

When specified, the custom language model names returned contain the substring you've specified.

type NextToken

string

param NextToken

When included, fetches the next set of jobs if the result of the previous request was truncated.

type MaxResults

integer

param MaxResults

The maximum number of language models to return in the response. If there are fewer results in the list, the response contains only the actual results.

rtype

dict

returns

Response Syntax

{
    'NextToken': 'string',
    'Models': [
        {
            'ModelName': 'string',
            'CreateTime': datetime(2015, 1, 1),
            'LastModifiedTime': datetime(2015, 1, 1),
            'LanguageCode': 'en-US'|'hi-IN'|'es-US'|'en-GB'|'en-AU',
            'BaseModelName': 'NarrowBand'|'WideBand',
            'ModelStatus': 'IN_PROGRESS'|'FAILED'|'COMPLETED',
            'UpgradeAvailability': True|False,
            'FailureReason': 'string',
            'InputDataConfig': {
                'S3Uri': 'string',
                'TuningDataS3Uri': 'string',
                'DataAccessRoleArn': 'string'
            }
        },
    ]
}

Response Structure

  • (dict) --

    • NextToken (string) --

      The operation returns a page of jobs at a time. The maximum size of the list is set by the MaxResults parameter. If there are more language models in the list than the page size, Amazon Transcribe returns the NextPage token. Include the token in the next request to the operation to return the next page of language models.

    • Models (list) --

      A list of objects containing information about custom language models.

      • (dict) --

        The structure used to describe a custom language model.

        • ModelName (string) --

          The name of the custom language model.

        • CreateTime (datetime) --

          The time the custom language model was created.

        • LastModifiedTime (datetime) --

          The most recent time the custom language model was modified.

        • LanguageCode (string) --

          The language code you used to create your custom language model.

        • BaseModelName (string) --

          The Amazon Transcribe standard language model, or base model used to create the custom language model.

        • ModelStatus (string) --

          The creation status of a custom language model. When the status is COMPLETED the model is ready for use.

        • UpgradeAvailability (boolean) --

          Whether the base model used for the custom language model is up to date. If this field is true then you are running the most up-to-date version of the base model in your custom language model.

        • FailureReason (string) --

          The reason why the custom language model couldn't be created.

        • InputDataConfig (dict) --

          The data access role and Amazon S3 prefixes for the input files used to train the custom language model.

          • S3Uri (string) --

            The Amazon S3 prefix you specify to access the plain text files that you use to train your custom language model.

          • TuningDataS3Uri (string) --

            The Amazon S3 prefix you specify to access the plain text files that you use to tune your custom language model.

          • DataAccessRoleArn (string) --

            The Amazon Resource Name (ARN) that uniquely identifies the permissions you've given Amazon Transcribe to access your Amazon S3 buckets containing your media files or text data.