Amazon Bedrock

2024/08/19 - Amazon Bedrock - 4 new api methods

Changes  Amazon Bedrock Batch Inference/ Model Invocation is a feature which allows customers to asynchronously run inference on a large set of records/files stored in S3.

CreateModelInvocationJob (new) Link ¶

Creates a job to invoke a model on multiple prompts (batch inference). Format your data according to Format your inference data and upload it to an Amazon S3 bucket. For more information, see Create a batch inference job.

The response returns a jobArn that you can use to stop or get details about the job. You can check the status of the job by sending a GetModelCustomizationJob request.

See also: AWS API Documentation

Request Syntax

client.create_model_invocation_job(
    jobName='string',
    roleArn='string',
    clientRequestToken='string',
    modelId='string',
    inputDataConfig={
        's3InputDataConfig': {
            's3InputFormat': 'JSONL',
            's3Uri': 'string'
        }
    },
    outputDataConfig={
        's3OutputDataConfig': {
            's3Uri': 'string',
            's3EncryptionKeyId': 'string'
        }
    },
    timeoutDurationInHours=123,
    tags=[
        {
            'key': 'string',
            'value': 'string'
        },
    ]
)
type jobName:

string

param jobName:

[REQUIRED]

A name to give the batch inference job.

type roleArn:

string

param roleArn:

[REQUIRED]

The Amazon Resource Name (ARN) of the service role with permissions to carry out and manage batch inference. You can use the console to create a default service role or follow the steps at Create a service role for batch inference.

type clientRequestToken:

string

param clientRequestToken:

A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

This field is autopopulated if not provided.

type modelId:

string

param modelId:

[REQUIRED]

The unique identifier of the foundation model to use for the batch inference job.

type inputDataConfig:

dict

param inputDataConfig:

[REQUIRED]

Details about the location of the input to the batch inference job.

  • s3InputDataConfig (dict) --

    Contains the configuration of the S3 location of the input data.

    • s3InputFormat (string) --

      The format of the input data.

    • s3Uri (string) -- [REQUIRED]

      The S3 location of the input data.

type outputDataConfig:

dict

param outputDataConfig:

[REQUIRED]

Details about the location of the output of the batch inference job.

  • s3OutputDataConfig (dict) --

    Contains the configuration of the S3 location of the output data.

    • s3Uri (string) -- [REQUIRED]

      The S3 location of the output data.

    • s3EncryptionKeyId (string) --

      The unique identifier of the key that encrypts the S3 location of the output data.

type timeoutDurationInHours:

integer

param timeoutDurationInHours:

The number of hours after which to force the batch inference job to time out.

type tags:

list

param tags:

Any tags to associate with the batch inference job. For more information, see Tagging Amazon Bedrock resources.

  • (dict) --

    Definition of the key/value pair for a tag.

    • key (string) -- [REQUIRED]

      Key for the tag.

    • value (string) -- [REQUIRED]

      Value for the tag.

rtype:

dict

returns:

Response Syntax

{
    'jobArn': 'string'
}

Response Structure

  • (dict) --

    • jobArn (string) --

      The Amazon Resource Name (ARN) of the batch inference job.

StopModelInvocationJob (new) Link ¶

Stops a batch inference job. You're only charged for tokens that were already processed. For more information, see Stop a batch inference job.

See also: AWS API Documentation

Request Syntax

client.stop_model_invocation_job(
    jobIdentifier='string'
)
type jobIdentifier:

string

param jobIdentifier:

[REQUIRED]

The Amazon Resource Name (ARN) of the batch inference job to stop.

rtype:

dict

returns:

Response Syntax

{}

Response Structure

  • (dict) --

ListModelInvocationJobs (new) Link ¶

Lists all batch inference jobs in the account. For more information, see View details about a batch inference job.

See also: AWS API Documentation

Request Syntax

client.list_model_invocation_jobs(
    submitTimeAfter=datetime(2015, 1, 1),
    submitTimeBefore=datetime(2015, 1, 1),
    statusEquals='Submitted'|'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped'|'PartiallyCompleted'|'Expired'|'Validating'|'Scheduled',
    nameContains='string',
    maxResults=123,
    nextToken='string',
    sortBy='CreationTime',
    sortOrder='Ascending'|'Descending'
)
type submitTimeAfter:

datetime

param submitTimeAfter:

Specify a time to filter for batch inference jobs that were submitted after the time you specify.

type submitTimeBefore:

datetime

param submitTimeBefore:

Specify a time to filter for batch inference jobs that were submitted before the time you specify.

type statusEquals:

string

param statusEquals:

Specify a status to filter for batch inference jobs whose statuses match the string you specify.

type nameContains:

string

param nameContains:

Specify a string to filter for batch inference jobs whose names contain the string.

type maxResults:

integer

param maxResults:

The maximum number of results to return. If there are more results than the number that you specify, a nextToken value is returned. Use the nextToken in a request to return the next batch of results.

type nextToken:

string

param nextToken:

If there were more results than the value you specified in the maxResults field in a previous ListModelInvocationJobs request, the response would have returned a nextToken value. To see the next batch of results, send the nextToken value in another request.

type sortBy:

string

param sortBy:

An attribute by which to sort the results.

type sortOrder:

string

param sortOrder:

Specifies whether to sort the results by ascending or descending order.

rtype:

dict

returns:

Response Syntax

{
    'nextToken': 'string',
    'invocationJobSummaries': [
        {
            'jobArn': 'string',
            'jobName': 'string',
            'modelId': 'string',
            'clientRequestToken': 'string',
            'roleArn': 'string',
            'status': 'Submitted'|'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped'|'PartiallyCompleted'|'Expired'|'Validating'|'Scheduled',
            'message': 'string',
            'submitTime': datetime(2015, 1, 1),
            'lastModifiedTime': datetime(2015, 1, 1),
            'endTime': datetime(2015, 1, 1),
            'inputDataConfig': {
                's3InputDataConfig': {
                    's3InputFormat': 'JSONL',
                    's3Uri': 'string'
                }
            },
            'outputDataConfig': {
                's3OutputDataConfig': {
                    's3Uri': 'string',
                    's3EncryptionKeyId': 'string'
                }
            },
            'timeoutDurationInHours': 123,
            'jobExpirationTime': datetime(2015, 1, 1)
        },
    ]
}

Response Structure

  • (dict) --

    • nextToken (string) --

      If there are more results than can fit in the response, a nextToken is returned. Use the nextToken in a request to return the next batch of results.

    • invocationJobSummaries (list) --

      A list of items, each of which contains a summary about a batch inference job.

      • (dict) --

        A summary of a batch inference job.

        • jobArn (string) --

          The Amazon Resource Name (ARN) of the batch inference job.

        • jobName (string) --

          The name of the batch inference job.

        • modelId (string) --

          The unique identifier of the foundation model used for model inference.

        • clientRequestToken (string) --

          A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

        • roleArn (string) --

          The Amazon Resource Name (ARN) of the service role with permissions to carry out and manage batch inference. You can use the console to create a default service role or follow the steps at Create a service role for batch inference.

        • status (string) --

          The status of the batch inference job.

        • message (string) --

          If the batch inference job failed, this field contains a message describing why the job failed.

        • submitTime (datetime) --

          The time at which the batch inference job was submitted.

        • lastModifiedTime (datetime) --

          The time at which the batch inference job was last modified.

        • endTime (datetime) --

          The time at which the batch inference job ended.

        • inputDataConfig (dict) --

          Details about the location of the input to the batch inference job.

          • s3InputDataConfig (dict) --

            Contains the configuration of the S3 location of the input data.

            • s3InputFormat (string) --

              The format of the input data.

            • s3Uri (string) --

              The S3 location of the input data.

        • outputDataConfig (dict) --

          Details about the location of the output of the batch inference job.

          • s3OutputDataConfig (dict) --

            Contains the configuration of the S3 location of the output data.

            • s3Uri (string) --

              The S3 location of the output data.

            • s3EncryptionKeyId (string) --

              The unique identifier of the key that encrypts the S3 location of the output data.

        • timeoutDurationInHours (integer) --

          The number of hours after which the batch inference job was set to time out.

        • jobExpirationTime (datetime) --

          The time at which the batch inference job times or timed out.

GetModelInvocationJob (new) Link ¶

Gets details about a batch inference job. For more information, see View details about a batch inference job

See also: AWS API Documentation

Request Syntax

client.get_model_invocation_job(
    jobIdentifier='string'
)
type jobIdentifier:

string

param jobIdentifier:

[REQUIRED]

The Amazon Resource Name (ARN) of the batch inference job.

rtype:

dict

returns:

Response Syntax

{
    'jobArn': 'string',
    'jobName': 'string',
    'modelId': 'string',
    'clientRequestToken': 'string',
    'roleArn': 'string',
    'status': 'Submitted'|'InProgress'|'Completed'|'Failed'|'Stopping'|'Stopped'|'PartiallyCompleted'|'Expired'|'Validating'|'Scheduled',
    'message': 'string',
    'submitTime': datetime(2015, 1, 1),
    'lastModifiedTime': datetime(2015, 1, 1),
    'endTime': datetime(2015, 1, 1),
    'inputDataConfig': {
        's3InputDataConfig': {
            's3InputFormat': 'JSONL',
            's3Uri': 'string'
        }
    },
    'outputDataConfig': {
        's3OutputDataConfig': {
            's3Uri': 'string',
            's3EncryptionKeyId': 'string'
        }
    },
    'timeoutDurationInHours': 123,
    'jobExpirationTime': datetime(2015, 1, 1)
}

Response Structure

  • (dict) --

    • jobArn (string) --

      The Amazon Resource Name (ARN) of the batch inference job.

    • jobName (string) --

      The name of the batch inference job.

    • modelId (string) --

      The unique identifier of the foundation model used for model inference.

    • clientRequestToken (string) --

      A unique, case-sensitive identifier to ensure that the API request completes no more than one time. If this token matches a previous request, Amazon Bedrock ignores the request, but does not return an error. For more information, see Ensuring idempotency.

    • roleArn (string) --

      The Amazon Resource Name (ARN) of the service role with permissions to carry out and manage batch inference. You can use the console to create a default service role or follow the steps at Create a service role for batch inference.

    • status (string) --

      The status of the batch inference job.

    • message (string) --

      If the batch inference job failed, this field contains a message describing why the job failed.

    • submitTime (datetime) --

      The time at which the batch inference job was submitted.

    • lastModifiedTime (datetime) --

      The time at which the batch inference job was last modified.

    • endTime (datetime) --

      The time at which the batch inference job ended.

    • inputDataConfig (dict) --

      Details about the location of the input to the batch inference job.

      • s3InputDataConfig (dict) --

        Contains the configuration of the S3 location of the input data.

        • s3InputFormat (string) --

          The format of the input data.

        • s3Uri (string) --

          The S3 location of the input data.

    • outputDataConfig (dict) --

      Details about the location of the output of the batch inference job.

      • s3OutputDataConfig (dict) --

        Contains the configuration of the S3 location of the output data.

        • s3Uri (string) --

          The S3 location of the output data.

        • s3EncryptionKeyId (string) --

          The unique identifier of the key that encrypts the S3 location of the output data.

    • timeoutDurationInHours (integer) --

      The number of hours after which batch inference job was set to time out.

    • jobExpirationTime (datetime) --

      The time at which the batch inference job times or timed out.