2026/01/05 - AWS Clean Rooms ML - 4 updated api methods
Changes AWS Clean Rooms ML now supports advanced Spark configurations to optimize SQL performance when creating an MLInputChannel or an audience generation job.
{'inputChannel': {'dataSource': {'protectedQueryInputParameters': {'computeConfiguration': {'worker': {'properties': {'spark': {'string': 'string'}}}}}}}}
Provides the information to create an ML input channel. An ML input channel is the result of a query that can be used for ML modeling.
See also: AWS API Documentation
Request Syntax
client.create_ml_input_channel(
membershipIdentifier='string',
configuredModelAlgorithmAssociations=[
'string',
],
inputChannel={
'dataSource': {
'protectedQueryInputParameters': {
'sqlParameters': {
'queryString': 'string',
'analysisTemplateArn': 'string',
'parameters': {
'string': 'string'
}
},
'computeConfiguration': {
'worker': {
'type': 'CR.1X'|'CR.4X',
'number': 123,
'properties': {
'spark': {
'string': 'string'
}
}
}
},
'resultFormat': 'CSV'|'PARQUET'
}
},
'roleArn': 'string'
},
name='string',
retentionInDays=123,
description='string',
kmsKeyArn='string',
tags={
'string': 'string'
}
)
string
[REQUIRED]
The membership ID of the member that is creating the ML input channel.
list
[REQUIRED]
The associated configured model algorithms that are necessary to create this ML input channel.
(string) --
dict
[REQUIRED]
The input data that is used to create this ML input channel.
dataSource (dict) -- [REQUIRED]
The data source that is used to create the ML input channel.
protectedQueryInputParameters (dict) --
Provides information necessary to perform the protected query.
sqlParameters (dict) -- [REQUIRED]
The parameters for the SQL type Protected Query.
queryString (string) --
The query string to be submitted.
analysisTemplateArn (string) --
The Amazon Resource Name (ARN) associated with the analysis template within a collaboration.
parameters (dict) --
The protected query SQL parameters.
(string) --
(string) --
computeConfiguration (dict) --
Provides configuration information for the workers that will perform the protected query.
worker (dict) --
The worker instances that will perform the compute work.
type (string) --
The instance type of the compute workers that are used.
number (integer) --
The number of compute workers that are used.
properties (dict) --
The configuration properties for the worker compute environment. These properties allow you to customize the compute settings for your Clean Rooms workloads.
spark (dict) --
The Spark configuration properties for SQL workloads. This map contains key-value pairs that configure Apache Spark settings to optimize performance for your data processing jobs. You can specify up to 50 Spark properties, with each key being 1-200 characters and each value being 0-500 characters. These properties allow you to adjust compute capacity for large datasets and complex workloads.
(string) --
(string) --
resultFormat (string) --
The format in which the query results should be returned. If not specified, defaults to CSV.
roleArn (string) -- [REQUIRED]
The Amazon Resource Name (ARN) of the role used to run the query specified in the dataSource field of the input channel.
Passing a role across AWS accounts is not allowed. If you pass a role that isn't in your account, you get an AccessDeniedException error.
string
[REQUIRED]
The name of the ML input channel.
integer
[REQUIRED]
The number of days that the data in the ML input channel is retained.
string
The description of the ML input channel.
string
The Amazon Resource Name (ARN) of the KMS key that is used to access the input channel.
dict
The optional metadata that you apply to the resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.
The following basic restrictions apply to tags:
Maximum number of tags per resource - 50.
For each resource, each tag key must be unique, and each tag key can have only one value.
Maximum key length - 128 Unicode characters in UTF-8.
Maximum value length - 256 Unicode characters in UTF-8.
If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.
Tag keys and values are case sensitive.
Do not use aws:, AWS:, or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Clean Rooms ML considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.
(string) --
(string) --
dict
Response Syntax
{
'mlInputChannelArn': 'string'
}
Response Structure
(dict) --
mlInputChannelArn (string) --
The Amazon Resource Name (ARN) of the ML input channel.
{'seedAudience': {'sqlComputeConfiguration': {'worker': {'properties': {'spark': {'string': 'string'}}}}}}
Returns information about an audience generation job.
See also: AWS API Documentation
Request Syntax
client.get_audience_generation_job(
audienceGenerationJobArn='string'
)
string
[REQUIRED]
The Amazon Resource Name (ARN) of the audience generation job that you are interested in.
dict
Response Syntax
{
'createTime': datetime(2015, 1, 1),
'updateTime': datetime(2015, 1, 1),
'audienceGenerationJobArn': 'string',
'name': 'string',
'description': 'string',
'status': 'CREATE_PENDING'|'CREATE_IN_PROGRESS'|'CREATE_FAILED'|'ACTIVE'|'DELETE_PENDING'|'DELETE_IN_PROGRESS'|'DELETE_FAILED',
'statusDetails': {
'statusCode': 'string',
'message': 'string'
},
'configuredAudienceModelArn': 'string',
'seedAudience': {
'dataSource': {
's3Uri': 'string'
},
'roleArn': 'string',
'sqlParameters': {
'queryString': 'string',
'analysisTemplateArn': 'string',
'parameters': {
'string': 'string'
}
},
'sqlComputeConfiguration': {
'worker': {
'type': 'CR.1X'|'CR.4X',
'number': 123,
'properties': {
'spark': {
'string': 'string'
}
}
}
}
},
'includeSeedInOutput': True|False,
'collaborationId': 'string',
'metrics': {
'relevanceMetrics': [
{
'audienceSize': {
'type': 'ABSOLUTE'|'PERCENTAGE',
'value': 123
},
'score': 123.0
},
],
'recallMetric': 123.0
},
'startedBy': 'string',
'tags': {
'string': 'string'
},
'protectedQueryIdentifier': 'string'
}
Response Structure
(dict) --
createTime (datetime) --
The time at which the audience generation job was created.
updateTime (datetime) --
The most recent time at which the audience generation job was updated.
audienceGenerationJobArn (string) --
The Amazon Resource Name (ARN) of the audience generation job.
name (string) --
The name of the audience generation job.
description (string) --
The description of the audience generation job.
status (string) --
The status of the audience generation job.
statusDetails (dict) --
Details about the status of the audience generation job.
statusCode (string) --
The status code that was returned. The status code is intended for programmatic error handling. Clean Rooms ML will not change the status code for existing error conditions.
message (string) --
The error message that was returned. The message is intended for human consumption and can change at any time. Use the statusCode for programmatic error handling.
configuredAudienceModelArn (string) --
The Amazon Resource Name (ARN) of the configured audience model used for this audience generation job.
seedAudience (dict) --
The seed audience that was used for this audience generation job. This field will be null if the account calling the API is the account that started this audience generation job.
dataSource (dict) --
Defines the Amazon S3 bucket where the seed audience for the generating audience is stored. A valid data source is a JSON line file in the following format:
{"user_id": "111111"}
{"user_id": "222222"}
...
s3Uri (string) --
The Amazon S3 location URI.
roleArn (string) --
The ARN of the IAM role that can read the Amazon S3 bucket where the seed audience is stored.
sqlParameters (dict) --
The protected SQL query parameters.
queryString (string) --
The query string to be submitted.
analysisTemplateArn (string) --
The Amazon Resource Name (ARN) associated with the analysis template within a collaboration.
parameters (dict) --
The protected query SQL parameters.
(string) --
(string) --
sqlComputeConfiguration (dict) --
Provides configuration information for the instances that will perform the compute work.
worker (dict) --
The worker instances that will perform the compute work.
type (string) --
The instance type of the compute workers that are used.
number (integer) --
The number of compute workers that are used.
properties (dict) --
The configuration properties for the worker compute environment. These properties allow you to customize the compute settings for your Clean Rooms workloads.
spark (dict) --
The Spark configuration properties for SQL workloads. This map contains key-value pairs that configure Apache Spark settings to optimize performance for your data processing jobs. You can specify up to 50 Spark properties, with each key being 1-200 characters and each value being 0-500 characters. These properties allow you to adjust compute capacity for large datasets and complex workloads.
(string) --
(string) --
includeSeedInOutput (boolean) --
Configure whether the seed users are included in the output audience. By default, Clean Rooms ML removes seed users from the output audience. If you specify TRUE, the seed users will appear first in the output. Clean Rooms ML does not explicitly reveal whether a user was in the seed, but the recipient of the audience will know that the first minimumSeedSize count of users are from the seed.
collaborationId (string) --
The identifier of the collaboration that this audience generation job is associated with.
metrics (dict) --
The relevance scores for different audience sizes and the recall score of the generated audience.
relevanceMetrics (list) --
The relevance scores of the generated audience.
(dict) --
The relevance score of a generated audience.
audienceSize (dict) --
The size of the generated audience. Must match one of the sizes in the configured audience model.
type (string) --
Whether the audience size is defined in absolute terms or as a percentage. You can use the ABSOLUTE AudienceSize to configure out audience sizes using the count of identifiers in the output. You can use the Percentage AudienceSize to configure sizes in the range 1-100 percent.
value (integer) --
Specify an audience size value.
score (float) --
The relevance score of the generated audience.
recallMetric (float) --
The recall score of the generated audience. Recall is the percentage of the most similar users (by default, the most similar 20%) from a sample of the training data that are included in the seed audience by the audience generation job. Values range from 0-1, larger values indicate a better audience. A recall value approximately equal to the maximum bin size indicates that the audience model is equivalent to random selection.
startedBy (string) --
The AWS account that started this audience generation job.
tags (dict) --
The tags that are associated to this audience generation job.
(string) --
(string) --
protectedQueryIdentifier (string) --
The unique identifier of the protected query for this audience generation job.
{'inputChannel': {'dataSource': {'protectedQueryInputParameters': {'computeConfiguration': {'worker': {'properties': {'spark': {'string': 'string'}}}}}}}}
Returns information about an ML input channel.
See also: AWS API Documentation
Request Syntax
client.get_ml_input_channel(
mlInputChannelArn='string',
membershipIdentifier='string'
)
string
[REQUIRED]
The Amazon Resource Name (ARN) of the ML input channel that you want to get.
string
[REQUIRED]
The membership ID of the membership that contains the ML input channel that you want to get.
dict
Response Syntax
{
'membershipIdentifier': 'string',
'collaborationIdentifier': 'string',
'mlInputChannelArn': 'string',
'name': 'string',
'configuredModelAlgorithmAssociations': [
'string',
],
'status': 'CREATE_PENDING'|'CREATE_IN_PROGRESS'|'CREATE_FAILED'|'ACTIVE'|'DELETE_PENDING'|'DELETE_IN_PROGRESS'|'DELETE_FAILED'|'INACTIVE',
'statusDetails': {
'statusCode': 'string',
'message': 'string'
},
'retentionInDays': 123,
'numberOfRecords': 123,
'privacyBudgets': {
'accessBudgets': [
{
'resourceArn': 'string',
'details': [
{
'startTime': datetime(2015, 1, 1),
'endTime': datetime(2015, 1, 1),
'remainingBudget': 123,
'budget': 123,
'budgetType': 'CALENDAR_DAY'|'CALENDAR_MONTH'|'CALENDAR_WEEK'|'LIFETIME',
'autoRefresh': 'ENABLED'|'DISABLED'
},
],
'aggregateRemainingBudget': 123
},
]
},
'description': 'string',
'syntheticDataConfiguration': {
'syntheticDataParameters': {
'epsilon': 123.0,
'maxMembershipInferenceAttackScore': 123.0,
'columnClassification': {
'columnMapping': [
{
'columnName': 'string',
'columnType': 'CATEGORICAL'|'NUMERICAL',
'isPredictiveValue': True|False
},
]
}
},
'syntheticDataEvaluationScores': {
'dataPrivacyScores': {
'membershipInferenceAttackScores': [
{
'attackVersion': 'DISTANCE_TO_CLOSEST_RECORD_V1',
'score': 123.0
},
]
}
}
},
'createTime': datetime(2015, 1, 1),
'updateTime': datetime(2015, 1, 1),
'inputChannel': {
'dataSource': {
'protectedQueryInputParameters': {
'sqlParameters': {
'queryString': 'string',
'analysisTemplateArn': 'string',
'parameters': {
'string': 'string'
}
},
'computeConfiguration': {
'worker': {
'type': 'CR.1X'|'CR.4X',
'number': 123,
'properties': {
'spark': {
'string': 'string'
}
}
}
},
'resultFormat': 'CSV'|'PARQUET'
}
},
'roleArn': 'string'
},
'protectedQueryIdentifier': 'string',
'numberOfFiles': 123.0,
'sizeInGb': 123.0,
'kmsKeyArn': 'string',
'tags': {
'string': 'string'
}
}
Response Structure
(dict) --
membershipIdentifier (string) --
The membership ID of the membership that contains the ML input channel.
collaborationIdentifier (string) --
The collaboration ID of the collaboration that contains the ML input channel.
mlInputChannelArn (string) --
The Amazon Resource Name (ARN) of the ML input channel.
name (string) --
The name of the ML input channel.
configuredModelAlgorithmAssociations (list) --
The configured model algorithm associations that were used to create the ML input channel.
(string) --
status (string) --
The status of the ML input channel.
statusDetails (dict) --
Details about the status of a resource.
statusCode (string) --
The status code that was returned. The status code is intended for programmatic error handling. Clean Rooms ML will not change the status code for existing error conditions.
message (string) --
The error message that was returned. The message is intended for human consumption and can change at any time. Use the statusCode for programmatic error handling.
retentionInDays (integer) --
The number of days to keep the data in the ML input channel.
numberOfRecords (integer) --
The number of records in the ML input channel.
privacyBudgets (dict) --
Returns the privacy budgets that control access to this Clean Rooms ML input channel. Use these budgets to monitor and limit resource consumption over specified time periods.
accessBudgets (list) --
A list of access budgets that apply to resources associated with this Clean Rooms ML input channel.
(dict) --
An access budget that defines consumption limits for a specific resource within defined time periods.
resourceArn (string) --
The Amazon Resource Name (ARN) of the resource that this access budget applies to.
details (list) --
A list of budget details for this resource. Contains active budget periods that apply to the resource.
(dict) --
The detailed information for a specific budget period, including time boundaries and budget amounts.
startTime (datetime) --
The start time of this budget period.
endTime (datetime) --
The end time of this budget period. If not specified, the budget period continues indefinitely.
remainingBudget (integer) --
The amount of budget remaining in this period.
budget (integer) --
The total budget amount allocated for this period.
budgetType (string) --
The type of budget period. Calendar-based types reset automatically at regular intervals, while LIFETIME budgets never reset.
autoRefresh (string) --
Specifies whether this budget automatically refreshes when the current period ends.
aggregateRemainingBudget (integer) --
The total remaining budget across all active budget periods for this resource.
description (string) --
The description of the ML input channel.
syntheticDataConfiguration (dict) --
The synthetic data configuration for this ML input channel, including parameters for generating privacy-preserving synthetic data and evaluation scores for measuring the privacy of the generated data.
syntheticDataParameters (dict) --
The parameters that control how synthetic data is generated, including privacy settings, column classifications, and other configuration options that affect the data synthesis process.
epsilon (float) --
The epsilon value for differential privacy, which controls the privacy-utility tradeoff in synthetic data generation. Lower values provide stronger privacy guarantees but may reduce data utility.
maxMembershipInferenceAttackScore (float) --
The maximum acceptable score for membership inference attack vulnerability. Synthetic data generation fails if the score for the resulting data exceeds this threshold.
columnClassification (dict) --
Classification details for data columns that specify how each column should be treated during synthetic data generation.
columnMapping (list) --
A mapping that defines the classification of data columns for synthetic data generation and specifies how each column should be handled during the privacy-preserving data synthesis process.
(dict) --
Properties that define how a specific data column should be handled during synthetic data generation, including its name, type, and role in predictive modeling.
columnName (string) --
The name of the data column as it appears in the dataset.
columnType (string) --
The data type of the column, which determines how the synthetic data generation algorithm processes and synthesizes values for this column.
isPredictiveValue (boolean) --
Indicates if this column contains predictive values that should be treated as target variables in machine learning models. This affects how the synthetic data generation preserves statistical relationships.
syntheticDataEvaluationScores (dict) --
Evaluation scores that assess the quality and privacy characteristics of the generated synthetic data, providing metrics on data utility and privacy preservation.
dataPrivacyScores (dict) --
Privacy-specific evaluation scores that measure how well the synthetic data protects individual privacy, including assessments of potential privacy risks such as membership inference attacks.
membershipInferenceAttackScores (list) --
Scores that evaluate the vulnerability of the synthetic data to membership inference attacks, which attempt to determine whether a specific individual was a member of the original dataset.
(dict) --
A score that measures the vulnerability of synthetic data to membership inference attacks and provides both the numerical score and the version of the attack methodology used for evaluation.
attackVersion (string) --
The version of the membership inference attack, which consists of the attack type and its version number, used to generate this privacy score.
score (float) --
The numerical score representing the vulnerability to membership inference attacks.
createTime (datetime) --
The time at which the ML input channel was created.
updateTime (datetime) --
The most recent time at which the ML input channel was updated.
inputChannel (dict) --
The input channel that was used to create the ML input channel.
dataSource (dict) --
The data source that is used to create the ML input channel.
protectedQueryInputParameters (dict) --
Provides information necessary to perform the protected query.
sqlParameters (dict) --
The parameters for the SQL type Protected Query.
queryString (string) --
The query string to be submitted.
analysisTemplateArn (string) --
The Amazon Resource Name (ARN) associated with the analysis template within a collaboration.
parameters (dict) --
The protected query SQL parameters.
(string) --
(string) --
computeConfiguration (dict) --
Provides configuration information for the workers that will perform the protected query.
worker (dict) --
The worker instances that will perform the compute work.
type (string) --
The instance type of the compute workers that are used.
number (integer) --
The number of compute workers that are used.
properties (dict) --
The configuration properties for the worker compute environment. These properties allow you to customize the compute settings for your Clean Rooms workloads.
spark (dict) --
The Spark configuration properties for SQL workloads. This map contains key-value pairs that configure Apache Spark settings to optimize performance for your data processing jobs. You can specify up to 50 Spark properties, with each key being 1-200 characters and each value being 0-500 characters. These properties allow you to adjust compute capacity for large datasets and complex workloads.
(string) --
(string) --
resultFormat (string) --
The format in which the query results should be returned. If not specified, defaults to CSV.
roleArn (string) --
The Amazon Resource Name (ARN) of the role used to run the query specified in the dataSource field of the input channel.
Passing a role across AWS accounts is not allowed. If you pass a role that isn't in your account, you get an AccessDeniedException error.
protectedQueryIdentifier (string) --
The ID of the protected query that was used to create the ML input channel.
numberOfFiles (float) --
The number of files in the ML input channel.
sizeInGb (float) --
The size, in GB, of the ML input channel.
kmsKeyArn (string) --
The Amazon Resource Name (ARN) of the KMS key that was used to create the ML input channel.
tags (dict) --
The optional metadata that you applied to the resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.
The following basic restrictions apply to tags:
Maximum number of tags per resource - 50.
For each resource, each tag key must be unique, and each tag key can have only one value.
Maximum key length - 128 Unicode characters in UTF-8.
Maximum value length - 256 Unicode characters in UTF-8.
If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.
Tag keys and values are case sensitive.
Do not use aws:, AWS:, or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Clean Rooms ML considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.
(string) --
(string) --
{'seedAudience': {'sqlComputeConfiguration': {'worker': {'properties': {'spark': {'string': 'string'}}}}}}
Information necessary to start the audience generation job.
See also: AWS API Documentation
Request Syntax
client.start_audience_generation_job(
name='string',
configuredAudienceModelArn='string',
seedAudience={
'dataSource': {
's3Uri': 'string'
},
'roleArn': 'string',
'sqlParameters': {
'queryString': 'string',
'analysisTemplateArn': 'string',
'parameters': {
'string': 'string'
}
},
'sqlComputeConfiguration': {
'worker': {
'type': 'CR.1X'|'CR.4X',
'number': 123,
'properties': {
'spark': {
'string': 'string'
}
}
}
}
},
includeSeedInOutput=True|False,
collaborationId='string',
description='string',
tags={
'string': 'string'
}
)
string
[REQUIRED]
The name of the audience generation job.
string
[REQUIRED]
The Amazon Resource Name (ARN) of the configured audience model that is used for this audience generation job.
dict
[REQUIRED]
The seed audience that is used to generate the audience.
dataSource (dict) --
Defines the Amazon S3 bucket where the seed audience for the generating audience is stored. A valid data source is a JSON line file in the following format:
{"user_id": "111111"}
{"user_id": "222222"}
...
s3Uri (string) -- [REQUIRED]
The Amazon S3 location URI.
roleArn (string) -- [REQUIRED]
The ARN of the IAM role that can read the Amazon S3 bucket where the seed audience is stored.
sqlParameters (dict) --
The protected SQL query parameters.
queryString (string) --
The query string to be submitted.
analysisTemplateArn (string) --
The Amazon Resource Name (ARN) associated with the analysis template within a collaboration.
parameters (dict) --
The protected query SQL parameters.
(string) --
(string) --
sqlComputeConfiguration (dict) --
Provides configuration information for the instances that will perform the compute work.
worker (dict) --
The worker instances that will perform the compute work.
type (string) --
The instance type of the compute workers that are used.
number (integer) --
The number of compute workers that are used.
properties (dict) --
The configuration properties for the worker compute environment. These properties allow you to customize the compute settings for your Clean Rooms workloads.
spark (dict) --
The Spark configuration properties for SQL workloads. This map contains key-value pairs that configure Apache Spark settings to optimize performance for your data processing jobs. You can specify up to 50 Spark properties, with each key being 1-200 characters and each value being 0-500 characters. These properties allow you to adjust compute capacity for large datasets and complex workloads.
(string) --
(string) --
boolean
Whether the seed audience is included in the audience generation output.
string
The identifier of the collaboration that contains the audience generation job.
string
The description of the audience generation job.
dict
The optional metadata that you apply to the resource to help you categorize and organize them. Each tag consists of a key and an optional value, both of which you define.
The following basic restrictions apply to tags:
Maximum number of tags per resource - 50.
For each resource, each tag key must be unique, and each tag key can have only one value.
Maximum key length - 128 Unicode characters in UTF-8.
Maximum value length - 256 Unicode characters in UTF-8.
If your tagging schema is used across multiple services and resources, remember that other services may have restrictions on allowed characters. Generally allowed characters are: letters, numbers, and spaces representable in UTF-8, and the following characters: + - = . _ : / @.
Tag keys and values are case sensitive.
Do not use aws:, AWS:, or any upper or lowercase combination of such as a prefix for keys as it is reserved for AWS use. You cannot edit or delete tag keys with this prefix. Values can have this prefix. If a tag value has aws as its prefix but the key does not, then Clean Rooms ML considers it to be a user tag and will count against the limit of 50 tags. Tags with only the key prefix of aws do not count against your tags per resource limit.
(string) --
(string) --
dict
Response Syntax
{
'audienceGenerationJobArn': 'string'
}
Response Structure
(dict) --
audienceGenerationJobArn (string) --
The Amazon Resource Name (ARN) of the audience generation job.