Amazon Voice ID

2022/08/29 - Amazon Voice ID - 1 updated api methods

Changes  Amazon Connect Voice ID now detects voice spoofing. When a prospective fraudster tries to spoof caller audio using audio playback or synthesized speech, Voice ID will return a risk score and outcome to indicate the how likely it is that the voice is spoofed.

EvaluateSession (updated) Link ¶
Changes (response)
{'FraudDetectionResult': {'Reasons': {'VOICE_SPOOFING'},
                          'RiskDetails': {'VoiceSpoofingRisk': {'RiskScore': 'integer'}}}}

Evaluates a specified session based on audio data accumulated during a streaming Amazon Connect Voice ID call.

See also: AWS API Documentation

Request Syntax

client.evaluate_session(
    DomainId='string',
    SessionNameOrId='string'
)
type DomainId:

string

param DomainId:

[REQUIRED]

The identifier of the domain where the session started.

type SessionNameOrId:

string

param SessionNameOrId:

[REQUIRED]

The session identifier, or name of the session, that you want to evaluate. In Voice ID integration, this is the Contact-Id.

rtype:

dict

returns:

Response Syntax

{
    'AuthenticationResult': {
        'AudioAggregationEndedAt': datetime(2015, 1, 1),
        'AudioAggregationStartedAt': datetime(2015, 1, 1),
        'AuthenticationResultId': 'string',
        'Configuration': {
            'AcceptanceThreshold': 123
        },
        'CustomerSpeakerId': 'string',
        'Decision': 'ACCEPT'|'REJECT'|'NOT_ENOUGH_SPEECH'|'SPEAKER_NOT_ENROLLED'|'SPEAKER_OPTED_OUT'|'SPEAKER_ID_NOT_PROVIDED'|'SPEAKER_EXPIRED',
        'GeneratedSpeakerId': 'string',
        'Score': 123
    },
    'DomainId': 'string',
    'FraudDetectionResult': {
        'AudioAggregationEndedAt': datetime(2015, 1, 1),
        'AudioAggregationStartedAt': datetime(2015, 1, 1),
        'Configuration': {
            'RiskThreshold': 123
        },
        'Decision': 'HIGH_RISK'|'LOW_RISK'|'NOT_ENOUGH_SPEECH',
        'FraudDetectionResultId': 'string',
        'Reasons': [
            'KNOWN_FRAUDSTER'|'VOICE_SPOOFING',
        ],
        'RiskDetails': {
            'KnownFraudsterRisk': {
                'GeneratedFraudsterId': 'string',
                'RiskScore': 123
            },
            'VoiceSpoofingRisk': {
                'RiskScore': 123
            }
        }
    },
    'SessionId': 'string',
    'SessionName': 'string',
    'StreamingStatus': 'PENDING_CONFIGURATION'|'ONGOING'|'ENDED'
}

Response Structure

  • (dict) --

    • AuthenticationResult (dict) --

      Details resulting from the authentication process, such as authentication decision and authentication score.

      • AudioAggregationEndedAt (datetime) --

        A timestamp indicating when audio aggregation ended for this authentication result.

      • AudioAggregationStartedAt (datetime) --

        A timestamp indicating when audio aggregation started for this authentication result.

      • AuthenticationResultId (string) --

        The unique identifier for this authentication result. Because there can be multiple authentications for a given session, this field helps to identify if the returned result is from a previous streaming activity or a new result. Note that in absence of any new streaming activity, AcceptanceThreshold changes, or SpeakerId changes, Voice ID always returns cached Authentication Result for this API.

      • Configuration (dict) --

        The AuthenticationConfiguration used to generate this authentication result.

        • AcceptanceThreshold (integer) --

          The minimum threshold needed to successfully authenticate a speaker.

      • CustomerSpeakerId (string) --

        The client-provided identifier for the speaker whose authentication result is produced. Only present if a SpeakerId is provided for the session.

      • Decision (string) --

        The authentication decision produced by Voice ID, processed against the current session state and streamed audio of the speaker.

      • GeneratedSpeakerId (string) --

        The service-generated identifier for the speaker whose authentication result is produced.

      • Score (integer) --

        The authentication score for the speaker whose authentication result is produced. This value is only present if the authentication decision is either ACCEPT or REJECT.

    • DomainId (string) --

      The identifier of the domain containing the session.

    • FraudDetectionResult (dict) --

      Details resulting from the fraud detection process, such as fraud detection decision and risk score.

      • AudioAggregationEndedAt (datetime) --

        A timestamp indicating when audio aggregation ended for this fraud detection result.

      • AudioAggregationStartedAt (datetime) --

        A timestamp indicating when audio aggregation started for this fraud detection result.

      • Configuration (dict) --

        The FraudDetectionConfiguration used to generate this fraud detection result.

        • RiskThreshold (integer) --

          Threshold value for determining whether the speaker is a fraudster. If the detected risk score calculated by Voice ID is higher than the threshold, the speaker is considered a fraudster.

      • Decision (string) --

        The fraud detection decision produced by Voice ID, processed against the current session state and streamed audio of the speaker.

      • FraudDetectionResultId (string) --

        The unique identifier for this fraud detection result. Given there can be multiple fraud detections for a given session, this field helps in identifying if the returned result is from previous streaming activity or a new result. Note that in the absence of any new streaming activity or risk threshold changes, Voice ID always returns cached Fraud Detection result for this API.

      • Reasons (list) --

        The reason speaker was flagged by the fraud detection system. This is only be populated if fraud detection Decision is HIGH_RISK, and the following possible values: KNOWN_FRAUDSTER and VOICE_SPOOFING.

        • (string) --

      • RiskDetails (dict) --

        Details about each risk analyzed for this speaker. Currently, this contains KnownFraudsterRisk and VoiceSpoofingRisk details.

        • KnownFraudsterRisk (dict) --

          The details resulting from 'Known Fraudster Risk' analysis of the speaker.

          • GeneratedFraudsterId (string) --

            The identifier of the fraudster that is the closest match to the speaker. If there are no fraudsters registered in a given domain, or if there are no fraudsters with a non-zero RiskScore, this value is null.

          • RiskScore (integer) --

            The score indicating the likelihood the speaker is a known fraudster.

        • VoiceSpoofingRisk (dict) --

          The details resulting from 'Voice Spoofing Risk' analysis of the speaker.

          • RiskScore (integer) --

            The score indicating the likelihood of speaker’s voice being spoofed.

    • SessionId (string) --

      The service-generated identifier of the session.

    • SessionName (string) --

      The client-provided name of the session.

    • StreamingStatus (string) --

      The current status of audio streaming for this session. This field is useful to infer next steps when the Authentication or Fraud Detection results are empty or the decision is NOT_ENOUGH_SPEECH. In this situation, if the StreamingStatus is ONGOING/PENDING_CONFIGURATION, it can mean that the client should call the API again later, after Voice ID has enough audio to produce a result. If the decision remains NOT_ENOUGH_SPEECH even after StreamingStatus is ENDED, it means that the previously streamed session did not have enough speech to perform evaluation, and a new streaming session is needed to try again.