Skip to content

Latest commit

 

History

History
58 lines (50 loc) · 2.91 KB

SpeechRecognitionResult.md

File metadata and controls

58 lines (50 loc) · 2.91 KB

SpeechRecognitionResult

A speech recognition result corresponding to a portion of the audio.

{
  "alternatives": [
    {
      object (SpeechRecognitionAlternative)
    }
  ],
  "channelTag": integer
}
Fields Description
alternatives[] object (SpeechRecognitionAlternative)
May contain one or more recognition hypotheses (up to the maximum specified in maxAlternatives). These alternatives are ordered in terms of accuracy, with the top (first) alternative being the most probable, as ranked by the recognizer.
channelTag integer
For multi-channel audio, this is the channel number corresponding to the recognized result for the audio from that channel. For audioChannelCount = N, its output values can range from '1' to 'N'.

SpeechRecognitionAlternative

Alternative hypotheses (a.k.a. n-best list).

{
  "transcript": string,
  "confidence": number,
  "words": [
    {
      object (WordInfo)
    }
  ]
}
Fields Description
transcript string
Transcript text representing the words that the user spoke.
confidence number
The confidence estimate between 0.0 and 1.0. A higher number indicates an estimated greater likelihood that the recognized words are correct. This field is set only for the top alternative of a non-streaming result or, of a streaming result where isFinal=true. This field is not guaranteed to be accurate and users should not rely on it to be always provided. The default of 0.0 is a sentinel value indicating confidence was not set.
words[] object (WordInfo)
A list of word-specific information for each recognized word. Note: When enableSpeakerDiarization is true, you will see all the words from the beginning of the audio.

WordInfo

Word-specific information for recognized words.

{
  "startTime": string,
  "endTime": string,
  "word": string,
  "speakerTag": integer
}
Fields Description
startTime string (Duration format)
Time offset relative to the beginning of the audio, and corresponding to the start of the spoken word. This field is only set if enableWordTimeOffsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.
endTime string (Duration format)
Time offset relative to the beginning of the audio, and corresponding to the end of the spoken word. This field is only set if enableWordTimeOffsets=true and only in the top hypothesis. This is an experimental feature and the accuracy of the time offset can vary.
word string
The word corresponding to this set of information.
speakerTag integer
Output only. A distinct integer value is assigned for every speaker within the audio. This field specifies which one of those speakers was detected to have spoken this word. Value ranges from '1' to diarizationSpeakerCount. speakerTag is set if enableSpeakerDiarization = 'true' and only in the top alternative.