The timer parameters determine the decision to end the phrase (recognition session). In addition, the recognition system (ASR) can independently return a sign that the client has finished speaking. She does this with the help of a trained neural model and reacts to intonations, language constructions and other patterns that she found in the training files.
If the ASR returns the end-of-phrase flag, the rest of the timer parameters will be ignored. Even if the client finished one phrase and started speaking the second almost instantly and managed to continue his speech in all time parameters, the system will no longer listen to him. She received a sign of the end of the phrase and ended the recognition session, so she ignores all the parameters listed below.
The duration of the speech recognition session is determined by the following parameters:
| Level | sint (ms) | nit (ms) | t (ms) |
|---|---|---|---|
| Monosyllabic answer | 100 | 4000 | 7000 |
| Very very short | 300 | 2000 | 5000 |
| Very short | 300 | 3000 | 5000 |
| Short | 400 | 3000 | 5000 |
| Normal | 960 | 3000 | 7000 |
| Normal (5 sec.) | 960 | 5000 | 7000 |
| Normal (180 sec.) | 1200 | 3000 | 180000 |
| Long | 1200 | 4000 | 10000 |
| Very long | 3000 | 4000 | 15000 |
| Very long (180 sec.) | 3000 | 4000 | 180000 |
The time after which the bot responds is not equal to the duration of the recognition session.
The time to the bot's response is calculated using the following formula: receiving a response from ASR + time to make a decision. Other sounds after the main speech of the client (before silence) and a pause in the audio file of the bot before playback can increase the response time of the bot.
The duration of the recognition session is Short. After the silence and the end of the recognition session, it took 0.1 seconds. to receive a full response from ASR. It took another 0.3 seconds for the bot to continue. to make a decision. The audio file of the bot's response contained a pause before the start of speech lasting 0.1 seconds.
Thus, the total duration of the pause is: 0.3 + 0.1 + 0.3 + 0.1 = 0.8 sec.
When testing a script inside the editor in voice mode, a simplified version of ASR is used.
In the simplified version of ASR speech recognition may be worse than in the full version. Besides, the simplified version supports only Russian, Turkish, English and Ukrainian languages.