Active listening is a feature for voice bots that allows them to respond to the user's speech, making the dialogue more natural and involved. The bot uses synthesized phrases or pre-recorded voiceover phrases to respond to the user's speech, imitating the behavior of a live interlocutor.
The purpose of active listening is to make communication with a voice bot more human and natural. When the bot inserts confirmation phrases, the user feels that he is being listened to and understood. This increases satisfaction and trust in the system, as well as contributes to a smoother and more continuous dialogue.
Active listening is activated when the bot recognizes the beginning of the user's speech — this can be caused by a trigger word, command, or address. The system continues to actively listen during the dialogue, tracking pauses and changes in speech to maintain smooth interaction.
During the conversation, the bot inserts short phrases such as "uh-huh", "yes, I understand", "go on". These phrases can be synthesized or played back in a pre-recorded voice, depending on the settings. This helps the user understand that the bot continues to listen and follow the dialogue, creating a feeling of live communication. The mechanism of operation:
After the bot has issued the first confirmation, the timers come into operation:
Depending on the context of the dialogue and the goals of the script, you can change the frequency and content of phrases to create a more natural interaction. For example, in a formal scenario, you can use less frequent and more neutral phrases, and in an informal conversation, more lively and frequent confirmations.
The bot does not use confirmation phrases in case of complete silence. This prevents the feeling of unnatural behavior or that the bot is "stuck". Confirmation phrases always depend on the recognized speech or its intermediate results.
The functionality is available in the following script blocks:
For a more natural and high-quality dialogue between the voice bot and the user, it is recommended to use pre-recorded phrases rather than speech synthesis. This is due to the fact that:
- synthesized speech can be loud;
- Synthesis will not be able to reproduce interjections and other short emotional reactions, such as "uh-huh", which are important for maintaining a conversation.
As a result, synthesized responses may sound unnatural, so it is preferable to use pre-recorded phrases.
Active listening supports working with enabled interruptions, which makes the bot even more interactive and able to respond to changes in the dialogue in real time.