Voicing is the process of adding a voice to a bot. There are two types of voicing:
You can select the voicing type when creating or editing a script in the editor. To do this, select the audio type under Speech in the Starting block settings.
Speech synthesis is an ML-based speech technology used to automatically convert text into sound files that mimic human speech. When this type of voicing is selected, the text of phrases written in script blocks will be converted into speech.
For details on how to work with speech synthesis, see Speech Synthesis.
Using audio recordings for voicing includes preparing pre-recorded phrases and linking them to the appropriate blocks in the script. This type of voicing will help to give the client the feeling of a live conversation with a person.
Special equipment is not required for audio recording, because the quality of recordings will be highly compressed in the conditions of telephone communication. It is enough to have a headset with a built-in microphone and a phone or computer with audio recording software.
For smartphones: standard voice recorder.
For computers and laptops: GoldWave (Linux/MacOS/Windows), Sound Forge Pro (MacOS/Windows).
Online services for audio recording and editing: Online Voice Recorder, Voice Audio Recorder.
When choosing a recording program, make sure that it supports the format.wav, which is recognized by the script editor. The file parameters must meet the following requirements:
File format: WAV.
Channels: Mono.
Sampling rate: 16 kHz (16,000 Hz).
Bit depth: 16 bits.If the audio files have a different format, convert them to the desired format using an online converter.
Before recording, unload all the phrases from the script blocks and study them. Write in logical branches — this will make it easier for you to get into character.
Record audio in a quiet room, with no extraneous sounds or echoes.
To improve the quality of your recording and eliminate unwanted noise, use Enhance speech from Adobe.
Record all phrases in one approach. This will allow you to keep the uniformity of emotional colouring and dynamics in the voice. Allocate the necessary amount of time for this. If you are doing it for the first time, try to set aside at least three hours.
Record the audio standing or sitting in the correct posture so that your voice sounds clear and understandable. Before recording, you can repeat a few articulation exercises or cursive phrases.
When recording, hold the microphone at the front 15-20 centimetres away from your mouth, or at the side, as in the case of a headset. If you bring the microphone very close to your mouth, hissing sounds, sighs, etc. will cut into the audio.
Do not read out the text. This is always audible. Understand the essence of the phrase, and then just run your eyes over it while recording, allowing for small deviations. This will help you sound like a live operator.
Put clear emphasis on dates, names, times and other important points that the client needs to hear correctly. You can even syllable or duplicate what is being said. This will save you from having to add blocks of repeated phrases everywhere in the future for those who didn't hear it the first time.
Don't neglect emotion:
Gesticulate. This will help you place logical pauses and emphasis in your speech.
Smile or frown while recording, depending on the context of the phrase. This will give your voice the right pitch.
Adjust the speed of your speech. For example, the answer to a complex question should be spoken slowly, imitating the thought process. And the phrase ‘When can I call you back?’ is better to say quickly and clearly, so that the client has time to hear you before he hangs up.
Use improvised means in the recording: keyboard sounds, rustling of documents, etc. can create a feeling of real communication with the operator.