Using hybrid synthesis when voicing a script using audio recordings allows the robot to reproduce the value of a variable in the same voice that is heard in the original audio file. For details on what hybrid synthesis is, as well as limitations and recommendations for its use, see the article Hybrid Synthesis.
When recording an audio file, be aware that if the narrator speaks too fast or too slow in the recording, the synthesis will adjust and pronounce the variable values at the same speed, which may distort the variables.
Process the audio file using any audio editor such as Audacity:
2.1. Remove silence at the beginning and end of the recording. This will help avoid unwanted noise during the synthesis process.
2.2 Identify the fragment of the audio recording to be replaced by the variable value.
2.3 Determine the exact start time and duration of this fragment in milliseconds. To do this, select the necessary fragment and set the measurement mode Begin and length of selection. This data will be useful for further markup of the variable in the script editor.
If the inaccuracy is greater than ~50ms, distortion and pauses in synthesis may occur.
Go to the Bots section in personal account of the platform.
Open the required script in the script editor.
Click on starting block. A window with its settings will open on the right.
Select Audio mode - Audio Recordings and Voice Type - Hybrid Synthesis under Speech.
Create a variable whose value you want to synthesise. For example, the value of the variable can be obtained from the client's response. In this case, do the following:
5.1. Click on the arrow coming out of the corresponding Question block. A window with the arrow properties will open on the right.
5.2 Enter the desired variable name in the Variable name field under Save answer. The name is specified without curly brackets.
Click on the block in which you want to voice the variables. A window with its settings will open on the right.
Write the full transcription of the phrase spoken in this block in the Message field and replace the required word with the variable name in curly brackets. If the transcription is different from the audio recording, the robot will still attempt to voice the text, which may result in inaccurate synthesis. If a word in the transcription is omitted, it will not be voiced.
Switch on the Hybrid synthesis switch below in the block settings. The Record field will become active.
Click on the Record field and then on
.
Select a previously prepared audio file and press the Open button. The uploaded audio file will be attached to the unit.
Mark up the audio track in Variable management using the values obtained in step 3 in Prepare audio file. To do this:
11.1. Specify the start time of the synthesis of the variable value in the Start field. The value is specified relative to the beginning of the audio track in milliseconds.
11.2 Specify the duration of the variable synthesis in milliseconds in the Length field.
When marking up the audio track before and after the variable to be synthesised, a 50 ms indent should be made. For example, if in a phrase the synthesised variable can be started at 2500 ms and will be spoken for 1000 ms, then 2450 and 1050 should be entered in the Start and Length fields respectively.
If the text in the block is changed, these parameters will be reset and will need to be re-entered.
If both values are zero, an error will occur when the script is saved.
If the synthesis sounds unnatural, makes unnecessary pauses or, on the contrary, there are no pauses between natural speech and synthesis, try editing the variable markup in the Hybrid Synthesis - Variable Management section of the settings of the block where hybrid synthesis is used.
Shift the markup in steps of no more than 100ms, then save the script and listen to how the synthesis behaves.
You can make offsets of 10-20ms for fine tuning.