Voice Properties
Each Audio Description event exposes two voice-related properties that directly affect how the rendered audio sounds: Rate and Voice Style. Understanding how they interact allows you to fine-tune the delivery of each narration line so that it fits naturally within its timing window and matches the tone of the content being described.
Rate​
Rate controls the speaking speed of the synthetic voice for a given event, expressed as a percentage relative to the voice's default tempo. A value of 100% represents the voice's natural speaking pace. Increasing the rate above 100% makes the voice speak faster, while lowering it below 100% slows the delivery down.
Rate becomes particularly important when the rendered audio is longer than the available event window. After the initial render, if a red error indicator appears on an event it means the audio does not fit. Rather than always extending the out-time, you can try increasing the rate modestly — often a few percentage points is enough to bring a slightly overlong line into budget without making it sound rushed. Conversely, if a window is unusually long relative to the length of the description, reducing the rate slightly can improve naturalness and prevent the narration from trailing off into an awkward silence.
The rate slider is located within the event card in the Event List. It applies only to the specific event where it is set, allowing different events within the same group to have independent speaking speeds. After adjusting the rate, click the force-render button on the event to regenerate the audio at the new speed.
Automatic Rate​
For text-to-speech voices, each AD event also includes an automatic rate button shown with a speedometer icon. Use this when you want Closed Caption Creator to calculate a speaking rate that fits the rendered narration inside the event's current start and end time.
Automatic Rate is available when the event has a supported TTS voice, valid timing, and description text. It is not used for manual recordings. If the event has not been rendered yet, Closed Caption Creator renders it first, compares the audio duration with the event duration, adjusts the event's rate, and renders again. The process may retry up to two times and stops when the rendered audio is within about half a second of the event window or when the provider's rate limit is reached.
Provider limits are respected automatically. ElevenLabs voices are limited to 0.7x–1.2x, Deepgram voices are limited to 0.5x–1.5x for automatic rate, and other providers are limited to 0.5x–3x. The operation is recorded in the undo history so you can revert it if the result does not fit the creative intent.
Voice Style​
Voice Style is an expressive modifier available on supported Microsoft Azure Neural voices. Where a standard synthetic voice delivers text in a neutral, consistent tone, a voice style shapes how the narration is performed — options vary by voice but typically include styles such as cheerful, empathetic, newscast, or documentary. Not all Microsoft voices support every style, and Amazon and Google voices do not currently support styles.
In the Virtual Voice Manager, available styles are shown as tags beneath the voice name. Once you have assigned a Microsoft voice to an event, the style options for that voice appear as a dropdown menu within the event editor, positioned above the notes field. Selecting a style does not immediately change the stored audio — the event must be re-rendered before the style takes effect. The render button flashes green after a style change to indicate that a new render is needed.
Because voice style is stored per-event, different events within the same AD group can use different styles. This can be useful when a project includes both a straightforward narration track and moments where a warmer or more dramatic delivery is more appropriate.