Automatic AD/DV
Automatic AD/DV is an AI-powered pipeline that generates a complete audio description script directly from your video. The system combines automated speech transcription, computer vision-based scene change detection, frame extraction, and a large language model (LLM) to produce a timed AD script that fits within the natural gaps in dialogue. The result is imported into your project as ready-to-edit Event Groups, which you can then assign voices to and render using the built-in Text-to-Speech engine.
This feature requires the desktop version of Closed Caption Creator and an active internet connection. Processing is cloud-based and is billed per minute of video content.
Getting Started​
Create a new project and import your video file. Then go to AI Tools > Automatic AD to open the job submission form.

Dialogue Source​
The first setting is the Dialogue Source — where the system gets the spoken dialogue it uses to identify gaps for your AD events.
| Option | Description |
|---|---|
| Automatic Transcription | The system transcribes the audio itself using your chosen provider |
| Existing Event Group | Use a caption or transcription Event Group already in your project |
If you already have an accurate caption file imported into your project, selecting it as the dialogue source avoids a redundant transcription step and reduces cost. If no caption file exists, select Automatic Transcription and the system will transcribe the audio as part of the job.
When Automatic Transcription is selected, additional settings appear for Transcription Provider and Language. Supported providers include Speechmatics, Deepgram, and Assembly AI. Your provider and language choices are saved automatically for future sessions.
LLM Provider​
Select the large language model service used to generate the AD script. Currently supported: Anthropic (Claude). The LLM is given the dialogue transcript, the list of detected shot changes, and the context you provide below to write scene descriptions that fit naturally between lines of dialogue.
Script Settings​
Minimum Event Duration​
The shortest AD event the system will generate, in seconds. Events shorter than this value are discarded as too brief to be spoken clearly. The default is 2 seconds.
Maximum Event Duration​
The longest AD event the system will generate, in seconds. This caps the length of any individual description. The default is 12 seconds. Adjusting this down produces shorter, more punchy descriptions; adjusting it up allows the system to write longer, more detailed passages where longer gaps exist in the dialogue.
Context Fields​
These fields tell the LLM what your content is about. More detailed context produces more accurate, tonally appropriate descriptions.
Content Description​
A one-to-two sentence summary of the programme. Include the genre, setting, and overall tone. For example: "A family drama set in rural Ireland in the 1980s. The tone is understated and melancholic."
Main Characters​
A comma-separated list of character names as they should appear in the script. The model uses these names when referring to on-screen people, improving consistency and reducing generic descriptions like "a man" or "a woman."
Additional Instructions​
Any further guidance for the LLM about tone, style, or specific requirements. For example: "Follow Netflix audio description guidelines. Describe actions neutrally without interpretation." This field can also be used to exclude certain types of content from description.
Your settings in all three context fields are saved to localStorage and pre-populated the next time you open the form.
Estimated Cost and Submission​
The estimated cost is calculated based on video duration. The minimum charge is 3 credits; above the minimum the cost is 1 credit per minute of video. The estimate is shown near the bottom of the form before you submit.
When you are ready, click Submit AD Job. A confirmation dialog displays the estimated cost; confirm to proceed.
After confirming, the application:
- Extracts audio from your video using FFmpeg
- (If using automatic transcription) Uploads the audio for transcription
- Detects scene changes
- Extracts keyframes at 1 fps and deduplicates near-identical frames
- Registers the job in Firestore and uploads all assets to cloud storage
- Hands off to the cloud pipeline, which runs the LLM and assembles the output CCPRJ file
A progress bar and status message show each step as it completes. When submission finishes, the AD Import Dashboard opens automatically.
Monitoring Jobs in the AD Import Dashboard​
The AD Import Dashboard is your live view of all submitted AD jobs. It refreshes automatically every 10 seconds. Each row shows the project name, submission date, status, progress percentage, and cost.
You can filter jobs by date range (last 24 hours, week, month, year, or custom) and by status (All, Submitted, Processing, In Progress, Passed, Failed).
Once your job reaches 100 % / Passed, select it in the table to activate the import controls.
You can also access the dashboard at any time via AI Tools > AD Import.
Importing Results​
With a completed job selected, click Import to download the output CCPRJ file from cloud storage and add it to your current project. Three new Event Groups are created:
| Event Group | Contents |
|---|---|
| Dialogue | The transcribed (or provided) dialogue, timed to the original audio |
| Shot Changes | One event per detected scene transition |
| AD Script | The generated audio description events, positioned in dialogue gaps |
The project's selected Event Group is automatically updated to the last imported group (the AD Script). Review the events, make any editorial changes, and correct any errors before proceeding to rendering.
Archiving and Deleting Jobs​
To keep your dashboard tidy, select a completed job and click Archive to hide it from the default view, or Delete to remove it permanently (including source files from cloud storage). A confirmation dialog is shown before permanent deletion.
You can also export a CSV summary of all visible jobs for reporting purposes using the Export CSV button.
Rendering the AD Script​
After importing and reviewing the AD Script Event Group, assign a voice to your events using the Voice Manager (AI Tools > Voice Manager). Then select AI Tools > Render All Audio to synthesise speech for every event in the group. Rendering is processed in batches and may take a few moments depending on the total number of events.
Once rendering is complete, review the audio in the player, adjust timing or wording as needed, and use the ADR Preview panel to check the final mix against the dialogue.