Requesting Auto Captions

This topic explains how to request the generation of captions from audio tracks in your video using the Dynamic Ingest API.

Introduction

Brightcove Auto Captioning is a platform-level service that allows you to automatically generate captions for new or existing videos in 31 different languages (provided you have an audio track for the language specified). Like all speech-to-text services Auto Captioning is not 100% accurate, but it provides a quick and easy way to generate captions right in Video Cloud.

Video Cloud uses the following process to determine the source that will be used to generate the captions.

  • If the video has default audio track, that will be used as the captions source file (supported by default in the Media module)
  • If the video has no default audio track but a master/mezzanine file exists, that will be used as the source file (supported by default in the Media module)
  • If the video has no default audio track or master/mezzanine files, but audio tracks are specified in the Dynamic Ingest call, the specified audio track will be used (not yet supported in the Media module)
  • If the video has no default audio track, no master/mezzanine files, and no audio track is specified, captions cannot be generated

Setup

The setup for Dynamic Ingest requests is the same, whether you are ingesting a video, images, audio tracks, WebVTT files, requesting auto captions, or all of these:

Request URL
https://ingest.api.brightcove.com/v1/accounts/{{account_id}}/videos/{{video_id}}/ingest-requests
        
Authentication
Authentication requires an access token passed as a Bearer token in an Authorization header:
          Authorization: Bearer {access_token}
        

To get access tokens, you will need client credentials (see below). For the process of obtaining an access token, see Get Access Tokens.

Note on S3

If your source files will be pulled from a protected S3 bucket, you will need to set a bucket policy to allow Video Cloud to access the files. See Using Dynamic Ingest with S3 for details.

Getting Credentials

To get a client_id and client_secret, you will need to go to the OAuth UI and register this app:

These are the permissions you will need:

Dynamic Ingest Permissions
Dynamic Ingest Permissions

You can also get your credentials via CURL, Postman, or our online app - see:

If you are getting credentials directly from the API, these are the permissions you need:

[
          "video-cloud/video/all",
          "video-cloud/ingest-profiles/profile/read",
          "video-cloud/ingest-profiles/account/read",
          "video-cloud/upload-urls/read"
          ]

Use cases

Here are some typical use cases

Create auto-captions for new ingestions or retranscoding

Request body
{
  "master": {
    "use_archived_master": true
  },
  "profile": "multi-platform-standard-static-with-mp4",
  "transcriptions": [
    {
      "srclang": "EN-us",
      "kind": "captions",
      "label": "English",
      "status": "published",
      "default": true
    }
  ],
  "priority": "normal"
}

Create auto-captions when ingesting an audio track

Request body
{
  "audio_tracks": {
    "merge_with_existing": true,
    "masters": [
      {
        "language": "fr-FR",
        "variant": "alternate",
        "url": "https://support.brightcove.com/test-assets//audio/celtic_lullaby.m4a"
      }
    ]
  },
  "transcriptions": [
    {
      "srclang": "fr-FR",
      "kind": "captions",
      "label": "french-FR",
      "status": "published",
      "default": false,
      "input_audio_track": {
        "language": "fr-FR",
        "variant": "alternate"
      }
    }
  ]
}

Create auto-captions for an existing video using the digital master

Request body
{
  "transcriptions": [
    {
      "srclang": "fr-FR",
      "kind": "captions",
      "label": "french-FR",
      "default": false
    }
  ]
}

Create auto-captions for an existing video defining the audio tracks

Request body
{
  "transcriptions": [
    {
      "srclang": "en-US",
      "kind": "captions",
      "label": "english-EN",
      "default": false,
      "input_audio_track": {
        "language": "en-US",
        "variant": "main"
      }
    },
    {
      "srclang": "fr-FR",
      "kind": "captions",
      "label": "french-FR",
      "default": false,
      "input_audio_track": {
        "language": "fr-FR",
        "variant": "alternate"
      }
    }
  ]
}

Request body fields for auto captions

The table below shows the request body fields for auto captions.

Fields for Auto Captions and Transcripts
Field Type Required Description
autodetect boolean no true to auto-detect language from audio source. false to use srclang specifying the audio language.
default boolean no If true, srclang will be ignored, and the main audio track will be used - language will be auto-detected.
input_audio_track object no For multiple audio tracks, defines the audio to extract the captions from. It is composed by language and variant (both required).
kind string no The kind of output to generate. Allowed values:
  • captions
  • transcripts
Notes:
  1. If the kind is transcripts, and the url for the transcripts file is included, a transcript file will be ingested, and no auto captions will be generated. See Ingesting Transcriptions for more details.
  2. If the kind is transcripts, and the url for the transcripts file is not included, a transcript file and captions will be generated,
  3. If the kind is captions, a captions will be generated, but not a transcript file.
label string no Human readable label. Defaults to the BCP-47 style language code.
srclang string no BCP-47 style language code for the text tracks (en-US, fr-FR, es-ES, etc.); see supported languages
status string no Indicates the actual situation of the caption, if it is published, draft.
url string no The URL where a transcript file is located. Must be included in the kind is transcripts. Must not be included if the kind is captions.

input_audio_track fields

input_audio_track Fields
Field Type Required Description
language string yes BCP-47 style language code for the text tracks (en-US, fr-FR, es-ES, etc.); see supported languages
variant string yes Specifies the variant to use:
  • main
  • alternate
  • dub
  • commentary
  • descriptive

Supported languages

Currently, auto captions are limited to the following languages

  • Australian English (en-AU)
  • Afrikaans (af-ZA)
  • Brazilian Portuguese (pt-BR)
  • British English (en-GB)
  • Canadian French (fr-CA)
  • Danish (da-DK)
  • Dutch (nl-NL)
  • Farsi Persian (fa-IR)
  • French (fr-FR)
  • German (de-DE)
  • Gulf Arabic (ar-AE)
  • Hebrew (he-IL)
  • Indian English (en-IN)
  • Indian Hindi (hi-IN)
  • Indonesian (id-ID)
  • Irish English (en-IE)
  • Italian (it-IT)
  • Japanese (ja-JP)
  • Korean (ko-KR)
  • Malay (ms-MY)
  • Mandarin Chinese, Mainland (zh-CN)
  • Mandarin Chinese, Taiwan (zh-TW)
  • Modern Standard Arabic (ar-SA)
  • New Zealand English (en-NZ)
  • Portuguese (pt-PT)
  • Russian (ru-RU)
  • Scottish English (en-AB)
  • South African English (en-ZA)
  • Spanish (es-ES)
  • Swiss German (de-CH)
  • Tamil (ta-IN)
  • Telugu (te-IN)
  • Thai (th-TH)
  • Turkish (tr-TR)
  • US English (en-US)
  • US Spanish (es-US)
  • Welsh English (en-WL)