Case Study - Transcription and Translation

Jordan is starting to work with indigenous elders around the world.

Jordan had a meeting with elders from Sweden and South America, in English.

They then needed to communicate the information to elders in Spanish and Hindi.

Jordan asked Pete, who suggested using MacWhisper, using at least the Large model: https://goodsnooze.gumroad.com/l/macwhisper

Jordan downloaded the Large (V3) model.

It took about 7 minutes to transcribe an hour long meeting.

Find and Replace in Transcript

Jordan noticed that there were common transcription errors repeated throughout, such as the misspelling of certain names spoken with accents.

Need: Rapid find and replace to remedy transcription errors.

After the transcription was done in English, Jordan searched around and found a button near the top with two arrows, and the description "Find and Replace Strings in the Document". Trying this, it was exactly what was needed.

Transcript Text (English) to Google Docs

Jordan tried to use copy button, and paste into Google Docs - all the text ran together with no line breaks.

Jordan tried to export to text file. Same.

Jordan manually highlighted in MacWhisper and copied, and manually pasted in Google Docs - that worked.

Translation to Spanish

Several team members speak primarily Spanish. Jordan noticed a Translate button up top in MacWhisper.

Upon clicking button, MacWhisper uses DeepL.

Rather than using the API, Jordan decided to copy and paste into DeepL web. Only 1,500 or 46,000 characters translated.

Starter package for 5 translations a month on web is $9.

API Pro is $5.50 / month, plus $25 / million characters. Assuming 50k characters / meeting, that is $25 / 20 meetings, $1 / meeting / language. Amazing.

Riverside FM...

Jordan is already paying for Riverside FM for podcasting... so...

Jordan uploaded video to new Studio called Jordan's Meetings.

  • A few min to upload video file on fast mac with good internet.
  • A few min Riverside "processing"
  • Once done processing, clicked into the recording, and the transcription was already automatically occurring
  • A button suggested I "Generate Magic Audio" into a polished professional sound using the latest AI... so - I pushed it... and it started a spinning wheel around "High Quality"
  • Doing some research on Riverside Transcription, I found the following: Riverside is a high-quality remote recording platform. Although, beyond 4K video resolution it offers highly accurate AI transcriptions. Transcriptions are available in over 100 languages and you can download them instantly after recording. We use Open AI's transcription software so you can expect 99% accuracy.
  • Also noted new AI features such as: Full Episode: With one click AI will clean up your episode, add captions, and get it ready to share. Looks super cool.
  • Also Magic Clips - Powered by AI. Instantly get clips of the best moments from your recording.
  • Also - "Let AI write your show notes"
  • After ~20 minutes, transcription complete. Slower than MacWhisper.
  • Jordan looked ability to transcribe. Couldn't find it.
  • Reached out to Riverside FM help, got an agent on chat within 10 minutes.
  • Agent confirmed that feature doesn't currently exist.
  • Agent suggested downloading the SRT files from the bottom right of the user interface.
  • SRT files are the format of subtitles and captions for video editors.
  • So - back to scratch.

Descript

  • Jordan also pays for Descript as part of podcasting workflow
  • May no longer be required?
  • Descript recommends transcribing
  • Then translating using Google Translate (Free) or hiring someone.

Google Translate vs. Deepl

  • https://www.weglot.com/guides/deepl-vs-google-translate
  • Multiple different websites seem to regard Deepl as superior to Google Translate.
  • Plus, it is already integrated into MacWhisper as the sole option, and since MacWhisper seems pretty thoughtful, I am guessing they did a reasonable amount of research to make the choice.
  • So - for now my workflow will utilize MacWhisper and Deepl

Deepl

  • Deepl free is too limited
  • Starter is $9 / month, limited to 5 files a month, and billed annually
  • Deepl API is $5 / month plus usage, cancelable monthly, with no usage restrictions
  • Deepl API Free allows up to 500,000 characters... upgradable to Pro at anytime
  • So - I am going with MacWhisper's suggestion and using Deepl free to start, then upgrade to pro if required
  • DeepL API Free - requires credit card to verify identity
  • Obtained API Key from Account Page

Conditions of your plan

Your subscription to DeepL API Free is completely free.

You can translate up to 500,000 characters per month.

You can upgrade to DeepL API Pro at any time.

Features of your plan

Access to DeepL REST API

DeepL translation quality

Back to MacWhisper

  • Pasted API Key from Deepl "Account Page" into MacWhisper
  • It read out accurately that 500k or 500k characters remaining
  • Translated to Spanish in less than 10 seconds.

Things to Check Out / Notes From Pete and Jordan

  • When you export text - export to VTT or SRT
    • These preserve timecodes
    • Clumsy, but still text format so if you had to you could read it.
    • There are converter programs that will strip out time codes
  • VTT or SRT will be read by YouTube and things like that...
  • As well as video player called VLC - free high quality useful, and put closed captions on the screen.
  • Can we provide simple instructions so that someone in Mexico can take a video meeting in English, and sync up the Spanish translation with time codes or even closed captions.
    • If you export to VTT, save with same file name, different extension
    • Instructions:
      • Go to Google Drive link
      • Download the video file and your preferred transcript language
        • Or can you have all in a package?
  • What are rules on Google for when YouTube transcribes and when it doesn't
    • and how it translates, and into what languages?
    • Can we upload multiple transcripts in different langauges to Google
      • and then instruct people how to select?

Next Steps

  • Maybe Jordan and Pete will figure out the work flow from a Zoom or RiversideFM recording, to transcripts in multiple language, uploaded YouTube or other than enables viewers in different languages to display closed captions from the translated transcript in their own language

Spirit


Pages that link to this page