Today we release our first self-hosted Auphonic Speech Recognition Engine using the open-source Whisper model by OpenAI!
With Whisper, you can now integrate automatic speech recognition in 99 languages into your Auphonic audio post-production workflow, without creating an external account and without extra costs!

Whisper Speech Recognition in Auphonic

So far, Auphonic users had to choose one of our integrated external service providers (Wit.ai, Google Cloud Speech, Amazon Transcribe, Speechmatics) for speech recognition, so audio files were transferred to an external server, using external computing powers, that users had to pay for in their external accounts.

The new Auphonic Speech Recognition is using Whisper, which was published by OpenAI as an open-source project. Open-source means, the publicly shared GitHub repository contains a complete Whisper package including source code, examples, and research results.
However, automatic speech recognition is a very time and hardware-consuming process, that can be incredibly slow using a standard home computer without special GPUs. So we decided to integrate this service and offer you automatic speech recognition (ASR) by Whisper processed on our own hardware, just like any other Auphonic processing task, giving you quite some benefits:

  • No external account is needed anymore to run ASR in Auphonic.
  • Your data doesn't leave our Auphonic servers for ASR processing.
  • No extra costs for external ASR services.
  • Additional Auphonic pre- and post-processing for more accurate ASR, especially for Multitrack Productions.
  • The quality of Whisper ASR is absolutely comparable to the “best” services in our comparison table.

How to use Whisper?

To use the Auphonic Whisper integration, you just have to create a production or preset as you are used to and select “Auphonic Whisper ASR” as “Service” in the section Speech Recognition.
This option will automatically appear for Beta and paying users. If you are a free user but want to try Whisper: please just ask for access!

When your Auphonic speech recognition is done, you can download your transcript in different formats and may edit or share your transcript with the Auphonic Transcript Editor.
For more details about all our integrated speech recognition services, please visit our Speech Recognition Help and watch this channel for Whisper updates – soon to come.

Why Beta?

We decided to launch Whisper for Beta and paying users only, as Whisper was just published end of September and there was not enough time to test every single use case sufficiently.
Another issue is the required computing power: for suitable scaling of the GPU infrastructure, we need a beta phase to test the service while we are monitoring the hardware usage, to make sure there are no server overloads.

Conclusion

Automatic speech recognition services are evolving very quickly, and we've seen major improvements over the past few years.
With Whisper, we can now perform speech recognition without extra costs on our own GPU hardware, no external services are required anymore.

Auphonic Whisper ASR is available for Beta and paying users now, free users can ask for Beta access.
You are very welcome to send us feedback (directly in the production interface or via email), whether you notice something that works particularly well or discover any problems.
Your feedback is a great help to improve the system!




Recent entries