Whisper

Additional information

Licence	Commercial
Model	Audio
Unit/Currency	Tokens

OpenAI

Whisper is an automatic speech recognition (ASR) system capable of transcribing in multiple languages as well as translating them into English. With Whisper, you can easily transcribe speech into text, allowing you to capture conversations and meetings for future reference. And if you need to communicate with someone who speaks a different language, Whisper can help with that too — it can translate many different languages into English, making it easier than ever to bridge the gap and ensure that everyone is on the same page.

Whisper is a general-purpose speech recognition model. It is trained on a large dataset of diverse audio and is also a multitasking model that can perform multilingual speech recognition, speech translation, and language identification. The speech to text API has two endpoints (transcriptions and translations) and file uploads are currently limited to 25 MB, and the following input file types are supported: mp3, mp4, mpeg, mpga, m4a, wav, and webm.

0.006

Licence	Commercial
Model	Audio
Unit/Currency	Tokens

Find Out More

Additional information

Whisper

Further Reading

Useful links

Additional information

Whisper

Related products

text-embedding-ada-002

Babbage Instruct model

GPT-4 32K context