xai/speech-to-text/v1
xAI's Grok Speech-to-Text — fast, accurate transcription across 25 languages with speaker diarization, word-level timestamps, multichannel audio, and inverse text normalization.
Inference
Commercial use
Streaming
Partner
When the request is deemed to be in violation of xAI terms the generation of the request will still be charged...
Input
Hint: Drag and drop audio files from your computer, audio from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp3, ogg, wav, m4a, aac
Additional Settings
Customize your input with more control.
Streaming
Result
Idle
Waiting for your input...
What would you like to do next?
$0.001667 per minute. Duration is rounded up to the nearest full minute (e.g., 10s and 50s both count as 1 minute; 1m 1s counts as 2 minutes).