xai/speech-to-text/v1
xAI's Grok Speech-to-Text — fast, accurate transcription across 25 languages with speaker diarization, word-level timestamps, multichannel audio, and inverse text normalization.
Inference
Commercial use
Streaming
Partner
Input
Hint: Drag and drop audio files from your computer, audio from web pages, paste from clipboard (Ctrl/Cmd+V), or provide a URL. Accepted file types: mp3, ogg, wav, m4a, aac
Additional Settings
Customize your input with more control.
Streaming
Result
Idle
Waiting for your input...
What would you like to do next?
$0.001667 per minute. Duration is rounded up to the nearest full minute (e.g., 10s and 50s both count as 1 minute; 1m 1s counts as 2 minutes).