The models handle live reasoning, speech translation across 70+ languages, and real-time transcription via the Realtime API