Show HN: MiraTTS, a 48kHz Open-Source TTS at 100x Real-Time Speed

I’ve been working on MiraTTS, a fine-tune of Spark-TTS designed for high realism and stable text-to-speech. The goal was to create an incredibly fast but high quality model.

Most open TTS models are either computationally heavy or generate 16-24kHz audio. Mira achieves high fidelity and speed by combining two things:

FlashSR: For generating crisp and clearer 48kHz audio outputs.

LMDeploy: Heavily optimized inference allowing for 100x real-time speed and low latency (roughly150ms).

I built this so local users have access to a high quality local text-to-speech model that works for any usecase. It’s currently in its early stages, and I'm currently experimenting with multilingual versions and multi-speaker versions. Streaming is coming soon as well.

Repo: https://github.com/ysharma3501/MiraTTS

Model: https://huggingface.co/YatharthS/MiraTTS

I also wrote a breakdown on how these LLM based TTS models work: https://huggingface.co/blog/YatharthS/llm-tts-models


Comments URL: https://news.ycombinator.com/item?id=46314749

Points: 2

# Comments: 0