March 9, 2025
O. Wolfson
Coqui TTS is an open-source text-to-speech (TTS) system that allows you to generate speech from text. You can also train it with your own voice to create a personalized TTS model. This guide covers:
Ensure you have the following installed:
sh/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
shbrew install ffmpeg
Using a virtual environment ensures that all packages are installed in an isolated directory, preventing conflicts with system-wide dependencies.
Create a virtual environment:
shpython -m venv coqui-venv
Activate the virtual environment:
shsource coqui-venv/bin/activate
shcoqui-venv\Scripts\activate
Once activated, your shell prompt may change, indicating you are inside the virtual environment.
shpip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Run the following command to check if Metal (MPS) is available:
pythonimport torch
print(torch.backends.mps.is_available())  # Should return True on M1/M2/M3
shpip install TTS
Check available models:
shtts --list_models
To exit the virtual environment, run:
shdeactivate
Ensure your virtual environment is activated before running:
shsource coqui-venv/bin/activate
Run a basic TTS model to check if everything works:
shtts --text "Hello, this is a test of Coqui TTS." --model_name tts_models/en/ljspeech/tacotron2-DDC
This will generate a speech WAV file using a built-in model.
You need:
text/my-dataset/ ├── wavs/ │ ├── audio_001.wav │ ├── audio_002.wav │ ├── ... ├── metadata.csv
textaudio_001.wav|Hello, this is my voice. audio_002.wav|I am training my own TTS model.
Run the training command:
shtts --train_config_path configs/your_config.json --dataset_path /my-dataset/
For fine-tuning an existing model:
shtts --train_config_path configs/your_config.json --dataset_path /my-dataset/ --restore_path path/to/pretrained/model.pth
For Apple Silicon Macs, enable GPU acceleration (MPS):
your_config.json and change:json"device": "mps"
shtts --train_config_path configs/your_config.json --dataset_path /my-dataset/
Ensure your virtual environment is activated before running:
shsource coqui-venv/bin/activate
Once the model is trained, you can generate speech files:
shtts --text "This is my custom trained voice." \
    --model_path path/to/your/trained_model.pth \
    --config_path path/to/config.json \
    --out_path output.wav
For batch processing multiple sentences, use Python:
pythonfrom TTS.api import TTS
# Load the trained model
tts = TTS("path/to/your/trained_model.pth")
# Generate speech and save to a file
tts.tts_to_file(text="Hello, this is my voice.", file_path="output.wav")
To play the audio on Mac:
shafplay output.wav
You can turn Coqui TTS into an API to generate speech via HTTP requests.
shpip install fastapi uvicorn
server.pypythonfrom fastapi import FastAPI
from TTS.api import TTS
app = FastAPI()
tts = TTS("path/to/your/trained_model.pth")
@app.get("/synthesize/")
async def synthesize(text: str):
    output_file = "output.wav"
    tts.tts_to_file(text=text, file_path=output_file)
    return {"message": "Speech generated", "file": output_file}
if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)
shpython server.py
shcurl "http://localhost:8000/synthesize/?text=Hello%20world"
This will generate a WAV file and return its path.
If you need cloud deployment, you can:
Coqui TTS allows you to train and run a text-to-speech model on a Mac, including custom voice training. Apple Silicon Macs can leverage MPS acceleration, but if training is too slow, cloud GPUs are an option.
With this setup, you can generate custom TTS audio files, deploy a local API, or even build your own AI voice assistant.