2025-03-09 Programming, Technology, Productivity
How to Download, Train, and Run Coqui TTS on a Mac (Text-to-Speech with Custom Voice)
By O. Wolfson
Coqui TTS is an open-source text-to-speech (TTS) system that allows you to generate speech from text. You can also train it with your own voice to create a personalized TTS model. This guide covers:
- Installing Coqui TTS on a Mac (Apple Silicon and Intel)
- Running pre-trained models
- Training a custom voice model
- Generating speech from text locally
1. Install Coqui TTS on a Mac
1.1 Prerequisites
Ensure you have the following installed:
- Python 3.8+ (recommended 3.10)
- Homebrew (for package management)
- ffmpeg (for audio processing)
- PyTorch with Metal support (for Apple Silicon GPUs)
Step 1: Install Homebrew (if not installed)
sh/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"
Step 2: Install Dependencies
shbrew install ffmpeg
Step 3: Set Up a Virtual Environment
Using a virtual environment ensures that all packages are installed in an isolated directory, preventing conflicts with system-wide dependencies.
-
Create a virtual environment:
shpython -m venv coqui-venv
-
Activate the virtual environment:
- On macOS/Linux:
sh
source coqui-venv/bin/activate
- On Windows: (if applicable)
sh
coqui-venv\Scripts\activate
- On macOS/Linux:
Once activated, your shell prompt may change, indicating you are inside the virtual environment.
Step 4: Install PyTorch (for Apple Silicon Macs)
shpip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu
Step 5: Verify GPU Support (Apple Silicon only)
Run the following command to check if Metal (MPS) is available:
pythonimport torch
print(torch.backends.mps.is_available()) # Should return True on M1/M2/M3
Step 6: Install Coqui TTS
shpip install TTS
Step 7: Verify Installation
Check available models:
shtts --list_models
To exit the virtual environment, run:
shdeactivate
2. Running a Pre-Trained Model (Quick Test)
Ensure your virtual environment is activated before running:
shsource coqui-venv/bin/activate
Run a basic TTS model to check if everything works:
shtts --text "Hello, this is a test of Coqui TTS." --model_name tts_models/en/ljspeech/tacotron2-DDC
This will generate a speech WAV file using a built-in model.
3. Training Coqui TTS with Your Own Voice
3.1 Prepare Your Voice Dataset
You need:
- 1–5 hours of high-quality recordings (WAV format, preferably 22kHz or 44kHz).
- A transcript (CSV or JSON) matching the speech.
Dataset Folder Structure
text/my-dataset/ ├── wavs/ │ ├── audio_001.wav │ ├── audio_002.wav │ ├── ... ├── metadata.csv
Example metadata.csv format
textaudio_001.wav|Hello, this is my voice. audio_002.wav|I am training my own TTS model.
3.2 Train the Model
Run the training command:
shtts --train_config_path configs/your_config.json --dataset_path /my-dataset/
For fine-tuning an existing model:
shtts --train_config_path configs/your_config.json --dataset_path /my-dataset/ --restore_path path/to/pretrained/model.pth
For Apple Silicon Macs, enable GPU acceleration (MPS):
- Open
your_config.json
and change:
json"device": "mps"
- Start training:
shtts --train_config_path configs/your_config.json --dataset_path /my-dataset/
4. Generating Speech from a Trained Model
Ensure your virtual environment is activated before running:
shsource coqui-venv/bin/activate
Once the model is trained, you can generate speech files:
shtts --text "This is my custom trained voice." \
--model_path path/to/your/trained_model.pth \
--config_path path/to/config.json \
--out_path output.wav
For batch processing multiple sentences, use Python:
pythonfrom TTS.api import TTS
# Load the trained model
tts = TTS("path/to/your/trained_model.pth")
# Generate speech and save to a file
tts.tts_to_file(text="Hello, this is my voice.", file_path="output.wav")
To play the audio on Mac:
shafplay output.wav
5. Deploying a Local TTS API
You can turn Coqui TTS into an API to generate speech via HTTP requests.
Step 1: Install FastAPI & Uvicorn
shpip install fastapi uvicorn
Step 2: Create server.py
pythonfrom fastapi import FastAPI
from TTS.api import TTS
app = FastAPI()
tts = TTS("path/to/your/trained_model.pth")
@app.get("/synthesize/")
async def synthesize(text: str):
output_file = "output.wav"
tts.tts_to_file(text=text, file_path=output_file)
return {"message": "Speech generated", "file": output_file}
if __name__ == "__main__":
import uvicorn
uvicorn.run(app, host="0.0.0.0", port=8000)
Step 3: Run the API
shpython server.py
Step 4: Test the API
shcurl "http://localhost:8000/synthesize/?text=Hello%20world"
This will generate a WAV file and return its path.
6. Deploying to the Cloud
If you need cloud deployment, you can:
- Use Google Colab for training (free GPU access)
- Deploy on RunPod.io / Lambda Labs for cheap GPU rentals
- Use AWS / GCP for production-grade hosting
- Host a web app using Hugging Face Spaces
Conclusion
Coqui TTS allows you to train and run a text-to-speech model on a Mac, including custom voice training. Apple Silicon Macs can leverage MPS acceleration, but if training is too slow, cloud GPUs are an option.
With this setup, you can generate custom TTS audio files, deploy a local API, or even build your own AI voice assistant.