Oliver Wolfson | Technical Product Builder

Coqui TTS is an open-source text-to-speech (TTS) system that allows you to generate speech from text. You can also train it with your own voice to create a personalized TTS model. This guide covers:

Installing Coqui TTS on a Mac (Apple Silicon and Intel)
Running pre-trained models
Training a custom voice model
Generating speech from text locally

1. Install Coqui TTS on a Mac

1.1 Prerequisites

Ensure you have the following installed:

Python 3.8+ (recommended 3.10)
Homebrew (for package management)
ffmpeg (for audio processing)
PyTorch with Metal support (for Apple Silicon GPUs)

Step 1: Install Homebrew (if not installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2: Install Dependencies

brew install ffmpeg

Step 3: Set Up a Virtual Environment

Using a virtual environment ensures that all packages are installed in an isolated directory, preventing conflicts with system-wide dependencies.

Create a virtual environment:
```
python -m venv coqui-venv
```
Activate the virtual environment:
- On macOS/Linux:
```
source coqui-venv/bin/activate
```
- On Windows: (if applicable)
```
coqui-venv\Scripts\activate
```

Once activated, your shell prompt may change, indicating you are inside the virtual environment.

Step 4: Install PyTorch (for Apple Silicon Macs)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Step 5: Verify GPU Support (Apple Silicon only)

Run the following command to check if Metal (MPS) is available:

import torch
print(torch.backends.mps.is_available())  # Should return True on M1/M2/M3

Step 6: Install Coqui TTS

pip install TTS

Step 7: Verify Installation

Check available models:

tts --list_models

To exit the virtual environment, run:

deactivate

2. Running a Pre-Trained Model (Quick Test)

Ensure your virtual environment is activated before running:

source coqui-venv/bin/activate

Run a basic TTS model to check if everything works:

tts --text "Hello, this is a test of Coqui TTS." --model_name tts_models/en/ljspeech/tacotron2-DDC

This will generate a speech WAV file using a built-in model.

3. Training Coqui TTS with Your Own Voice

3.1 Prepare Your Voice Dataset

You need:

1–5 hours of high-quality recordings (WAV format, preferably 22kHz or 44kHz).
A transcript (CSV or JSON) matching the speech.

Dataset Folder Structure

/my-dataset/
├── wavs/
│   ├── audio_001.wav
│   ├── audio_002.wav
│   ├── ...
├── metadata.csv

Example metadata.csv format

audio_001.wav|Hello, this is my voice.
audio_002.wav|I am training my own TTS model.

3.2 Train the Model

Run the training command:

tts --train_config_path configs/your_config.json --dataset_path /my-dataset/

For fine-tuning an existing model:

tts --train_config_path configs/your_config.json --dataset_path /my-dataset/ --restore_path path/to/pretrained/model.pth

For Apple Silicon Macs, enable GPU acceleration (MPS):

Open your_config.json and change:

"device": "mps"

Start training:

tts --train_config_path configs/your_config.json --dataset_path /my-dataset/

4. Generating Speech from a Trained Model

Ensure your virtual environment is activated before running:

source coqui-venv/bin/activate

Once the model is trained, you can generate speech files:

tts --text "This is my custom trained voice." \
    --model_path path/to/your/trained_model.pth \
    --config_path path/to/config.json \
    --out_path output.wav

For batch processing multiple sentences, use Python:

from TTS.api import TTS

# Load the trained model
tts = TTS("path/to/your/trained_model.pth")

# Generate speech and save to a file
tts.tts_to_file(text="Hello, this is my voice.", file_path="output.wav")

To play the audio on Mac:

afplay output.wav

5. Deploying a Local TTS API

You can turn Coqui TTS into an API to generate speech via HTTP requests.

Step 1: Install FastAPI & Uvicorn

pip install fastapi uvicorn

Step 2: Create `server.py`

from fastapi import FastAPI
from TTS.api import TTS

app = FastAPI()
tts = TTS("path/to/your/trained_model.pth")

@app.get("/synthesize/")
async def synthesize(text: str):
    output_file = "output.wav"
    tts.tts_to_file(text=text, file_path=output_file)
    return {"message": "Speech generated", "file": output_file}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Step 3: Run the API

python server.py

Step 4: Test the API

curl "http://localhost:8000/synthesize/?text=Hello%20world"

This will generate a WAV file and return its path.

6. Deploying to the Cloud

If you need cloud deployment, you can:

Use Google Colab for training (free GPU access)
Deploy on RunPod.io / Lambda Labs for cheap GPU rentals
Use AWS / GCP for production-grade hosting
Host a web app using Hugging Face Spaces

Conclusion

Coqui TTS allows you to train and run a text-to-speech model on a Mac, including custom voice training. Apple Silicon Macs can leverage MPS acceleration, but if training is too slow, cloud GPUs are an option.

With this setup, you can generate custom TTS audio files, deploy a local API, or even build your own AI voice assistant.

Installing Coqui TTS on a Mac (Apple Silicon and Intel)
Running pre-trained models
Training a custom voice model
Generating speech from text locally

1. Install Coqui TTS on a Mac

1.1 Prerequisites

Ensure you have the following installed:

Python 3.8+ (recommended 3.10)
Homebrew (for package management)
ffmpeg (for audio processing)
PyTorch with Metal support (for Apple Silicon GPUs)

Step 1: Install Homebrew (if not installed)

/bin/bash -c "$(curl -fsSL https://raw.githubusercontent.com/Homebrew/install/HEAD/install.sh)"

Step 2: Install Dependencies

brew install ffmpeg

Step 3: Set Up a Virtual Environment

Using a virtual environment ensures that all packages are installed in an isolated directory, preventing conflicts with system-wide dependencies.

Create a virtual environment:
```
python -m venv coqui-venv
```
Activate the virtual environment:
- On macOS/Linux:
```
source coqui-venv/bin/activate
```
- On Windows: (if applicable)
```
coqui-venv\Scripts\activate
```

Once activated, your shell prompt may change, indicating you are inside the virtual environment.

Step 4: Install PyTorch (for Apple Silicon Macs)

pip install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cpu

Step 5: Verify GPU Support (Apple Silicon only)

Run the following command to check if Metal (MPS) is available:

import torch
print(torch.backends.mps.is_available())  # Should return True on M1/M2/M3

Step 6: Install Coqui TTS

pip install TTS

Step 7: Verify Installation

Check available models:

tts --list_models

To exit the virtual environment, run:

deactivate

2. Running a Pre-Trained Model (Quick Test)

Ensure your virtual environment is activated before running:

source coqui-venv/bin/activate

Run a basic TTS model to check if everything works:

tts --text "Hello, this is a test of Coqui TTS." --model_name tts_models/en/ljspeech/tacotron2-DDC

This will generate a speech WAV file using a built-in model.

3. Training Coqui TTS with Your Own Voice

3.1 Prepare Your Voice Dataset

You need:

1–5 hours of high-quality recordings (WAV format, preferably 22kHz or 44kHz).
A transcript (CSV or JSON) matching the speech.

Dataset Folder Structure

/my-dataset/
├── wavs/
│   ├── audio_001.wav
│   ├── audio_002.wav
│   ├── ...
├── metadata.csv

Example metadata.csv format

audio_001.wav|Hello, this is my voice.
audio_002.wav|I am training my own TTS model.

3.2 Train the Model

Run the training command:

tts --train_config_path configs/your_config.json --dataset_path /my-dataset/

For fine-tuning an existing model:

tts --train_config_path configs/your_config.json --dataset_path /my-dataset/ --restore_path path/to/pretrained/model.pth

For Apple Silicon Macs, enable GPU acceleration (MPS):

Open your_config.json and change:

"device": "mps"

Start training:

tts --train_config_path configs/your_config.json --dataset_path /my-dataset/

4. Generating Speech from a Trained Model

Ensure your virtual environment is activated before running:

source coqui-venv/bin/activate

Once the model is trained, you can generate speech files:

tts --text "This is my custom trained voice." \
    --model_path path/to/your/trained_model.pth \
    --config_path path/to/config.json \
    --out_path output.wav

For batch processing multiple sentences, use Python:

from TTS.api import TTS

# Load the trained model
tts = TTS("path/to/your/trained_model.pth")

# Generate speech and save to a file
tts.tts_to_file(text="Hello, this is my voice.", file_path="output.wav")

To play the audio on Mac:

afplay output.wav

5. Deploying a Local TTS API

You can turn Coqui TTS into an API to generate speech via HTTP requests.

Step 1: Install FastAPI & Uvicorn

pip install fastapi uvicorn

Step 2: Create `server.py`

from fastapi import FastAPI
from TTS.api import TTS

app = FastAPI()
tts = TTS("path/to/your/trained_model.pth")

@app.get("/synthesize/")
async def synthesize(text: str):
    output_file = "output.wav"
    tts.tts_to_file(text=text, file_path=output_file)
    return {"message": "Speech generated", "file": output_file}

if __name__ == "__main__":
    import uvicorn
    uvicorn.run(app, host="0.0.0.0", port=8000)

Step 3: Run the API

python server.py

Step 4: Test the API

curl "http://localhost:8000/synthesize/?text=Hello%20world"

This will generate a WAV file and return its path.

6. Deploying to the Cloud

If you need cloud deployment, you can:

Use Google Colab for training (free GPU access)
Deploy on RunPod.io / Lambda Labs for cheap GPU rentals
Use AWS / GCP for production-grade hosting
Host a web app using Hugging Face Spaces

Conclusion

With this setup, you can generate custom TTS audio files, deploy a local API, or even build your own AI voice assistant.

1. Install Coqui TTS on a Mac

1.1 Prerequisites

Step 1: Install Homebrew (if not installed)

Step 2: Install Dependencies

Step 3: Set Up a Virtual Environment

Step 4: Install PyTorch (for Apple Silicon Macs)

Step 5: Verify GPU Support (Apple Silicon only)

Step 6: Install Coqui TTS

Step 7: Verify Installation

2. Running a Pre-Trained Model (Quick Test)

3. Training Coqui TTS with Your Own Voice

3.1 Prepare Your Voice Dataset

Dataset Folder Structure

Example metadata.csv format

3.2 Train the Model

4. Generating Speech from a Trained Model

5. Deploying a Local TTS API

Step 1: Install FastAPI & Uvicorn

Step 2: Create server.py

Step 3: Run the API

Step 4: Test the API

6. Deploying to the Cloud

Conclusion

1. Install Coqui TTS on a Mac

1.1 Prerequisites

Step 1: Install Homebrew (if not installed)

Step 2: Install Dependencies

Step 3: Set Up a Virtual Environment

Step 4: Install PyTorch (for Apple Silicon Macs)

Step 5: Verify GPU Support (Apple Silicon only)

Step 6: Install Coqui TTS

Step 7: Verify Installation

2. Running a Pre-Trained Model (Quick Test)

3. Training Coqui TTS with Your Own Voice

3.1 Prepare Your Voice Dataset

Dataset Folder Structure

Example metadata.csv format

3.2 Train the Model

4. Generating Speech from a Trained Model

5. Deploying a Local TTS API

Step 1: Install FastAPI & Uvicorn

Step 2: Create server.py

Step 3: Run the API

Step 4: Test the API

6. Deploying to the Cloud

Conclusion

Step 2: Create `server.py`

Step 2: Create `server.py`