2024-09-30 Web Development
How to Convert Text to Speech Using Google Cloud Text-to-Speech API
By O Wolfson
Google Cloud Text-to-Speech API allows developers to synthesize natural-sounding speech from text. This guide will walk you through the process of setting up the API, obtaining the necessary credentials, and writing a Node.js script to convert text to speech.
Step 1: Set Up a Google Cloud Project
-
Create a Google Cloud Project:
- Go to the Google Cloud Console.
- Click on the project dropdown at the top of the page and select "New Project."
- Enter a name for your project and click "Create."
-
Enable the Text-to-Speech API:
- Once your project is created, navigate to the Text-to-Speech API page.
- Click "Enable" to enable the API for your project.
Step 2: Set Up Service Account Credentials
-
Create a Service Account:
- In the Google Cloud Console, go to the Service Accounts page.
- Click "Create Service Account."
- Enter a name and description for your service account, then click "Create."
-
Grant the Service Account Access:
- On the next screen, select the "Text-to-Speech API User" role from the dropdown.
- Click "Continue" and then "Done."
-
Create a Key for the Service Account:
- Click on the newly created service account to open its details.
- Go to the "Keys" tab and click "Add Key" -> "Create New Key."
- Choose the JSON key type and click "Create."
- Save the JSON file to a secure location on your computer.
Step 3: Write the Node.js Script
Install the required Node.js packages:
bashnpm install @google-cloud/text-to-speech
Create a script (synthesize.js
) with the following content:
javascriptconst textToSpeech = require("@google-cloud/text-to-speech");
const fs = require("node:fs");
const util = require("node:util");
// Initialize the Text-to-Speech client with the service account key file
const client = new textToSpeech.TextToSpeechClient({
keyFilename: "./tts-key.json",
});
// Function to synthesize speech from text and save it to an MP3 file
async function synthesizeSpeech(text, outputFile) {
// Define the request payload
const request = {
input: { text: text },
voice: {
languageCode: "en-US",
name: "en-US-Neural2-D",
},
audioConfig: { audioEncoding: "MP3" },
};
// Make the API request to synthesize speech
const [response] = await client.synthesizeSpeech(request);
// Write the audio content to a file
const writeFile = util.promisify(fs.writeFile);
await writeFile(outputFile, response.audioContent, "binary");
console.log(`Audio content written to file: ${outputFile}`);
}
// Sample text to convert to speech
const text = `This is a generic sentence intended for testing text-to-speech.`;
// Output file path
const outputFile = "output.mp3";
// Call the function to synthesize speech
synthesizeSpeech(text, outputFile);
In this script:
- We initialize the Text-to-Speech client using the service account key file.
- We define a function
synthesizeSpeech
that takes text and an output file path as arguments. - The function makes a request to the Text-to-Speech API to synthesize speech and saves the audio content to an MP3 file.
Step 4: Run the Script
To run the script, execute the following command in your terminal:
bashnode synthesize.js
If everything is set up correctly, you should see the message "Audio content written to file: output.mp3" and an MP3 file will be generated with the synthesized speech.
Voice and Language Options
The Google Cloud Text-to-Speech API provides a variety of voices and languages to choose from. Here are some of the available options:
English (United States) Neural2 Voices
en-US-Neural2-A
(Female)en-US-Neural2-B
(Male)en-US-Neural2-C
(Female)en-US-Neural2-D
(Male)en-US-Neural2-E
(Female)en-US-Neural2-F
(Male)en-US-Neural2-G
(Female)en-US-Neural2-H
(Male)en-US-Neural2-I
(Female)en-US-Neural2-J
(Male)
English (United States) WaveNet Voices
en-US-Wavenet-A
(Female)en-US-Wavenet-B
(Male)en-US-Wavenet-C
(Female)en-US-Wavenet-D
(Male)en-US-Wavenet-E
(Male)en-US-Wavenet-F
(Female)en-US-Wavenet-G
(Male)en-US-Wavenet-H
(Female)
English (United Kingdom) Neural2 Voices
en-GB-Neural2-A
(Female)en-GB-Neural2-B
(Male)en-GB-Neural2-C
(Female)en-GB-Neural2-D
(Male)
English (United Kingdom) WaveNet Voices
en-GB-Wavenet-A
(Female)en-GB-Wavenet-B
(Male)en-GB-Wavenet-C
(Female)en-GB-Wavenet-D
(Male)
English (Australian) Neural2 Voices
en-AU-Neural2-A
(Female)en-AU-Neural2-B
(Male)en-AU-Neural2-C
(Female)en-AU-Neural2-D
(Male)
English (Australian) WaveNet Voices
en-AU-Wavenet-A
(Female)en-AU-Wavenet-B
(Male)en-AU-Wavenet-C
(Female)en-AU-Wavenet-D
(Male)
English (Indian) Neural2 Voices
en-IN-Neural2-A
(Female)en-IN-Neural2-B
(Male)en-IN-Neural2-C
(Female)en-IN-Neural2-D
(Male)
English (Indian) WaveNet Voices
en-IN-Wavenet-A
(Female)en-IN-Wavenet-B
(Male)en-IN-Wavenet-C
(Female)en-IN-Wavenet-D
(Male)
Pricing
Google Cloud Text-to-Speech API offers a flexible pricing structure based on the number of characters synthesized per month. Here’s an overview of the costs:
-
Free Tier:
- First 1 million characters each month for WaveNet voices are free.
-
Paid Usage:
- Standard voices: $4.00 per 1 million characters.
- WaveNet voices: $16.00 per 1 million characters.
- Neural2 voices: $16.00 per 1 million characters.
- Studio voices: $160.00 per 1 million characters.
New users also get $300 in free credits for the first 90 days to explore Google Cloud services.
For more details, visit the Google Cloud Pricing page.
Conclusion
Using the Google Cloud Text-to-Speech API, you can easily convert text to natural-sounding speech in various languages and voices. This guide walked you through the process of setting up the API, obtaining credentials, and writing a Node.js script to synthesize speech. You can now integrate this functionality into your applications for enhanced user interactions.
This article should provide a comprehensive guide for setting up and using the Google Cloud Text-to-Speech API.