July 11, 2024
O Wolfson
Google Cloud Text-to-Speech API allows developers to synthesize natural-sounding speech from text. This guide will walk you through the process of setting up the API, obtaining the necessary credentials, and writing a Node.js script to convert text to speech.
Create a Google Cloud Project:
Enable the Text-to-Speech API:
Create a Service Account:
Grant the Service Account Access:
Create a Key for the Service Account:
Install the required Node.js packages:
bashnpm install @google-cloud/text-to-speech
Create a script (synthesize.js) with the following content:
javascriptconst textToSpeech = require("@google-cloud/text-to-speech");
const fs = require("node:fs");
const util = require("node:util");
// Initialize the Text-to-Speech client with the service account key file
const client = new textToSpeech.TextToSpeechClient({
keyFilename: "./tts-key.json",
});
// Function to synthesize speech from text and save it to an MP3 file
async function synthesizeSpeech(text, outputFile) {
// Define the request payload
const request = {
input: { text: text },
voice: {
languageCode: "en-US",
name: "en-US-Neural2-D",
},
audioConfig: { audioEncoding: "MP3" },
};
// Make the API request to synthesize speech
const [response] = await client.synthesizeSpeech(request);
// Write the audio content to a file
const writeFile = util.promisify(fs.writeFile);
await writeFile(outputFile, response.audioContent, "binary");
console.log(`Audio content written to file: ${outputFile}`);
}
// Sample text to convert to speech
const text = `This is a generic sentence intended for testing text-to-speech.`;
outputFile = ;
(text, outputFile);
In this script:
synthesizeSpeech that takes text and an output file path as arguments.To run the script, execute the following command in your terminal:
bashnode synthesize.js
If everything is set up correctly, you should see the message "Audio content written to file: output.mp3" and an MP3 file will be generated with the synthesized speech.
The Google Cloud Text-to-Speech API provides a variety of voices and languages to choose from. Here are some of the available options:
en-US-Neural2-A (Female)en-US-Neural2-B (Male)en-US-Neural2-C (Female)en-US-Neural2-D (Male)en-US-Neural2-E (Female)en-US-Neural2-F (Male)en-US-Neural2-G (Female)en-US-Neural2-H (Male)en-US-Neural2-I (Female)en-US-Neural2-J (Male)en-US-Wavenet-A (Female)en-US-Wavenet-B (Male)en-US-Wavenet-C (Female)en-US-Wavenet-D (Male)en-US-Wavenet-E (Male)en-US-Wavenet-F (Female)en-US-Wavenet-G (Male)en-US-Wavenet-H (Female)en-GB-Neural2-A (Female)en-GB-Neural2-B (Male)en-GB-Neural2-C (Female)en-GB-Neural2-D (Male)en-GB-Wavenet-A (Female)en-GB-Wavenet-B (Male)en-GB-Wavenet-C (Female)en-GB-Wavenet-D (Male)en-AU-Neural2-A (Female)en-AU-Neural2-B (Male)en-AU-Neural2-C (Female)en-AU-Neural2-D (Male)en-AU-Wavenet-A (Female)en-AU-Wavenet-B (Male)en-AU-Wavenet-C (Female)en-AU-Wavenet-D (Male)en-IN-Neural2-A (Female)en-IN-Neural2-B (Male)en-IN-Neural2-C (Female)en-IN-Neural2-D (Male)en-IN-Wavenet-A (Female)en-IN-Wavenet-B (Male)en-IN-Wavenet-C (Female)en-IN-Wavenet-D (Male)Google Cloud Text-to-Speech API offers a flexible pricing structure based on the number of characters synthesized per month. Here’s an overview of the costs:
Free Tier:
Paid Usage:
New users also get $300 in free credits for the first 90 days to explore Google Cloud services.
For more details, visit the Google Cloud Pricing page.
Using the Google Cloud Text-to-Speech API, you can easily convert text to natural-sounding speech in various languages and voices. This guide walked you through the process of setting up the API, obtaining credentials, and writing a Node.js script to synthesize speech. You can now integrate this functionality into your applications for enhanced user interactions.
This article should provide a comprehensive guide for setting up and using the Google Cloud Text-to-Speech API.