OWolf

BlogToolsProjectsAboutContact
© 2025 owolf.com
HomeAboutNotesContactPrivacy

2024-09-09 web, development, javascript

Using Google Cloud Text-to-Speech API

By O. Wolfson

Introduction

In this aritcle, we will walk through the process of using Google Cloud Text-to-Speech API to convert text to speech in a Node.js application. This guide will cover setting up Google Cloud credentials, managing billing, preparing the source data, and running a script that converts phrases from JSON data into audio files.

Prerequisites

  • Basic knowledge of JavaScript and Node.js.
  • Node.js installed on your system.
  • Google Cloud account.

Step 1: Setting up Google Cloud Console Credentials

  1. Create a Google Cloud Project: Log in to the Google Cloud Console and create a new project.
  2. Enable Text-to-Speech API: Navigate to the "API & Services" dashboard and enable the Text-to-Speech API for your project.
  3. Create Credentials:
    • Go to the "Credentials" page.
    • Click "Create credentials" and select "Service account".
    • Follow the steps to create a service account.
    • Once created, click on your new service account and go to the "Keys" tab.
    • Add a new key and select JSON. The JSON file will be downloaded to your machine.

Step 2: Setting up Your Node.js Application

  1. Initialize a Node.js Project: Create a new directory for your project and initialize it with npm init.
  2. Install Dependencies: Install the necessary packages by running:
    bash
    npm install @google-cloud/text-to-speech fs dotenv
    
  3. Set Up Environment Variables:
    • Create a .env file in your project root.
    • Add the following line, replacing YOUR_CREDENTIALS with the content of the downloaded JSON file:
      GOOGLE_CREDENTIALS_CONTENT='YOUR_CREDENTIALS'
      

Step 3: Preparing the Source Data

Your source data should be in JSON format containing phrases you want to convert. Here is an example (/data/phrases.json):

json
[
  {
    "id": "79546973-07ee-4b13-ae8e-2ebf07db7b2d",
    "phrase": "Potremmo vederci domani?",
    "translation": "Could we meet tomorrow?"
  }
  // Additional phrases...
]

Step 4: The Script

The provided script converts each phrase in the JSON file into an audio file using Google Cloud Text-to-Speech API.

js
const textToSpeech = require("@google-cloud/text-to-speech");
const fs = require("fs");
const util = require("util");
const path = require("path");
require("dotenv").config();

const phrases = require("../data/phrases.json");

const google_credentials_content = process.env.GOOGLE_CREDENTIALS_CONTENT;
const credentials = JSON.parse(google_credentials_content);
const client = new textToSpeech.TextToSpeechClient({ credentials });

async function convertTextToAudioFile(obj) {
  // console.log("convertTextToAudioFile:", obj.phrase);

  const request = {
    input: { text: obj.phrase },
    voice: { languageCode: "it-IT", ssmlGender: "FEMALE" },
    audioConfig: { audioEncoding: "MP3" },
  };

  const [response] = await client.synthesizeSpeech(request);

  // Specify the path where the audio should be saved
  const outputPath = path.join(
    __dirname,
    "../public/audio/",
    `${obj.id}.audio.mp3`
  );

  // Write the audio content to the file
  await fs.promises.writeFile(outputPath, response.audioContent, "binary");
}

async function convertAll() {
  for (const phraseObj of phrases) {
    await convertTextToAudioFile(phraseObj);
  }
}

convertAll();

Script Breakdown

  • Load Dependencies: The script uses the @google-cloud/text-to-speech, fs, util, path, and dotenv modules.
  • Load Phrases: It reads the phrases from phrases.json.
  • Convert Text to Audio: For each phrase, the script sends a request to the Text-to-Speech API to generate audio and then saves it as an MP3 file.

Running the Script

Place the script in your project directory (e.g., as utils/process-audio.js) and run it using Node.js:

bash
node utils/process-audio.js

Step 5: Managing Billing

Be aware of the billing for the Text-to-Speech API:

  • Check Usage in Google Cloud Console: Regularly monitor your usage and charges in the Google Cloud Console.
  • Set Alerts and Budgets: To prevent unexpected charges, set up billing alerts and budgets.
  • Understand the Billing Cycle: Google Cloud TTS API follows a monthly billing cycle, and charges are based on the number of characters processed.

Conclusion

By following these steps, you can integrate Google Cloud Text-to-Speech into your Node.js application to create a dynamic and interactive voice experience. Remember to monitor your usage to align with your budget and regularly update your application as needed.


Chat with me

Ask me anything about this blog post. I'll do my best to help you.