| 2023-11-23

Using Google Cloud Text-to-Speech API


    In this aritcle, we will walk through the process of using Google Cloud Text-to-Speech API to convert text to speech in a Node.js application. This guide will cover setting up Google Cloud credentials, managing billing, preparing the source data, and running a script that converts phrases from JSON data into audio files.


    • Basic knowledge of JavaScript and Node.js.
    • Node.js installed on your system.
    • Google Cloud account.

    Step 1: Setting up Google Cloud Console Credentials

    1. Create a Google Cloud Project: Log in to the Google Cloud Console and create a new project.
    2. Enable Text-to-Speech API: Navigate to the "API & Services" dashboard and enable the Text-to-Speech API for your project.
    3. Create Credentials:
      • Go to the "Credentials" page.
      • Click "Create credentials" and select "Service account".
      • Follow the steps to create a service account.
      • Once created, click on your new service account and go to the "Keys" tab.
      • Add a new key and select JSON. The JSON file will be downloaded to your machine.

    Step 2: Setting up Your Node.js Application

    1. Initialize a Node.js Project: Create a new directory for your project and initialize it with npm init.
    2. Install Dependencies: Install the necessary packages by running:
      npm install @google-cloud/text-to-speech fs dotenv
    3. Set Up Environment Variables:
      • Create a .env file in your project root.
      • Add the following line, replacing YOUR_CREDENTIALS with the content of the downloaded JSON file:

    Step 3: Preparing the Source Data

    Your source data should be in JSON format containing phrases you want to convert. Here is an example (/data/phrases.json):

        "id": "79546973-07ee-4b13-ae8e-2ebf07db7b2d",
        "phrase": "Potremmo vederci domani?",
        "translation": "Could we meet tomorrow?"
      // Additional phrases...

    Step 4: The Script

    The provided script converts each phrase in the JSON file into an audio file using Google Cloud Text-to-Speech API.

    const textToSpeech = require("@google-cloud/text-to-speech");
    const fs = require("fs");
    const util = require("util");
    const path = require("path");
    const phrases = require("../data/phrases.json");
    const google_credentials_content = process.env.GOOGLE_CREDENTIALS_CONTENT;
    const credentials = JSON.parse(google_credentials_content);
    const client = new textToSpeech.TextToSpeechClient({ credentials });
    async function convertTextToAudioFile(obj) {
      // console.log("convertTextToAudioFile:", obj.phrase);
      const request = {
        input: { text: obj.phrase },
        voice: { languageCode: "it-IT", ssmlGender: "FEMALE" },
        audioConfig: { audioEncoding: "MP3" },
      const [response] = await client.synthesizeSpeech(request);
      // Specify the path where the audio should be saved
      const outputPath = path.join(
      // Write the audio content to the file
      await fs.promises.writeFile(outputPath, response.audioContent, "binary");
    async function convertAll() {
      for (const phraseObj of phrases) {
        await convertTextToAudioFile(phraseObj);

    Script Breakdown

    • Load Dependencies: The script uses the @google-cloud/text-to-speech, fs, util, path, and dotenv modules.
    • Load Phrases: It reads the phrases from phrases.json.
    • Convert Text to Audio: For each phrase, the script sends a request to the Text-to-Speech API to generate audio and then saves it as an MP3 file.

    Running the Script

    Place the script in your project directory (e.g., as utils/process-audio.js) and run it using Node.js:

    node utils/process-audio.js

    Step 5: Managing Billing

    Be aware of the billing for the Text-to-Speech API:

    • Check Usage in Google Cloud Console: Regularly monitor your usage and charges in the Google Cloud Console.
    • Set Alerts and Budgets: To prevent unexpected charges, set up billing alerts and budgets.
    • Understand the Billing Cycle: Google Cloud TTS API follows a monthly billing cycle, and charges are based on the number of characters processed.


    By following these steps, you can integrate Google Cloud Text-to-Speech into your Node.js application to create a dynamic and interactive voice experience. Remember to monitor your usage to align with your budget and regularly update your application as needed.

    Thanks for reading. If you enjoyed this post, I invite you to explore more of my site. I write about web development, programming, and other fun stuff.