2024-12-06 Web Development
Setting up Google Cloud Text to Speech App
By O. Wolfson
This article will guide you through the process of setting up a simple TTS web app using Google Cloud Text-to-Speech (TTS) and Next.js. We'll include server-side integration, client-side rendering, and Google Cloud API configuration.
Prerequisites
-
Node.js and npm installed on your system.
-
Google Cloud Project:
- Create a project on the Google Cloud Console.
- Enable the Text-to-Speech API.
- Generate and download a service account key (JSON file).
-
Next.js Project:
-
Initialize a Next.js project:
bashnpx create-next-app@latest my-tts-app cd my-tts-app
-
Install necessary dependencies:
bashnpm install @google-cloud/text-to-speech
-
Step 1: Configure Google Cloud Text-to-Speech API
-
Service Account Key: Place the downloaded service account key file (e.g.,
tts-key.json
) in a secure location in your project directory, such as./keys
. -
Environment Variables: Securely reference your key file using environment variables if deploying:
-
Add the path to
.env.local
:envGOOGLE_APPLICATION_CREDENTIALS=./keys/tts-key.json
-
Access the variable in the code:
javascriptprocess.env.GOOGLE_APPLICATION_CREDENTIALS;
-
Step 2: Build the Server-Side TTS API
We’ll use a server action in Next.js to handle API requests.
-
Create a file at
app/api/tts/route.js
:javascriptimport { TextToSpeechClient } from "@google-cloud/text-to-speech"; const client = new TextToSpeechClient(); export async function POST(request) { try { const { text } = await request.json(); if (!text) { return new Response(JSON.stringify({ error: "Text is required" }), { status: 400, headers: { "Content-Type": "application/json" }, }); } const requestPayload = { input: { text }, voice: { languageCode: "en-US", name: "en-US-Neural2-D", }, audioConfig: { audioEncoding: "MP3" }, }; const [response] = await client.synthesizeSpeech(requestPayload); return new Response(response.audioContent, { status: 200, headers: { "Content-Type": "audio/mpeg", "Content-Disposition": "inline; filename=speech.mp3", }, }); } catch (error) { console.error("Error synthesizing speech:", error); return new Response( JSON.stringify({ error: "Failed to generate speech" }), { status: 500, headers: { "Content-Type": "application/json" } } ); } }
-
Test the API locally: Start the Next.js development server:
bashnpm run dev
Send a
POST
request to/api/tts
using a tool like Postman or curl with a JSON body:json{ "text": "Hello, world!" }
Step 3: Build the Client-Side UI
Create a React component for interacting with the API.
-
In
app/page.js
:javascript"use client"; import { useState } from "react"; export default function TextToSpeechPage() { const [text, setText] = useState("This is a test sentence."); const [isLoading, setIsLoading] = useState(false); const [audioUrl, setAudioUrl] = useState(null); const handleGenerateSpeech = async () => { setIsLoading(true); setAudioUrl(null); try { const response = await fetch("/api/tts", { method: "POST", headers: { "Content-Type": "application/json" }, body: JSON.stringify({ text }), }); if (!response.ok) { throw new Error("Failed to generate speech."); } const audioBlob = await response.blob(); const audioUrl = URL.createObjectURL(audioBlob); setAudioUrl(audioUrl); } catch (error) { console.error("Error:", error); } finally { setIsLoading(false); } }; return ( <div style={{ padding: "20px", textAlign: "center" }}> <h1>Text-to-Speech</h1> <textarea value={text} onChange={(e) => setText(e.target.value)} rows={4} cols={40} style={{ display: "block", margin: "10px auto" }} /> <button type="button" onClick={handleGenerateSpeech} disabled={isLoading} style={{ padding: "10px 20px", fontSize: "16px" }} > {isLoading ? "Generating..." : "Generate Speech"} </button> {audioUrl && ( <audio controls src={audioUrl} style={{ marginTop: "20px" }} /> )} </div> ); }
Step 4: Deploy Your Application
- Deploy your Next.js app to a platform like Vercel or Netlify.
- If using Vercel, ensure you upload the
tts-key.json
file securely or replace it with an environment variable setup.
Summary
With this foundation, you can extend the application by:
- Supporting multiple languages and voices.
- Adding audio file downloads.
- Including user authentication for personalized services.