August 11, 2024
O Wolfson
Google's Text-to-Speech (TTS) service supports a variety of SSML (Speech Synthesis Markup Language) tags that allow you to control the pronunciation, pitch, rate, volume, and other aspects of speech synthesis. Below is an overview of the key SSML tags and attributes supported by Google TTS, along with examples of how to use them:
<speak>
xml<speak>
Welcome to our service!
</speak>
<emphasis>
level
: Can be "strong", "moderate", or "reduced".xml<speak>
This is <emphasis level="strong">very important</emphasis> information.
</speak>
<break>
time
: Specifies the duration of the pause (e.g., "500ms").strength
: Specifies the strength of the pause ("none", "x-weak", "weak", "medium", "strong", "x-strong").xml<speak>
Please wait <break time="500ms"/> before continuing.
</speak>
<prosody>
pitch
: Changes the pitch of the speech (e.g., "+10%", "high", "low").rate
: Changes the speed of the speech (e.g., "slow", "fast", "medium", "x-slow", "x-fast").volume
: Adjusts the volume (e.g., "soft", "loud", "x-loud", "-10dB").xml<speak>
<prosody pitch="+10%" rate="slow" volume="loud">
This text is spoken slowly, with a higher pitch and louder volume.
</prosody>
</speak>
<say-as>
interpret-as
: Can be "date", "time", "telephone", "characters", "fraction", etc.xml<speak>
The meeting is scheduled for <say-as interpret-as="date">2024-08-15</say-as>.
</speak>
<sub>
xml<speak>
Read the abbreviation as <sub alias="National Aeronautics and Space Administration">NASA</sub>.
</speak>
<audio>
src
: URL of the audio file.xml<speak>
Welcome to the tutorial. <audio src="https://www.example.com/audio/welcome.mp3" />
</speak>
<p> and <s>
<p>
is used to define a paragraph, and <s>
is used to define a sentence.xml<speak>
<p>This is the first paragraph.</p>
<p>This is the second paragraph.</p>
</speak>
<voice>
name
: The name of the voice to use.xml<speak>
<voice name="en-US-Wavenet-D">This part is spoken by a different voice.</voice>
</speak>
<lang>
xml:lang
: The language code (e.g., "en-US", "fr-FR").xml<speak>
<lang xml:lang="fr-FR">Bonjour tout le monde.</lang>
</speak>
xml<speak>
<p>
<emphasis level="strong">Attention!</emphasis> Please note that the event is on
<say-as interpret-as="date">2024-12-01</say-as>.
</p>
<p>
<prosody pitch="+5%" rate="slow" volume="soft">
Make sure you <emphasis level="moderate">arrive</emphasis> on time.
</prosody>
<break time="1s"/>
The doors will close promptly at <say-as interpret-as="time">09:00 AM</say-as>.
</p>
</speak>
For the most up-to-date and detailed information on the supported SSML tags and their usage, you can refer to the Google Cloud Text-to-Speech SSML documentation.