Speech Synthesis
This is the 23rd project of WesBos's JS30 series. To see the whole 30 part series, click here Today we'll learn how to do speech synthesis (text to speech) with JavaScript.
Video -
Starter Code -
Speech synthesis is one half of the Web Speech API, the other being the speech recognition API we dealt with earlier.
Speech synthesis is accessed via the SpeechSynthesis
interface, a text-to-speech component that allows programs to read out their text content (normally via the device's default speech synthesizer.) Different voice types are represented by SpeechSynthesisVoice
objects, and different parts of text that you want to be spoken are represented by SpeechSynthesisUtterance
objects. You can get these spoken by passing them to the SpeechSynthesis.speak()
method.
What we already have -
const msg = new SpeechSynthesisUtterance()
let voices = []
const voicesDropdown = document.querySelector('[name="voice"]')
const options = document.querySelectorAll('[type="range"], [name="text"]')
const speakButton = document.querySelector('#speak')
const stopButton = document.querySelector('#stop')
We have handles to the various DOM elements, and we have msg - a SpeechSynthesisUtterance object.
The SpeechSynthesisUtterance
represents a speech request. It contains the content the speech service should read and information about how to read it (e.g. language, pitch and volume.) The SpeechSynthesisUtterance
object has various properties that can be set, to manipulate the speech generated.
.lang
- Gets and sets the language of the utterance..pitch
- Gets and sets the pitch at which the utterance will be spoken at..rate
- Gets and sets the speed at which the utterance will be spoken at..text
- Gets and sets the text that will be synthesized when the utterance is spoken..voice
- Gets and sets the voice that will be used to speak the utterance..volume
- Gets and sets the volume that the utterance will be spoken at.
This is the most basic thing you need to do to generate speech -
const msg = new SpeechSynthesisUtterance()
// .text should be set, all other properties have defaults
msg.text = "Hello There"
// speak() does the actual synthesis - uses default voice
speechSynthesis.speak(msg)
Now for the various features of our app!
Get the default text
We want the app to speak whatever text we have in the text area, so lets set the initial value of the msg text to that.
msg.text = document.querySelector('[name="text"]').value
Select a custom voice
Setting the voice property - we use voiceschanged
event to listen to when all the voices have been loaded.
function populateVoices(){
// array of SpeechSynthesisVoice objects
voices = this.getVoices()
voicesDropdown.innerHTML = voices
.filter(voice => voice.lang.includes('en'))
.map(voice => `<option value="${voice.name}">${voice.name} (${voice.lang})</option>`)
.join('')
}
speechSynthesis.addEventListener('voiceschanged', populateVoices)
We get the voices from the speechSynthesis
object, then filter only those which are english. Then create option tags (as strings) and add them to the innerHTML
of the select component.
Now when the user selects the voice, we want to change the utterance to use that,
function setVoice() {
msg.voice = voices.find(voice => voice.name === this.value)
}
voicesDropdown.addEventListener('change', setVoice)
We search through the array of voices to find the correct SpeechSynthesisVoice
. We then set the msg.voice
to that object.
// A example SpeechSynthesisVoice object
{
default:true, lang:"en-IN",
localService:true, name:"Veena",
voiceURI:"Veena"
}
Rate, pitch, user entered text
Our options
variable is a NodeList of three elements, the rate, pitch range
elements and the text area.
We can deal with all three inputs in one function
const options = document.querySelectorAll('[type="range"], [name="text"]')
function setOption() {
msg[this.name] = this.value
}
options.forEach(option => option.addEventListener('change', setOption))
The name of each of the input elements match with the corresponding property name in msg
, and the value can be extracted in the same way using this.value
.
Start and stop the speech
We have a start and stop button already, let us write a function to be triggered on click
function toggle(startOver = true) {
speechSynthesis.cancel()
if (startOver) {
speechSynthesis.speak(msg)
}
}
speakButton.addEventListener('click', toggle)
stopButton.addEventListener('click', () => toggle(false))
toggle()
cancels anything that is already playing, and in case startOver
is true (viz the default case), then it replays the current text with the current settings. startOver
is set to false in the stop button, so it doesn't start playing anything after is cancels the current speech.
We'll also use toggle in setOption
to re-start the speech after a change in settings.
function setOption() {
msg[this.name] = this.value
toggle()
}
This completes our tiny app, here is the final code -