[text-to-speech] Google Text-To-Speech API

I want to know how can I use Google Text-to-Speech API in my .NET project. I think I need to call a URL to use the web service, but the idea for me is not clear. Can anyone help?

This question is related to text-to-speech google-text-to-speech

The answer is


I used the url as above: http://translate.google.com/translate_tts?tl=en&q=Hello%20World

And requested with python library..however I'm getting HTTP 403 FORBIDDEN

In the end I had to mock the User-Agent header with the browser's one to succeed.


Google text to speech

<!DOCTYPE html>
<html>
    <head>
        <script>
            function play(id){
            var text = document.getElementById(id).value;
            var url = 'http://translate.google.com/translate_tts?tl=en&q='+text;
            var a = new Audio(url);
                a.play();
            }
        </script>
    </head>
    <body>
        <input type="text" id="text" />
        <button onclick="play('text');"> Speak it </button>
    </body>
</html>

As of now, Google official Text-to-Speech service is available at https://cloud.google.com/text-to-speech/

It's free for the first 4 million characters.


Because it came up in chat here , and the first page for googeling was this one, i decided to let all in on my findings googling some more XD

you really dont need to go any length anymore to make it work simply stand on the shoulders of giants:

there is a standard

https://dvcs.w3.org/hg/speech-api/raw-file/tip/webspeechapi.html

and an example

http://html5-examples.craic.com/google_chrome_text_to_speech.html

at least for your web projects this should work (e.g. asp.net)


Expanding on Chris' answer. I managed to reverse engineer the token generation process.

The token for the request is based on the text and a global TKK variable set in the page script. These are hashed in JavaScript thus resulting in the tk param.

Somewhere in the page script you will find something like this:

TKK='403413';

This is the amount of hours passed since epoch.

The text is pumped in the following function (somewhat deobfuscated):

_x000D_
_x000D_
var query = "Hello person";_x000D_
var cM = function(a) {_x000D_
    return function() {_x000D_
        return a_x000D_
    }_x000D_
};_x000D_
var of = "=";_x000D_
var dM = function(a, b) {_x000D_
    for (var c = 0; c < b.length - 2; c += 3) {_x000D_
        var d = b.charAt(c + 2),_x000D_
            d = d >= t ? d.charCodeAt(0) - 87 : Number(d),_x000D_
            d = b.charAt(c + 1) == Tb ? a >>> d : a << d;_x000D_
        a = b.charAt(c) == Tb ? a + d & 4294967295 : a ^ d_x000D_
    }_x000D_
    return a_x000D_
};_x000D_
_x000D_
var eM = null;_x000D_
var cb = 0;_x000D_
var k = "";_x000D_
var Vb = "+-a^+6";_x000D_
var Ub = "+-3^+b+-f";_x000D_
var t = "a";_x000D_
var Tb = "+";_x000D_
var dd = ".";_x000D_
var hoursBetween = Math.floor(Date.now() / 3600000);_x000D_
window.TKK = hoursBetween.toString();_x000D_
_x000D_
fM = function(a) {_x000D_
    var b;_x000D_
    if (null === eM) {_x000D_
        var c = cM(String.fromCharCode(84)); // char 84 is T_x000D_
        b = cM(String.fromCharCode(75)); // char 75 is K_x000D_
        c = [c(), c()];_x000D_
        c[1] = b();_x000D_
        // So basically we're getting window.TKK_x000D_
        eM = Number(window[c.join(b())]) || 0_x000D_
    }_x000D_
    b = eM;_x000D_
_x000D_
    // This piece of code is used to convert d into the utf-8 encoding of a_x000D_
    var d = cM(String.fromCharCode(116)),_x000D_
        c = cM(String.fromCharCode(107)),_x000D_
        d = [d(), d()];_x000D_
    d[1] = c();_x000D_
    for (var c = cb + d.join(k) +_x000D_
            of, d = [], e = 0, f = 0; f < a.length; f++) {_x000D_
        var g = a.charCodeAt(f);_x000D_
_x000D_
        128 > g ? d[e++] = g : (2048 > g ? d[e++] = g >> 6 | 192 : (55296 == (g & 64512) && f + 1 < a.length && 56320 == (a.charCodeAt(f + 1) & 64512) ? (g = 65536 + ((g & 1023) << 10) + (a.charCodeAt(++f) & 1023), d[e++] = g >> 18 | 240, d[e++] = g >> 12 & 63 | 128) : d[e++] = g >> 12 | 224, d[e++] = g >> 6 & 63 | 128), d[e++] = g & 63 | 128)_x000D_
    }_x000D_
_x000D_
_x000D_
    a = b || 0;_x000D_
    for (e = 0; e < d.length; e++) a += d[e], a = dM(a, Vb);_x000D_
    a = dM(a, Ub);_x000D_
    0 > a && (a = (a & 2147483647) + 2147483648);_x000D_
    a %= 1E6;_x000D_
    return a.toString() + dd + (a ^ b)_x000D_
};_x000D_
_x000D_
var token = fM(query);_x000D_
var url = "https://translate.google.com/translate_tts?ie=UTF-8&q="  + encodeURI(query) + "&tl=en&total=1&idx=0&textlen=12&tk=" + token + "&client=t";_x000D_
document.write(url);
_x000D_
_x000D_
_x000D_

I managed to successfully port this to python in my fork of gTTS, so I know this works.

Edit: By now the token generation code used by gTTS has been moved into gTTS-token.

Edit 2: Google has changed the API (somewhere around 2016-05-10), this method requires some modification. I'm currently working on this. In the meantime changing the client to tw-ob seems to work.

Edit 3:

The changes are minor, yet annoying to say the least. The TKK now has two parts. Looking something like 406986.2817744745. As you can see the first part has remained the same. The second part is the sum of two seemingly random numbers. TKK=eval('((function(){var a\x3d2680116022;var b\x3d137628723;return 406986+\x27.\x27+(a+b)})())'); Here \x3d means = and \x27 is '. Both a and b change every UTC minute. At one of the final steps in the algorithm the token is XORed by the second part.

The new token generation code is:

_x000D_
_x000D_
var xr = function(a) {_x000D_
    return function() {_x000D_
        return a_x000D_
    }_x000D_
};_x000D_
var yr = function(a, b) {_x000D_
    for (var c = 0; c < b.length - 2; c += 3) {_x000D_
        var d = b.charAt(c + 2)_x000D_
          , d = "a" <= d ? d.charCodeAt(0) - 87 : Number(d)_x000D_
          , d = "+" == b.charAt(c + 1) ? a >>> d : a << d;_x000D_
        a = "+" == b.charAt(c) ? a + d & 4294967295 : a ^ d_x000D_
    }_x000D_
    return a_x000D_
};_x000D_
var zr = null;_x000D_
var Ar = function(a) {_x000D_
    var b;_x000D_
    if (null  !== zr)_x000D_
        b = zr;_x000D_
    else {_x000D_
        b = xr(String.fromCharCode(84));_x000D_
        var c = xr(String.fromCharCode(75));_x000D_
        b = [b(), b()];_x000D_
        b[1] = c();_x000D_
        b = (zr = window[b.join(c())] || "") || ""_x000D_
    }_x000D_
    var d = xr(String.fromCharCode(116))_x000D_
      , c = xr(String.fromCharCode(107))_x000D_
      , d = [d(), d()];_x000D_
    d[1] = c();_x000D_
    c = "&" + d.join("") + _x000D_
    "=";_x000D_
    d = b.split(".");_x000D_
    b = Number(d[0]) || 0;_x000D_
    for (var e = [], f = 0, g = 0; g < a.length; g++) {_x000D_
        var l = a.charCodeAt(g);_x000D_
        128 > l ? e[f++] = l : (2048 > l ? e[f++] = l >> 6 | 192 : (55296 == (l & 64512) && g + 1 < a.length && 56320 == (a.charCodeAt(g + 1) & 64512) ? (l = 65536 + ((l & 1023) << 10) + (a.charCodeAt(++g) & 1023),_x000D_
        e[f++] = l >> 18 | 240,_x000D_
        e[f++] = l >> 12 & 63 | 128) : e[f++] = l >> 12 | 224,_x000D_
        e[f++] = l >> 6 & 63 | 128),_x000D_
        e[f++] = l & 63 | 128)_x000D_
    }_x000D_
    a = b;_x000D_
    for (f = 0; f < e.length; f++)_x000D_
        a += e[f],_x000D_
        a = yr(a, "+-a^+6");_x000D_
    a = yr(a, "+-3^+b+-f");_x000D_
    a ^= Number(d[1]) || 0;_x000D_
    0 > a && (a = (a & 2147483647) + 2147483648);_x000D_
    a %= 1E6;_x000D_
    return c + (a.toString() + "." + (a ^ b))_x000D_
}_x000D_
;_x000D_
Ar("test");
_x000D_
_x000D_
_x000D_

Of course I can't generate a valid url anymore, since I don't know how a and b are generated.


Old answer:

Try using this URL: http://translate.google.com/translate_tts?tl=en&q=Hello%20World It will automatically generate a wav file which you can easily get with an HTTP request through any .net programming.

Edit:

Ohh Google, you thought you could prevent people from using your wonderful service with flimsy http header verification.

Here is a solution to get a response in multiple languages (I'll try to add more as we go):

NodeJS

// npm install `request`
const fs = require('fs');
const request = require('request');
const text = 'Hello World';

const options = {
    url: `https://translate.google.com/translate_tts?ie=UTF-8&q=${encodeURIComponent(text)}&tl=en&client=tw-ob`,
    headers: {
        'Referer': 'http://translate.google.com/',
        'User-Agent': 'stagefright/1.2 (Linux;Android 5.0)'
    }
}

request(options)
    .pipe(fs.createWriteStream('tts.mp3'))

Curl

curl 'https://translate.google.com/translate_tts?ie=UTF-8&q=Hello%20Everyone&tl=en&client=tw-ob' -H 'Referer: http://translate.google.com/' -H 'User-Agent: stagefright/1.2 (Linux;Android 5.0)' > google_tts.mp3

Note that the headers are based on @Chris Cirefice's example, if they stop working at some point I'll attempt to recreate conditions for this code to function. All credits for the current headers go to him and the wonderful tool that is WireShark. (also thanks to Google for not patching this)


Allright, so Google has introduces tokens (see the tk parameter in the new url) and the old solution doesn't seem to work. I've found an alternative - which I even think is better-sounding, and has more voices! The command isn't pretty, but it works. Please note that this is for testing purposes only (I use it for a little domotica project) and use the real version from acapella-group if you're planning on using this commercially.

curl $(curl --data 'MyLanguages=sonid10&MySelectedVoice=Sharon&MyTextForTTS=Hello%20World&t=1&SendToVaaS=' 'http://www.acapela-group.com/demo-tts/DemoHTML5Form_V2.php' | grep -o "http.*mp3") > tts_output.mp3

Some of the supported voices are;

  • Sharon
  • Ella (genuine child voice)
  • EmilioEnglish (genuine child voice)
  • Josh (genuine child voice)
  • Karen
  • Kenny (artificial child voice)
  • Laura
  • Micah
  • Nelly (artificial child voice)
  • Rod
  • Ryan
  • Saul
  • Scott (genuine teenager voice)
  • Tracy
  • ValeriaEnglish (genuine child voice)
  • Will
  • WillBadGuy (emotive voice)
  • WillFromAfar (emotive voice)
  • WillHappy (emotive voice)
  • WillLittleCreature (emotive voice)
  • WillOldMan (emotive voice)
  • WillSad (emotive voice)
  • WillUpClose (emotive voice)

It also supports multiple languages and more voices - for that I refer you to their website; http://www.acapela-group.com/


In an update to Schahriar SaffarShargh's answer, Google has recently implemented a 'Google abuse' feature, making it impossible to send just any regular old HTTP GET to a URL such as:

http://translate.google.com/translate_tts?tl=en&q=Hello%20World

which worked just fine and dandy previously. Now, following such a link presents you with a CAPTCHA. This also affects HTTP GET requests out-of-browser (such as with cURL), because using that URL gives a redirect to the abuse protection page (the CAPTCHA).

To start, you have to add the query parameter client to the request URL:

http://translate.google.com/translate_tts?tl=en&q=Hello%20World&client=t

Google Translate sends &client=t, so you should too.

Before you make that HTTP request, make sure that you set the Referer header:

Referer: http://translate.google.com/

Evidently, the User-Agent header is also required, but interestingly enough it can be blank:

User-Agent:

Edit: NOTE - on some user-agents, such as Android 4.X, the custom User-Agent header is not sent, meaning that Google will not service the request. In order to solve that problem, I simply set the User-Agent to a valid one, such as stagefright/1.2 (Linux;Android 5.0). Use Wireshark to debug requests (as I did) if Google's servers are not responding, and ensure that these headers are being set properly in the GET! Google will respond with a 503 Service Unavailable if the request fails, followed by a redirect to the CAPTCHA page.

This solution is a bit brittle; it is entirely possible that Google will change the way they handle these requests in the future, so in the end I would suggest asking Google to make a real API endpoint (free or paid) that we can use without feeling dirty for faking HTTP headers.


Edit 2: For those interested, this cURL command should work perfectly fine to download an mp3 of Hello in English:

curl 'http://translate.google.com/translate_tts?ie=UTF-8&q=Hello&tl=en&client=t' -H 'Referer: http://translate.google.com/' -H 'User-Agent: stagefright/1.2 (Linux;Android 5.0)' > google_tts.mp3

As you may notice, I have set both the Referer and User-Agent headers in the request, as well as added the client=t parameter to the querystring. You may use https instead of http, your choice!


Edit 3: Google now requires a token for each GET request (noted by tk in the querystring). Below is the revised cURL command that will correctly download a TTS mp3:

curl 'https://translate.google.com/translate_tts?ie=UTF-8&q=hello&tl=en&tk=995126.592330&client=t' -H 'user-agent: stagefright/1.2 (Linux;Android 5.0)' -H 'referer: https://translate.google.com/' > google_tts.mp3

Notice the &tk=995126.592330 in the querystring; this is the new token. I obtained this token by pressing the speaker icon on translate.google.com and looking at the GET request. I simply added this querystring parameter to the previous cURL command, and it works.

NOTE: obviously this solution is very frail, and breaks at the whim of the architects at Google who introduce new things like tokens required for the requests. This token may not work tomorrow (though I will check and report back)... the point is, it is not wise to rely on this method; instead, one should turn to a commercial TTS solution, especially if using TTS in production.

For further explanation of the token generation and what you might be able to do about it, see Boude's answer.


If this solution breaks any time in the future, please leave a comment on this answer so that we can attempt to find a fix for it!



You can download the Voice using Wget:D

wget -q -U Mozilla "http://translate.google.com/translate_tts?tl=en&q=Hello"

Save the output into a mp3 file:

wget -q -U Mozilla "http://translate.google.com/translate_tts?tl=en&q=Hello" -O hello.mp3

Enjoy !!



#! /usr/bin/python2
# -*- coding: utf-8 -*-

def run(cmd):
    import os
    import sys
    from subprocess import Popen, PIPE
    print(cmd)
    proc=Popen(cmd, stdin=None, stdout=PIPE, stderr=None, shell=True)
    while True:
        data = proc.stdout.readline()   # Alternatively proc.stdout.read(1024)
        if len(data) == 0:
            print("Finished process")
            break
        sys.stdout.write(data)

import urllib

msg='Hello preety world'
msg=urllib.quote_plus(msg)
# -v verbosity
cmd='curl '+ \
    '--output tts_responsivevoice.mp2 '+ \
    "\""+'https://code.responsivevoice.org/develop/getvoice.php?t='+msg+'&tl=en-US&sv=g2&vn=&pitch=0.5&rate=0.5&vol=1'+"\""+ \
    ' -H '+"\""+'User-Agent: Mozilla/5.0 (X11; Linux x86_64; rv:44.0) Gecko/20100101 Firefox/44.0'+"\""+ \
    ' -H '+"\""+'Accept: audio/webm,audio/ogg,audio/wav,audio/*;q=0.9,application/ogg;q=0.7,video/*;q=0.6,*/*;q=0.5'+"\""+ \
    ' -H '+"\""+'Accept-Language: pl,en-US;q=0.7,en;q=0.3'+"\""+ \
    ' -H '+"\""+'Range: bytes=0-'+"\""+ \
    ' -H '+"\""+'Referer: http://code.responsivevoice.org/develop/examples/example2.html'+"\""+ \
    ' -H '+"\""+'Cookie: __cfduid=ac862i73b6a61bf50b66713fdb4d9f62c1454856476; _ga=GA1.2.2126195996.1454856480; _gat=1'+"\""+ \
    ' -H '+"\""+'Connection: keep-alive'+"\""+ \
    ''
print('***************************')
print(cmd)
print('***************************')
run(cmd)

Line:

/getvoice.php?t='+msg+'&tl=en-US&sv=g2&vn=&pitch=0.5&rate=0.5&vol=1'+"\""+ \

is responsible for language.

tl=en-US

There is another preety interesting site with tts engines that can be used in this manner.

substitute o for null iv0na.c0m

have a nice day


Go to console.developer.google.com login and get an API key or use microsoft bing's API
https://msdn.microsoft.com/en-us/library/?f=255&MSPPError=-2147217396

or even better use AT&T's speech API developer.att.com(paid one)
For voice recognition

Public Class Voice_recognition

    Public Function convertTotext(ByVal path As String, ByVal output As String) As String
        Dim request As HttpWebRequest = DirectCast(HttpWebRequest.Create("https://www.google.com/speech-api/v1/recognize?xjerr=1&client=speech2text&lang=en-US&maxresults=10"), HttpWebRequest)
        'path = Application.StartupPath & "curinputtmp.mp3"
        request.Timeout = 60000
        request.Method = "POST"
        request.KeepAlive = True
        request.ContentType = "audio/x-flac; rate=8000"  
        request.UserAgent = "speech2text"

        Dim fInfo As New FileInfo(path)
        Dim numBytes As Long = fInfo.Length
        Dim data As Byte()

        Using fStream As New FileStream(path, FileMode.Open, FileAccess.Read)
            data = New Byte(CInt(fStream.Length - 1)) {}
            fStream.Read(data, 0, CInt(fStream.Length))
            fStream.Close()
        End Using

        Using wrStream As Stream = request.GetRequestStream()
            wrStream.Write(data, 0, data.Length)
        End Using

        Try
            Dim response As HttpWebResponse = DirectCast(request.GetResponse(), HttpWebResponse)
            Dim resp = response.GetResponseStream()

            If resp IsNot Nothing Then
                Dim sr As New StreamReader(resp)
                MessageBox.Show(sr.ReadToEnd())

                resp.Close()
                resp.Dispose()
            End If
        Catch ex As System.Exception
            MessageBox.Show(ex.Message)
        End Try

        Return 0
    End Function
End Class

And for text to speech: use this.

I think you'll understand this
if didn't then use vbscript to vb/C# converter.
still didn't then contact Me.

I have done this before ,can't find the code now that this why i'm not directly givin' you the code.


An additional alternative is: responsivevoice.org a simple example JsFiddle is Here

HTML

<div id="container">
<input type="text" name="text">
<button id="gspeech" class="say">Say It</button>
<audio id="player1" src="" class="speech" hidden></audio>
</div>

JQuery

$(document).ready(function(){

 $('#gspeech').on('click', function(){
        
        var text = $('input[name="text"]').val();
        responsiveVoice.speak("" + text +"");
        <!--  http://responsivevoice.org/ -->
    });

});

External Resource:

https://code.responsivevoice.org/responsivevoice.js