[dba-VB] Google Translate API

Salakhetdinov Shamil mcp2004 at mail.ru
Sat Feb 18 12:14:49 CST 2012


Thank you, Gustav,

Yes, your function worked well.

Here is a bit shorter version using regular expressions:

        /// <summary>
        /// Translate text using Google Translate web page.
        /// <para>Google URL: "http://www.google.com/translate_t?hl=en(et)ie=UTF8(et)text=InputText(et)langpair=LanguagePair"</para>
        /// </summary>
        /// <param name="input">Text to be translated.</param>
        /// <param name="languagePair">Two-letter language pair, delimited by "|".
        /// <para >E.g. "en|fr" language pair will translate from English to French.</para></param>
        /// <returns>The translated text.</returns>
        public static string TranslateText(string inputText, string languagePair)
        {
            string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", inputText, languagePair);
            WebClient webClient = new WebClient();
            webClient.Encoding = System.Text.Encoding.UTF8;
            string translationPage = webClient.DownloadString(url);

            string startTag = "onmouseout=\"this.style.backgroundColor='#fff'\">";
            string endTag = "</span>";

            System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(translationPage,
                string.Format("(?<={0})(.*?)(?={1})", startTag, endTag));

            return match.Value; // translatedText;
        }

or my initial version corrected using your approach:

        public static string Translate(
            string sentence,
            string sourceLanguageCode,
            string targetLanguageCode)
        {
            string url = string.Format(
                    "http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}|{2}",
                    System.Web.HttpUtility.HtmlEncode(sentence).Replace(" ", "%20"),
                    sourceLanguageCode,
                    targetLanguageCode
                    );

            System.Net.WebClient webClient = new System.Net.WebClient();
            webClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
            webClient.Encoding = System.Text.Encoding.UTF8;
            string html = webClient.DownloadString(url);

            string startTag = "onmouseout=\"this.style.backgroundColor='#fff'\">";
            string endTag = "</span>";

            System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(html, 
                string.Format("(?<={0})(.*?)(?={1})", startTag, endTag));

            return match.Value;             
        }

URL I used before doesn't work.

Thank you.

-- Shamil


18 февраля 2012, 16:22 от "Gustav Brock" <gustav at cactus.dk>:
> Hi Shamil
> 
> I found the code for my small demo.
> The trap I encountered was that the returned string is html encoded, thus it has to be decoded.
> As you can see, I feed the source string as is to the Google translator - no encoding:
> 
> <C#>
> using System;
> using System.Windows.Forms;
> using System.Net;
> 
> namespace GoogleTranslation
> {
>     public partial class FormTranslate : Form
>     {
>         string defaultText = "People of Copenhagen like silver bicycles!";
>         string languagePair = "en|fr";
> 
>         public FormTranslate()
>         {
>             InitializeComponent();
>             this.textBoxSource.Text = defaultText;
>             this.labelLanguage.Text = languagePair;
>         }
> 
>         private void buttonTranslate_Click(object sender, EventArgs e)
>         {
>             string inputText = this.textBoxSource.Text;
>             if (inputText.Equals(string.Empty))
>             {
>                 this.textBoxSource.Text = defaultText;
>                 inputText = defaultText;
>             }
>             string translatedText = TranslateText(inputText, languagePair);
>             this.textBoxTranslate.Text = translatedText;
>         }
> 
>         /// <summary>
>         /// Translate text using Google Translate web page.
>         /// <para>Google URL: "http://www.google.com/translate_t?hl=en(et)ie=UTF8(et)text=InputText(et)langpair=LanguagePair"</para>
>         /// </summary>
>         /// <param name="input">Text to be translated.</param>
>         /// <param name="languagePair">Two-letter language pair, delimited by "|".
>         /// <para >E.g. "en|fr" language pair will translate from English to French.</para></param>
>         /// <returns>The translated text.</returns>
>         public string TranslateText(string inputText, string languagePair)
>         {
>             string url = String.Format("http://www.google.com/translate_t?hl=en&ie=UTF8&text={0}&langpair={1}", inputText, languagePair);
>             WebClient webClient = new WebClient();
>             webClient.Encoding = System.Text.Encoding.UTF7;
>             string translationPage = webClient.DownloadString(url);
>             // Example for en|fr:
>             // <span id=result_box class="short_text">
>             // <span
>             //      title="People of Copenhagen like silver bicycles!"
>             //      onmouseover="this.style.backgroundColor='#ebeff9'"
>             //      onmouseout="this.style.backgroundColor='#fff'">
>             //      Les gens de Copenhague comme les bicyclettes d'argent!
>             // </span>
> 
>             /// <Search String="id=result_box">Value to be searched on the results for cutting the string looking for the translated text</Search>
>             int pos0;
>             pos0 = translationPage.IndexOf("id=result_box");
>             pos0 = translationPage.IndexOf("<span title=", pos0);
>             pos0 = translationPage.IndexOf(">", pos0) + 1;
> 
>             /// <Search String="</span>">This is found right after the translated text</Search>
>             int pos1;
>             pos1 = translationPage.IndexOf("</span>", pos0);
>             string translatedText = string.Empty;
>             if (pos1 > pos0)
>             {
>                 //  Decode text like:
>                 //      Les gens de Copenhague comme les bicyclettes d'argent!
>                 //  to:
>                 //      Les gens de Copenhague comme les bicyclettes d'argent!
>                 translatedText = WebUtility.HtmlDecode(translationPage.Substring(pos0, pos1 - pos0).Trim());
>             }
>             return translatedText;
>         }
>     }
> }
> </C#>
> 
> /gustav
> 
> >>> Salakhetdinov Shamil <mcp2004 at mail.ru> 17-02-12 13:20 >>>
> Hi Gustav --
> 
> Google could have blocked translation service by using "grabbing approach" - here is the code that returns nothing:
> 
> public static string Translate(
>     string sentence,
>     string sourceLanguageCode,
>     string targetLanguageCode)
> {
>     string url = string.Format(
>             "http://translate.google.com/#{0}|{1}|{2}",
>             sourceLanguageCode,
>             targetLanguageCode,
>             System.Web.HttpUtility.HtmlEncode(sentence)
>             .Replace(" ", "%20"));
> 
>     System.Net.WebClient webClient = new System.Net.WebClient();
>     webClient.Headers.Add("user-agent", "Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.2; .NET CLR 1.0.3705;)");
>     string html = webClient.DownloadString(url);
> 
>     string startTag = "<span class=\"hps\">";
>     string endTag = "</span>";
> 
>     System.Text.RegularExpressions.Match match = System.Text.RegularExpressions.Regex.Match(html,
>         string.Format("(?<={0})(.*?)(?={1})", startTag, endTag));
> 
>     return match.Value;
> }
> 
> They probably fill HTML textbox with translation by using JavaScript, or my code above is wrong...
> 
> If you have the code which works it would be interesting to see it here.
> 
> Anyway I will use Google API for my batch translation.
> 
> Thank you.
> 
> -- Shamil
> 
> 17 февраля 2012, 13:21 от "Gustav Brock" <Gustav at cactus.dk>:
> > Hi Shamil
> >
> > You can "attack" the web interface via code. It's very easy - you post the wording to be translated, grab the returned html stream, and parse it for the returned translation. Much slower than the API, of course, but totally free as far as I understand. I didn't check for upper/lower case issues however.
> >
> > I have code for it somewhere should anyone be interested.
> >
> > /gustav
> >
> > >>> Salakhetdinov Shamil <mcp2004 at mail.ru> 16-02-2012 14:50 >>>
> > Hi All --
> >
> > JFYI: I have tried to use .NET Google Translate API today (http://code.google.com/p/google-api-dotnet-client/wiki/Setup) and it worked rather well - rather well because it somehow failed to translate words "Event Viewer" into Russian but translated them well into French. And http://translate.google.com/ is able to translate words "Event Viewer" well into Russian...
> >
> > Google Translate API also translated well lowcased words "event viewer" into Russian.
> >
> > Google Translate API is a paid service - USD20 for 1 million translated chars - so to not waste money (I need to translate about 800,000 chars) I'd probably need to submit only lowcased words to Google Translation API. We will see.
> >
> > Thank you.
> >
> > -- Shamil
> 
> _______________________________________________
> dba-VB mailing list
> dba-VB at databaseadvisors.com
> http://databaseadvisors.com/mailman/listinfo/dba-vb
> http://www.databaseadvisors.com
> 
> 



More information about the dba-VB mailing list