MediaWiki talk:Gadget-dictionaryLookupHover.js/how to adapt to another language

How to adapt the wiktionary lookup xslt to other languages
The basic idea is to take the english script, and translate the variables marked translate these (If for some reason you try and translate based on a script from a language other than english, which I don't recommend you do as English is easiest to translate, whatever you do, do not try to translate based on the French script, as its formatted in a harder to translate way than the other scripts).

This more or less works. However this script basically screen-scrapes the html, and the html of different languages are different, so sometimes you have to change more than this.

It is assumed you have a basic knowledge of regex, and JavaScript ( a very basic knowledge. a lot of this can be done without such knowledge)

p.s. I know this is an ugly script. Its actually mostly using xslt as a vector to execute JavaScript on the results of the api, rather than actually using xslt.

The easy parts to translate (part 1; xslt)
Lets start with the xslt variables. look for:

The part you translate is the part enclosed by the tag or by the select attribute Translate this part  Note, for the variables that use select, it is important that they have the double and single quotes as shown above.

dir (text direction)
For left to right languages (English, French, Spanish, etc): ltr For right to left (Hebrew, Arabic, etc) rtl

more
This variable is used for the text of the link to display more info (aka the link to the full definition). In English we use » More. » More

error
This is used to introduce that an error has occurred. This is displayed in case of an API error (most commonly if someone tries to lookup an illegal title, such as &lt;.) It is important to have a space at the end of this variable, as the api error message is added directly after this message (which is not translated). In English we use: Error: 

copyright
This is the copyright statement, it is one of the more complicated messages. It is important to have a space before the copyright (©) sign. be careful when translating the links. rel="license copyright" should not be translated. nor should href or a (basically don't translate stuff enclosed by &lt; and &gt;). However the urls may need to be translated (to your local wiktionary site, and to the translated cc license) Here's what we use on English:  © Wiktionary. Released under CC-BY-SA 3.0</xsl:variable> and in French: <xsl:variable name="copyright"> © <a href="http://fr.wiktionary.org/wiki/"> Wiktionnaire</a>. Paru en <a href="http://creativecommons.org/licenses/by-sa/3.0/deed.fr" rel="license copyright"> CC-BY-SA 3.0 </a></xsl:variable>

contentLang
Note. It is important this has both the double quotes and the single quotes, and that the double quotes enclose the single quotes. For English we use:
 * Two or three letter language code. This is used in the  attribute, and is also assumed to be the start of the url for the project (aka http://<whatever the contentLang is&gt;.wiktionary.org ).

<xsl:variable name="contentLang" select="'en'"/>

The easy parts to translate (part 2; JavaScript)
Look for: If you find this section confusing, just give a translation for the text of the create link (the text show in place of more if the article does not exist), and the not found link. (aka in english the Could not retrieve definition of ).

perferLang
This is an associative array (or object in js speak) mapping lang code to language name, in whatever language your working in. Keep the  and make sure you have a comma after each pair, except for the last one. Its important to make sure it can map your lang code to your language. Its a good idea to be able to map some other common languages, but its not critical. This step is generally one that can be done with google translate. (note: some language projects, like French, need a different mapping scheme. Almost every other language does it this way). Here's what it looks like in English:

var preferLang = {'en': 'English', 'fr': 'French', 'de': 'German', 'es': 'Spanish', 'it': 'Italian', 'pt': 'Portuguese', 'ja': 'Japanese', 'pl': 'Polish', 'ru': 'Russian', 'nl': 'Dutch', 'qqqAny': null}; //for now.

and in Dutch (This should probably include more langs) var preferLang = {'nl': 'Nederlands', 'en': 'Engels', 'qqqAny': null};

createLink
This one is easy. This is the text of the create link (which replaces the more link if the article does not exist). Note this accepts text input, so feel free to use &lt; without escaping if you so desire. In English we use: var createLink = '» Create'; // text only.

extractSeeAlso and see_also_process
This part is overly technical and requires knowladge of regex and html. - see bottom. If you're not sure about it, leave it to user:Bawolff.

not_found
The could not find the word your clicked on text. $1 is replaced with the word in question. For example in English we use: var not_found = "Could not retrieve definition of $1.";

The hard part to translate
Note, this part is technical, and requires knowladge of regex and HTML If you are not familiar with these things, thats ok, you can leave this part for user:Bawolff. Generally this is adapting to different formating, and not really translating.

There is quite a variaty of formatting differences between Wiktionary editions. Sometimes you need to do more than what is listed above. (however often you don't). This requires a fairly decent knowledge of regex, as well as a limited knowledge of HTML. Look for the section:

extractSeeAlso and see_also_process
Note: This part is technically from the section before, but is included here as it is more technical.

This is used for extracting the text from the see also box, which varies with almost every language. (this is the part you need regex knowledge for, and one of the harder parts to translate). Most languages use either a SeeAlso the looks like the French see also box, or the English see also box. the  is a variable containing a regex that should match the See also text. Keep see_also_process the same as it is, unless you need to further process the result of the regex (for example if you use subexpressions, getting the first sub expression). Here is what it looks like for English:

var extractSeeAlso = /<div class="disambig-see-also(?:-2)?">[\s\S]*?<\/div>/; //no subexpressions! var see_also_process = function (sa) {return sa;}

For nl (which is like fr), where we use sub-expressions, and require further processing: var extractSeeAlso = /<table class="bandeau-voir"[^>]*>([\s\S]*?)<\/td>[\s\S]*?<\/table>/; //Modified elsewhere! var see_also_process = function (sa) { return sa[1].replace(/<a(?:[\t\r\n ][^>]*)?><img(?:\/|(?:[\t\r\n ][^>]*)?)><\/a>/, ''); }

subSectRegex
This deletes everything before the language section we're interested in. If the Wiktionary uses a different scheme for organizing sections than the normal one (like fr) than you might need to change this. (or if MediaWiki parser changes. This part isn't the most robust).

extractCurLangName
Extracts the full language name. Might have to change if the Wiktionary uses fancy templates for the lang name.

Other notes
If a language is very different in how they format there page from what English wiktionary does, some other things might have to be changed. (for example on ru, we strip out examples that are on the same line as the definition).

Also: <meta name="generator" content="Wiktionary Extract XSLT 1.08-EN"/> Should have the EN translated to your language code (This is not very important, just to keep track of the different versions of this script)