diekershoff.net

/pub/diekershoff/tobias

Python and gettext

software #python #gettext #translations #transifex #friendica #gitea #github
Estimated time to read: 5 min.
Do 07 Juni 2018

Since at least SourceForge committed suicide it is obvious that companies hosting your code may be part of the problem. Having central spots for code and/or FLOSS projects is nice, but not really necessary. As long as the project has a homepage that links to a copy of the code and is listed in the search engine of your choice. Never the less it is nice to have and easy to use, which is partly a reason why the issue in the Friendica tracker to not use github any more slept for such a long time. With the recent turn of events, github being acquired by Microsoft that issue got a boost on the priority ToDo list. The love of MS for Linux and FLOSS is at least questionable.

And in fact, preparation to solve this issue are some month old now. Evaluating self-hosted alternatives over the last year or so, gitea was the winner, for ease of setup and features really needed. So there is now a gitea instance of the Friendica project, that serves mainly as a mirror of the repositories outside of github for some month now already. However, there were other more pressing issues for the recent release so the remaining steps to solve the github issue remained on a low priority.

One of them is the landing page being available not only in plain English. We have a wonderful community of translators, who do a great job at Transifex to translate the UI of Friendica and Hubzilla. So I wanted to tap into that resource to get a multilingual landing page as well. Hence, I looked into the used template and the gettext support of python3 to automate the process of generating the template file for the projects gitea instance.

Extraction of the translate-able strings

For a string being detected for translation, in python the syntax is to wrap it in _('this is the string') which is an alias for the gettext.gettext function. In the template for gitea, all translations are bundled up in a cascade of

{{if eq .Lang "de-DE"}}
    ... do something in German ...
{{else if eq .Lang "it-IT"}}
    ... do the same in Italian ...
{{else}}
    ... do the same in the default language for the page (English)
{{end}}

So for the script I decided to define the translate-able strings at the top of the script and then pack together some self-made templates used while looping over all available translations to generate the gitea template file. Following the deferred translation example from the python docs I created a block of strings, marking them for translation.

# constant strings for the webpage
str_collection = _("Collection of the git repositories of the <a href='https://friendi.ca' target='_new'>Friendica</a> project.")
# and so on...

These strings are extracted using the pygettext tool that comes with python running the following at the CLI

pygettext -p ./lang/C/ generate_home_tmpl.py

which will create a file called messages.pot in the lang/C directory of the project. This file is then uploaded to Transifex. You can then add an URL for Transifex to check for updates to the translation base file on a regular basis. But you can also upload the file manually once it changes. Or just pass that file to your translator by email or other methods.

Generation of the multi-lingual template

Once the file has been translated it has to be converted into a binary format. To do so one can use the tool msgfmt. For gettext to find the file, it has to be placed into a subdirectory of the locales directory (I choose the lang directory of my project to be that locales directory). The directory structure must be something like

+ lang
   + de-DE
   |   + LC_MESSAGES
   |       + messages.mo
   |
   + it-IT
   |   + LC_LANGUAGES
   |       + messages.mo

These messages.mo files are generated with the tool mentioned above

msgfmt messages.pot

but before you can execute that program, make sure that the content-type character set in the header of the messages.pot file is set to something useful like UTF-8.

Having this files in place, it is time to tell python to use the translations. For my script I pre-load the translations before usage into a dictionary, later I’m taking the needed language from that dictionary and write the translations into the right spot of the template file.

lngdir = os.path.join(os.path.abspath(os.path.dirname(__file__)), 'lang')
translations       = ['de-DE', 'nl-NL', 'it-IT']
languages          = {}
for lng in translations:
    l = gettext.translation(domain='messages', localedir=lngdir, languages=[lng])
    languages[lng] = l

So at the moment there are three translations ready to use, Dutch, German and Italian. Those are collected in the translations list. With gettext.translation the translation for each of the languages is loaded from the domain messages, which is essentially the filename except of the .mo suffix, and the files can be found in language specific subdirectories of the lngdir directory passed to the localedir parameter.

Later in the script, when the translations are needed, they are read from the dictionary and applied. For the first step of the if $lang elseif $lang … else … end construction mentioned above, this looks like this:

    if len(translations):
    l = languages[translations[0]]
    _ = l.gettext
    l.install()
    outFile.write('\n{{if eq .Lang "{{lngcode}}"}}\n'.replace('{{lngcode}}', translations[0]))

which is then followed by a similar block looping over the rest of the translations (translations[1:]) and finalized with the untranslated block. Thus, the template for gitea is generated from the different messages.mo files and some constant blocks of HTML code with placeholders (e.g. the {{lngcode}} in the last line of the above code block), that are replaced by the translated text.

The script is full of most likely unneeded comments at the moment. Which is why I have not yet pushed it to a public repository. But it is already working nicely and will get published once cleaned up.

The one bump on the ride was me not realize that there needs to be a LC_MESSAGES subdirectory for every translation. Took me some time reading other entries (see below) online to realize the missing piece in the directory structure. After that was added, everything worked out fine.

Links

While diving into the topic, I enjoyed reading the following articles from the web.


theme last update

november 2017

License

Unless otherwhise noted the contents of this homepage are governed by a Creative Commins license (CC-BY) that essentially means you may use my content to remix it into your work but name me.

Contact

You can send me an email to tobiasdiekershoff.net or see the imprint for further contacts channels.

Made with

Powered by Pelican. Theme inspired by Bootply using the Sandstone color schema.