Scottish Gaelic Information Technology

by | May 25, 2015

Minority languages like Scottish Gaelic got 99 problems and information technology is one. Linguistically speaking, information technology (IT) is dominated by English. How on earth can minority language users carve out a space for their own technology needs and desires without having to use English or another dominant language? How can IT be used in minority-language immersion teaching, for example, if the interface is English? Who will identify the IT needs and desires of minority language users and work to fulfill them?

Surely, you think, there is some kind of expert committee figuring these things out for Gaelic?

While you might expect nonprofit or educational organizations to be doing the work, quite a lot of this effort is individual. Here I present an interview with Michael Bauer, the human being behind many recent achievements and efforts in the world of Scottish Gaelic-medium IT. He gives his insights on the challenges of Gaelic software localization and the problems of English-Gaelic machine translation:

EM: It seems both obligatory, and obnoxious to me as a non-native Gaelic speaker, to start off an interview with biographical questions when it’s really about your work with minority language IT. But here goes. Where are you from, and how did you get interested in Gaelic?

MB: I was born and brought up in Munich, Germany. And the thing with Gaelic just happened… I took an interest in Basque and Irish at the tender age of 16 or 17. Irish was the first language I ‘got into’, spending many happy months in Ireland to acquire some fluency. I then threw myself into Basque and – though rusty now – became highly fluent. Scottish Gaelic had initially interested my when I started Irish but because the only class in Munich at the time was for Irish, I dropped it and didn’t learn it until I switched to Edinburgh University because I’d met someone special via the Internet. So hand on heart, it was a coincidence.


Michael Bauer of Akerbeltz and iGàidhlig with his cat Squiggles

EM: You are the person behind Akerbeltz, which has given us Blas na Gàidhlig and Gaelic grammar resources online. What exactly is Akerbeltz? How does that work fit in with your IT work for Gaelic?

MB: I get asked that one a lot! Akerbeltz is one of the old Basque gods, it literally means ‘black billy goat’ – a bit like the Greek Pan without the nasty bits. I had gone by that handle in chat rooms for a long time (too many people called Michael) and when I put my Gaelic resources site together, I needed a domain name and I didn’t want anything ostentatious as it was just going to be my own ramblings (this was before blogs). So I went for Akerbeltz as it was ‘me’. I have since used it in all sorts of variations, including my micro-publishing house Foillseachadh Akerbeltz as it’s the closest thing to a brand name I have. I went for a different name with iGàidhlig because I wanted it to be more obviously about Gaelic and IT. Plus it’s a neat multi-level pun. I did learn something from watching the early episodes of the Apprentice it would seem!

EM: So iGàidhlig (“Teicneòlas Gàidhlig dhan 21mh linn“) is your new project that can guide people through obtaining Gaelic-language versions of software and “Gaelicizing” their computer usage to whatever extent they desire? Can you tell me more about it?

MB: It’s now officially a charity actually. It started out by me falling into localization (the translation of software) via Google (before they went over to the dark side) and then Mozilla. It has two main aims – one is to continue working on projects (both localization and development of tools) which would be of benefit to Gaelic speakers and learners. The second is to encourage people to use these and to help them getting ‘set up’ with Gaelic software. Oddly enough, the hardest part of this all is NOT localizing software or developing tools but to get folk to use them. Not because they don’t like them when someone shows them what they do and how to get them and use them. But because the majority of users don’t feel comfortable with tinkering with their systems and for many, something like installing Firefox to replace Internet Explorer is a real challenge.

The reason I registered it as a charity is because I want to give it more sustainability, hopefully by bringing on board some more hands and to ensure that when I pop my clogs, the work will continue. It is one of the greatest failings of small languages that they rarely consider the long-term future of IT projects.


iGàidhlig - Gaelic language software

EM: Can you please give a quick list of your various IT projects for Scottish Gaelic – all of the Gaelic versions of software and platforms that you have created, as well as projects like the online dictionary Faclair Beag?

MB: Urgh, all of them? Let’s see (my contribution is the percentage; if no percentage is shown, then it’s all me):

All the Mozilla stuff (99%)
Ubuntu/Linux (75%)
Microsoft (Windows/Office etc… about 65%)
Faclair Beag (the IT side aside, 98% me)
Google (when it was running)
WordPress (98%)
Webfeud (word game like Scrabble)
GCompris (50%) (educational games and software)
MediaWiki (hard to judge.. a high %)
An Dearbhair Beag
TuxPaint (65%)

Technical expertise is often provided by other people such as Adaptxt, or all the spellchecker/wordlist geekery comes mostly via a tool which Kevin Scannell helped me set up.

EM: What are you working on right now (May 2015)?

MB: Gaelic Text to Speech with Cereproc; maintenance of existing projects…

EM: Who are the main users of the Gaelic products and services you’ve created?

MB: Good question. Very hard to know. The only reliable data we get is from Mozilla, which suggests uptake is low but comparably so with other small languages. As to where they are or who they are, I can only make educated guesses. Not schools, because all IT is outsourced in Scotland and the third party providers refuse to supply anything in Gaelic. Even a better keyboard setting.

I can usually tell from a dip in Firefox users when SMO [Sabhal Mòr Ostaig, the Gaelic college on Skye] is off. I feel it’s about a 65/35 split between learners/young people and native/older people. Weaning folks off pre-intalled software is hard (which is why I’m setting up iGàidhlig as a charity). And it’s a ‘one step forward, two to the side’ game; just when we had a good offering of desktop stuff, suddenly the world lurches onto mobile devices and they are such linguistic fortresses, it would make you weep.

EM: What do you mean by ‘linguistic fortresses’?

MB: How can I put this without turning it into a rant… I’ll give you two examples. Number one, the WordPress app. Now I’ve been maintaining the translation for a few years now but unfortunately, almost all apps use something calle ‘force locale’ to determine the interface language of your apps (because the phone manufacturers all assume we’re lazy monolinguals…). That means if your Android is set to French, then all your apps will be in French. Even if the app has been translated into, say, Tahitian, you can’t get at it unless you set your whole phone to Tahitian. Which is where the next hurdle is – it is up to the manufacturers to decide which locales they ship with a handset. Usually, only the big boys and girls get in. French, sure, Spanish, sure, Chinese, of course… Tahitian, forget it. It wasn’t until I finally convinced WordPress to add a language option in the settings that you could finally get your hands on the Gaelic translation.

Second example – predictive texting. We teamed up with a company called Adaptxt to allow people to swype and all that. Worked on Android straight away because Android happens to allow you to use “third party text entry methods” aka “predictive texting not created by Google”. Guess what… yup, Apple used to block such apps until very recently. So even though Adaptxt had all the tech in place to work on iPhones and iPads, you couldn’t use it. Only with iOS8 did they finally allow third party apps like Adaptxt.

For some reason, desktops tend to give the user a lot of freedom in terms of choosing things like how to enter text or picking the language of your software. But on mobile devices, most things to do with language make attacking Mordor seem like a cake walk.

EM: Have you been paid for creating these resources? (Nosy question I know!)

MB: Yesno. The main commercial client is Microsoft. Occasionally I will pick up a paid job in relation to software localization such as OpenCart for a Gaelic body. And the BBC (at least so far) has paid each year for a license to re-use the Faclair Beag on LearnGaelic. It just about pays for all the unpaid projects or rather, I haven’t starved yet. 🙂

Software translation and development (in a very loose sense of the word, i.e. I work with developers providing linguistic expertise, I can’t programme for toffee) oddly enough IS my main line of work.

EM: Why is it that you, as an independent individual, were able to create a true online Gaelic dictionary where Gaelic organizations and publishers had not (with the exception of Sabhal Mòr Ostaig’s Stor-Data which is really just a wordlist database)?

MB: It’s a combination of things (including the congenital madness in my family). 🙂

I tend to see something that needs fixing or is missing and if I deem it a worthwhile need, then set about creating a solution without waiting for permission, encouragement or funding. Or paying attention to people who tell me it can’t be done. I usually figure that it will either somehow pay for itself in the long run or won’t but as long as I enjoy doing it, that to me is payment in kind. Some people climb mountains, some enjoy shopping. I get a kick out of seeing the dictionary grow.

And it mostly just ‘happened’, it wasn’t a dictionary to begin with but grew out of a wordlist I kept at university because I wanted something searchable (English to Gaelic) which Dwelly wasn’t – and the other dictionaries were too small and ‘slow’ in the sense that you’d often have to rely on a good deal of guesswork to tease out the right word or expression. So all the words I looked up, all my notes from language classes (regarding words and idioms), all the stuff I was reading, went into a spreadsheet. Which then became the Faclair nan Gnàthasan-cainnte initially and eventually the Faclair Beag. I just keep adding to it, one word or phrase at a time without thinking about the size of the task – that would get depressing. And every now and then you look back and go ‘gosh, not doing too badly’.

Not having kids helps and the ability to put in crazy hours, admittedly. And a job that leaves me just about enough time to do all this stuff though I’m probably near full-capacity just now.

EM: As we’ve seen, bad things happen when people try to do their own translations for tattoos in both Irish and Scottish Gaelic, whether with a dictionary or online resources. Are there currently (May 2015) any online machine translation services for Scottish Gaelic?

MB: Fortunately not. Well, there is a version that translates from Scottish Gaelic to Irish (at which works fairly well done by Prof. Kevin Scannell in the US, the über-guru of Irish and other minority language language technology.

EM: Why are there no online machine translation services for Gaelic yet, in your opinion?

MB: First, 60,000 speakers is not big business (so the likes of Google, Apple and Microsoft don’t really give a fig unless you happen to know someone on the inside). Second, all speakers are bilingual (by contrast, Haitian Creole made it onto Google Translate in the wake of their hurricane disaster, which created a need for a tool enabling aid workers to communicate with the locals many of whom are monolingual Creole speakers). Third, Gaelic has no real lobby (not the way Māori has for example).

EM: And why do you say “fortunately not” (as to the current lack of online machine translation services for Gaelic)?

MB: Don’t get me wrong, I like technology and I’m always keeping a weather eye on the e-horizon for anything that might be useful to Gaelic speakers and learners. But machine translation simply isn’t one of those things. At best, it serves the non-Gaelic world, at worst, it will harm the Gaelic world. Let me explain.
First, there is feasibility and quality. MT works best:

(a) for closely related languages

(b) going from a morphologically complex (hard grammar) to a simple language (easy grammar). Mostly that means INTO English works better than INTO something else.

(c) if you have a truly massive bilingual aligned corpus (same text in two languages, side by side). We’re talking billions of words, ideally.

Neither (a), (b), nor (c) apply in the Gaelic context. Not even for Irish, where there is a lot more data. And we all know how useless Google is going from English to Irish. So whatever we do, without a truly gargantuan effort, at best we can hope for something that’s as bad as Irish<>English on Google Translate. At worst, well, it will be worse. ‘All your bases are belong to us’ kind of bad. Does the world really need more bad translations?

And no matter what the designers of such systems think regarding ‘responsible use’, that’s a pipe dream. Give people a tool, and they will mis-use it. Like our former joiner who used to hammer in screws. People have and will use Google Translate for official signs (has happened in Ireland, China and elsewhere), official websites (has happened in Ireland and probably elsewhere), homework/coursework (has happened in many places). Sure, machine translation when used responsibly can be useful, say when I’m trying to make sense of a Russian news article or a scientific article in Japanese. To get the gist in English.

Which brings me onto the wider question of ‘what for’? If we have a system that will never create good Gaelic from English, the only direction it might be of use is the other way, Gaelic to English. Really? There are few enough domains which are only accessible via Gaelic, with song, stories and poetry (much as I dislike the stuff) probably being the last bastion. How does it server Gaelic, Gaelic speakers or learners if we take away one of the last things which you can only access if you have a command of the language? So a second grader in New Zealand can write a paper on Calum Sgàire in English?

Google Translate is a running joke in Ireland. Do we really need to replicate the joke?

EM: Recently in a Facebook group I read an exchange between yourself and a Gaelic linguist about the pros and cons of submitting Gaelic texts to Google in order to help create Scottish Gaelic functionality in Google Translate. You were not in favour of doing so, while he was. So there is some difference of opinion on this issue?

MB: What I’ve already said aside, my view is that even if we decide that we want Gaelic machine translation, then let’s go with something we can at least partly control. There are decent Open Source MT platforms out there like Apertium. Google is a bit like a black hole, you chuck something in and you have no idea where it goes or when and how it’s going to hit you coming back the other way. And you certainly lose control straight away. No, I’m not a control freak. I’m a *quality* control freak, before you ask.

I don’t hate MT per se and I’m honestly not worried about my income – but why oh why Google Translate and why oh why English<>Gaelic? I’ve helped Kevin Scannell with some with his MT project in the past (he’s working in Irish<>Gaelic) which produces some decent results even at this stage (it fulfils condition (a) above, after all).

It’s just another one of those sexy-sounding projects which I wish people would think through all the way, including the unintended consequences.

EM: So ideally we shouldn’t use machine translation, but should rather hire real people, professional translators, luchd-eadar-theangachaidh, in order to get the best results, support the integrity of the language, and support the actual minority language community. What are the greatest challenges in the professional translation field?

MB: How long have you got 😉
– No orthographic framework that isn’t full of glaring holes and errors;
– not enough native speakers who are confident in writing/reading;
– not enough learners who have good enough Gaelic;
– and not enough of either who have any training in translation.

EM: You’ve also mentioned in Facebook posts how UK organizations outsource minority language translation to firms in India, which then come back to you wanting to pay you a low rate compared to what they would earn for your work.

MB: Fortunately the outsourcing issue is more of an annoyance than a problem. It is an odd by-product of being so small a language – the price goes up and usually you can insist on getting *your* rate because you know they won’t be able to find someone else in a hurry.

EM: I remember hearing discussions about the need for professional translator training programs for Scottish Gaelic (like the ones for Irish) as far back as 2000 during my Ph.D. fieldwork, when the Scottish Parliament was newly restarted and people were suddenly pressed into service for simultaneous interpretation. Are there any professional translator training programs in existence now?

MB: Agi Hammerthief is writing that brief as we speak. Or for those who haven’t read Terry Pratchett, yes, there is talk now and then, but nothing really ever happens. I believe UHI [University of the Highlands and Islands] were looking at it last year but from what I can recall of the questionnaire I saw, I’m not holding my breath. Especially since it seemed to involve membership fees which would be rather… exorbitant given that most translators are part timers.

EM: About how many professional Gaelic translators in Scotland do you know of besides yourself?

MB: About a dozen or so. But I think Bòrd na Gàidhlig had a full-ish list some years ago which had some 50 names on it. If there are more than a dozen who ever had any formal training I’d be surprised. Not that you can’t learn on the job, many people do and end up doing a very good job. But there are probably better and easier ways of reaching that level.

EM: Tapadh leibh, a Mhìcheil! If readers have additional questions for Michael, please submit them via blog comments or e-mail. If we have enough questions, perhaps we can do a part 2!

