This post originally appeared on DZone on September 1, 2020.
messageformat is an OpenJS Foundation project that handles both pluralization and gender in applications. It helps keep messages in human-friendly formats, and can be the basis for tone and accuracy that are critical for applications. Pluralization and gender are not a simple challenge, and deciding on which message format to implement can be pushed down the priority list as development teams make decisions on resources. However, this can lead to tougher transitions later on in the process with both technology and vendor lock-in playing a role.
Quick note: The upstream spec is called ICU MessageFormat. ICU stands for International Components for Unicode: a set of portable libraries that are meant to make working with i18n easier for Java and C/C++ developers. If you’ve worked on a project with i18n/l10n, you may have used the ICU MessageFormat without knowing it.
How do formats deal with nuances in language?
It’s all about choices. Variance, e.g. how the greeting used by a program could vary from one instance to the next, gets dealt with by having the messaging format that you’re using support the ability to have choices. So you can have some random number coming in and depending on the choice of that random number, you select one of a number of choices. This functionality isn’t directly built into ICU MessageFormat, but it’s very easily implementable in a way that gets you results.
We need to decide how we deal with choices and whether you can have just a set number of different choice types. Is it a sort of generic function that you can define and then use? It’s an interesting question, but ICU MessageFormat doesn’t yet provide an easy, clear answer to that. But it provides a way of getting what you want.
What are the biggest problems with messaging formats?
Perhaps the biggest problem is that while ICU MessageFormat is the closest we have to a standard, that doesn’t mean it is in standard use by everyone. There are a number of different other standards. There are various versions that are used by a number of tools and workflows and other processes in terms of localization. The biggest challenge is that when you have some kind of interface and you want to present some messages in that interface, there isn’t one clear solution that’s always the right one for you.
And then it also becomes challenging because, for the most part, almost any solution that you end up with will solve most of the problems that you have. This is the scope in which it’s easy to get lock-in. Effectively, if you have a workflow that works with one standard or one set of tools or one format that you’re using, then you have some sort of limitation. Eventually, at some point, you will want to do something that your systems aren’t supporting. You can feel like it’s a big cost to change that system, and therefore you make do with what you have, and then you get a suboptimal workflow and a suboptimal result. Eventually, your interface and whole project may not work as well.
It’s easy to look at messageformat and go, “That’s too complicated for us, let’s pick something simpler.” You end up being stuck with “that something simpler” for the duration of whatever it is that you’re working on.
You’re forced to make a decision between two bad options. So the biggest challenge is it would be nice to have everyone agree that “this is the right thing to do” and do it from the start! (laughs)
But of course that is never going to happen. When you start building an interface like that, you start with just having a JSON file with keys and messages. That will work for a long time, for a really great variety of interfaces, but it starts breaking at some point, and then you start fixing it, and then your fix has become your own custom bespoke localization system.
Is technology lock-in a bigger problem than vendor lock-in?
Technology lock-in is the largest challenge. Of course there is vendor lock-in as well, because there are plenty of companies offering their solutions and tools and systems for making all of this work and once you’ve started using them, you’re committed. Many of them use different standards than messageformat, their own custom ones.
In the Unicode working group where I’m active, we are essentially talking about messageformat 2. How do we take the existing ICU MessageFormat specification and improve upon it? How do we make it easier to use? How do we make sure there’s better tooling around it? What sorts of features do we want to add or even remove from the language as we’re doing this work?
Does using TypeScript help or hurt with localization?
But of course in TypeScript you need to be much more clear about what the shape of that message is. And, if for whatever reason not everything is a string, then it gets complicated.
Should open source projects build their own solution for localization?
There are a number – like three or four – whole stacks of tooling for various environments for localization. And these are the sorts of things that you should be looking at, rather than writing your own.
How is the OpenJS Foundation helping with localization?
Well, along with messageformat OpenJS hosts Globalize which utilizes the official Unicode CLDR JSON data.
The greatest benefit that I or the messageformat project is currently getting from the OpenJS Foundation is that the Standards Working Group is quite active. And with their support, I’m actively participating in the Unicode Consortium working group I mentioned earlier where we are effectively developing the next version of the specification for messageformat.
How far off is the next specification for messageformat?
It’s definitely a work in progress. We have regular monthly video calls and are making good progress otherwise. I would guess that we might get something in actual code maybe next year. But it may be actually longer than that for the messageformat to become standard and ready.
How will localization be handled differently in 3-5 years?
Effectively, this is also coming back to what the OpenJS Foundation is supporting. What I’m primarily trying to push with messageformat is to make the whole project obsolete! Right now we’re working on messageformat 3, which is a refactoring of some breaking changes. But hopefully a later version will be a polyfill for the actual Intl.MessageFormat functionality that will come out at some point.
On a larger scale, it’s hard to predict how much non-textual interfaces are going to become a more active part of our lives. When you’re developing an application that uses an interface that isn’t textual, what considerations do you really need to bring in and how do you really design everything to work around that? When we’re talking about Google Assistant, Siri, Amazon Echo, their primary interface is really language, messages. Those need to be supported by some sort of backing structure. So can that be messageformat?
Some of the people working on these systems are actively participating in the messageformat 2 specifications work. And through that, we are definitely keeping that within the scope of what we’re hoping to do.
Try it out now
To install the core messageformat package, use:
npm install –save-dev messageformat@next
This includes the MessageFormat compiler and a runtime accessor class that provides a slightly nicer API for working with larger numbers of messages. More information: messageformat.github.io/messageformat/v3