Community

messageformat is Working Hard to Make Themselves Obsolete


messageformat is an OpenJS Foundation project that handles both pluralization and gender in applications.

messageformat is Working Hard to Make Themselves Obsolete

This post originally appeared on DZone on September 1, 2020.

messageformat is an OpenJS Foundation project that handles both pluralization and gender in applications. It helps keep messages in human-friendly formats, and can be the basis for tone and accuracy that are critical for applications. Pluralization and gender are not a simple challenge, and deciding on which message format to implement can be pushed down the priority list as development teams make decisions on resources. However, this can lead to tougher transitions later on in the process with both technology and vendor lock-in playing a role. 

Quick note: The upstream spec is called ICU MessageFormat. ICU stands for International Components for Unicode: a set of portable libraries that are meant to make working with i18n easier for Java and C/C++ developers. If you’ve worked on a project with i18n/l10n, you may have used the ICU MessageFormat without knowing it. 

To find out more about messageformat, I spoke with Eemeli Aro, Software Developer at Vincit, and OpenJS Cross Project Council (CPC) member. Aro maintains the messageformat libraries, and actively participates in various efforts to improve JavaScript localization. Aro spoke on “The State of the Art in Localization” at last year’s Node+JS Interactive. Aro is an active participant in ECMA-402 processes, runs the monthly HelsinkiJS meetups, and helps organise React Finland conferences. 

How do formats deal with nuances in language? 

It’s all about choices. Variance, e.g. how the greeting used by a program could vary from one instance to the next, gets dealt with by having the messaging format that you’re using support the ability to have choices. So you can have some random number coming in and depending on the choice of that random number, you select one of a number of choices. This functionality isn’t directly built into ICU MessageFormat, but it’s very easily implementable in a way that gets you results. 

We need to decide how we deal with choices and whether you can have just a set number of different choice types. Is it a sort of generic function that you can define and then use? It’s an interesting question, but ICU MessageFormat doesn’t yet provide an easy, clear answer to that. But it provides a way of getting what you want. 

What are the biggest problems with messaging formats?

Perhaps the biggest problem is that while ICU MessageFormat is the closest we have to a standard, that doesn’t mean it is in standard use by everyone. There are a number of different other standards. There are various versions that are used by a number of tools and workflows and other processes in terms of localization. The biggest challenge is that when you have some kind of interface and you want to present some messages in that interface, there isn’t one clear solution that’s always the right one for you. 

And then it also becomes challenging because, for the most part, almost any solution that you end up with will solve most of the problems that you have. This is the scope in which it’s easy to get lock-in. Effectively, if you have a workflow that works with one standard or one set of tools or one format that you’re using, then you have some sort of limitation. Eventually, at some point, you will want to do something that your systems aren’t supporting. You can feel like it’s a big cost to change that system, and therefore you make do with what you have, and then you get a suboptimal workflow and a suboptimal result. Eventually, your interface and whole project may not work as well. 

It’s easy to look at messageformat and go, “That’s too complicated for us, let’s pick something simpler.” You end up being stuck with “that something simpler” for the duration of whatever it is that you’re working on. 

You’re forced to make a decision between two bad options. So the biggest challenge is it would be nice to have everyone agree that “this is the right thing to do” and do it from the start! (laughs) 

But of course that is never going to happen. When you start building an interface like that, you start with just having a JSON file with keys and messages. That will work for a long time, for a really great variety of interfaces, but it starts breaking at some point, and then you start fixing it, and then your fix has become your own custom bespoke localization system. 

Is technology lock-in a bigger problem than vendor lock-in? 

Technology lock-in is the largest challenge. Of course there is vendor lock-in as well, because there are plenty of companies offering their solutions and tools and systems for making all of this work and once you’ve started using them, you’re committed. Many of them use different standards than messageformat, their own custom ones. 

In the Unicode working group where I’m active, we are essentially talking about messageformat 2. How do we take the existing ICU MessageFormat specification and improve upon it? How do we make it easier to use? How do we make sure there’s better tooling around it? What sorts of features do we want to add or even remove from the language as we’re doing this work? 

messageformat, the library that I maintain and is an OpenJS project, is a JavaScript implementation of ICU MessageFormat. It tries to follow the specification as close as it can. 

Does using TypeScript help or hurt with localization? 

For the most part, it works pretty well. TypeScript brings in an interesting question of “How do you type these messages that you’re getting out of whatever system you’re using?” TypeScript itself doesn’t provide for plugins at the parser level, so you can’t define that. When there’s input in JavaScript, for example, for a specific file, then you can use specific tools for the different types that are coming out of it. Because messages aren’t usually one by one by one. You have messages in collections, so if you get one message out of a collection in JavaScript, you can make very safe assumptions about what the shape of that message is going to be. 

But of course in TypeScript you need to be much more clear about what the shape of that message is. And, if for whatever reason not everything is a string, then it gets complicated. 

It’s entirely manageable. You can use JavaScript tools for localization in a TypeScript environment, there are just these edge cases that could have better solutions than we currently have but work on those kind of requires some work on TypeScript behalf as well.

Should open source projects build their own solution for localization? 

I think this is one of those cases where it’s good to realize that this is JavaScript. If there’s a problem you can express briefly, you go look and you’ll find five competing solutions that are all valid in one way or another. Whatever your problem or issue is, it is highly likely that you will find someone else has already solved your problem for you, you just need to figure out how to adapt their solution to your exact problem. 

There are a number – like three or four – whole stacks of tooling for various environments for localization. And these are the sorts of things that you should be looking at, rather than writing your own. 

How is the OpenJS Foundation helping with localization?

Well, along with messageformat OpenJS hosts Globalize which utilizes the official Unicode CLDR JSON data. 

The greatest benefit that I or the messageformat project is currently getting from the OpenJS Foundation is that the Standards Working Group is quite active. And with their support, I’m actively participating in the Unicode Consortium working group I mentioned earlier where we are effectively developing the next version of the specification for messageformat. 

How far off is the next specification for messageformat?

It’s definitely a work in progress. We have regular monthly video calls and are making good progress otherwise. I would guess that we might get something in actual code maybe next year. But it may be actually longer than that for the messageformat to become standard and ready. 

How will localization be handled differently in 3-5 years? 

The messageformat working group didn’t start out under Unicode, it started out under ECMA-402. That whole work started from looking to see what we should do about adding support for messageformat to JavaScript. And this is one of the main expected benefits to come out of the Unicode messageformat working group. In the scope of 3-5 years, it is reasonable to assume that we are going to have something like intl.MessageFormat as a core component in JavaScript, which will be great! 

Effectively, this is also coming back to what the OpenJS Foundation is supporting. What I’m primarily trying to push with messageformat is to make the whole project obsolete! Right now we’re working on messageformat 3, which is a refactoring of some breaking changes. But hopefully a later version will be a polyfill for the actual Intl.MessageFormat functionality that will come out at some point. 

On a larger scale, it’s hard to predict how much non-textual interfaces are going to become a more active part of our lives. When you’re developing an application that uses an interface that isn’t textual, what considerations do you really need to bring in and how do you really design everything to work around that? When we’re talking about Google Assistant, Siri, Amazon Echo, their primary interface is really language, messages. Those need to be supported by some sort of backing structure. So can that be messageformat? 

Some of the people working on these systems are actively participating in the messageformat 2 specifications work. And through that, we are definitely keeping that within the scope of what we’re hoping to do. 

Try it out now

To install the core messageformat package, use:

npm install –save-dev messageformat@next

This includes the MessageFormat compiler and a runtime accessor class that provides a slightly nicer API for working with larger numbers of messages. More information: messageformat.github.io/messageformat/v3