Case Study

How the Wikimedia Foundation Balances Security and Open Information in Web Development

The Wikimedia Foundation is the non-profit that hosts Wikipedia and other free knowledge and open data projects. These projects are made possible by a global community who, together with the Foundation, comprise the “Wikimedia movement”. The Wikimedia movement is united by a vision: to bring about a world in which every single human being can freely share in the sum of all knowledge. We talked with Timo Tijhof, Principal Engineer at the Wikimedia Foundation, to find out how the organization approaches security and performance at scale.

Background

The Wikimedia Foundation is the non-profit that hosts Wikipedia and other free knowledge and open data projects. These projects are made possible by a global community who, together with the Foundation, comprise the “Wikimedia movement”. The Wikimedia movement is united by a vision: to bring about a world in which every single human being can freely share in the sum of all knowledge.

We talked with Timo Tijhof, Principal Engineer at the Wikimedia Foundation, to find out how the organization approaches security and performance at scale. Timo has worked at the Wikimedia Foundation for over 10 years, first starting as a front-end developer and eventually as a part of the Performance Team. 

The Wikimedia movement is rooted in the culture of freely licensed software. The MediaWiki application that Wikipedia runs on, and all other software developed at the Foundation, is open source. “That includes the configuration and datacenter automation of our web servers, databases, and CDN service,” said Timo. The Wikimedia community and any other individual or organization may inspect, contribute to, reuse for themselves, or fork any aspect of the platform at any time. This philosophy is also the basis of long-standing security practices which support visibility and openness.

Increased Security is about Increased Visibility and Trust

We live in an incredible world. Today, most online devices are powered by open source. Whether the data centers of video streaming giants and social media sites, or your smartphone, they likely run an open source operating system like Linux or a BSD derivative. The vast majority of websites are also built with open source tools, or run on open source platforms. When you build on existing software that is developed by another organization or community, this is called an “upstream”.

The Wikimedia Foundation relies heavily on upstream technology to power its platforms. This allows the organization to focus on its core mission of providing free knowledge to the world, rather than on developing and maintaining technology from scratch. Additionally, by collaborating with other open source projects, the Foundation is able to give back to the broader free software ecosystem and help advance the state of technology for everyone.

The Wikimedia Foundation is notable for operating exclusively with upstreams that are also open source. This ensures the community’s freedom principles (to freely inspect, modify, reuse, and fork) are not hindered by proprietary components.

New Wikimedia production software components or dependencies must pass certain fitness checks and a chain of trust for the software’s security and integrity. When the Wikimedia community creates software that is peer-reviewed during development, this trust follows implicitly from its public policies and standards. When adding a new third-party package or dependency (“upstream”), this chain needs to be established by other means.

The Wikimedia Foundation extends its chain to several credible upstream vendors and communities. For example, Debian, known for its Linux operating system, is host to the highly trusted and curated Debian package repository. When a package is present in the Debian repository, this signals trust, stability, and confidence to the industry. Timo adds, “While we usually don’t audit source code of Debian packages, installing a Debian package may still require a concept review to validate and verify that the package actually intends to meet our scale, threat model, and performance requirements.”

When considering PHP or JavaScript libraries from an anonymous and open registry like npm or Packagist, the Wikimedia Foundation audits the code as if it were its own. The Wikimedia Foundation keeps on-going costs to a minimum by adopting upstream packages in areas that solve non-trivial problems, have stable external requirements, and sit behind a module boundary. “Dependencies should reduce cost, not increase it. In practice, we only consider packages with few or no transitive dependencies, written for a stable runtime,” said Timo.

As an added precaution, the Wikimedia Foundation prohibits networking to third-party services in its production realm. When deploying or installing the MediaWiki application, it does not download JavaScript or PHP packages from npm or Composer. Instead, upstream packages are downloaded as a file with an integrity hash, and already checked into Git. This approach implements the organization’s security requirements, allowing for transparent auditing, patch-ability, and independent offline deployment. “It also helps with faster onboarding, consistent and reproducible development, and creates a natural space for auditing upstream changes,” said Timo.

The Most Localized Software in the World

With over 300 language editions, Wikipedia might be among the most-translated literature in the world. Wikipedia editors usually write or translate articles manually, and in recent years, the ContentTranslation tool has helped editors do this more efficiently, producing over 1 million articles through this new tool alone. 

The MediaWiki platform underneath it all recognizes and localizes its user interface in over 400 languages, including gender, pluralization rules (“10 new messages”), and sort order ICU collations. “We contribute to the Unicode CLDR standard on behalf of Wikipedia’s language communities. These contributions flow downstream to other Unicode customers such as Linux, Apple, and Microsoft.” said Timo.

Languages like Arabic and Hebrew are written from right to left. CSSJanus takes stylesheets designed and developed for left-to-right languages like English, and automatically converts them into right-to-left layouts. “We deploy the MediaWiki platform on a weekly basis. Each change to functionality is deployed to all supported languages from day 1, every time. CSSJanus is part of what makes this feasible and with little to no developer training,” said Timo.

Not all issues are that easy! During VisualEditor development, extensive effort went into localizing the bold and italic toolbar buttons. The familiar “B” and “I” buttons usually make place for an equivalent abbreviation, such as F (Fett) and K (Kursiv) in German, with a stylized A for language communities that have no accepted standard. But, early adoption of English-centric software led to “B” and “I” becoming the established and culturally familiar design pattern in some languages. In Hebrew, Czech, and Malayalam “correcting” these with a translation actually created confusion.

No Profit Motive Means Better Support

Large corporations, driven by profit motives, regularly drop support for older devices and browsers. The Wikimedia Foundation, however, has an imperative to make information more accessible, not less.

How does the organization pull that off without the resources of a large corporation? “Through equal parts being aggressively lean and aggressively uncompromising,” says Timo.

The organization saves development and testing costs by writing and deploying native JavaScript that targets only modern browsers. Through an approach inspired by BBC News’ cutting the mustard, the Foundation enables millions of people (1% of its 2 billion monthly users) to access Wikipedia through a JavaScript-free experience. This is the same experience that all page views start at prior the (optional) arrival of JavaScript code.

The Wikimedia Foundation’s development principles and browser support policy reflects this by emphasizing the importance of progressive enhancement.

Viewing Wikipedia through a web browser is the most common access method, but Wikipedia’s knowledge is consumed far beyond the canonical experience at Wikipedia.org. “Wikipedia content goes everywhere. It’s distributed offline through Kiwix and IPFS, rendered in native apps like Apple Dictionary, and even shared peer-to-peer through USB sticks,” said Timo. What these environments have in common is that they may not involve JavaScript as they require high security and high privacy. This is made possible at no extra cost due to APIs offering complete content HTML-first, with CSS and embedded media based on open formats only.

Summary

The Wikimedia Foundation prioritizes both security and openness. To achieve this balance, it implements a number of practices and policies that ensure that it protects both the freedoms and the privacy of its audience, all while sharing information transparently.

For example, the Foundation publishes an annual transparency report detailing its response to information and takedown requests twice per year. The Wikimedia Foundation’s Board positions are largely held by community members, and appointed by public election through anonymous and cryptographically-verifiable votes from any eligible Wikipedia account. Its Governance Wiki publishes the Foundation’s bylaws, board decisions, and meetings.

The Foundation participates in an ecosystem of organizations that collaborate on freely-licensed information and open-source software. Overall, the organization balances exceptional security and openness by implementing strong security practices, and providing transparency about their policies and procedures.

——

Timo is currently helping with the Open Source Security Foundation (OpenSSF) Project Omega-Alpha and the OpenJS Foundation to reduce potential security risks for jQuery. OpenSSF has committed close to $350,000 to reduce potential security incidents for jQuery by helping modernize its infrastructure and its code. The goal for 2023 is to update jQuery infrastructure, identify potential security risks and pain points for end-users, and understand the factors that influence the adoption of new software versions.