Ajv Version 7, Big changes and improvements

Ajv Version 7, Big changes and improvements

The following post was written by Evgeny Poberezkin, lead maintainer of Ajv (incubation project of the OpenJS Foundation).

It’s been over a month since Ajv version 7 was released, and in this time many users have migrated to the new version. Ajv v7 is a complete rewrite that both changed the language to TypeScript and also changed the library design. I’m happy to share that it has been relatively smooth, without any major issues.

What’s new

I’ve written previously about what has changed in version 7, to summarize:

1. Support of the JSON Schema draft 2019-09 – users have been asking specifically for `unevaluatedProperties` keyword, which adds flexibility to the valuation scenarios, even if at a performance cost.

2. More secure code generation in case untrusted schemas are used. Execution of code that might be embedded in untrusted schemas is now prevented by design, on a compiler type system level (and you don’t need to use TypeScript to benefit from it, unless you are defining your own keywords).

3. Standalone validation code generation is now comprehensively supported, for all schemas.

4. Strict mode protecting users from common mistakes when writing JSON Schemas.

That is a big list of improvements that was possible thanks to Mozilla’s MOSS program grant.

Better for community

I’m also excited to share that Ajv v7 has grown contribution interest from its users, with some cases when independent users are interested to collaborate between them on some new features.

There are several reasons for that, I believe:

– the code is better organised and written on a higher level – it is easier to read and to change than before.

– documentation is now better structured with additional sections specifically for contributors – code components and code generation.

I am really looking forward to all the new ideas and features coming from Ajv users.

What’s changed and removed

These improvements came at a cost of a full library redesign, that requires being aware of these changes during migration:

– Importing from your code

– Installation

– Code generation performance

– Validation of JSON Schema formats

– Migrating from JSON Schema draft 4

Below these changes are covered in detail – they were causing migration difficulties to some users.

Importing Ajv from your code

To import Ajv in typescript you can still use default import:

 ```typescript
 import Ajv from "ajv"
 const ajv = new Ajv()
 ```

But to import in JavaScript you now need to use `default` property of the exported object:

 ```javascript
 const Ajv = require("ajv").default
 // or const {default: Ajv} = require("ajv")
 const ajv = new Ajv()
 ```

And if you use JavaScript modules you need to import Ajv this way:

 ```javascript
 // from .mjs file
 import Ajv from "ajv"
 const ajv = new Ajv.default()
 ```

This is a compromise approach that leads to a bit simpler compiled JavaScript code size of Ajv, and, what is more important, allows to export additional things alongside Ajv and does not force dependencies to use `esModuleInterop` setting of TypeScript. Possibly, there is a better way to export Ajv – please share any ideas to this issue.

Ajv installation

Several users, in particular those who use `yarn` rather than `npm`, had issues related to version conflicts between old and new versions. Because Ajv is a dependency of many JavaScript tools, the users can have both version 6 and version 7 installed at the same time.

When version 6 was released 2 years ago there were a lot of version conflicts. Since then npm seems to have improved – it handles multiple versions correctly when performing a clean installation – at least I have not seen any example that shows version conflicts in this scenario. But when performing incremental installations version conflicts still happened to a few users.

This situation should resolve itself as dependencies migrate, and in all cases clean installation resolved the problem.

Code generation performance

Validation code that Ajv v7 generates is at least as efficient as code generated by v6, and in many cases it is faster – version 7 introduced several tree optimisations and other improvements for it. The primary objective to re-design code generation was to improve its security when using untrusted schemas and to make the code more maintainable.

As a side effect, it also led to the reduction of Ajv bundle size.

The downside that may be affecting some users though is that the code generation itself is 4-5 times slower.

For most users it won’t have an impact on the application performance, as schema compilation should only happen once, when the application is started, or when the schema is used for the first time. But there are several scenarios when it can be important:

1. When using schemas in short lived environments when validation is performed only once or few times per compilation – it may include serverless environments, short-lived web pages, etc. In this case you should explore the possibility of using the standalone validation code to compile all your schemas at build time. Ajv v7 improved the stability of generating standalone validation code and it is now supported for all schemas.

2. When a schema is generated dynamically for each validation (or to perform a small number of validations). There is no solution for such scenarios – Ajv (and any other validator that compiles schemas to code) is simply not a good fit for such scenarios, if the performance is critical. The main advantage of schema compilation is that the produced validation code is much faster than it would have been to interpret the schema. But if the schema is dynamic, then there is no benefit to the compilation – a validator that interprets the schema in the process of validation could be a better fit. While it may be 50-100 times slower to validate, it may still be faster than compiling schema to code. You need to run your own benchmarks and decide what is better for your application.

3. When you used Ajv incorrectly and compiled schema for each validation. The correct usage is either to use the same Ajv instance to manage both schemas and compiled validation functions, or to manage (cache) them in your application code. In Ajv v7 this incorrect usage is more likely to be noticeable both because of slower compilation speed and also because Ajv caches the functions using schema itself as a key and not it’s serialised presentation.

To summarize, if you use Ajv correctly, as it is intended, it will be both safer and faster, but if you use(d) it incorrectly it may become slower.

Caching compiled schemas

Ajv compiles schemas to validation code that is very fast, but the compilation itself is costly, so it is important to reuse compiled validation functions.

There are 2 possible approaches:

1. Compile schemas either at start time or on demand, lazily, and manage how validation functions are re-used in your application code:

 ```javascript
 const schema = require("mu_schema.json")
 const validate = ajv.compile(schema)
 // ...
 // in this case schema compilation happens
 // when app is started, before any request is processed
 async function processRequest(req) {
   if (!validate(req.body)) throw Error("bad request")
   // ...
 }
 ```

It is important that `ajv.compile` is used outside of any API endpoints, as otherwise ajv may recompile the schema every time it is used (depending whether you pass the same schema reference or not).

2. Add all schemas to Ajv instance, using it as a cache of compiled validation functions, later retrieve them using either the schema `$id` attribute from the schema or the key passed to `addSchema` method.

File `./my_schema.json`:

 ```json
 {
   "$id": "https://example.com/schemas/my_schema",
   "type": "object",
   "properties": {
     "foo": {
       "type": "string"
     }
   }
 }
 ```

Code:

 ```javascript
 const schema = require("./my_schema.json")
 ajv.addSchema(schema, "my_schema")
 // ...
 // schema compilation happens on demand
 // but only the first time the schema is used
 async function processRequest(req) {
   const validate = ajv.getSchema("https://example.com/schemas/my_schema")
   // or
   // const validate = ajv.getSchema("my_schema")
   if (!validate(req.body)) throw Error("bad request")
   // ...
 }
 ```

If you are passing exactly the same (and not just deeply equal) schema object to ajv, ajv would use a cached validation function anyway, using schema object reference as a key.

But if you pass the new instance of the schema, even if the contents of the object is deeply equal, ajv would compile it again. In version 6 Ajv used a serialized schema as a cache key, and it partially protected from the incorrect usage of compiled validation functions, but it had both performance and memory costs. Some users had this problem (https://github.com/ajv-validator/ajv/issues/1413) when migrating to version 7.

Validation of JSON Schema formats

Format validation has always been a difficult area, as it is not possible to find an optimal balance between validation performance, correctness and security – these objectives are contradictory, and, depending on your application, you would need a different approach to validate the same format.

JSON Schema specification evolved to the point of declaring format validation as an optional, opt-in behaviour, and Ajv v7 made the same choice – formats are now released as a separate package ajv-formats.

Unlike JSON Schema specifies, Ajv does not just quietly ignore formats – it would have been error-prone – you have to explicitly configure it for the desired behaviour (or do not use formats in the schemas).

You have several options:

1. Fully disable format validation with the option `validateFormats: false`. In this case, even if you use formats in the schema, they will be ignored.

2. Define the list of formats that you want to be ignored by passing `true` values for some formats in `formats` option:

 ```javascript
 new Ajv({formats: {email: true}})
 ```

The configuration above would allow and ignore `email` format in your schemas, but would still throw an exception if any other format is used. This approach is more performant than passing regular expressions `/.*/` or function `() => true` because they would have to be executed, and in case of `true` no validation code is generated when this format is used.

3. Use ajv-formats package – it includes all formats previously shipped as part of Ajv, some of the formats have two options – more performant and more correct (`fast` and `full` – see the docs to ajv-formats).

4. Define your own functions (or use some a 3rd party library) to validate formats that suit your application – you can pass functions to ajv for each format you use and you can even use asynchronous validation if, for example, you want to validate the existence and/or configuration of domain name as part of `email` or `hostname` validation.

The last approach to validate formats – defining your own functions or using a library – is strongly recommended as it allows you to achieve the right balance between validation security, speed and correctness that fits your application.

JSON Schema draft 4 should be used with version 6

Draft 4 of the JSON Schema is the first version that Ajv supported, and since then there were several important changes in the specification that made supporting multiple versions of JSON Schema in the same code unnecessarily complex.

JSON Schema draft 2019-09 has introduced further complexity, so the support for draft 4 was removed.

You can either continue using JSON Schema draft 4 with Ajv version 6, or if you want to have all the advantages of using Ajv version 7 you need to migrate your schemas – it is very simple with ajv-cli command line utility.

What is next

Thanks for continuing sponsorship from Mozilla, many new improvements are coming in Ajv – the new major version 8 will be released in a few months.

The most exciting new feature that was just released in version 7.1.0 is the support for the alternative specification for JSON validation – JSON Type Definition – it was approved as RFC8927 in November 2020. This is a much simpler and more restrictive standard than JSON Schema, and it enforces better data design for JSON APIs, prevents user mistakes and maps well to type systems of all major languages. See Choosing schema language section in Ajv readme for a detailed comparison.

Ajv version 8 will bring many additional features and stability improvements and also will support the changes in the most recent JSON Schema draft 2020-12.

The second exciting change that is coming soon is a new website for Ajv – to make the documentation more accessible and discoverable, and to make contributions easier.

Thanks a lot for supporting Ajv!