Master JSON Schema's Subtleties

Don’t be Surprised by JSON Schema’s Surprising Surprises (Part I)

Jul 11, 2023

Before returning to API design with OpenAPI, I want to present a detailed post (well, two actually) on the more subtle nuances of JSON Schema as a language for designing the data of APIs. Knowing these details will help you understand other APIs’ schemas and help you construct your own.

Welcome to the next article in the Language of API Design. Rather than jumping into the middle of this series, I encourage new subscribers/visitors start by reading Your Guide to The Language of API Design and scanning previous posts in the series.

It’s turtles schemas all the way down

Before we move on to the fun stuff, I want to reiterate the point I made in my last post Composing API Models with JSON Schema.

JSON Schema is a highly recursive specification. Schemas are primarily built by combining smaller schemas into larger schemas in many different ways. Within the properties of an object schema, all of the named subelements are schemas. The items in an array schema are defined by a subschema. Other elements (some described below) also use nested subschemas to define them. Grasping/grokking this very important concept is key to understanding how JSON Schemas are constructed and interpreted: it’s schemas all the way down.

A tower of turtles, generated by Bing Image Generator

With that, let’s see what surprises JSON Schema holds for those initiates seeking fluency!

Defining `properties` does not imply `type: object`

JSON Schema has a very important keyword when you want to define a JSON object that has a set of known and defined properties. Fortunately for us, this keyword has a easy to remember name: properties.

Let’s consider the simplest example of a schema that defines three numeric properties, x, y, and z:

components:
  schemas:
    threeDimensionalPoint:
      properties:
        x:
          type: number
        y:
          type: number
        z:
          type: number

For this schema, the JSON value { "x": 1.618033, "y": 2.71828, "z": -3.14159 } will be valid against this schema, as will { "x": 1.618033, "y": 2.71828 } and even the value {} (because none of the properties are required).

The value { "x": "1.618033", "y": true, "z": [] } is invalid because the values of the x, y, and z properties are not numbers. OK, that fits our expectation, no surprises, right?

Well, no. Surprisingly, all the following values are also be valid against this schema:

{ "a": 1.618033, "b": 2.71828, "c": -3.14159 }
1.618033
[ 1.618033, "y": 2.71828, "z": -3.14159]
null
"z"
"Can you believe I’m valid, too?"
false

Why? Let’s investigate. The core specification defines the properties keyword as follows:

The value of "properties" MUST be an object. Each value of this object MUST be a valid JSON Schema.
Validation succeeds if, for each name that appears in both the instance and as a name within this keyword's value, the child instance for that name successfully validates against the corresponding schema.

This means that the validation enforced by the properties keyword is validation defined by the subschemas of the named properties. However, nothing here says that the instance (the JSON value being validated) must be an object. The above is only conditionally applied if If the JSON value is not an object.

In other words, using properties does not imply a type: object. The core specification also says

A missing keyword MUST NOT produce a false assertion result

This means no defaults are implied. If the type keyword is missing, validation MUST NOT assume the missing keyword is in force.

(Note that the ajv tool, popularly used to validate JSON data against JSON Schema, has a strict option which helps detect and prevent common schema composition errors, such as omitting a type constraint like our first weak schema. Strict mode is enabled by default. ajv will mark a schema invalid if it has a properties object but does not define a type that includes object. However, this is a schema definition “sanity check” provided by that tool, and ajv not enforcing any “strictness” rule of the JSON Schema specification.)

This gist of all this is that, when defining an object schema for API data, be sure to include type: object along with the properties keyword. It need only be listed once when you compose multiple schemas, but there is no harm in including it in each subschema—IOW, most places where you use properties, you can safely add type: object. (In Part 2, however, we’ll see when this is not the case!)

This “no implicit type” rule also extends to many other JSON schema keywords. For example, the minimum property does not imply type: number or type: integer; the specification only says that if an instance is a number, the value must not be less than the minimum to be valid. Surprise!

`additionalProperties: false` is not what you think

Our next fun fact of JSON Schema is that additionalProperties: false is not what you think.

additionalProperties is a keyword that applies to object schemas which specifies how to handle additional properties in the object that are not defined with the properties keyword of the object schema. (Like the properties keyword, it does not imply type: object.)

Let’s consider our example, expanded a bit. Say we want to reject any a JSON document that has properties other than x, y, and z:

components:
  schemas:
    threeDimensionalPoint:
      type: object
      required:
        - x 
        - y
        - z
      properties:
        x:
          type: number
        y:
          type: number
        z:
          type: number
      additionalProperties: false

It’s easy interpret this schema as “a threeDimensionalPoint is an object that has 3 properties (x, y, and z) and no additional properties.” Cool, the value { "x": 1.618033, "y": 2.71828, "z": -3.14159 } is valid, while the value { "w": 0, "x": 1, "y": 2, "z": 3} is invalid because it contains an additional property, w. Easy to interpret that way, but not really correct.

Although additionalProperties looks like a boolean flag, that’s the wrong interpretation. Instead of a flag, this value specifies a nested JSON Schema which specifies how to validate any additional properties that may appear in an object that don’t match one of the defined properties.

That’s right, the right hand side, false, is a schema. (Yup, it’s schemas all the way down!) The JSON Schema core specification says

A JSON Schema MUST be an object or a boolean.

The schema false is a special case: it is a schema that rejects all JSON values as invalid. In the example, w: 0 is interpreted as invalid because the value 0 does not satisfy the schema false. Thus, the entire object is invalid. In fact any value for w or any other property not named x, y, or z is invalid.

Conversely,

additionalProperties: true

means that any and all additional properties that satisfy the schema true are valid (as long as no other constraints are violated.) And just as false means any value is invalid, true means that any JSON value is valid.

Instead of using true or false with additionalProperties, one can use another schema. For example, if you want an object which maps all names to a threeDimensionalPoint, you can use a schema similar to the following:

components:
  schemas:
    mapOfThreeDimensionalPoints:
      title: Map of names to 3-D cartesian points
      description: >-
        A map object whose names are any strings
        and whose values are 3-D cartesian (x, y, z) points.
      additionalProperties:
        $ref: '#/components/schemas/threeDimensionalPoint'

`additionalProperties: false` does not do what you want

OK, now that additionalProperties is crystal clear (it is crystal clear now, isn’t it?), I’ll let you know that additionalProperties: false still does not do what you want… sometimes.

In simple cases that do not involve schema composition, additionalProperties: false does what you want: any object with additional properties is deemed invalid.

However, when composing schemas (see Composing API Models with JSON Schema for details), additionalProperties: false may not give you the desired effect.

Consider defining the above threeDimensionalPoint via composition from a simpler twoDimensionalPoint schema that defines properties x and y:

components:
  schemas:
    twoDimensionalPoint:
      type: object
      required:
        - x 
        - y
      properties:
        x:
          type: number
        Y:
          type: number
      additionalProperties: false
    threeDimensionalPoint:
      type: object
      allOf:
        - $ref: '#/components/schemas/twoDimensionalPoint'
        - type: object
          required:
            - z
          properties:
            z:
              type: number
          additionalProperties: false

This looks like should work: a twoDimensionalPoint has properties x and y, and a threeDimensionalPoint has those properties and z. But recall that schema composition with allOf is context free:

Schema keywords typically operate independently, without affecting each other's outcomes.

The schema keywords within a schema (or all the schemas within an allOf array) must validate independently of each other. Thus, when a value such as { "x": 1.618033, "y": 2.71828, "z": -3.14159 } is validated against the threeDimensionalPoint schema, it must be validated against both of the items in the allOf array. Unfortunately, it fails against both of them:

It satisfies the constraint on z in the second subschema, but the the additional properties x and y must be validated again against the same schema. However, additionalProperties: false says any properties other than z are not allowed.
In the twoDimensionalPoint schema, x and y are valid, but the additional property z is not allowed.

Exercise for the reader: Would this still fail if we removed additionalProperties: false and added it only to threeDimensionalPoint?

Fortunately, JSON Schema has a construct to support the intended semantics. The unevaluatedProperties is like additionalProperties, but it extends across the subschemas. (Here, “unevaluated” means values which have not been evaluated against other adjacent schema keywords.) The core JSON Schema specification is a bit hard to understand here (I won’t even quote it), so let’s clarify how unevaluatedProperties works with our two- and three-dimensional point schema examples.

First, it does not work to just replace additionalProperties with unevaluatedProperties. The same problem above will hold. Instead, unevaluatedProperties applies to the effective aggregation of all the nested properties within the subschemas of a schema. So to use it for our threeDimensionalPoint, we need the following:

components:
  schemas:
    twoDimensionalPoint:
      type: object
      required:
        - x 
        - y
      properties:
        x:
          type: number
        Y:
          type: number
    threeDimensionalPoint:
      type: object
      unevaluatedProperties: false
      allOf:
        - $ref: '#/components/schemas/twoDimensionalPoint'
        - type: object
          required:
            - z
          properties:
            z:
              type: number

This now works: an object with only numeric x, y, and z properties is valid; an object with fewer or more properties is invalid, as is an object with non-numeric values for x, y, and z. (Of course, like additionalProperties discussed above, unevaluatedProperties names a schema, which includes the schema false.)

However, this is not the end of the story. If you want to reject objects with additional properties when validating against the twoDimensionalPoint schema, you can’t just add unevaluatedProperties: false to it. You need to adjust the schema composition and use mixin schema. That is, use an simple Fields schema as described in Composing API Models with JSON Schema as follows:

components:
  schemas:
    twoDimensionalPointFields:
      type: object
      required:
        - x
        - y
      properties:
        x:
          type: number
        y:
          type: number

    twoDimensionalPoint:
      type: object
      unevaluatedProperties: false
      allOf:
        - $ref: '#/components/schemas/twoDimensionalPointFields'
      required:
        - x
        - y

    threeDimensionalPoint:
      type: object
      unevaluatedProperties: false
      allOf:
        - $ref: '#/components/schemas/twoDimensionalPointFields'
        - type: object
          required:
            - z
          properties:
            z:
              type: number

This example uses a mixin schema named twoDimensionalPointFields and defines both twoDimensionalPoint and threeDimensionalPoint with it. Both twoDimensionalPoint and threeDimensionalPoint use unevaluatedProperties: false but none of the schemas nested within them use unevaluatedProperties: false. This is key to making this keyword work as intended when composing schemas.

The primary lesson here is that one must design schemas carefully, with composition in mind—avoid using unevaluatedProperties: false in schemas that you use to compose other schemas (and conversely, don’t compose using mixin schemas that have unevaluatedProperties: false).

`type: string` and `application/json` is not suitable for plain text

When using JSON Schema to define OpenAPI definitions, most will start by defining the request and response bodies with the content type application/json. Note that a valid JSON document can be either an object or an array, or a instance of a primitive type (boolean, null, number, integer, string). However, a valid JSON string value must be quoted with double quotes, such as

""
"this is a string"
"12.3"

If you want to pass non-JSON string data to an API request body and not quote it, don’t use the content type application/json. If you use application/json, the request body

this is a string

will not parse as valid JSON (because it’s not valid JSON, that’s why!) and your API won’t work as intended.

Instead, define your request or response body using text/plain:

      responses:
        '200':
          description: OK. The operation succeeded.
          content:
            text/plain:
              schema:
                title: The plain text rendering content of a chain link.
                description: 
                type: string

The JSON Schema specification is robust and extensive but sometimes a bit hard to interpret. That’s not uncommon for tech specifications and RFCs. But knowing how to interpret the language of API data will make you a better API designer and make your APIs more robust.

Jul 12, 2023

The kind folks in the JSON Schema community pointed out a couple things I should share:

When using OpenAPI, the `unevaluatedProperties` feature described in this post is only available with OpenAPI 3.1 (which uses JSON Schema 2020-12 by default).

Also, the behavior of `unevaluatedProperties` and `additionalProperties` are also affected by the presence of the `patternProperties` keyword.

Also, the JSON Schema team hold that ajv is non-compliant by having the strict mode enabled by default. You can turn that off with --strict=false to be more compliant.

One also needs to use the --spec=draft2020 option with the ajv CLI when using JSON Schemas that use these later JSON Schema drafts.

Expand full comment

Master JSON Schema's Subtleties

Don’t be Surprised by JSON Schema’s Surprising Surprises (Part I)

It’s turtles schemas all the way down

Defining properties does not imply type: object

additionalProperties: false is not what you think

additionalProperties: false does not do what you want

type: string and application/json is not suitable for plain text

Discussion about this post

Defining `properties` does not imply `type: object`

`additionalProperties: false` is not what you think

`additionalProperties: false` does not do what you want

`type: string` and `application/json` is not suitable for plain text