Master JSON Schema's Subtleties
Don’t be Surprised by JSON Schema’s Surprising Surprises (Part I)
Before returning to API design with OpenAPI, I want to present a detailed post (well, two actually) on the more subtle nuances of JSON Schema as a language for designing the data of APIs. Knowing these details will help you understand other APIs’ schemas and help you construct your own.
Welcome to the next article in the Language of API Design. Rather than jumping into the middle of this series, I encourage new subscribers/visitors start by reading Your Guide to The Language of API Design and scanning previous posts in the series.
It’s turtles schemas all the way down
Before we move on to the fun stuff, I want to reiterate the point I made in my last post Composing API Models with JSON Schema.
JSON Schema is a highly recursive specification. Schemas are primarily built by combining smaller schemas into larger schemas in many different ways. Within the properties
of an object
schema, all of the named subelements are schemas. The items
in an array
schema are defined by a subschema. Other elements (some described below) also use nested subschemas to define them. Grasping/grokking this very important concept is key to understanding how JSON Schemas are constructed and interpreted: it’s schemas all the way down.
With that, let’s see what surprises JSON Schema holds for those initiates seeking fluency!
Defining properties
does not imply type: object
JSON Schema has a very important keyword when you want to define a JSON object that has a set of known and defined properties. Fortunately for us, this keyword has a easy to remember name: properties
.
Let’s consider the simplest example of a schema that defines three numeric properties, x
, y
, and z
:
components:
schemas:
threeDimensionalPoint:
properties:
x:
type: number
y:
type: number
z:
type: number
For this schema, the JSON value { "x": 1.618033, "y": 2.71828, "z": -3.14159 }
will be valid against this schema, as will { "x": 1.618033, "y": 2.71828 }
and even the value {}
(because none of the properties are required
).
The value { "x": "1.618033", "y": true, "z": [] }
is invalid because the values of the x
, y
, and z
properties are not numbers. OK, that fits our expectation, no surprises, right?
Well, no. Surprisingly, all the following values are also be valid against this schema:
{ "a": 1.618033, "b": 2.71828, "c": -3.14159 }
1.618033
[
1.618033, "y": 2.71828, "z": -3.14159]
null
"z"
"Can you believe I’m valid, too?"
false
Why? Let’s investigate. The core specification defines the properties
keyword as follows:
The value of "properties" MUST be an object. Each value of this object MUST be a valid JSON Schema.
Validation succeeds if, for each name that appears in both the instance and as a name within this keyword's value, the child instance for that name successfully validates against the corresponding schema.
This means that the validation enforced by the properties
keyword is validation defined by the subschemas of the named properties. However, nothing here says that the instance (the JSON value being validated) must be an object. The above is only conditionally applied if If the JSON value is not an object.
In other words, using properties
does not imply a type: object. The core specification
also says
A missing keyword MUST NOT produce a false assertion result
This means no defaults are implied. If the type
keyword is missing, validation MUST NOT assume the missing keyword is in force.
(Note that the ajv tool, popularly used to validate JSON data against JSON Schema, has a strict
option which helps detect and prevent common schema composition errors, such as omitting a type
constraint like our first weak schema. Strict mode is enabled by default. ajv
will mark a schema invalid if it has a properties
object but does not define a type
that includes object
. However, this is a schema definition “sanity check” provided by that tool, and ajv
not enforcing any “strictness” rule of the JSON Schema specification.)
This gist of all this is that, when defining an object schema for API data, be sure to include type: object
along with the properties
keyword. It need only be listed once when you compose multiple schemas, but there is no harm in including it in each subschema—IOW, most places where you use properties
, you can safely add type: object
. (In Part 2, however, we’ll see when this is not the case!)
This “no implicit type” rule also extends to many other JSON schema keywords. For example, the minimum
property does not imply type: number
or type: integer
; the specification only says that if an instance is a number, the value must not be less than the minimum
to be valid. Surprise!
additionalProperties: false
is not what you think
Our next fun fact of JSON Schema is that additionalProperties: false
is not what you think.
additionalProperties
is a keyword that applies to object
schemas which specifies how to handle additional properties in the object that are not defined with the properties
keyword of the object schema. (Like the properties keyword, it does not imply type: object
.)
Let’s consider our example, expanded a bit. Say we want to reject any a JSON document that has properties other than x
, y
, and z
:
components:
schemas:
threeDimensionalPoint:
type: object
required:
- x
- y
- z
properties:
x:
type: number
y:
type: number
z:
type: number
additionalProperties: false
It’s easy interpret this schema as “a threeDimensionalPoint
is an object that has 3 properties (x
, y
, and z
) and no additional properties.” Cool, the value { "x": 1.618033, "y": 2.71828, "z": -3.14159 }
is valid, while the value { "w": 0, "x": 1, "y": 2, "z": 3}
is invalid because it contains an additional property, w
. Easy to interpret that way, but not really correct.
Although additionalProperties
looks like a boolean flag, that’s the wrong interpretation. Instead of a flag, this value specifies a nested JSON Schema which specifies how to validate any additional properties that may appear in an object that don’t match one of the defined properties.
That’s right, the right hand side, false
, is a schema. (Yup, it’s schemas all the way down!) The JSON Schema core specification says
A JSON Schema MUST be an object or a boolean.
The schema false
is a special case: it is a schema that rejects all JSON values as invalid. In the example, w: 0
is interpreted as invalid because the value 0
does not satisfy the schema false
. Thus, the entire object is invalid. In fact any value for w
or any other property not named x
, y
, or z
is invalid.
Conversely,
additionalProperties: true
means that any and all additional properties that satisfy the schema true
are valid (as long as no other constraints are violated.) And just as false
means any value is invalid, true
means that any JSON value is valid.
Instead of using true
or false
with additionalProperties,
one can use another schema. For example, if you want an object which maps all names to a threeDimensionalPoint
, you can use a schema similar to the following:
components:
schemas:
mapOfThreeDimensionalPoints:
title: Map of names to 3-D cartesian points
description: >-
A map object whose names are any strings
and whose values are 3-D cartesian (x, y, z) points.
additionalProperties:
$ref: '#/components/schemas/threeDimensionalPoint'
additionalProperties: false
does not do what you want
OK, now that additionalProperties
is crystal clear (it is crystal clear now, isn’t it?), I’ll let you know that additionalProperties: false
still does not do what you want… sometimes.
In simple cases that do not involve schema composition, additionalProperties: false
does what you want: any object with additional properties is deemed invalid.
However, when composing schemas (see Composing API Models with JSON Schema for details), additionalProperties: false
may not give you the desired effect.
Consider defining the above threeDimensionalPoint
via composition from a simpler twoDimensionalPoint schema
that defines properties x
and y
:
components:
schemas:
twoDimensionalPoint:
type: object
required:
- x
- y
properties:
x:
type: number
Y:
type: number
additionalProperties: false
threeDimensionalPoint:
type: object
allOf:
- $ref: '#/components/schemas/twoDimensionalPoint'
- type: object
required:
- z
properties:
z:
type: number
additionalProperties: false
This looks like should work: a twoDimensionalPoint
has properties x
and y
, and a threeDimensionalPoint
has those properties and z
. But recall that schema composition with allOf
is context free:
Schema keywords typically operate independently, without affecting each other's outcomes.
The schema keywords within a schema (or all the schemas within an allOf
array) must validate independently of each other. Thus, when a value such as { "x": 1.618033, "y": 2.71828, "z": -3.14159 }
is validated against the threeDimensionalPoint
schema, it must be validated against both of the items in the allOf
array. Unfortunately, it fails against both of them:
It satisfies the constraint on
z
in the second subschema, but the the additional propertiesx
andy
must be validated again against the same schema. However,additionalProperties: false
says any properties other thanz
are not allowed.In the
twoDimensionalPoint
schema,x
andy
are valid, but the additional propertyz
is not allowed.
Exercise for the reader: Would this still fail if we removed
additionalProperties: false
and added it only tothreeDimensionalPoint
?
Fortunately, JSON Schema has a construct to support the intended semantics. The unevaluatedProperties
is like additionalProperties
, but it extends across the subschemas. (Here, “unevaluated” means values which have not been evaluated against other adjacent schema keywords.) The core JSON Schema specification is a bit hard to understand here (I won’t even quote it), so let’s clarify how unevaluatedProperties
works with our two- and three-dimensional point schema examples.
First, it does not work to just replace additionalProperties
with unevaluatedProperties
. The same problem above will hold. Instead, unevaluatedProperties
applies to the effective aggregation of all the nested properties within the subschemas of a schema. So to use it for our threeDimensionalPoint
, we need the following:
components:
schemas:
twoDimensionalPoint:
type: object
required:
- x
- y
properties:
x:
type: number
Y:
type: number
threeDimensionalPoint:
type: object
unevaluatedProperties: false
allOf:
- $ref: '#/components/schemas/twoDimensionalPoint'
- type: object
required:
- z
properties:
z:
type: number
This now works: an object with only numeric x
, y
,
and z
properties is valid; an object with fewer or more properties is invalid, as is an object with non-numeric values for x
, y
,
and z
. (Of course, like additionalProperties
discussed above, unevaluatedProperties
names a schema, which includes the schema false.)
However, this is not the end of the story. If you want to reject objects with additional properties when validating against the twoDimensionalPoint
schema, you can’t just add unevaluatedProperties: false
to it. You need to adjust the schema composition and use mixin schema. That is, use an simple Fields
schema as described in Composing API Models with JSON Schema as follows:
components:
schemas:
twoDimensionalPointFields:
type: object
required:
- x
- y
properties:
x:
type: number
y:
type: number
twoDimensionalPoint:
type: object
unevaluatedProperties: false
allOf:
- $ref: '#/components/schemas/twoDimensionalPointFields'
required:
- x
- y
threeDimensionalPoint:
type: object
unevaluatedProperties: false
allOf:
- $ref: '#/components/schemas/twoDimensionalPointFields'
- type: object
required:
- z
properties:
z:
type: number
This example uses a mixin schema named twoDimensionalPointFields
and defines both twoDimensionalPoint
and threeDimensionalPoint
with it. Both twoDimensionalPoint
and threeDimensionalPoint
use unevaluatedProperties: false
but none of the schemas nested within them use unevaluatedProperties: false
. This is key to making this keyword work as intended when composing schemas.
The primary lesson here is that one must design schemas carefully, with composition in mind—avoid using unevaluatedProperties: false
in schemas that you use to compose other schemas (and conversely, don’t compose using mixin schemas that have unevaluatedProperties: false).
type: string
and application/json
is not suitable for plain text
When using JSON Schema to define OpenAPI definitions, most will start by defining the request and response bodies with the content type application/json
. Note that a valid JSON document can be either an object or an array, or a instance of a primitive type (boolean
, null
, number
, integer
, string
). However, a valid JSON string value must be quoted with double quotes, such as
""
"this is a string"
"12.3"
If you want to pass non-JSON string data to an API request body and not quote it, don’t use the content type application/json
. If you use application/json
, the request body
this is a string
will not parse as valid JSON (because it’s not valid JSON, that’s why!) and your API won’t work as intended.
Instead, define your request or response body using text/plain
:
responses:
'200':
description: OK. The operation succeeded.
content:
text/plain:
schema:
title: The plain text rendering content of a chain link.
description:
type: string
The JSON Schema specification is robust and extensive but sometimes a bit hard to interpret. That’s not uncommon for tech specifications and RFCs. But knowing how to interpret the language of API data will make you a better API designer and make your APIs more robust.
The kind folks in the JSON Schema community pointed out a couple things I should share:
When using OpenAPI, the `unevaluatedProperties` feature described in this post is only available with OpenAPI 3.1 (which uses JSON Schema 2020-12 by default).
Also, the behavior of `unevaluatedProperties` and `additionalProperties` are also affected by the presence of the `patternProperties` keyword.
Also, the JSON Schema team hold that ajv is non-compliant by having the strict mode enabled by default. You can turn that off with --strict=false to be more compliant.
One also needs to use the --spec=draft2020 option with the ajv CLI when using JSON Schemas that use these later JSON Schema drafts.