Master More JSON Schema's Subtleties
Don’t be Surprised by JSON Schema’s Surprising Surprises (Part II)
Today, I present Part II of the Master JSON Schema's Subtleties article, imaginatively titled Master More JSON Schema’s Subtleties.
Welcome to the next article in the Language of API Design. Rather than jumping into the middle of this series, I encourage new subscribers/visitors start by reading Your Guide to The Language of API Design and scanning previous posts in the series.
unevaluatedProperties, Revisited
In the last article, I explained how the unevaluatedProperties
keyword can be used with composing JSON schemas with allOf
, to prevent clients from sending unexpected data in an object: only the properties defined in each of the subschemas within the allOf
array are allowed.
This is great, but it does not go far enough. For, although schemas themselves are recursive structures, that behavior of a schema’s keywords are not: unevaluatedProperties
is not recursive—it does not apply to nested schemas/objects. What we really want is something akin to Jean-Luc Picard speaking of the Borg: “The line must be drawn here. This far. No further.”
Let’s put this in the context of our Chain Links social media app, which consists of chains, chain links, authors, universes, characters, etc. A simplified resource model for a character object may include two sub-object properties: the character’s mother and father. (Ignore for the moment the fantasy universes with asexual reproduction, cloning, etc.). The Picard character may be represented as
name: Jean-Luc Picard
species: human
mother:
id: ch-fjk4i9f3jk4-4hkd
name: Yvette Picard
father:
id: ch-489jkexbcsl-348dk
name: Maurice Picard
We can define a schema named character
for this object, then define a schema named characterReference
for the mother
and father
properties. (Again, we only show skeletal schemas here to reveal the structure.)
components:
schemas:
character:
type: object
unevaluatedProperties: false
required:
- name
- species
properties:
name:
type: string
species:
type: string
mother:
$ref: '#/components/schemas/characterReference'
father:
$ref: '#/components/schemas/characterReference'
characterReference:
type: object
required:
- id
- name
properties:
id:
type: string
name:
type: string
Adding the unevaluatedProperties: false
assertion to the character
schema would disallow any properties other than name
, species
, mother
, and father
. Thus, the following request body would be rejected because the id
property is not allowed:
name: Jean-Luc Picard
id: ch-2305ncc-1701-d
species: human
mother:
id: ch-fjk4i9f3jk4-4hkd
name: Yvette Picard
father:
id: ch-489jkexbcsl-348dk
name: Maurice Picard
However, the above schemas would allow additional properties in the nested mother
and father
objects, even though they satisfy the characterReference
schema:
name: Jean-Luc Picard
species: human
mother:
id: ch-fjk4i9f3jk4-4hkd
name: Yvette Picard
species: human
father:
id: ch-489jkexbcsl-348dk
name: Maurice Picard
species: human
This is because the unevaluatedProperties: false
assertion in the character
schema does not extend to schemas of its child properties. Instead, we must explicitly declare this in the schema for those properties:
characterReference:
type: object
unevaluatedProperties: false
required:
- id
- name
properties:
...
unevaluatedProperties || ^unevaluatedProperties
Let’s move on to a bit more of the rationale for using additionalProperties: false
or unevaluatedProperties: false
in the first place. As noted above, this prevents an API consumer from sending in unexpected data. There are several reasons an API may want to enforce this, and two of them are related to designing robust and secure APIs:
Disallowing additional properties can prevent malicious clients from flooding your APIs with tons of data (a form of a Denial of Service attack). By employing API edge security, such as a highly scalable API Gateway, that performs JSON schema validation at the edge, your services can detect and reject such malicious use before the request makes its way to the more important API business logic tear.
More importantly, such safety measures prevent Broken Object Property Level Authorization (formerly known as Mass Assignment), a well known API vulnerability. This vulnerability is #3 on the OWASP API Security Top Ten list (2023), described in API3: 2023 Broken Object Property Level Authorization. If unprotected, this vulnerability allows a malicious actor to alter data that it should not be allowed to change, or to cause other effects if, for example, the resource server blindly writes all the properties it receives in a request to the persistent store. See this scenario for an example.
These constraints also improve the developer experience (DX) for those coding to your APIs. These keywords help detect “syntax” errors and coding mistakes when developers misspell your properties
Thus, adding unevaluatedProperties: false
assertions to your API’s schemas can enhance your API’s security and DX. Score another point for JSON Schema!
However, there is a tradeoff between security and the API’s evolution and forwards/backwards compatibility.
Consider a client that uses the above character
and characterReference
schemas, which were part of version 1.5.0 of the Chain Link API; the species
property was added in version 1.5.0. Client SDKs may perform client-side schema validation when constructing API requests. If that client happened to send such a request to a server that was running version 1.4.0 of the API, the request would be rejected. This scenario is rare but quite possible if an API definition has multiple implementations, such as an open banking API that multiple financial institutions implement independently. Some institutions may support version 1.4.0 and others may have adopted version 1.5.0. This situation is hard for clients to manage with one code base.
A complementary backwards compatibility issue arises if an API uses unevaluatedProperties: false
assertions in response schemas as well as for request schemas. If the client is built against the schemas for version 1.4.0, but version 1.5.0 added the new species
property, the client that validates responses against a schema will fail when it receives a character
object from the 1.5.0 server. Thus, when a response schema has the unevaluatedProperties: false
assertion, simply adding a new property to an object schema constitutes a breaking change in clients. This means the API version (if it follows Semantic Versioning) should have been bumped from 1.4.0 to 2.0.0 instead of to 1.5.0.
Why you should avoid format: uuid
The OWASP API Top Ten number 1 vulnerability is API1:2023 Broken Object Level Authorization: if the resource ID used in the resources’ URL path uses database sequential integers as the primary key, a hacker can use a valid integer resource ID (such as .../path/to/resources/11478)
and increment/decrement that integer to probe for other resources and possibly gain access to other user’s data that they should not see. For better security, all API resource IDs should be opaque strings which cannot be decoded to yield (sequential) integers. Often, back end services use Universally Unique IDs (UUIDs, also call Globally Unique IDs or GUIDs) or some other string form that includes a significant number of random bytes. That’s a useful implementation practice but rarely belongs in an interface contract.
JSON Schema defines a format: uuid
constraint for string properties; this looks like a useful approach to close this vulnerability. However, while it is useful to use a UUID for a resource ID or path parameter, declaring the uuid
format has two negative implications:
It exposes implementation details of your service, whereas an API should be about the interface.
It overly constrains the API implementation. If the property or path parameter is defined with a
format: uuid
constraint, then it must always be a UUID. You cannot later optimize the API with a shorter encoding of the same data (reducing a 36 byte UUID to a shorter Base64 byte string encoding) or enhance the ID with a resource type identifier prefix, as suggested in Designing APIs for humans: Object IDs).1
Instead of using a format: uuid
, use just type: string
, augmented with a reasonable maxLength
(such as 48, which gives a little wriggle room) and a pattern
that constrains the set of allowed characters in an ID. That is useful for specifying alphanumeric characters and a few special characters like _
and -
, but disallowing characters that require URL encoding when used in a URL element:
components:
schemas:
resourceId:
title: Resource Identifier
description: >-
An immutable opaque string that uniquely identifies a resource.
type: string
minLength: 6
maxLength: 48
pattern: ^[-_.~a-zA-Z0-9]{6,48}$
examples:
- ch-2305ncc-1701-d
"null"
is a type
Some APIs support JSON Merge Patch [RFC7386] semantics for updating resources. With JSON Merge Patch,
Null values in the merge patch are given special meaning to indicate the removal of existing values in the target.
Thus, a client can send a null
value to remove a property from a resource.2 To indicate that Jean-Luc Picard’s species
is unknown (or unset) rather than "human"
(and raise the ire of Star Trek fans worldwide), one could PATCH
the resource with the request
{ "species": null }
However, this request is not allowed if the type
of the species
property is type: string
; the JSON value null
is not a valid string value.
OpenAPI Specification 3.0 and previous versions of OAS used an earlier draft of JSON schema and employed the nullable
keyword to indicate a property supported a null
value in addition to the other values allowed for that schema.
properties:
species:
type: string
nullable: true
OpenAPI 3.1 uses JSON Schema 2020-12 which does not use the nullable
keyword to augment other type constraints. Instead, the string "null"
3
is the name of a special schema type
constraint that allows the JSON value null
and nothing else. How does this help us?
properties:
species:
type: "null"
Of course, we cannot simply change the type of the species
property to "null"
. Such a schema allows the above request but fails validation when a JSON string value such as "human"
is sent.
Instead, we can employ the oneOf
construct of JSON schema—a value is valid if it matches exactly one of the alternate schemas in an array of schemas:
properties:
species:
oneOf:
- type: string
- type: 'null'
Fortunately, JSON Schema provides a more concise way to represent this scenario. A schema’s type
may be an array:
properties:
species:
type: [ string, 'null' ]
This schema will accept both PATCH
requests for our friend, Jen-Luc:
{ "species": null }
and
{ "species": "human" }
Good, we’ve restored sanity to the universe!
Always define a type
constraint, except when you shouldn’t
Let’s tie up some of this knowledge of JSON schemas to see how API models may be modeled. As mentioned in my previous article (see Defining properties does not imply type: object), one should always add a type
constraint when defining JSON schemas, because without it, the schema will allow values of other types.
However, in some cases, defining the type
constraint too early can work against you. Let’s consider composing schemas using mixin schemas, such as a mutableCharacterFields
schema that is mixed into our character and characterReference
schemas introduced above.
components:
schemas:
mutableCharacterFields:
description: >-
A mixin schema to define mutable properties
of other Character instances.
type: object
properties:
name:
type: string
description: The full name of this character
minLength: 1
maxLength: 64
pattern: '^[\p{L}\p{N}\p{M}\p{Zs}\p{P}]{1,64}$'
// other mutable properties here...
characterReference:
description: >-
A reference to another existing character
type: object
unevaluatedProperties: false
required:
- id
- name
allOf:
- $ref: '#/components/schemas/mutableCharacterFields'
- properties:
id:
$ref: '#/components/schemas/resourceId'
character:
description: >-
A character in a chain link universe that
appears in chains and chain links.
type: object
unevaluatedProperties: false
required:
- id
- name
- species
allOf:
- $ref: '#/components/schemas/mutableCharacterFields'
- properties:
id:
$ref: '#/components/schemas/resourceId'
species:
type: string
// other constraints
mother:
description: >-
The character's biological mother.
$ref: '#/components/schemas/characterReference'
father: >-
description: >-
The character's biological father.
$ref: '#/components/schemas/characterReference'
Such mixins help your API design follow the DRY principle by eliminating copy/paste of non-trivial schema constraints, such as those on a character’s name
.4
This schema composition works well to define the schemas… until we want to support JSON Merge Patch to PATCH
a character. As defined, we can patch a character’s mother
or father
, but we can’t unset those properties by sending a null
:
{ "mother": null, "father": null }
Changing the type of the characterReference schema from
type: object
to
type: [ object, 'null' ]
does not work because the semantics of allOf
means that each schema must match the value. Unfortunately, while this type constraint on characterReference
allows the null
value, the effective type constraints from the schema composition are
allOf:
- type: object
- type: [ object, 'null' ]
A character reference object satisfy both type constraints, but a null
value only satisfies the second.
Thus, it may be useful to omit the type: object
constraint on mixin schemas such as in mutableCharacterFields
so that the concrete (non-mixin) schemas that use the mixins can define the correct type constraint.
The other solution is to use the oneOf
constraint on the properties instead of using the array type in the characterReference schema:
character:
...
allOf:
- $ref: '#/components/schemas/characterReference'
- properties:
...
mother:
description: The character's biological mother.
oneOf:
- $ref: '#/components/schemas/characterReference'
- type: 'null'
father:
description: The character's biological father.
oneOf:
- $ref: '#/components/schemas/characterReference'
- type: 'null'
Alas, many OpenAPI SDK generation tools do not handle the oneOf
keyword well.
Summary
This concludes API Design Matters articles on the subtleties of JSON Schema. There are more to be covered, but we’ll deal with them later as they arise in practical use.
The examples I gave above use a ch-
prefix for resource IDs for character resources.
This is only valid for object properties which are not required.
Note: This uses the string value "null"
for the type name, not the JSON null
value. This is an important distinction! You’ll get a schema error if you use type: null.
The regular expression pattern ^[\p{L}\[{N}\p{M}\p{Zs}\p{P}]{1,64}$
allows Unicode letters, numeric digits, accent marks, a space, and punctuation, but not control characters, line separators, paragraph separators, etc.