Master More JSON Schema's Subtleties
Don’t be Surprised by JSON Schema’s Surprising Surprises (Part II)
Today, I present Part II of the Master JSON Schema's Subtleties article, imaginatively titled Master More JSON Schema’s Subtleties.
Welcome to the next article in the Language of API Design. Rather than jumping into the middle of this series, I encourage new subscribers/visitors start by reading Your Guide to The Language of API Design and scanning previous posts in the series.
unevaluatedProperties, Revisited
In the last article, I explained how the unevaluatedProperties keyword can be used with composing JSON schemas with allOf, to prevent clients from sending unexpected data in an object: only the properties defined in each of the subschemas within the allOf array are allowed.
This is great, but it does not go far enough. For, although schemas themselves are recursive structures, that behavior of a schema’s keywords are not: unevaluatedProperties is not recursive—it does not apply to nested schemas/objects. What we really want is something akin to Jean-Luc Picard speaking of the Borg: “The line must be drawn here. This far. No further.”
Let’s put this in the context of our Chain Links social media app, which consists of chains, chain links, authors, universes, characters, etc. A simplified resource model for a character object may include two sub-object properties: the character’s mother and father. (Ignore for the moment the fantasy universes with asexual reproduction, cloning, etc.). The Picard character may be represented as
name: Jean-Luc Picard
species: human
mother:
id: ch-fjk4i9f3jk4-4hkd
name: Yvette Picard
father:
id: ch-489jkexbcsl-348dk
name: Maurice PicardWe can define a schema named character for this object, then define a schema named characterReference for the mother and father properties. (Again, we only show skeletal schemas here to reveal the structure.)
components:
schemas:
character:
type: object
unevaluatedProperties: false
required:
- name
- species
properties:
name:
type: string
species:
type: string
mother:
$ref: '#/components/schemas/characterReference'
father:
$ref: '#/components/schemas/characterReference'
characterReference:
type: object
required:
- id
- name
properties:
id:
type: string
name:
type: stringAdding the unevaluatedProperties: false assertion to the character schema would disallow any properties other than name, species, mother, and father. Thus, the following request body would be rejected because the id property is not allowed:
name: Jean-Luc Picard
id: ch-2305ncc-1701-d
species: human
mother:
id: ch-fjk4i9f3jk4-4hkd
name: Yvette Picard
father:
id: ch-489jkexbcsl-348dk
name: Maurice PicardHowever, the above schemas would allow additional properties in the nested mother and father objects, even though they satisfy the characterReference schema:
name: Jean-Luc Picard
species: human
mother:
id: ch-fjk4i9f3jk4-4hkd
name: Yvette Picard
species: human
father:
id: ch-489jkexbcsl-348dk
name: Maurice Picard
species: humanThis is because the unevaluatedProperties: false assertion in the character schema does not extend to schemas of its child properties. Instead, we must explicitly declare this in the schema for those properties:
characterReference:
type: object
unevaluatedProperties: false
required:
- id
- name
properties:
...unevaluatedProperties || ^unevaluatedProperties
Let’s move on to a bit more of the rationale for using additionalProperties: false or unevaluatedProperties: false in the first place. As noted above, this prevents an API consumer from sending in unexpected data. There are several reasons an API may want to enforce this, and two of them are related to designing robust and secure APIs:
Disallowing additional properties can prevent malicious clients from flooding your APIs with tons of data (a form of a Denial of Service attack). By employing API edge security, such as a highly scalable API Gateway, that performs JSON schema validation at the edge, your services can detect and reject such malicious use before the request makes its way to the more important API business logic tear.
More importantly, such safety measures prevent Broken Object Property Level Authorization (formerly known as Mass Assignment), a well known API vulnerability. This vulnerability is #3 on the OWASP API Security Top Ten list (2023), described in API3: 2023 Broken Object Property Level Authorization. If unprotected, this vulnerability allows a malicious actor to alter data that it should not be allowed to change, or to cause other effects if, for example, the resource server blindly writes all the properties it receives in a request to the persistent store. See this scenario for an example.
These constraints also improve the developer experience (DX) for those coding to your APIs. These keywords help detect “syntax” errors and coding mistakes when developers misspell your properties
Thus, adding unevaluatedProperties: false assertions to your API’s schemas can enhance your API’s security and DX. Score another point for JSON Schema!
However, there is a tradeoff between security and the API’s evolution and forwards/backwards compatibility.
Consider a client that uses the above character and characterReference schemas, which were part of version 1.5.0 of the Chain Link API; the species property was added in version 1.5.0. Client SDKs may perform client-side schema validation when constructing API requests. If that client happened to send such a request to a server that was running version 1.4.0 of the API, the request would be rejected. This scenario is rare but quite possible if an API definition has multiple implementations, such as an open banking API that multiple financial institutions implement independently. Some institutions may support version 1.4.0 and others may have adopted version 1.5.0. This situation is hard for clients to manage with one code base.
A complementary backwards compatibility issue arises if an API uses unevaluatedProperties: false assertions in response schemas as well as for request schemas. If the client is built against the schemas for version 1.4.0, but version 1.5.0 added the new species property, the client that validates responses against a schema will fail when it receives a character object from the 1.5.0 server. Thus, when a response schema has the unevaluatedProperties: false assertion, simply adding a new property to an object schema constitutes a breaking change in clients. This means the API version (if it follows Semantic Versioning) should have been bumped from 1.4.0 to 2.0.0 instead of to 1.5.0.
Why you should avoid format: uuid
The OWASP API Top Ten number 1 vulnerability is API1:2023 Broken Object Level Authorization: if the resource ID used in the resources’ URL path uses database sequential integers as the primary key, a hacker can use a valid integer resource ID (such as .../path/to/resources/11478) and increment/decrement that integer to probe for other resources and possibly gain access to other user’s data that they should not see. For better security, all API resource IDs should be opaque strings which cannot be decoded to yield (sequential) integers. Often, back end services use Universally Unique IDs (UUIDs, also call Globally Unique IDs or GUIDs) or some other string form that includes a significant number of random bytes. That’s a useful implementation practice but rarely belongs in an interface contract.
JSON Schema defines a format: uuid constraint for string properties; this looks like a useful approach to close this vulnerability. However, while it is useful to use a UUID for a resource ID or path parameter, declaring the uuid format has two negative implications:
It exposes implementation details of your service, whereas an API should be about the interface.
It overly constrains the API implementation. If the property or path parameter is defined with a
format: uuidconstraint, then it must always be a UUID. You cannot later optimize the API with a shorter encoding of the same data (reducing a 36 byte UUID to a shorter Base64 byte string encoding) or enhance the ID with a resource type identifier prefix, as suggested in Designing APIs for humans: Object IDs).1
Instead of using a format: uuid, use just type: string, augmented with a reasonable maxLength (such as 48, which gives a little wriggle room) and a pattern that constrains the set of allowed characters in an ID. That is useful for specifying alphanumeric characters and a few special characters like _ and -, but disallowing characters that require URL encoding when used in a URL element:
components:
schemas:
resourceId:
title: Resource Identifier
description: >-
An immutable opaque string that uniquely identifies a resource.
type: string
minLength: 6
maxLength: 48
pattern: ^[-_.~a-zA-Z0-9]{6,48}$
examples:
- ch-2305ncc-1701-d"null" is a type
Some APIs support JSON Merge Patch [RFC7386] semantics for updating resources. With JSON Merge Patch,
Null values in the merge patch are given special meaning to indicate the removal of existing values in the target.
Thus, a client can send a null value to remove a property from a resource.2 To indicate that Jean-Luc Picard’s species is unknown (or unset) rather than "human" (and raise the ire of Star Trek fans worldwide), one could PATCH the resource with the request
{ "species": null }However, this request is not allowed if the type of the species property is type: string; the JSON value null is not a valid string value.
OpenAPI Specification 3.0 and previous versions of OAS used an earlier draft of JSON schema and employed the nullable keyword to indicate a property supported a null value in addition to the other values allowed for that schema.
properties:
species:
type: string
nullable: trueOpenAPI 3.1 uses JSON Schema 2020-12 which does not use the nullable keyword to augment other type constraints. Instead, the string "null"3 is the name of a special schema type constraint that allows the JSON value null and nothing else. How does this help us?
properties:
species:
type: "null"Of course, we cannot simply change the type of the species property to "null". Such a schema allows the above request but fails validation when a JSON string value such as "human" is sent.
Instead, we can employ the oneOf construct of JSON schema—a value is valid if it matches exactly one of the alternate schemas in an array of schemas:
properties:
species:
oneOf:
- type: string
- type: 'null'Fortunately, JSON Schema provides a more concise way to represent this scenario. A schema’s type may be an array:
properties:
species:
type: [ string, 'null' ]This schema will accept both PATCH requests for our friend, Jen-Luc:
{ "species": null }and
{ "species": "human" }Good, we’ve restored sanity to the universe!
Always define a type constraint, except when you shouldn’t
Let’s tie up some of this knowledge of JSON schemas to see how API models may be modeled. As mentioned in my previous article (see Defining properties does not imply type: object), one should always add a type constraint when defining JSON schemas, because without it, the schema will allow values of other types.
However, in some cases, defining the type constraint too early can work against you. Let’s consider composing schemas using mixin schemas, such as a mutableCharacterFields schema that is mixed into our character and characterReference schemas introduced above.
components:
schemas:
mutableCharacterFields:
description: >-
A mixin schema to define mutable properties
of other Character instances.
type: object
properties:
name:
type: string
description: The full name of this character
minLength: 1
maxLength: 64
pattern: '^[\p{L}\p{N}\p{M}\p{Zs}\p{P}]{1,64}$'
// other mutable properties here...
characterReference:
description: >-
A reference to another existing character
type: object
unevaluatedProperties: false
required:
- id
- name
allOf:
- $ref: '#/components/schemas/mutableCharacterFields'
- properties:
id:
$ref: '#/components/schemas/resourceId'
character:
description: >-
A character in a chain link universe that
appears in chains and chain links.
type: object
unevaluatedProperties: false
required:
- id
- name
- species
allOf:
- $ref: '#/components/schemas/mutableCharacterFields'
- properties:
id:
$ref: '#/components/schemas/resourceId'
species:
type: string
// other constraints
mother:
description: >-
The character's biological mother.
$ref: '#/components/schemas/characterReference'
father: >-
description: >-
The character's biological father.
$ref: '#/components/schemas/characterReference'Such mixins help your API design follow the DRY principle by eliminating copy/paste of non-trivial schema constraints, such as those on a character’s name.4
This schema composition works well to define the schemas… until we want to support JSON Merge Patch to PATCH a character. As defined, we can patch a character’s mother or father, but we can’t unset those properties by sending a null:
{ "mother": null, "father": null }Changing the type of the characterReference schema from
type: objectto
type: [ object, 'null' ]does not work because the semantics of allOf means that each schema must match the value. Unfortunately, while this type constraint on characterReference allows the null value, the effective type constraints from the schema composition are
allOf:
- type: object
- type: [ object, 'null' ]A character reference object satisfy both type constraints, but a null value only satisfies the second.
Thus, it may be useful to omit the type: object constraint on mixin schemas such as in mutableCharacterFields so that the concrete (non-mixin) schemas that use the mixins can define the correct type constraint.
The other solution is to use the oneOf constraint on the properties instead of using the array type in the characterReference schema:
character:
...
allOf:
- $ref: '#/components/schemas/characterReference'
- properties:
...
mother:
description: The character's biological mother.
oneOf:
- $ref: '#/components/schemas/characterReference'
- type: 'null'
father:
description: The character's biological father.
oneOf:
- $ref: '#/components/schemas/characterReference'
- type: 'null'Alas, many OpenAPI SDK generation tools do not handle the oneOf keyword well.
Summary
This concludes API Design Matters articles on the subtleties of JSON Schema. There are more to be covered, but we’ll deal with them later as they arise in practical use.
The examples I gave above use a ch- prefix for resource IDs for character resources.
This is only valid for object properties which are not required.
Note: This uses the string value "null" for the type name, not the JSON null value. This is an important distinction! You’ll get a schema error if you use type: null.
The regular expression pattern ^[\p{L}\[{N}\p{M}\p{Zs}\p{P}]{1,64}$ allows Unicode letters, numeric digits, accent marks, a space, and punctuation, but not control characters, line separators, paragraph separators, etc.




