Composing API Models with JSON Schema
Use JSON Schema effectively to build real API request and response bodies
In today’s post, I expand upon API request and response data modeling with JSON Schema. We saw last time how constructing a deeply nested schema does not result in a good developer experience when generating a Software Development Kit (SDK) for our Chain Links API. We’ll learn how restructuring the models improves the generated SDKs and also allows us to keep our API definitions DRY.
Welcome to the next article in the Language of API Design. Rather than jumping into the middle of this series, I encourage new subscribers/visitors start by reading Your Guide to The Language of API Design and scanning previous posts in the series.
Previously, I showed the nested JSON schema for the listChainLinks
operation response. That value is a JSON object with a property named items
which is an array of objects describing chain link objects. Here is the structural outline of the chainLinks
schema:
components:
type: object
properties:
items:
type: array
items:
type: object
properties:
id:
type: string
type:
type: string
authorId:
type: string
createdAt:
type: string
However, when we generated an SDK, the Typescript type for the chainLinks
schema is a disappointment.
There is another way to express this, and it also provides additional benefits to the API definition—composition and reuse, as I’ll explain below.
The first step is to refactor the schema by lifting the anonymous (that is, unnamed) schema for the array items and giving it a name. I’ll name it chainLinkItem
. This is a standard pattern I used for schemas which define the items in a containing array of API resources. Here is the skeleton to see the structure:
components:
schemas:
chainLinks:
type: object
properties:
items:
type: array
items:
$ref: '#/components/schemas/chainLinkItem'
chainLinkItem:
type: object
properties:
id:
type: string
type:
type: string
authorId:
type: string
createdAt:
type: string
The above omits the useful description
and other schema details, so you can see the structure more clearly. A more complete version of the chainLinks
and chainLinkItem
schema is at this gist.
If we try code generation on the updated OpenAPI document
$ swagger-codegen generate \
-l typescript-axios \
-o typescript-axios \
-i openapi.yaml
we see the ChainLinkItem
interface defined in the generated source typescript-axios/models/chain-link-item.ts
:
export interface ChainLinkItem {
/**
* This chain link's unique resource identifier.
* @type {string}
* @memberof ChainLinkItem
*/
id: string;
/**
* Describes what type of chain link this is.
* @type {string}
* @memberof ChainLinkItem
*/
type: string;
/**
* The ID of the author who created this chain link.
* @type {string}
* @memberof ChainLinkItem
*/
authorId: string;
/**
* The [https://www.rfc-editor.org/rfc/rfc3339#section-5](date-time)
* when this chain link was created.
* @type {Date}
* @memberof ChainLinkItem
*/
createdAt: Date;
}
and the revised chainLinks
schema is translated to this models/chain-links.ts
:
import { ChainLinkItem } from './chain-link-item';
export interface ChainLinks {
/**
* A list of chain links in this page
* @type {Array<ChainLinkItem>}
* @memberof ChainLinks
*/
items: Array<ChainLinkItem>;
}
Success! We’ve solved the problem from our last attempt which only gave us the nearly useless construct:
export interface ChainLinks {
items: Array<any>;
}
We also see similar improvements in Java code generation. For example, the ChainLinks.java
has this code
import io.chainlinks.models.ChainLinkItem;
public class ChainLinks {
private List<ChainLinkItem> items = new ArrayList<ChainLinkItem>();
It’s important to note how this transformation leads to a better developer experience: the properties of the target programming language class/interface/structure are strongly typed rather than untyped. This adds type safety (even in dynamic languages), allows IDE code completion, aids code readability, and allows static analysis tools, refactoring tooling, and so much more.
And that’s just by using a JSON Schema language construct to define useful types for fields…
Composition
As promised above, we can do much more with schema composition and schema construction while applying the Don’t Repeat Yourself principle.
For example, in APIs that support CRUDL (Create, Read, Update, Delete, List) operations, there is a great deal of similarity across the request and response bodies for the various POST
/PUT
/GET
/PATCH
API operations. Consider the Chain Link resource in the Chain Links API. Here’s a recap of what is in a Chain Link, from Chain Links Domain Model:
author
a list of chains in which this chain link is used
type - plain text, rich text, image with optional caption, branch (to define alternate, non-sequential story lines)
content - the actual block of (Markdown) text, or the image data.
creation timestamp
Different combinations of these values show up in different API model schemas.
A schema to define the
items
within the list of chain links returned from thelistChainLinks
operation—this is thechainLinkItem
schema from above. ThechainLinkItem
schema has four properties:type
authorId
createdAt
id1
A schema to represent the full chain link resource from the
getChainLink
(GET /chainLinks/{chainLinkId}
) operation—we’ll name this schemachainLink
. This has six properties, four of which should look very familiar:type
authorId
createdAt
id
content
chains_url
2
Importantly, notice that these schemas build upon each other. A chainLink
has all the properties from the chainLinkItem
schema, plus two additional properties. Here is a diagram which shows this structural composition:
Luckily, JSON Schema has a direct language construct for this type of structural composition. The allOf
keyword of JSON Schema declares that a schema is composed of one or more other subschemas. The value of allOf
is an array of schemas or references to the subschemas.
The formal definition of allOf
is:
This keyword's value MUST be a non-empty array. Each item of the array MUST be a valid JSON Schema.
An instance validates successfully against this keyword if it validates successfully against all schemas defined by this keyword's value.
This reflects JSON Schema’s origins for validating JSON, so it is not evident how this helps us build API data models. Patience, dear reader!
Let us define the chainLink
object schema as a composition of two subschemas:
The
chainLinkItem
schemaAn
object
schema with additional properties,chains_url
andcontent
Here is a skeleton of the chainLinkItem and chainLink
schemas. (Again, I’ve omitted the important titles, descriptions etc. so that the structure and schema composition is clearer. See this gist for a more complete definition.3)
components:
schemas:
chainLink:
type: object
allOf:
- $ref: '#/components/schemas/chainLinkItem'
- type: object
properties:
content:
$ref: '#/components/schemas/chainLinkContent'
chains_url:
type: array
Within the allOf
definition, the first array item is a schema reference. This allows us to reuse existing schemas rather than reverting to copy/paste, thus keeping our API definition DRY.
The allOf
keyword means that all of the subschemas apply when the JSON Schema is used to validate a JSON document instance: the instance is valid if and only if it validates against all of the subschemas in the array.
When used to define an API model, however, code generation tools interpret the allOf
construct as a model composition construct of the target language. This interpretation (or translation) varies by target language and by tool used to define the API model.
When we use swagger-codegen to build a client SDK using this schema composition, we get a ChainLinksItem.java
model definition that contains the following (comments and annotations removed):
public class ChainLinkItem {
private String id = null;
private String type = null;
private String authorId = null;
private OffsetDateTime createdAt = null;
...
}
and a ChainLinks.java
model definition:
import apiDesignMatters.ChainLinkItem;
import apiDesignMatters.ChainLinkContent;
public class ChainLink extends ChainLinkItem {
private ChainLinkContent content = null;
private String chainsUrl = null;
...
}
We can see that the schema composition construct is interpreted by the code generation tool with inheritance for the Java target language. For Typescript, the story is different. We saw the interface ChainLinkItem
above.4
Here is an excerpt from chain-link.ts
(with comments and annotations removed)
import { ChainLinkItem } from './chain-link-item';
export interface ChainLink extends ChainLinkItem {
chainsUrl: string;
}
Things get a bit trickier when the allOf
array contains more than one reference. For example, Java does not support multiple class inheritance. swagger-codegen employs a strategy of using class inheritance for the first $ref
schema, then it inlines the properties of other schemas. For example, if we define the chainLink
schema as follows, using a new chainLinkContentField
schema (because that schema will be useful when defining other schemas):
chainLinkContentField:
type: object
properties:
content:
$ref: '#/components/schemas/chainLinkContent'
chainLink:
allOf:
- $ref: '#/components/schemas/chainLinkItem'
- $ref: '#/components/schemas/chainLinkContentField'
- type: object
properties:
chains_url:
description: >-
A link to the API operation to return the paginated
list of chains in which that this chain link is used.
type: string
format: uri
maxLength: 256
then swagger-codegen
defines the ChainLink.java
class as follows:
import apiDesignMatters.ChainLinkContent;
import apiDesignMatters.ChainLinkContentField;
import apiDesignMatters.ChainLinkItem;
public class ChainLink extends ChainLinkItem {
private ChainLinkContent content = null;
private String chainsUrl = null;
...
}
(The code imports but does not use or reference the ChainLinkContentField
class.)
I recommend you try reordering the items in the allOf
schema to see the effect on the generated code. (Here is a source OpenAPI document to work with.) Can you spot how such changes can introduce breaking changes in client code?
Defining Elemental Schemas for Composition
This approach of defining tiny schemas, such as the chainLinkContentField
, is a useful design pattern for keeping API definitions DRY. This is analogous to creating small, atomic, highly reusable functions in a programming language, rather than long, complex single-purpose functions.
If you find yourself repeating a property definition across multiple schemas, consider lifting that property into its own field definition schema. If there are groups of related properties that you use together in multiple schemas, you can group them together into a reusable {group}Fields
schema (with a descriptive schema name that indicates its purpose, such as mutableChainLinkFields
), then mix them into other schemas using the allOf
schema composition construct.
This practice is one form of refactoring that is useful when defining and refining schemas.
Composition is not Inheritance
Warning: this JSON Schema composition construct is not inheritance… even if some tools translate this into inheritance in a target language. This means that, as the “allOf
” name implies, all the schema constraints apply: you cannot override or replace constraints.
For example, note that the full schema definition for the authorId
property in chainLinkItem
contains the following constraints:
authorId:
description: >-
The ID of the author resource:
who created this chain link.
type: string
minLength: 4
maxLength: 48
pattern: ^[-_a-zA-Z0-9:+$]{4,48}$
readOnly: true
Suppose we want to use schema composition but loosen those constraints in the chainLink
schema’s authorId
to allow 2 to 64 characters, we can express this as follows:
chainLink:
title: A Chain Link
allOf:
- $ref: '#/components/schemas/chainLinkItem'
- type: object
properties:
authorId:
description: >-
The ID of the author resource:
who created this chain link.
type: string
minLength: 2
maxLength: 64
pattern: ^[-_a-zA-Z0-9:+$]{2,64}$
readOnly: false
However, we won’t get the desired effect. While the code generation may not change, the schema validation of a JSON object containing an authorId
still enforces all of the subschema constraints. Thus, the effective schema constraints for the authorId
property are the union of all the constraints:
type: string
minLength: 4
maxLength: 48
pattern: ^[-_a-zA-Z0-9:+$]{4,48}$
type: string
minLength: 2
maxLength: 64
pattern: ^[-_a-zA-Z0-9:+$]{2,64}$
While a value such as “C289D6F7-6B30-4788-9C70-4274730FAFCA”
satisfies all 8 of these constraints, a shorter author ID string of 3 characters, “A00”,
will fail the constraint #2 and #4, and thus that JSON would be rejected.
A final note on schema composition. The behavior of the allOf
construct is well defined within JSON Schema for JSON validation. However, there is no standard for how this JSON schema construct must be treated in other tools such as code generation for client or server SDKs. Thus, each tool vendor (SmartBear’s swagger-codegen, the OSS openapi-generator, ApiMatic, Speakeasy, Fern, Kiota, etc.) may interpret JSON Schema in individual and different ways. It is useful to try them out to see if their interpretation and implementation meets your needs and expectations.
JSON Schema provides other keywords for schema composition (the oneOf
, anyOf
, and not
keywords, as well as conditionals with a if/then/else
construct) but these are more advanced topics and I’ve already asked too much of you to read this far. Stay tuned, however, there is more to come!
The id
property is the unique resource ID of a chain link instance, a concept I introduced in From Domain Model to OpenAPI. The id
is used to reference the chain link instance for GET /chainLinks/{chainLinkId}
, PATCH /chainLinks/{chainLinkId}
and DELETE /chainLinks/{chainLinkId}
operations
This property contains an API URL to list all the chains that contain this chain link.
The source files I provide as GitHub gists use openapi: 3.0.3
because, sadly, the swagger-codegen tool still does not support OpenAPI 3.1. C’mon SmartBear—OpenAPI 3.1 was released in 2021!
Typescript interfaces may define properties; interfaces in Java cannot define properties/fields, only methods, hence the code generation differences.