Improve the Security of Your API's Data

Employ the strength of JSON Schema in OpenAPI to increase your API's security

Feb 07, 2024

The OpenAPI Specification is a language for expressing the design of an API. But contained within OpenAPI is another language: JSON Schema. Like OpenAPI, JSON Schema is defined by a standards specification (published at json-schema.org). You saw the tip of the JSON Schema iceberg in What Am I Getting Out of This?.

Fortunately for us, JSON Schema also contains constructs that assist in keeping your API secure.

Welcome to the next article in the Language of API Design. Rather than jumping into the middle of this series, I encourage new subscribers/visitors start by reading Your Guide to The Language of API Design and scanning previous posts in the series.

As a motivating example, I encourage you to read “How Spoutible’s Leaky API Spurted out a Deluge of Personal Data“ by Troy Hunt, or simply search for “top API data leaks” and read a sampling. But don’t get sucked down that rabbit hole.

None of us want our APIs to be called out in someone’s blog post on leaky APIs, or showing up in an “API Hall of Shame” or even making into national news.

A common theme in such leaky API stories is excessive data exposure - an API that returns too much sensitive data that can then be exploited. We covered the most obvious steps to securing APIs by ensuring you have adequate security requirements guarding who can call each operation: see “Understanding the Language of API Security: How OpenAPI expresses API security.... and how it does not”:

Understanding the Language of API Security

David Biesack

November 7, 2023

Understanding the Language of API Security

In today’s issue, we begin a deeper dive into keeping APIs secure. I hope you do not mind me assuming that you know why API security is important. The how is harder, and there are many facets of API security. We begin by exploring what can and cannot be expressed with

Read full story

But there are more ways that poorly designed APIs can be exploited, even if they have good security requirements on all operations and are implemented and configured correctly. So how does JSON Schema help us plug those holes?

Let’s revisit some of the schemas we defined earlier—see What Am I Getting Out of This? for the JSON schema that describes the response from the getChainLinks operation. Here is that schema (and YAML example) for reference:

title: Chain Links
description: A page of chain link items from a
  collection of chain links.
type: object
properties:
  items:
    title: Chain Link Items
    description: A list of chain links in this page.
    type: array
    maxItems: 10000
    items:
      title: Chain Link Item
      description: A concise representation of a chain link item
        in a list of chain links.
      type: object
      properties:
        id:
          description: This chain links unique resource identifier.
          type: string
          minLength: 4
          maxLength: 48
          pattern: ^[-_a-zA-Z0-9:+$]{4,48}$
        type:
          description: Describes what type of chain link this is.
          type: string
        authorId:
          description: The ID of the author who created this chain link.
          type: string
          minLength: 4
          maxLength: 48
          pattern: ^[-_a-zA-Z0-9:+$]{4,48}$
        createdAt:
          description: The RFC 3339 `date-time`
            when this chain link was created.
          type: string
          format: date-time
example:
  items:
    - id: cl-489fjkd-49d9d
      type: text
      authorId: au-4639fjk3-fjkf
      createdAt: 2023-03-08T20:22:50Z
    - id: cl-f89jf-3jkdkh
      type: text
      authorId: au-4639fjk3-fjkf
      createdAt: 2023-03-08T21:44:05Z
    - id: cl-d9h4d83-dh49dhe
      type: image
      authorId: au-4639fjk3-fjkf
      createdAt: 2023-03-08T22:58:37Z
    - id: cl-478d9d-4hjdhj93
      type: text
      authorId: au-4639fjk3-fjkf
      createdAt: 2023-03-08T23:52:19Z

Let’s explore some refinement allowed by JSON Schema.

This is a fairly well constrained schema definition. Adding validation constraints is important for API data schemas, both for informing your consumers what values are valid, and also for API security. An API that allows unconstrained values to be passed is an API that likely has security vulnerabilities: it is open to Denial of Service attacks by malicious clients that pump really large data values (such as 100,000 character strings) into the API. By adding constraints such as maxItems to array and maxLength to strings, you can help reduce the API’s risk. (Other strategies, such as using middleware the rejects unreasonably large request bodies can also help)

An API that allows unconstrained values to be passed is an API that likely has security vulnerabilities

Some development tools, like the Spectral API linter and its OWASP ruleset (both open-source software from Stoplight) the can check your OpenAPI definition and schemas and highlight where you can tighten up security. (See the second link for some installation instructions for those who already have Node.js installed.) The Spectral OWASP ruleset applies some linting checks based on the OWASP API Security Top Ten, 2019 edition. An update to the Open Worldwide Application Security Project (OWASP) list was released in late 2023; as of this writing, an update to the Spectral OWASP ruleset for the 2023 edition is in the works.

See also the OpenAPI Editor by 42Crunch; it also helps “shift left” and identifies potential security risks in an OpenAPI definition by using42Crunch’s API Audit tool. Other API security vendors have similar static OpenAPI scanning tools.

Running Spectral+OWASP ruleset against the early drafts of our Chain Links OpenAPI definition shows a few schema related candidates for tightening up the API definition:

error  owasp:api4:2019-string-limit                Schema of type string must specify maxLength, enum, or const.           paths./chainLinks.get.responses[200].content.application/json.schema.properties.items.items.properties.type
error  owasp:api4:2019-string-restricted           Schema of type string must specify a format, pattern, enum, or const.   paths./chainLinks.get.responses[200].content.application/json.schema.properties.items.items.properties.type
error  owasp:api4:2019-string-limit                Schema of type string must specify maxLength, enum, or const.           paths./chainLinks.get.responses[200].content.application/json.schema.properties.items.items.properties.createdAt

The owasp:api4:2019-string-limit error is based on API4:2019 Lack of Resources & Rate Limiting which recommends

Define and enforce maximum size of data on all incoming parameters and payloads such as maximum length for strings and maximum number of elements in arrays.

The owasp:api4:2019-string-restricted error is based on the same OWASP rule, but with a different twist. By specifying a format annotation or pattern assertion on string properties (when enum or const do not apply), you can also reject some malicious requests. An example of a suitable string format is using format: date or format: date-time for properties that convey a date or timestamp value.

The OWASP rule applies to incoming payloads, but Spectral does not do deep analysis of the API and JSON models to determine if a field is confined to only response schemas or may be used in a request schema, so it errs on the side of caution and assumes the latter.

By adding a maxLength constraint to string schemas (or using an enum list of allowed values or a single const allowed value, bad (malicious) requests can be blocked early in the API request processing—when validating the request. This work can be pushed to the “edge” of your network such as an API gateway (where it is often cheaper to detect such problems), rather than deferring checks until the request has made its way into your back end service tier where such invalid requests can negatively impact response time and scalability needed for valid traffic.

This implies an API implementation and deployment that performs full request validation. That is, a secure API implementation must treat all input data as untrusted, and reject any data that it deems to be invalid. Since all the data in a request—response bodies, even query and header parameters—are defined with JSON Schema in OpenAPI, look for infrastructure and middleware that performs JSON Schema validation of such request data, including support of the Format-Assertion vocabulary which mandates checking strings against the defined OpenAPI and JSON schema formats. This will prevent a number of OWASP vulnerabilities.

A secure API implementation must treat all input data as untrusted, and reject any data that it deems to be invalid.

To go further, use the unevaluatedProperties: false assertion in your object schema definitions, as explained in Master More JSON Schema's Subtleties. Place this assertion above/next to your properties object. This assertion in a schema causes a validator to reject any properties that are not evaluated (not matched) by other assertions, such as properties in your schema composition. (See Master JSON Schema's Subtleties for some tips on composing JSON Schemas.)

Part of Spoutible’s Leaky API could have been plugged if it defined a strict JSON schema, not just for validating the API’s input, but for also validating its response bodies:

By employing unevaluatedProperties: false in your API’s request schemas (coupled with an accurate and restrictive set of properties you explicitly want the caller to send), you can prevent Mass Assignment vulnerabilities.
By employing unevaluatedProperties: false in your API’s response schemas, you can also prevent Excessive Data Exposure.

The Excessive Data Exposure OWASP API Security page advises:

Avoid using generic methods such as to_json() and to_string(). Instead, cherry-pick specific properties you really want to return

That is, simply returning all the data properties stored for a resource risks including highly sensitive or exploitable data in API responses. If the OpenAPI definition and schema define a response schema to include only properties a, b and c, and your implementation validates response bodies against the defined schema, it will catch a poor implementation which tries to return other properties not defined by the properties object(s) for the response, such as an em_code or a password property. Applying JSON Schema validation to your response bodies will cause validation to fail if the service implementation is or becomes leaky.

JSON Schema is nice for modeling your API data, but it goes well beyond that by providing an API language for securing your API in multiple dimensions. Master these Language of APIs skills to protect your customer’s data, your job, and your company’s future… Stay safe out there, folks.

Improve the Security of Your API's Data

Employ the strength of JSON Schema in OpenAPI to increase your API's security

Understanding the Language of API Security

Discussion about this post