Pagination of List Resources

https://blueprints.launchpad.net/craton/+spec/pagination-of-resources

Craton is intended to manage large quantities of devices and other objects without sacrificing performance. Craton needs to add pagination support in order to efficiently handle queries on large collections.

Problem description

In the current implementation, a request to one of our collection resources will attempt to return all of the values that can be returned (based on authentication, etc.). For example, if a user and project have access to 5000 hosts then making a GET request against /v1/hosts would return all 5000. Such large result sets can and likely will slow down Craton’s response times and make it unusable.

Proposed change

We propose adding pagination query parameters to all collection endpoints. The new parameters would assume defaults if the user does not include them.

We specifically propose that:

  1. Craton choose a default page size of 30 and limit it to being at least 10 items and at most 100 items,
  2. Craton choose to make the next page both discoverable and calculable. In other words, using “link” hypermedia relations in a response to indicate first, previous, next, and last page URLs that are generated by the server for the client,
  3. Craton should assume the defaults for requests that have no query parameters. For example, if someone makes a GET request to /v1/hosts it would imply an original page size of 30 and that the first 30 results should be returned.

To provide pagination to users, it is suggested that we use limit and marker parameters to indicate the page size and last seen ID. This allows users to begin pagination after an item, rather than at a particular page. For example, if a user is checking for new hosts in the listing and they know the ID of the last host they encountered they can provide marker=:id&limit=30 to get the newer hosts. If instead, we used page and per_page there’s the possibility they’d miss items since hosts may have been deleted changing the page number of the last host.

This implies that the default limit value would be 30 and the default marker would be null (to indicate that no last ID is seen).

This combination of parameters is practically the standard in OpenStack. Operators familiar with OpenStack’s existing Compute, Images, etc. APIs will be familiar with these parameters.

In addition to pagination parameters, this spec proposes adding link relations in the Response body - as defined by JSON Hyper-Schema and favored by the API WG

This makes API usage easier for everyone, including, people using the API directly and people writing API wrappers such as python-cratonclient. This does, however, have the downside of affecting our response bodies and JSON Schema

Finally, I’d like to strongly propose that we include these links in each response. Which relation types we include would depend on where in the pagination the user is, but it would do something like this:

  1. Include a self relation for every page that tells the user exactly what page they’re presently on.

  2. If there is a page prior to the current one, we would include the prev and first relations. These tell the user what the previous page is and what the first page is.

  3. If there is a page after the current one, we would include the next and last relations. These are the opposites to prev and first respectively.

    It is worth noting that without properly implemented caching the last relation, it could become computationally expensive to calculate for every pagination query.

Alternatives

Alternative query parameters to limit and marker are:

  1. Use page and per_page parameters to indicate the 1-indexed “page number” and number of items on each page respectively. This means that users can change how many items they get on each page request and can resume in arbitrary places by specifying the page parameter.

    This would imply that the default page value would be 1 and the default per_page would be 30.

    These two parameters are presently used by a significant number of large APIs at the moment but are not common in OpenStack itself. They provide simplicity in that if the API user wants to, they can just constantly increment the page number to get the next page in the simplest way possible. They don’t have to calculate the next value from a combination of values in the response of the last request.

    This does, however, prevent users from being able to resume iteration from the last item it received in a list. Further, this adds the potential that users may miss objects due to deletions or other changes in the corresponding collection. Finally, these parameters only provide users an opaque idea as to where in a paginated resource they are and how to resume pagination.

  2. Use limit and offset parameters to provide similar functionality and opacity to per_page and page respectively.

    The default limit would, again, be 30 and the default offset would be 0.

    This combination of parameters is also present in a small number of OpenStack projects but has some of the same negative implications as the page and per_page parameters when compared to limit and marker.

An alternative way to provide pagination links are:

  1. Link headers - as defined in RFC 6903 - using Relation Types defined in RFC 5988.

    These are also commonly used outside of OpenStack and were popular to the creation of including the relations in the response body. The benefit to Craton of using this method is that it doesn’t effect our JSON Schema or existing Response bodies. A major problem with this approach is that a relation type can be repeated in a Link header. However, the HTTP library used by the majority of the Python world - Requests - does not parse such links correctly. Further, widespread support for parsing these header values is not known to the author of this specification.

Data model impact

This should have no impact on our data model.

REST API impact

This specification will have two impacts on our REST API:

  1. It will add limit and marker query parameters that are identical to a number of existing and future endpoints.

  2. It will change the fundamental structure of our list responses in order to accommodate the link relations.

    At the moment, for example, a GET request made to /v1/hosts has a response body that looks like:

    [
      {
         "active": true,
         "cell_id": null,
         "device_type": "Computer",
         "id": 1,
         "ip_address": "12.12.12.15",
         "name": "foo2Host",
         "note": null,
         "parent_id": null,
         "region_id": 1
      },
      {
         "active": true,
         "cell_id": null,
         "device_type": "Phone",
         "id": 2,
         "ip_address": "11.11.11.14",
         "name": "fooHost",
         "note": null,
         "parent_id": null,
         "region_id": 1
      }
    ]
    

    This would need to transform to

      {
        "items": [
          {
             "active": true,
             "cell_id": null,
             "device_type": "Computer",
             "id": 1,
             "ip_address": "12.12.12.15",
             "name": "foo2Host",
             "note": null,
             "parent_id": null,
             "region_id": 1
          },
          {
             "active": true,
             "cell_id": null,
             "device_type": "Phone",
             "id": 2,
             "ip_address": "11.11.11.14",
             "name": "fooHost",
             "note": null,
             "parent_id": null,
             "region_id": 1
          }
        ],
        "links": [
          {
            "rel": "first",
            "href": "https://craton.environment.com/v1/hosts?limit=30"
          },
          {
            "rel": "next",
            "href": "https://craton.environment.com/v1/hosts?limit=30&marker=2"
          },
          {
            "rel": "self",
            "href": "https://craton.environment.com/v1/hosts?limit=30&marker=1"
          }
        ]
    }
    

Security impact

Pagination suppport reduces the potential attack surface for denial of service attacks aimed at Craton. It alone, however, is not sufficient to prevent DoS attacks and additional measures should be taken by deployers to further mitigate those possibilities.

Notifications impact

Craton does not yet have notifications.

Other end user impact

This will have a minor affect on python-cratonclient. The list calls it implements will need to become smarter so they can handle pagination for the user automatically.

Performance Impact

There should not be any performance impact on the service created by this code although it will frequently be called.

Other deployer impact

None

Developer impact

None

Implementation

Assignee(s)

Primary assignee: - icordasc

Other contributors: - None

Work Items

  • Add basic pagination support with tests to ensure that functionality works independent of the other features proposed in this specification
  • Add link relation support to response bodies

Dependencies

N/A

Testing

This should be tested on different levels, but at a minimum on a functional level.

Documentation Impact

This will impact our API reference documentation