API pagination best practices

Question

I d love some some help handling a strange edge case with a paginated API I m building   Like many APIs  this one paginates large results  If you query  foos  you ll get 100 results  i e  foo  1-100   and a link to  foos page 2 which should return foo  101-200   Unfortunately  if foo  10 is deleted from the data set before the API consumer makes the next query   foos page 2 will offset by 100 and return foos  102-201   This is a problem for API consumers who are trying to pull all foos - they will not receive foo  101   What s the best practice to handle this  We d like to make it as lightweight as possible  i e  avoiding handling sessions for API requests   Examples from other APIs would be greatly appreciated

User · Accepted Answer

I m not completely sure how your data is handled  so this may or may not work  but have you considered paginating with a timestamp field   When you query  foos you get 100 results   Your API should then return something like this  assuming JSON  but if it needs XML the same principles can be followed           data                 data item 1 with all relevant fields                  data item 2                             data item 100               paging               previous     http   api example com foo since TIMESTAMP1            next     http   api example com foo since TIMESTAMP2             Just a note  only using one timestamp relies on an implicit  limit  in your results  You may want to add an explicit limit or also use an until property   The timestamp can be dynamically determined using the last data item in the list   This seems to be more or less how Facebook paginates in its Graph API  scroll down to the bottom to see the pagination links in the format I gave above    One problem may be if you add a data item  but based on your description it sounds like they would be added to the end  if not  let me know and I ll see if I can improve on this

User · Answer

Just to add to this answer by Kamilk   https   www stackoverflow com a 13905589  Depends a lot on how large dataset you are working on  Small data sets do work on effectively on offset pagination but large realtime datasets do require cursor pagination  Found a wonderful article on how Slack evolved its api s pagination as there datasets increased explaining the positives and negatives at every stage    https   slack engineering evolving-api-pagination-at-slack-1c1f644f8e12

User · Answer

You have several problems   First  you have the example that you cited   You also have a similar problem if rows are inserted  but in this case the user get duplicate data  arguably easier to manage than missing data  but still an issue    If you are not snapshotting the original data set  then this is just a fact of life   You can have the user make an explicit snapshot   POST  createquery filter firstName Bob amp filter lastName Eubanks   Which results   HTTP 1 1 301 Here s your query Location  http   www example org query 12345   Then you can page that all day long  since it s now static  This can be reasonably light weight  since you can just capture the actual document keys rather than the entire rows   If the use case is simply that your users want  and need  all of the data  then you can simply give it to them   GET  query 12345 all true   and just send the whole kit

User · Answer

Refer to API Pagination Design  we could design pagination api through cursor  They have this concept  called cursor     it   s a pointer to a row  So you can say to a database    return me 100 rows after that one     And it   s much easier for a database to do since there is a good chance that you   ll identify the row by a field with an index  And suddenly you don   t need to fetch and skip those rows  you   ll go directly past them  An example     GET  api products     quot items quot       100 products       quot cursor quot    quot qWe quot     API returns an  opaque  string  which you can use then to retrieve the next page   GET  api products cursor qWe   quot items quot       100 products     quot cursor quot    quot qWr quot     Implementation-wise there are many options  Generally  you have some ordering criteria  for example  product id  In this case  you   ll encode your product id with some reversible algorithm  let   s say hashids   And on receiving a request with the cursor you decode it and generate a query like WHERE id  gt   cursor LIMIT 100   Advantage   The query performance of db could be improved through cursor Handle well when new content was inserted into db while querying  Disadvantage   It   s impossible to generate a previous page link with a stateless API

User · Answer

I think currently your api s actually responding the way it should  The first 100 records on the page in the overall order of objects you are maintaining  Your explanation tells that you are using some kind of ordering ids to define the order of your objects for pagination   Now  in case you want that page 2 should always start from 101 and end at 200  then you must make the number of entries on the page as variable  since they are subject to deletion   You should do something like the below pseudocode   page max   100 def get page results page no         start    page no - 1    page max   1     end   page no   page max      return fetch results by id between start  end

User · Answer

Option A  Keyset Pagination with a Timestamp  In order to avoid the drawbacks of offset pagination you have mentioned  you can use keyset based pagination  Usually  the entities have a timestamp that states their creation or modification time  This timestamp can be used for pagination   Just pass the timestamp of the last element as the query parameter for the next request  The server  in turn  uses the timestamp as a filter criterion  e g  WHERE modificationDate  gt   receivedTimestampParameter          elements               data    data    modificationDate   1512757070            data    data    modificationDate   1512757071            data    data    modificationDate   1512757072              pagination              lastModificationDate   1512757072           nextPage    https   domain de api elements modifiedSince 1512757072            This way  you won t miss any element  This approach should be good enough for many use cases  However  keep the following in mind    You may run into endless loops when all elements of a single page have the same timestamp  You may deliver many elements multiple times to the client when elements with the same timestamp are overlapping two pages    You can make those drawbacks less likely by increasing the page size and using timestamps with millisecond precision   Option B  Extended Keyset Pagination with a Continuation Token  To handle the mentioned drawbacks of the normal keyset pagination  you can add an offset to the timestamp and use a so-called  Continuation Token  or  Cursor   The offset is the position of the element relative to the first element with the same timestamp  Usually  the token has a format like Timestamp Offset  It s passed to the client in the response and can be submitted back to the server in order to retrieve the next page          elements               data    data    modificationDate   1512757070            data    data    modificationDate   1512757072            data    data    modificationDate   1512757072              pagination              continuationToken    1512757072 2            nextPage    https   domain de api elements continuationToken 1512757072 2            The token  1512757072 2  points to the last element of the page and states  the client already got the second element with the timestamp 1512757072   This way  the server knows where to continue   Please mind that you have to handle cases where the elements got changed between two requests  This is usually done by adding a checksum to the token  This checksum is calculated over the IDs of all elements with this timestamp  So we end up with a token format like this  Timestamp Offset Checksum   For more information about this approach check out the blog post  Web API Pagination with Continuation Tokens   A drawback of this approach is the tricky implementation as there are many corner cases that have to be taken into account  That s why libraries like continuation-token can be handy  if you are using Java a JVM language   Disclaimer  I m the author of the post and a co-author of the library

User · Answer

Another option for Pagination in RESTFul APIs, is to use the Link header introduced here. For example Github use it as follow:

Link: <https://api.github.com/user/repos?page=3&per_page=100>; rel="next",
  <https://api.github.com/user/repos?page=50&per_page=100>; rel="last"

The possible values for rel are: first, last, next, previous. But by using Link header, it may be not possible to specify total_count (total number of elements).

User · Answer

Pagination is generally a  user  operation and to prevent overload both on computers and the human brain you generally give a subset   However  rather than thinking that we don t get the whole list it may be better to ask does it matter   If an accurate live scrolling view is needed  REST APIs which are request response in nature are not well suited for this purpose   For this you should consider WebSockets or HTML5 Server-Sent Events to let your front end know when dealing with changes   Now if there s a need to get a snapshot of the data  I would just provide an API call that provides all the data in one request with no pagination   Mind you  you would need something that would do streaming of the output without temporarily loading it in memory if you have a large data set   For my case I implicitly designate some API calls to allow getting the whole information  primarily reference table data    You can also secure these APIs so it won t harm your system

User · Answer

There may be two approaches depending on your server side logic.

Approach 1: When server is not smart enough to handle object states.

You could send all cached record unique id’s to server, for example ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"] and a boolean parameter to know whether you are requesting new records(pull to refresh) or old records(load more).

Your sever should responsible to return new records(load more records or new records via pull to refresh) as well as id’s of deleted records from ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"].

Example:- If you are requesting load more then your request should look something like this:-

{
        "isRefresh" : false,
        "cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10"]
}

Now suppose you are requesting old records(load more) and suppose "id2" record is updated by someone and "id5" and "id8" records is deleted from server then your server response should look something like this:-

{
        "records" : [
{"id" :"id2","more_key":"updated_value"},
{"id" :"id11","more_key":"more_value"},
{"id" :"id12","more_key":"more_value"},
{"id" :"id13","more_key":"more_value"},
{"id" :"id14","more_key":"more_value"},
{"id" :"id15","more_key":"more_value"},
{"id" :"id16","more_key":"more_value"},
{"id" :"id17","more_key":"more_value"},
{"id" :"id18","more_key":"more_value"},
{"id" :"id19","more_key":"more_value"},
{"id" :"id20","more_key":"more_value"}],
        "deleted" : ["id5","id8"]
}

But in this case if you’ve a lot of local cached records suppose 500, then your request string will be too long like this:-

{
        "isRefresh" : false,
        "cached" : ["id1","id2","id3","id4","id5","id6","id7","id8","id9","id10",………,"id500"]//Too long request
}

Approach 2: When server is smart enough to handle object states according to date.

You could send the id of first record and the last record and previous request epoch time. In this way your request is always small even if you’ve a big amount of cached records

Example:- If you are requesting load more then your request should look something like this:-

{
        "isRefresh" : false,
        "firstId" : "id1",
        "lastId" : "id10",
        "last_request_time" : 1421748005
}

Your server is responsible to return the id’s of deleted records which is deleted after the last_request_time as well as return the updated record after last_request_time between "id1" and "id10" .

{
        "records" : [
{"id" :"id2","more_key":"updated_value"},
{"id" :"id11","more_key":"more_value"},
{"id" :"id12","more_key":"more_value"},
{"id" :"id13","more_key":"more_value"},
{"id" :"id14","more_key":"more_value"},
{"id" :"id15","more_key":"more_value"},
{"id" :"id16","more_key":"more_value"},
{"id" :"id17","more_key":"more_value"},
{"id" :"id18","more_key":"more_value"},
{"id" :"id19","more_key":"more_value"},
{"id" :"id20","more_key":"more_value"}],
        "deleted" : ["id5","id8"]
}

Pull To Refresh:-

enter image description here

enter image description here

User · Answer

If you ve got pagination you also sort the data by some key  Why not let API clients include the key of the last element of the previously returned collection in the URL and add a WHERE clause to your SQL query  or something equivalent  if you re not using SQL  so that it returns only those elements for which the key is greater than this value

User · Answer

It may be tough to find best practices since most systems with APIs don't accommodate for this scenario, because it is an extreme edge, or they don't typically delete records (Facebook, Twitter). Facebook actually says each "page" may not have the number of results requested due to filtering done after pagination. https://developers.facebook.com/blog/post/478/

If you really need to accommodate this edge case, you need to "remember" where you left off. jandjorgensen suggestion is just about spot on, but I would use a field guaranteed to be unique like the primary key. You may need to use more than one field.

Following Facebook's flow, you can (and should) cache the pages already requested and just return those with deleted rows filtered if they request a page they had already requested.

User · Answer

I ve thought long and hard about this and finally ended up with the solution I ll describe below  It s a pretty big step up in complexity but if you do make this step  you ll end up with what you are really after  which is deterministic results for future requests   Your example of an item being deleted is only the tip of the iceberg  What if you are filtering by color blue but someone changes item colors in between requests  Fetching all items in a paged manner reliably is impossible    unless    we implement revision history   I ve implemented it and it s actually less difficult than I expected  Here s what I did    I created a single table changelogs with an auto-increment ID column My entities have an id field  but this is not the primary key The entities have a changeId field which is both the primary key as well as a foreign key to changelogs  Whenever a user creates  updates or deletes a record  the system inserts a new record in changelogs  grabs the id and assigns it to a new version of the entity  which it then inserts in the DB My queries select the maximum changeId  grouped by id  and self-join that to get the most recent versions of all records   Filters are applied to the most recent records A state field keeps track of whether an item is deleted The max changeId is returned to the client and added as a query parameter in subsequent requests Because only new changes are created  every single changeId represents a unique snapshot of the underlying data at the moment the change was created   This means that you can cache the results of requests that have the parameter changeId in them forever  The results will never expire because they will never change  This also opens up exciting feature such as rollback   revert  synching client cache etc  Any features that benefit from change history

[rest] API pagination best practices

The answer is

Examples related to rest

Examples related to api-design

Tags

[rest] API pagination best practices

The answer is

Examples related to rest

Examples related to pagination

Examples related to api-design

Tags