Querying paginated API endpoints the Rails way
18 Aug 2020Recently I was working on querying an API that didn’t have a Ruby SDK. So I had the opportunity to write a thin service wrapper for it. This API allowed requesting a specific page and the total number of records to include per page.
So say for example, the API returned a list of people. Here’s what that would look like:
{
"meta": {
"page": { "total": 200 }
},
"items": [
{
"type": "people",
// ...
},
{
"type": "people",
// ...
}
]
}
First pass
I made a first attempt at writing a PeopleService
PORO that is responsible for querying those records:
class PeopleService
def where(params = {})
default_params = { page: 1, count: 25 }
params = default_params.merge(params)
make_request(params)
end
private
def make_request(params)
# Make external API call using the params
end
end
That would do the job if the caller writes their our own iteration logic. For instance, if we want to retrieve all the People from the API:
page_count = 0
people = []
loop do
page_count += 1
result = PeopleService.new.where(page: page_count)
people << result[:items]
break if page_count = result.dig(:meta, :page, :total).to_i
end
Although that feels a bit messy. The API contract is leaking out of the PeopleService
abstraction layer that we just created.
Let’s make it Rails-y
I want my service object to follow more Rails like conventions. So in other words, I’d like to be able to iterate over the results from the PeopleService
in an ActiveRecord
like syntax. For example: PeopleService.new.all.each{ |person| #some operation }
Enumeration
So to achieve that, we will have to make use Ruby’s Enumerator object:
class PeopleService
def initialize
# Setup API auth params
end
def where(params = {})
default_params = { page: 1, count: 25 }
params = default_params.merge(params)
make_request(params)
end
def all(params = {})
Enumerator.new do |yielder|
page = 1
loop do
result = where(params.merge(page: key))
result[:items].map { |item| yielder << item }
# This loop knows how to rescues from this exception and will treat it as a break
raise StopIteration if page >= result.dig(:meta, :page, :total).to_i
page += 1
end
end.lazy
end
private
def make_request(params)
# Make external API call using the params
end
end
That will get us closer to what we are looking for. Using the Enumerator object above is what will give us the ability to iterate over the results returned from the all
method. It will instantiate and return an Enumerable object.
That will unlock a powerful ability to chain a number of enumerators together and perform block operations on them which will make our service highly composable.
So for example, if we wanted to group the people by their location we could chain the results with a group_by
function:
PeopleService.new.all.group_by{ |person| person.location }
Lastly, you might have noticed we tacked in a .lazy
at the end of the enumerable instance. What that does is makes it into an instance of Enumerator::Lazy
and only returns the results that we specifically enumerate over.
So say if this API had 1,000 pages of results. Without the lazy enumerator, PeopleService.new.all
would actually query all of those 1,000 pages as soon as you call it. That would be extremely slow and resource-intensive. In most cases, we might even hit a rate limit set by the API provider. What we rather want is that it only query the pages that we actually enumerate over.
So for example, if we are trying to find the person object with a specific email, it will stop querying the API as soon as it finds a page that contains Jon Doe:
PeopleService.new.all.find { |person| person.email = '[email protected]' }
Caching
Right now calling the all
method again, despite it being a lazy enumerator will query the pages it has already queried. So for example:
ps = PeopleService.new
# This will iterate through the pages until we find Jon Doe
person = ps.all.find { |person| person.email = '[email protected]' }
# Calling this again **should not** query the same pages again. We should already store the results.
person = ps.all.find { |person| person.email = '[email protected]' }
Similar to ActiveRecord
’s query cache, we also want to cache the results from our query for performance. This is where one of the most underrated features of the Hash class comes into play.
If you instantiate a Hash with a block, it will use that block to calculate the value of the key. In our case, we can tell the hash to call the API to fetch the results of the page we are looking for.
The beauty of this feature is that it will only call the block once per key. So if the key has already been assigned a value, it will not call it again:
h = Hash.new do |h, key|
h[key] = where(page: key)
end
h[1]
# Fetches results for page 1 fromt he API
# => (500.0ms) [{...},{...},{...}]
# Next call to the same key is already assigned, the block isn't executed
h[1]
# => (Cached 0.0ms) [{...},{...},{...}]
When using the Hash approach in our class, we will also want to make sure that we use Memoization (using the ||=
operator) to ensure that the Hash itself is cached in an instance variable called all_pages
.
This will allow us to call the all
method multiple times after the class has instantiated and ensure the results don’t get overwritten:
class PeopleService
...
def all(params = {})
Enumerator.new do |yielder|
page = 1
loop do
@all_pages ||= Hash.new do |h, key|
h[key] = where(params.merge(page: key))
end
result = @all_pages[page]
result[:items].map { |item| yielder << item }
raise StopIteration if page >= result.dig(:meta, :page, :total).to_i
page += 1
end
end.lazy
end
...
end
Final form
Here’s what our finished product looks like after leveraging the key features of the Enumerator and Hash objects. Now our all
method’s interface will be very similar to the one provided by ActiveRecord
class PeopleService
def where(params = {})
default_params = { page: 1, count: 25 }
params = default_params.merge(params)
make_request(params)
end
def all(params = {})
Enumerator.new do |yielder|
page = 1
loop do
@all_pages ||= Hash.new do |h, key|
h[key] = where(params.merge(page: key))
end
result = @all_pages[page]
result[:items].map { |item| yielder << item }
raise StopIteration if page >= result.dig(:meta, :page, :total).to_i
page += 1
end
end.lazy
end
private
def make_request(params)
# Make external API call using the params
end
end
Usage:
ps = PeopleService.new
ps.all.each do |person|
# some operation on the person object
end
Let me know if that was useful. Would love to hear about any other techniques that you’ve found particularly interesting when querying external APIs.