Querying JSON fields in PostgreSQL using ActiveRecord
24 Jan 2016I’ve written before about how to use PostgreSQL with Rails to store hashes using the hstore extension. Since then I’ve moved on to using the JSON/B column which offers the flexibility of storing much more complex JSON objects.
While querying hstore data is lot more straight forward, it took me a while to figure out how to query more complex data structures in the JSON column. Besides the official documentation here, I had a hard time coming across good examples. So I’ll go ahead and document my experience with it here for what it’s worth.
Setup
So first let’s start by creating a migration which would allow us to add JSON data to our table:
class AddArbitraryDataToRecord < ActiveRecord::Migration
def change
add_column :record, :arbitrary_data, :jsonb
end
end
You can choose to store the data as either json
or jsonb
(supported in PostgreSQL 9.4 & Rails 4.2). The main difference being that the latter stores the data in a binary form which will lend itself to some speed improvements if you are planning to do more complex operations such as sorting, slicing, splicing, etc. inside the DB. At this point, there is probably no reason not to use jsonb
if it’s available.
Next let’s add some dummy data which is basically a 2d matrix of rows and columns. It looks something like this:
Record.create(arbitrary_data: {
"rows": [
{
"id": 1,
"columns": [
{"id": 1, "value": true},
{"id": 2, "value": false},
{"id": 3, "value": true},
{"id": 4, "value": false},
{"id": 5, "value": true}
]
},
{
"id": 2,
"columns": [
{"id": 1, "value": false},
{"id": 2, "value": false},
{"id": 3, "value": true},
{"id": 4, "value": false},
{"id": 5, "value": false}
]
}
]
})
Raw SQL Query
Now say for example, we wanted to fetch all the records that had a true value in their second row and third column. How would we get that?
At the time of writing, there is no ActiveRecord DSL which lets us do that, so we’ll just have to make do with a raw SQL query like this:
Record.where("arbitrary_data #> '{rows,?}' #> '{columns,?}' ->> 'value' = 'true'", 1, 2)
Let’s break down the key operators in that query:
#>
: Get the JSON object at that path->>
: Get the JSON object at that path as text{obj, n}
: Get the nth item in that object
So in plain English that means, look into all the arbitrary_data
columns and get the 1st item in the rows
object as JSON, then get the 2nd item from the columns
object as JSON and then get the value in the value
attribute as text and select the row if it equals 'true'
.
For more insight into other operators check out Matthew Schinckel’s excellent post which provides a variery of different examples.
Refactoring to use Arel
Now let’s see if we can make this query bit more composable by taking an object oriented approach using Arel which is the underlying querying engine for ActiveRecord:
class RecordQuery
attr_accessor :rows
attr_accessor :columns
attr_accessor :value
def initialize(rows, columns, value)
@rows = rows
@columns = columns
@value = value
end
def result
table.where(query.eq(@value))
end
private
def table
Arel::Table.new(:records)
end
def column_op
Arel::Nodes::InfixOperation.new('->>', Arel::Nodes.build_quoted("{columns,#{@columns}}"), Arel::Nodes.build_quoted('value'))
end
def row_op
Arel::Nodes::InfixOperation.new('#>', Arel::Nodes.build_quoted("{rows,#{@rows}}"), column_op)
end
def query
Arel::Nodes::InfixOperation.new('#>', table[:arbitrary_data], row_op)
end
end
Calling this in the console should generate identical SQL to the raw query we had constructed earlier:
> RecordQuery.new(1, 2, 'true').result.to_sql
=> "SELECT FROM \"records\" WHERE \"records\".\"arbitrary_data\" #> '{rows,1}' #> '{columns,2}' ->> 'value' = 'true'"
While creating a separate RecordQuery
object may look like an overkill for this particular example, for more complex queries this is the recommended approach given it lends itself to better readability and reuse in the long run.
You could further expand RecordQuery
to accept an argument hash and construct a more complex query based on if we just want to query by a single attribute or a combination of two or more.
Going forward
I’m hoping ActiveRecord will expand their DSL for better JSON querying but until then we can either build our own using Arel or take advantage of some gems like surus and postgres_ext
Happy hacking!