Querying JSON fields in PostgreSQL using ActiveRecord

I’ve written before about how to use PostgreSQL with Rails to store hashes using the hstore extension. Since then I’ve moved on to using the JSON/B column which offers the flexibility of storing much more complex JSON objects.

While querying hstore data is lot more straight forward, it took me a while to figure out how to query more complex data structures in the JSON column. Besides the official documentation here, I had a hard time coming across good examples. So I’ll go ahead and document my experience with it here for what it’s worth.

Setup

So first let’s start by creating a migration which would allow us to add JSON data to our table:

class AddArbitraryDataToRecord < ActiveRecord::Migration
  def change
    add_column :record, :arbitrary_data, :jsonb
  end
end

You can choose to store the data as either json or jsonb (supported in PostgreSQL 9.4 & Rails 4.2). The main difference being that the latter stores the data in a binary form which will lend itself to some speed improvements if you are planning to do more complex operations such as sorting, slicing, splicing, etc. inside the DB. At this point, there is probably no reason not to use jsonb if it’s available.

Next let’s add some dummy data which is basically a 2d matrix of rows and columns. It looks something like this:

    Record.create(arbitrary_data: {
      "rows": [
        {
          "id": 1,
          "columns": [
          {"id": 1, "value": true}, 
          {"id": 2, "value": false}, 
          {"id": 3, "value": true},
          {"id": 4, "value": false},
          {"id": 5, "value": true}
          ]
        },
        {
          "id": 2,
          "columns": [
          {"id": 1, "value": false}, 
          {"id": 2, "value": false}, 
          {"id": 3, "value": true},
          {"id": 4, "value": false},
          {"id": 5, "value": false}
          ]
        }
      ]
    })

Raw SQL Query

Now say for example, we wanted to fetch all the records that had a true value in their second row and third column. How would we get that?

At the time of writing, there is no ActiveRecord DSL which lets us do that, so we’ll just have to make do with a raw SQL query like this:

Record.where("arbitrary_data #> '{rows,?}' #> '{columns,?}' ->> 'value' = 'true'", 1, 2)

Let’s break down the key operators in that query:

So in plain English that means, look into all the arbitrary_data columns and get the 1st item in the rows object as JSON, then get the 2nd item from the columns object as JSON and then get the value in the value attribute as text and select the row if it equals 'true'.

For more insight into other operators check out Matthew Schinckel’s excellent post which provides a variery of different examples.

Refactoring to use Arel

Now let’s see if we can make this query bit more composable by taking an object oriented approach using Arel which is the underlying querying engine for ActiveRecord:

class RecordQuery
  attr_accessor :rows
  attr_accessor :columns
  attr_accessor :value

  def initialize(rows, columns, value)
    @rows = rows
    @columns = columns
    @value = value
  end

  def result
    table.where(query.eq(@value))
  end

  private

  def table
    Arel::Table.new(:records)
  end

  def column_op
    Arel::Nodes::InfixOperation.new('->>', Arel::Nodes.build_quoted("{columns,#{@columns}}"), Arel::Nodes.build_quoted('value'))
  end

  def row_op
    Arel::Nodes::InfixOperation.new('#>', Arel::Nodes.build_quoted("{rows,#{@rows}}"), column_op)
  end

  def query
    Arel::Nodes::InfixOperation.new('#>', table[:arbitrary_data], row_op)
  end
end

Calling this in the console should generate identical SQL to the raw query we had constructed earlier:

> RecordQuery.new(1, 2, 'true').result.to_sql
=> "SELECT FROM \"records\" WHERE \"records\".\"arbitrary_data\" #> '{rows,1}' #> '{columns,2}' ->> 'value' = 'true'"

While creating a separate RecordQuery object may look like an overkill for this particular example, for more complex queries this is the recommended approach given it lends itself to better readability and reuse in the long run.

You could further expand RecordQuery to accept an argument hash and construct a more complex query based on if we just want to query by a single attribute or a combination of two or more.

Going forward

I’m hoping ActiveRecord will expand their DSL for better JSON querying but until then we can either build our own using Arel or take advantage of some gems like surus and postgres_ext

Happy hacking!

If you liked this post, 🗞 subscribe to my newsletter and follow me on 𝕏!