Tagging With ActiveRecord and Postgres
Postgres doesn’t have a secret tagging feature, it does however support arrays as a column type. This creates interesting opportunities for modeling complex data. We will be using arrays to build efficient tagging into a Rails application. We will cover some of Postgres’ the basic array operators, and some strategies for using them in ActiveRecord.
Read the SQL carefully, once you understand what is happening in Postgres, you will be able to apply it much more easily in ActiveRecord.
Tagging With Postgres
As an example, we will create a table of pet hedgehogs, and store their tags in an array column:
CREATE TABLE hedgehogs (
id integer primary key,
name text,
age integer,
tags text[]
);
You might take a second and ponder, “Why hedgehogs?”, well why not? Next notice the syntax for an array, we use text[]
to denote a text array.
You might be tempted to use a varchar
array, don’t. Postgres assumes that ARRAY['spiny', 'cuddly']
is of type text[]
. This means that if you try to do tags && ARRAY['spiny', 'cuddly']
, you will get operator mismatch errors. Just store your tags as text[]
, internally Postgres treats varchar
and text
the same, so there should not be any performance impact.
There are two operators we will use to implement tagging:
A @> B
: Does A contain all of B?A && B
: Does A overlaps any of B?
For our purposes, the contains operator can be used to query for records that contain all the tags, while the overlap operator will return records that contain any of the tags.
Using these two operators we can now do some interesting queries. For instance if we want to find all the hedgehogs that are either spiny or prickly, we can use the &&
operator:
SELECT name, tags FROM hedgehogs
WHERE tags && ARRAY['spiny', 'prickly']
This queries for all the hedgehogs whose tags overlap either 'spiny'
or 'prickly'
:
name | tags
---------|--------------------------------
Marty | spiny, prickly, cute
Quilby | cuddly, prickly, hungry
Thomas | grumpy, prickly, sleepy, spiny
Franklin | spiny, round, tiny
Now what if we want to reduce the results, and only show the hedgehogs that are both spiny and prickly? To do this, we can use the contains operator, which requires that the array on the left hand side contains all the elements on the right:
SELECT name, tags FROM hedgehogs
WHERE tags @> ARRAY['spiny', 'prickly']
As you can see, we only get back a subset of the previous results:
name | tags
---------|--------------------------------
Marty | spiny, prickly, cute
Thomas | grumpy, prickly, sleepy, spiny
Excluding tags is simply a matter of using a NOT
, so if you wanted to find all the hedgehogs that are not spiny, but are cuddly, you can construct the query as follows:
SELECT name, tags FROM hedgehogs
WHERE NOT tags @> ARRAY['spiny']
AND tags @> ARRAY['cuddly']
This yields a new list of hedgehogs:
name | tags
---------|-------------------------
Horrace | cuddly, cute
Quilby | cuddly, prickly, hungry
Combining these set operations provides a flexible and convenient way to query tags.
Tagging in ActiveRecord
ActiveRecord 4 now supports SQL arrays in Postgres. This means that if you have an array column in your database, it will be exposed as a ruby array when you access it in your model. Even if you are not using the most recent version of ActiveRecord, you can still benefit from querying against arrays using the techniques we have already covered.
The ActiveRecord equivalent to our previous hedgehog example would be:
create_table :hedgehogs do |t|
t.string :name
t.integer :age
t.text :tags, array: true
end
The array
option tells ActiveRecord to create SQL array.
We can of course simply use the conditions from the previous queries:
Hedgehog.where "tags @> ARRAY[?]", ['spiny', 'prickly']
In this case we would return the models for all the hedgehogs that are both spiny and prickly since we used the contains operator. ActiveRecord will replace the ?
with properly quoted values, but will not turn the array ['spiny', 'prickly']
into a SQL array literal, so you need to wrap it in the array constructor ARRAY[...]
. While this approach works fine for one off queries, it makes sense to wrap this logic up in a scope:
class Hedgehog < ActiveRecord::Base
scope :any_tags, -> (tags){where('tags && ARRAY[?]', tags)}
scope :all_tags, -> (tags){where('tags @> ARRAY[?]', tags)}
end
These two scopes are parameterized by the tags we want to match against. Now you can query your models more easily:
Hedgehog.all_tags(['spiny', 'large'])
Best of all, using scopes to encapsulate these operations lets us chain them with other ActiveRecord calls. For instance you could pluck out all the ids of the large, spiny hedgehogs:
Hedgehog.all_tags(['spiny', 'large']).pluck(:id)
Performance
If you intend on querying against tags often, then you may want to create an index on them. Postgres has two special types of indexes that speed up the overlap and contains operators. These are GIN and GiST indexes. As a rule of thumb, if your tags are not being updated often you will want to use a GIN since it is faster to query. If your tags will be updated often, a GiST index will have less overhead. When in doubt, benchmark both.
A GIN index can be created in SQL as follows:
CREATE INDEX hedgehogs_tags_index
ON hedgehogs
USING gin(tags)
Should you want to use a GiST index instead, simply replace USING gin(tags)
with USING gist(tags)
. ActiveRecord 4 has introduced helpers for GIN and GiST indexes, so in a migration you can now use:
class AddTagsIndex < ActiveRecord::Migration
def change
add_index(:hedgehogs, :tags, :using => 'gin')
end
end
Arrays in ActiveRecord < 4
If you are not using ActiveRecord 4, don’t despair. The postgres_ext gem provides helpers for Postgres migrations and accessors for array types in ActiveRecord 3.2.
Need support for older rails applications? None of this magic, you can create your tables without using ActiveRecord’s helpers:
class CreateHedgehogsTable < ActiveRecord::Migration
def up
execute <<-SQL
CREATE TABLE hedgehogs (
id integer primary key,
name text,
age integer,
tags text[]
)
SQL
end
When using an older version of ActiveRecord, array columns will be returned as strings that look like ‘{1,2,3}’. You can parse these with pg_array_parser.
Further Reading
- Postgres Array Documentation
- Postgres Array Operators
- Perspectives on LedgerSMB has two articles on arrays and indexing
- ActiveRecord Scopes
If you know of any other useful libraries or techniques for dealing with tagging in SQL and ActiveRecord, let me know.