Reading Rails - Migrations


Today we’re going to talk about an often ignored workhorse of Rails, the Migrator. How does it find your migrations and run them? We will amble through the Rails source, and pick up whatever bits of knowledge we find along the way.

To follow along, open each library in your editor with qwandry, or just look it up on Github.

In the Beginning

In the beginning, there’s nothing. Maybe you have your database, but it’s empty. If you call rake db:migrate, all the pending migrations will run. Let’s start off by looking at that Rake task in databases.rake:

desc "Migrate the database (options: VERSION=x, VERBOSE=false, SCOPE=blog)."
task :migrate => [:environment, :load_config] do
  ActiveRecord::Migration.verbose = ENV["VERBOSE"] ? ENV["VERBOSE"] == "true" : true
  ActiveRecord::Migrator.migrate(ActiveRecord::Migrator.migrations_paths, ENV["VERSION"] ? ENV["VERSION"].to_i : nil)
  #...
end

We’ll gloss over the details of how Rake itself works, but for the time being notice that migrate requires two other tasks [:environment, :load_config] to be run first. This ensures that the Rails environment and your database.yml has been loaded.

The body of the rake task configures ActiveRecord::Migration and ActiveRecord::Migrator using an environment variables. Environment variables are a useful way of passing information to your application. Many variables are set by default such as USER. They can also be set on a per command basis. For instance if you invoked Rake with VERBOSE=false rake db:migrate, then ENV["VERBOSE"] would be the string "false".

# Invoke irb with an environment variable:
# > FOOD=cake irb
ENV['FOOD']     #=> 'cake'
ENV['USER']     #=> 'adam'
ENV['WAFFLES']  #=> nil

The actual migration gets kicked off with ActiveRecord::Migrator.migrate, which is being given a set of paths where migrations might exist, and an optional version to migrate to.

Finding Migrations

Pop open ActiveRecord’s migration.rb. Before we dig into this, take a moment to look over the exceptions defined at the top of the file. It is very easy to define custom exceptions, and migration.rb has a few good examples of them:

module ActiveRecord
  # Exception that can be raised to stop migrations from going backwards.
  class IrreversibleMigration < ActiveRecordError
  end
  
  #...
  class IllegalMigrationNameError < ActiveRecordError#:nodoc:
    def initialize(name)
      super("Illegal name for migration file: #{name}\n\t(only lower case letters, numbers, and '_' allowed)")
    end
  end
  
  #...

Custom exceptions can be specially handled as we saw in the previous article about how rails handles exceptions. In this case, IrreversibleMigration signals that migration cannot be backed out. Another reason to define your own exceptions is to generate consistent error messages as IllegalMigrationNameError does by overriding initialize. Just be sure that you call super.

Now scroll down and let’s look at Migrator.migrate:

class Migrator
  class << self
    def migrate(migrations_paths, target_version = nil, &block)
      case
      when target_version.nil?
        up(migrations_paths, target_version, &block)
      #...
      when current_version > target_version
        down(migrations_paths, target_version, &block)
      else
        up(migrations_paths, target_version, &block)
      end
    end
  #...

Depending on the target_version we will either migrate up or down. Both methods follow the same pattern, they scan the migration_paths for available migrations, and then initiate a new Migrator instance. Let’s see how those migrations are located:

class Migrator
  class << self
    def migrations(paths)
      paths = Array(paths)

      files = Dir[*paths.map { |p| "#{p}/**/[0-9]*_*.rb" }]

      migrations = files.map do |file|
        version, name, scope = file.scan(/([0-9]+)_([_a-z0-9]*)\.?([_a-z0-9]*)?\.rb\z/).first

        raise IllegalMigrationNameError.new(file) unless version
        version = version.to_i
        name = name.camelize

        MigrationProxy.new(name, version, file, scope)
      end

      migrations.sort_by(&:version)
    end

This method is chock full of useful examples, so let’s settle down for a few minutes and read it carefully. We start off with a little trick used to ensure arguments are always arrays, the Array() method. Method you say? Although unorthodox, it is valid to define CamelCase methods, even if they share the name of a class:

class Flummox
end

def Flummox()
  "confusing"
end

Flummox       #=> Flummox
Flummox.new   #=> #<Flummox:0x0000000bf0b5d0>
Flummox()     #=> "confusing"

Ruby uses this to define an Array() method, which always returns an Array instance:

Array(nil)                #=> []
Array([])                 #=> []
Array(1)                  #=> [1]
Array("Hello")            #=> ["Hello"]
Array(["Hello", "World"]) #=> ["Hello", "World"]

This is similar to to_a, but can be called on any object. Rails uses this with paths = Array(paths) to ensure that paths will always be an array.

Next Rails searches those paths and filters them all in one impressive line:

files = Dir[*paths.map { |p| "#{p}/**/[0-9]*_*.rb" }]

Let’s unpack that from the inside out. paths.map { |p| "#{p}/**/[0-9]*_*.rb" } converts each path into a [shell glob](http://en.wikipedia.org/wiki/Glob_(programming\)). A path like "db/migrate" becomes "db/migrate/**/[0-9]*_*.rb", which will match any file inside "db/migrate" or any of its sub directories as long as they start with a digit. Those paths are then splatted with the * operator and passed to Dir[].

Dir[] is extremely useful. It takes patterns like "db/migrate/**/[0-9]*_*.rb", and returns an array of matching files. Keep Dir[] at hand whenever you need to find files based on a path. The ** will recursively match sub directories, and * is a wildcard for one or more characters, so this pattern will match migrations like 20131127051346_create_people.rb. .

Rails iterates over each matching file, and plucks out information using a regular expression with String#scan. If you’re not familiar with regular expressions, drop everything and learn them now. String#scan returns all the matches in a given string. If the expression contains capturing groups, those are returned in subarrays. For example:

s = "123 abc 456"
# No capturing groups:
s.scan(/\d+/)           #=> ["123", "456"]
s.scan(/\d+\s\w+/)      #=> ["123 abc"]

# Capturing a number and then a word:
s.scan(/(\d+)\s+(\w+)/) #=> [["123", "abc"]]

So file.scan will match the version ([0-9]+), a name ([_a-z0-9]*), and then optionally a scope ([_a-z0-9]*)?. Since String#scan always returns an array, and we know this pattern will only appear once, Rails just plucks off the first match. Rails assigns version, name, scope = ... all at once. This is done with array destructuring:

version, name, scope = ["20131127051346", "create_people"]
version #=> "20131127051346"
name    #=> "create_people"
scope   #=> nil

Notice that if there are more variables than array elements, the remaining variables will be assigned nil. This is a handy shortcut when assigning values from a regular expression.

The version is converted to an integer (Fixnum) using to_i, and the name is reformatted with name.camelize. String#camelize is defined by ActiveSupport, and refers to the conventions of snake_case vs CamelCase. This method will convert a string "create_people" into "CreatePeople".

Let’s save MigrationProxy for a moment and look at the final part of this method, migrations.sort_by(&:version). This expression sorts all the migrations using their version. How it sorts them though is rather interesting.

As of Ruby 1.9, & will call to_proc on whatever it precedes. When called on a symbol, the result is a Proc which calls the method named by the symbol. So &:version evaluates to something along the lines of {|obj| obj.version }.

Library = Struct.new(:name, :version)
libraries = [
  Library.new("Rails", "4.0.1"), 
  Library.new("Rake", "10.1.0")
]

libraries.map{|lib| lib.version } #=> ["4.0.1", "10.1.0"]

# &:version => Proc.new{|lib| lib.version } (Roughly)
libraries.map(&:version)          #=> ["4.0.1", "10.1.0"]

This is often used when sorting or mapping in Rails. As with all shorthands, make sure your team is comfortable with this syntax. When in doubt, the alternative is not much longer, and more clear.

The Migration

Now, return to MigrationProxy. As its name implies, this is a proxy for Migration instances. Proxy objects are a common design pattern used to transparently replace one object with another object. In this case the MigrationProxy is a stand in for a real Migration object, but defers actually loading the migration’s source unless it is needed. MigrationProxy achieves this by delegating methods:

class MigrationProxy
  #...
  delegate :migrate, :announce, :write, :disable_ddl_transaction, to: :migration

  private

    def migration
      @migration ||= load_migration
    end

    def load_migration
      require(File.expand_path(filename))
      name.constantize.new
    end

end

The delegate method sends each of its arguments to the object returned by the to: option, which is migration in this case. migration will lazily load_migration if @migration has not yet been set. load_migration in turn requires the ruby source, and then creates an instance using name.constantize.new. String#constantize is defined by ActiveSupport, and returns the constant named by a string:

"Person".constantize       #=> Person
"Person".constantize.class #=> Class
"person".constantize       #=> NameError: wrong constant name person

This can be very helpful when you want to dynamically reference a class.

Using MigrationProxy, Rails only loads and instantiates migrations if they are required, which speeds up the migration process and saves some memory.

The actual Migration class gets called by the Migrator when the proxy delegates the migrate method. This in turn calls either Migration#up or Migration#down depending on if the migration is being applied or rolled back.

Recap

We have only scratched the surface of Rails’ migration code, but we’ve learned some interesting things all the same. Migrations are started with a Rake task, which invokes the Migrator. The Migrator in turn finds our migrations and wraps them with MigrationProxy objects until the real Migration is needed.

As always, we have come across a number of interesting methods, idioms, and tricks:

  • Environment variables can be accessed via the ENV constant.
  • Defining custom exceptions is a common idiom for error handling.
  • Array() converts any object into an array.
  • Dir[] uses the shell glob syntax to search for files.
  • String#scan returns all the matches in a string, and supports capturing groups.
  • String#camelize converts snake_case strings to CamelCase.
  • The & operator creates a Proc when called on a symbol.
  • delegate can be used to implement the proxy design pattern.
  • You can dynamically load constants with String#constantize.

Next time perhaps we can figure out exactly how the Migrator knows which migrations have been applied to your database.

More articles in this series