Getting to Know the Ruby Standard Library – TSort
TSort
is an interesting part of the ruby standard library that performs topological sorting. Topological sorting is used in package management, analyzing source code, and evaluating prerequisites. RubyGems uses
TSort
to determine what order to install gems in when there are multiple dependencies. Let’s take a look at
TSort
in action.
Here is an example of a
JobRunner
which will take a bunch of tasks with prerequisites, and then tell you which order they should be performed in:
require 'tsort'
class JobRunner
include TSort
Job = Struct.new(:name, :dependencies)
def initialize()
@jobs = Hash.new{|h,k| h[k] = []}
end
alias_method :execute, :tsort
def add(name, dependencies=[])
@jobs[name] = dependencies
end
def tsort_each_node(&block)
@jobs.each_key(&block)
end
def tsort_each_child(node, &block)
@jobs[node].each(&block)
end
end
if __FILE__ == $0
runner = JobRunner.new
runner.add('breakfast', ['serve'])
runner.add('serve', ['cook'])
runner.add('cook', ['buy eggs','buy bacon'])
puts runner.execute
end
Running this file will show that you need to buy the eggs and bacon before you can cook and then serve breakfast.
buy eggs
buy bacon
cook
serve
breakfast
Hardly an achievement of complex reasoning, but as the number of prerequisites grow, this becomes vastly more useful.
TSort
Let’s take a look at the source (qw tsort
if you have
qwandry). The first thing to notice is that TSort is a module instead of a class. This means that instead of extending it, or calling it directly we will include it in another class. Notice that in the
JobRunner
above we called
include TSort
at the very beginning. This means our class will now include
TSort
’s functionality, but in order for it to work we need to implement a few methods. From the TSort rdoc:
TSort requires two methods to interpret an object as a graph,
tsort_each_node and tsort_each_child.
* tsort_each_node is used to iterate for all nodes over a graph.
* tsort_each_child is used to iterate for child nodes of a given node.
Our implementation of
tsort_each_node
iterates over each job, while
tsort_each_child
will iterate over each prerequisite for the job. These methods allow
TSort
to provide two useful methods. The first is
strongly_connected_components
, which will return each node or sets of nodes forming a circular dependency. The second is
tsort
which we used above to return the nodes in a sorted order so that each of their prerequisites are satisfied. Lets take a look at what each of these does:
def tsort
result = []
tsort_each {|element| result << element}
result
end
This is simple enough, it just collects all the results from
tsort_each
and returns them. Following this thread we see that
tsort_each
then iterates over the results of
each_strongly_connected_component
, so lets take a closer look at that, and perhaps we can figure out how
TSort
works.
def each_strongly_connected_component # :yields: nodes
id_map = {}
stack = []
tsort_each_node {|node|
unless id_map.include? node
each_strongly_connected_component_from(node, id_map, stack) {|c|
yield c
#...
From this snippet of code we can see that
TSort
is going to iterate over each of the nodes (jobs in our case), while doing this it will keep track of the position each node has in the stack with the
id_map
hash. Next let’s see what
each_strongly_connected_component_from
does with the node.
def each_strongly_connected_component_from(node, id_map={}, stack=[])
minimum_id = node_id = id_map[node] = id_map.size
stack_length = stack.length
stack << node
#...
We can see that
id_map
and
stack
are passed around to keep track of what has been seen already. Jumping to the end of the method, we see that when we finish,
TSort
is going to yield back an array of nodes:
if node_id == minimum_id
component = stack.slice!(stack_length .. -1)
component.each {|n| id_map[n] = nil}
yield component
end
We know from our example above that this eventually returns all the nodes in their proper order, and
tsort_each
expects the arrays to have a length of 1 so we can assume that if everything goes correctly
stack.slice!(stack_length .. -1)
should return an array with a length of 1. After it returns this array, it clears the entries from the
id_map
. We can deduce that
TSort
sorts the nodes by pushing items onto the stack, and then returning the top of it when there are no remaining prerequisites for the node.
Now let’s look at the main part of this method:
#...
tsort_each_child(node) {|child|
if id_map.include? child
child_id = id_map[child]
minimum_id = child_id if child_id && child_id < minimum_id
else
sub_minimum_id =
each_strongly_connected_component_from(child, id_map, stack) {|c|
yield c
}
minimum_id = sub_minimum_id if sub_minimum_id < minimum_id
end
}
#...
For each child we see that if we have already evaluated the node. If its index in the stack is less than our current
minimum_id
we treat it as the new minimum. Otherwise we recursively call
each_strongly_connected_component_from
for this node, which will push the child onto the stack and push its prerequisites onto the stack as well. Once there are no more nodes left to push on, the loop exits and the node will get popped off. If there is a cycle (nodes that depend on each other), all the nodes in the cycle will be returned. If this is starting to look familiar to you, this essentially amounts to a depth first search of all the nodes.
Recap
Our exploration of
TSort
has revealed a useful library, and shown us one way to perform a depth first search on a data structure that may include cycles, which is pretty neat. Curious about other applications of
TSort
? The
strongly_connected_components
that
TSort
returns also happens to be useful for detecting tightly coupled methods and classes which depend on each other which is useful when doing refactoring and static analysis of source code. This is just one of the handy things we can do with ruby’s standard library.
More articles in this series
- Getting to Know the Ruby Standard Library – Delegator
- Getting to Know the Ruby Standard Library – WeakRef
- Getting to Know the Ruby Standard Library – Timeout
- Getting to Know the Ruby Standard Library – Pathname
- Getting to Know the Ruby Standard Library – Abbrev
- Getting to Know the Ruby Standard Library – TSort
- Getting to Know the Ruby Standard Library – MiniTest::Mock
- Getting to Know the Ruby Standard Library – Shellwords
- Getting to Know the Ruby Standard Library – MiniTest