Getting to Know the Ruby Standard Library – Abbrev
We’re going to take a look at another little piece of ruby’s standard library, this time it is
Abbrev
, a tiny library that generates abbreviations for a set of words. We will expand ever so slightly on the
one-liner from my last post to show an example of
Abbrev
in action:
require 'abbrev'
commands = Dir[*ENV['PATH'].split(':').map{|p| p+"/*"}].select{|f| File.executable? f}.map{|f| File.basename f}.uniq.abbrev
commands['ls'] #=> 'ls'
commands['spli'] #=> 'split'
commands['spl'] #=> nil
This will match anything on your path, or any substring that will match only one longer string. Combine this with Shellwords, and you have the pieces for an auto completing console. It could also be used for matching rake tasks, tests, or giving suggestions for mistyped methods.
How it Works
So that is what
Abbrev
does, but how does it work? If you open up the library (qw abbrev
if you have
Qwandry), you will see that it is pretty small, there’s just one method, and then a helper that extends array.
Starting from the beginning, we see that it takes an array of words and an optional pattern. It stores the abbreviations in
table
, and tracks of occurrences of each abbreviation in
seen
using the counting idiom I mentioned in
Hash Tricks.
def abbrev(words, pattern = nil)
table = {}
seen = Hash.new(0)
...
The pattern can be a RegularExpression or a String:
if pattern.is_a?(String)
pattern = /^#{Regexp.quote(pattern)}/ # regard as a prefix
end
If it’s a String, it is converted to a RegularExpression. Notice that
Regexp.quote(pattern)
is used so that any characters that have special meanings as RegularExpressions will get escaped. If this pattern is present, it is used to ignore any abbreviations that don’t match it. Next we see how the abbreviations are generated for each word:
#...
words.each do |word|
next if (abbrev = word).empty?
while (len = abbrev.rindex(/[\w\W]\z/)) > 0
abbrev = word[0,len]
next if pattern && pattern !~ abbrev
case seen[abbrev] += 1
when 1
table[abbrev] = word
when 2
table.delete(abbrev)
else
break
end
end
end
#...
The first part of this sets the current word to
abbrev
, but skips the word if it is blank. The next part of the loop is a little more confusing, what does
abbrev.rindex(/[\w\W]\z/)
do? It gives you the index of the last character in the String, as far as I can tell in ruby 1.9 this is equivalent to
String#length - 1
. So the inner
while
loop is going to use
abbrev = word[0,len]
to chop off a character each time until the String is empty. The hash
seen
is incremented by 1 for this substring. If this is the first time the word has been seen, then the word is recorded. If this is the second time the word has been seen, the word is removed because it is not unique. If the word has been seen more than twice, then not only has this word been seen, but we know that all the substrings of this word have been seen and removed, so the loop exits.
#...
words.each do |word|
next if pattern && pattern !~ word
table[word] = word
end
table
end
Finally
Abbrev
loops through the original words and inserts them. This means that if the array contained "look"
and "lookout"
they both get added as matches for themselves even though "look"
is a substring of "lookout"
.
So there you have it, ruby’s
Abbrev
library explained, go forth and shorten words.
More articles in this series
- Getting to Know the Ruby Standard Library – Delegator
- Getting to Know the Ruby Standard Library – WeakRef
- Getting to Know the Ruby Standard Library – Timeout
- Getting to Know the Ruby Standard Library – Pathname
- Getting to Know the Ruby Standard Library – Abbrev
- Getting to Know the Ruby Standard Library – TSort
- Getting to Know the Ruby Standard Library – MiniTest::Mock
- Getting to Know the Ruby Standard Library – Shellwords
- Getting to Know the Ruby Standard Library – MiniTest