The Test That Cried Fail
Let’s start with a story.
Once there was a developer who was eager and new. He wrote a feature. He wrote a test. He ran the test suite.
His test passed, and the feature seemed to work, but one test failed. Then it passed. His coworkers nodded sagely, “Yes”, they said, “that happens sometimes.” So he committed and carried on.
Weeks later, he fixed a bug, and lo, the test failed once again, but he knew this test, it had a penchant for failure. So he carried on.
That evening, news came from on high, there was a horrible flaw that must be fixed. He wrote a fix, and ran his tests. All but one test passed, but he knew this fickle test, and heeded it not.
Eager to prove his worth, he deployed the code. When the clock struck midnight, every user account was deleted.
There was much gnashing of teeth.
Hyperbole? Perhaps a little, indeterminate failures are costly. When taken seriously they rob us of time, and when ignored, they lull us into false security. So why do tests fail intermittently, and what can we do about it?
There are four major causes of undue headaches: time, external state, internal state, and unspecified behavior. By isolating our code from these, we can prevent many intermittent test failures.
Isolating Time
If a test depends on time, then time’s passage may introduce unexpected results. Let’s consider a simple function that returns the number of weekdays remaining this week.
def weekdays_remaining
w = Date.today.wday # Sunday = 0, Saturday = 6
w < 5 ? 5 - w : 0
end
So if today is Wednesday, then Thursday and Friday remain in this week.
def test_weekdays_remaining
remaining = weekdays_remaining
assert_equal 2, remaining
end
This test passes today, but will fail tomorrow, and then pass again next Wednesday. Although contrived, there are lots of situations where we may use Time.now
or Date.today
, for instance you might send out emails on Fridays, or prevent some operations on weekends.
To test time dependent code, you need to control time. Actually, you need to control Ruby’s two clocks Time.now
and Date.today
. Controlling just one means that if your code, or any of your dependencies, call the other, you’ll end up in a very strange world where Time.now.to_date
no longer equals Date.today
.
Luckily there are plenty of libraries that can help you isolate time.
Let’s rewrite this test with ActiveSupport’s time helper:
def test_weekdays_remaining_on_wednesday
travel_to(Date.new(2015, 10, 7)) do
remaining = weekdays_remaining
assert_equal 2, remaining
end
end
Now we have isolated our test from time, and if we run it on Thursday, the test will pass. Since we are now masters of time, we can easily add tests for other days like Saturday and Sunday.
Symptoms: Tests fail when run at specific times.
Solution: Control time in your tests.
Isolating External State
Databases, file systems, and web services, are all external to your code, but can cause test failures if not controlled.
Consider this set of fictitious tests:
def test_items_should_have_names
item = Item.find(13)
assert !item.name.blank?
end
def test_destroying_items_is_logged
Item.find(13).destroy
assert LogEntry.where(item_id: 13, action: "destroyed").exists?
end
If the tests execute in the order listed, and your database actually has an Item 13, the tests will pass. What happens if the tests run in the reverse order?
If you use ActiveRecord, you have probably never run into this problem. Let’s consider why. By default, Rails runs every test in a database transaction, and then rolls that transaction back at the end of the test. So when test_destroying_items_is_logged
finishes, Item 13 shows up again.
Your database is only one source of state, consider this insidious test:
def test_creating_users_increments_cached_user_count
User.create! # Assume this increments "user_count"
count = User.count
cached_count = redis.get("user_count")
assert_equal count, cached_count
end
Run this test once, and it will pass. Run the test again, and you may see a message like Expected 1 to equal 2
. Although the database rolls back between tests, Redis is a separate system which does not roll back.
If the data you store in Redis is ephemeral, you can probably get away with clearing it after each test, and bring it back to a known state. In Rails you can achieve this with a setup hook:
setup do
redis.flushdb
end
In a similar vein, consider the file system. If your code interacts with the file system, you may run into some unexpected behavior. Imagine running this test:
def test_generating_reports
report = SampleReport.new
report.run
assert File.exist?("reports/sample.csv")
end
What happens if you change SampleReport to generate JSON, and write to "reports/sample.json"
instead? The test will continue to pass on your machine until you delete "reports/sample.csv"
, but developers with a fresh checkout will see a failing test. In the simple case, you can just clean up after your test:
def test_generating_reports
expected_path = "report/sample.csv"
begin
report = SampleReport.new
report.run
assert File.exist?(expected_path)
ensure
FileUtils.rm_f(expected_path)
end
end
Symptoms: Tests fail when run multiple times.
Solution: Reset servers or filesystem to a known good state.
In each of these cases, you own the source of the external state. It’s your database or filesystem, so you can safely reset it to a known state. It is becoming increasingly common though to rely on third party services. If you’re lucky the service may provide a developer sandbox for you to experiment with. Unfortunately, this makes for a poor testing situation.
Consider the following test:
def test_charging_customer_records_balance
user = User.find(13)
user.charge(10.00) # issues HTTP POST to example.com
user.charge(20.00) # issues HTTP POST to example.com
balance = user.balance
assert_equal 30.00, balance
end
What happens if this test is run twice? What happens if another developer runs this test while you run it? What if the service is down for maintenance?
Just as we controlled time, controlling the network can isolate your tests from this external state. Libraries such as WebMock allow you to declare what requests to make, and what responses they should yield. Instead of relying on the real service, you declare how the service should respond:
def test_charging_customer_records_balance
stub_request(:post, "www.example.com/api/user/13/charge"))
.to_return(
{status: 200, body: '{id: 1, amount: 10.0}'},
{status: 200, body: '{id: 2, amount: 20.0}'}
)
user = User.find(13)
user.charge(10.00) # issues HTTP POST to example.com
user.charge(20.00) # issues HTTP POST to example.com
balance = user.balance
assert_equal 30.00, balance
end
Assuming the responses are realistic, you can now test your application’s logic without relying on external state. This also makes testing failure cases simpler:
def test_balance_only_changes_on_successful_charges
stub_request(:post, "www.example.com/api/user/13/charge"))
.to_return(
{status: 200, body: '{id: 1, amount: 10.0}'},
{status: 422 }
)
user = User.find(13)
user.charge(10.00) # issues HTTP POST to example.com
user.charge("$30.00") # issues HTTP POST to example.com
balance = user.balance
assert_equal 10.00, balance
end
Isolating tests from third party services avoids unpredictable behavior.
Symptoms: Tests fail when multiple people run them at the same time.
Solution: Mock out HTTP requests.
Isolating Internal State
Just because something is within the boundaries of your code, doesn’t mean it can’t introduce intermittent test failures. Here’s an example of a failure we recently had a whole lot of fun tracking down.
It’s a good idea to set limits, imagine you have a config file describing those limits. There’s a simple way to test those limits, just create a bunch of things:
def test_items_have_a_limit
max_item_count = MyApp::Application.config.limits.max_items
max_item_count.times { Item.create! }
assert_raises(LimitException) do
Item.create!
end
end
Sooner or later, someone will gripe. Why spend several minutes creating tons of pointless objects? It would be more efficient to simply change the limit:
def test_items_have_a_limit
MyApp::Application.config.limits.max_items = 1
Item.create!
assert_raises(LimitException) do
Item.create!
end
end
Brilliant! Now your tests run much more quickly. Except there’s one problem. Other tests seem to randomly fail now. Unlike your database, Redis, or that web service you mocked out, MyApp::Application.config
is shared global state, and by changing it in a test, you have changed it for all the tests run after it. This is a particularly sneaky cause of indeterminate failures because the new test will never fail.
This isn’t just isolated to configuration data, you may also run into these types of issues if you change any object that persists outside of a given test case such as object caches or dynamically generated code.
To avoid this problem, you’ll want to make sure you always set any shared state back. Here’s a handy method you could use:
# Temporarily override an attribute
def temporarily_set(object, attribute, new_value)
# Store the original value
old_value = object.send(attribute)
begin
# Set the new value, and yield to the test
object.send("#{attribute}=", new_value)
yield
ensure
# Ensure that the old value is replaced
object.send("#{attribute}=", old_value)
end
end
Rewriting the test so that it doesn’t pollute other tests is simple:
def test_items_have_a_limit
temporarily_set(MyApp::Application.config.limits, :max_items, 1) do
Item.create!
assert_raises(LimitException) do
Item.create!
end
end
end
Alternatively, this is a great time to use a generalized mocking library like Mocha or FlexMock. Let’s rewrite this test using Mocha’s stub
method to temporarily override the limit:
def test_items_have_a_limit
MyApp::Application.config.limits.stubs(max_items: 1)
Item.create!
assert_raises(LimitException) do
Item.create!
end
end
This might look similar to our initial test, but Mocha isolates these changes by tracking the objects it has modified, and undoing the magic at the end of each test.
Symptoms: Adding a test caused other tests to fail.
Solution: Isolate changes to objects that persist beyond the test.
Avoiding Unspecified Behavior
Most operations are specified. Asking your computer to add the integers 1 and 2 will result in 3, or at least it should. There are however operations that are valid, but unspecified. These are the source of some ridiculously intermittent failures.
Consider this simple test verifying that items send an email when marked done:
def test_marking_item_done_sends_email
item = Items.first
item.done!
sent = ActionMailer::Base.deliveries.present?
assert sent, "Expected marking an item done to send an email"
end
Run this a hundred times on your machine, and it may well pass. Run it on your build server, and maybe it fails. Perhaps most of the time the item fetched is not done, but once in a while it is already marked done, and no email is generated. How could this be? Well if you’re using Postgres (or most SQL databases), it’s because we didn’t specify a sort order:
If sorting is not chosen, the rows will be returned in an unspecified order. — PostgreSQL 9.4.5 Documentation
It happens to be that unspecified order order tends to be the insertion order. Things that “tend” to behave one way should set off alarm bells when writing tests.
There are a few ways to make our test more resilient. One is to be more specific, we want an item that is not done, so specify that:
def test_marking_item_done_sends_email
item = Items.where(done: false).first
# ...
end
That’s better, but it might be more clear to either use a fixture, or a test factory like Factory Girl. Both approaches help ensure you test a known object. Rails’ fixture data lets you reference records by name, so you know exactly what you’re getting:
def test_marking_item_done_sends_email
item = items(:undone_item)
# ...
end
Test factories, like Factory Girl, make it easy to build records suited to your test:
def test_marking_item_done_sends_email
item = create(:item, done: false)
# ...
end
Both approaches have merits, and most importantly, both help you avoid unspecified behavior.
A big theme in unspecified behavior is ordering, be it sorting records with non unique values, or the completion order of asynchronous jobs. If you can, make your tests resilient, or be explicit and avoid unspecified behavior all together.
Symptoms: Tests tend to pass in one environment, and fail in another.
Solution: Avoid unspecified behavior, especially ordering.
Recap
There are a lot of reasons why tests may fail one day, but not the next.
- Isolate time when tests depend on the system clock.
- Isolate resources you control by resetting them to a known good state.
- Isolate resources you don’t control by mocking them out.
- Mock out changes to shared values (config, constants, etc.)
- Prefer explicit tests, and avoid unspecified behavior.
These are just a few common causes of intermittent failures. If you’re interested we can explore some more horrible examples in the future. Until then, do you have any favorite examples?