Posts Tagged ‘unit testing’

Test all levels, god dammit!


Recently I’ve heard a discussion regarding how a test system should be implemented. The system follows a standard architecture found in many, many places.


One of my colleagues proposed a following solution:

Let’s test at the GUI level. If we do that we will have everything tested, as all layers have to work correctly in order to give us properly rendered page.

Drawing from my small testing experience, I can state this is a very bad idea. It’s very tempting to do so, but there are several flaws of this approach and hopefully you won’t make this mistake.

Observable State

The first major flaw is that the on the highest level of abstraction (GUI) we will only observe a small piece of the reality. Let’s say you have a table in your html with aggregated data (sum of all income, etc.). There is quite a lot of data, so you draw a conclusion that as the values match your expected state everything is in order.

Invalid scenario

This is what really happened.

  1. Database was created
  2. Database was filled with test input vector
  3. System executed several test script commands
  4. PHP retrieved aggregated values from tables
  5. Defect PHP command removed half of the values from database
  6. HTML consists of properly calculated data (which was captured in point 4)
  7. Database is torn-down
  8. New test begins

Covering Infections

This time, let’s presume that virtually all state that is relevant to the system is visible in the GUI. What if there is not one, but two defects, when one covers the other?

Invalid scenario

  1. SetUp() logic completed successfully
  2. Defective DB stored procedure which was supposed to add 10$ to every account only added 5$
  3. Defective (due to outdated logic) PHP function applies a bonus of 5$ to every read column
  4. Correct values are observed in HTML

Silent Defects

Remember, the more functional/system tests you will do, the more you are tied to a specific environment/input data. This time we have a low-level function that returns the week of the year, called YearWeek(). This procedure is called many times and we never (during our GUI tests) encountered any problems with it.

Unfortunately, after some time a QA finds out that in one part of the system a difference of week number A and week number B (which happens always later in the year) is negative! How come Assuumption: B > A Result: B - A < 0!?

What happened is that border weeks in December/January of one year are treated as weeks of the previous/next year. In order to fix this problem you will have to return both, the year and a week. Let’s assume that this functionality is used across the whole system. Do you think incorporating such change of logic can be easy in multiple places in the system?

Every insufficiently tested functionality is bound to show it’s weakness some day, which might require a lot of costly changes and delays.

Debugging Paranoia

Dang it! Your test environment reported an one test has failed. Value A in textbox T is invalid as we expected B. You know that you will do for the next n hours? You will dig from the very top of your system (GUI), to the very bottom (Database). Madness!

By having test for every layer, 1+ test will fail. Basing on the failed non-GUI tests you can quickly determine at what layer the error appeared. If all layers reported errors, someone created a defect in database. If only GUI failed, then you know you will take a closer look at the HTML/JavaScript/etc.


Testing GUI itself is tiresome. First, you probably need to understand and use some framework so that you can even start testing. Second, such test aren’t the fastest ones, thus you will need some more worker machines if you wish to follow the continuous integration path. Third, these test are very brittle, especially when comparing screen-to-screen. Any change in the design that is done by an artist, or changing the environment settings (resolution, bpp, browser settings) can lead to avalanche of failed tests.