Monday, June 1, 2009

When once is not enough



P1030473

D.R.Y. principal of software design is based on one of the most natural and intuitive ideas: once is enough. D.R.Y. stands for "Do not Repeat Yourself" and applies not so much to code, but to representing ideas and concepts in software systems. Every piece of knowledge, decision, policy, algorithm, etc., should be represented exactly once, and no more than once. The concept of D.R.Y. was introduced by Andy Hunt and Dave Thomas in "The Pragmatic Programmer" book, first published in 1999.
However, the D.R.Y. principal is extremely hard to consistently follow in programming practice. Commonly, the same algorithms will be implemented independently in different parts of the system, for different objects, and same concepts will be implemented differently (with the idea of producing the same results) at multiple parts of the system. Lots of bugs happen when a concept needs to be changed, but not all representations are modified, or modified in the same way.
A common case is dealing with a simple list of things. It is natural to want to store a list as database entries. It makes sense to store a list of things in the database table. It also makes for easier report generation when working strictly from the database data.
However, when writing code, it is desirable to have each item in the list available as an enumerated type. It simplifies the code greatly to be able to work with an enumeration. It also works well with the fact that the list is not going to change often, thus saving on database accesses.
But storing the list as a database table and as an enumerated type directly violates the D.R.Y. - this list is now defined in two places. More importantly, it is now a very likely source of bugs - one of the two sources defining the list can be updated without the other. Depending on the usage, these bugs can be extremely hard to track.
While there is no good solution that restores the D.R.Y. principal, there are ways to automate checking for synchronization of two implementations. A unit test can verify that two lists are in sync and warn about a potential mismatch.
To verify that two lists are the same, the unit test must that retrieve all items from the database and ensure that each item has a corresponding enumerated type. It should also run through all defined enumerated types and verify that there is a correspond database entry. Here's the pseudo code:

List dbItems = retrieveAllFromDB();
foreach (item : dbItems) {
EnumeratedType enum = EnumeratedType.get(item.id);
Assert.valid(enum);
}


foreach (EnumeratedType enum : EnumeratedType.getAll() ) {
DatabaseItem dbItem = retrieveFromDB(enum.id);
Assert.valid(dbItem);
}
While this is not quite as good or right as following the D.R.Y. philosophy, it does prevent many errors, and makes bug hunting a lot simple and more predictable. One of the more substantial drawbacks is that it requires more code, and more maintainance. Still, it is miles better than simply defining the same objects multiple places.