This is exactly the the feeling I got when I took a peek at the database I’m working with.
I thought, oh gosh I made a mistake when I chose this work…then i thought, well, look at the positive side,
there are lots of room for improvement here. To be honest I cant wait to make some database refactoring, if they allow my hands into it.
Whenever you mention to do database refactoring, you cant help by noticing chicken skin on some project leaders.
It is like mentioning a curse, we are all so much afraid of Murphy’s Law and the main engineering principle:
“If it’s working, don’t touch it!” that we endure whatever we have for the sake of safety.
As usual I will keep this on my blog as a reminder for future use.
It’s taken from Scott Ambler’s site http://www.agiledata.org/essays/databaseRefactoringSmells.html
1. Multi-purpose column. If a column is being used for several purposes it is very likely that extra code exists to ensure that the source data is being used the “right way”, often by checking the values of one or more other columns. An example is a column used to store either someone’s birth date if they’re a customer or their start date if they’re an employee. Worse yet, you are very likely constrained in the functionality that you can now support, for example, how would you store the birth date of an employee?
2. Multi-purpose table. Similarly, when a table is being used to store several types of entities there is likely a design flaw. An example would be a generic Customer table that is used to store information about both people and corporations. The problem with this approach is that data structures for people and corporations are different – people have a first, middle, and last name for example whereas a corporation simply has a legal name. A generic Customer table would have columns which are NULL for some kinds of customers but not others.
3. Redundant data. Redundant data is one of many serious problems in operational databases because when data is stored in several places the opportunity for inconsistency occurs. For example, it is quite common to discover that customer information is stored in many different places within your organization, in fact many companies are unable to put together an accurate list of who their customers actually are. The problem is that in one table John Smith lives at 123 Main Street and in another table at 456 Elm Street. In this case this is actually one person who used to live at 123 Main Street but who moved last year, unfortunately John didn’t submit two change of address forms to your company, one for each application which new about him.
4. Tables with many columns. When a table has many columns it is indicative that the table lacks cohesion, which it’s trying to store data from several entities. Perhaps your Customer table contains columns to store three different addresses (shipping, billing, seasonal) or several phone numbers (home, work, cell, …). You likely need to normalize this structure by adding Address and PhoneNumber tables.
5. Tables with many rows. Large tables are indicative of performance problems, for example it’s very time consuming to search a table with millions of rows. You may want to split the table vertically by moving some columns into another table, or split it horizontally by moving some rows into another table. Both strategies reduces the size of the table, potentially improving performance.
6. “Smart” columns. A “smart column” is one in which different positions within the data represent different concepts. For example, if the first four digits of the client ID indicate the client’s home branch, then client ID is a smart column because you can parse it to discover more granular information (e.g. home branch ID). Another example includes a text column used to store XML data structures; clearly you can parse the XML data structure for smaller data fields. Smart columns often need to be reorganized into their constituent data fields at some point so that the database can easily deal with them as separate elements.
7. Fear of change. If you’re afraid to change your database schema because you’re afraid to break something, for example the fifty applications which access it, then that’s the surest sign that you need to refactor your schema. Fear of change is a very good indication that you have a serious technical risk on your hands, one that will only get worse over time. My advice is to embrace change.