Friday, September 28, 2007

Never Start With the Data Model

My bias is for object oriented design and domain modeling. I am convinced that you don't build great software by starting design with a data model. I think starting with the data model is just about the worst idea and pretty much guarantees mediocre software. The best systems start with a domain model and work through the various layers of the system down to the database, not the other way around.

The database should be left until as late in development as possible. Experience shows this is not the case because creating the database is usually one of the first things on a project task list. Instead we should be asking how soon do we really need a database? The unit tests will be better and faster when they don't need a database. You know that data access should be encapsulated behind an interface. Why not leave the database until we know what the data requirements of an at least partially working system are?

Honestly, I can't think of a harder way to try to conceptualize a system than through a data model. Do your users understand a database schema? Is a database schema the model of your system you want to carry around in your head? Great software requires a model that can be used to communicate with users, developers, product owners, and domain experts. I don't think a data model is how any business person conceptualizes their business processes. What you need is a consistent coherent domain model that can can be used to make sense of a system at any level. You must be able to "peel the onion" to discover details as needed. Design by data model immediately forces all the nasty gnarly implementation details front and center. It is a well understood user interface design principle that bad designs expose details of the data model via data entry screens. The worst designs percolate the data model from top to bottom.

The best software solutions model a problem in the problem domain. A data model is a solution in the database domain not the problem domain. So why is the database schema where design so often starts? Maybe it's the mainframe mentality in big corporations, or naive developers who don't really understand object orientation, or just a general ignorance of object oriented design and domain modeling. Eric Evans wrote great book on domain driven design that is not as widely read as it deserves to be. In that book he talks about the "ubiquitous language" of a software project -
"A domain model can be the core of a common language for a software project. The model is a set of concepts built up in the heads of people on the project, with terms and relationships that reflect domain insight. These terms and interrelationships provide the semantics of a language that is tailored to the domain ..."
I don't see such a language ever coming from a data model.

What prompted me to write this was a post touting the new release of an "active record" object relational mapping framework for Java and some controversy it generated in this post. As you can probably guess, I'm not really a fan of the active record design pattern. But reading the posts and the comments, and knowing how popular Ruby on Rails is, it's clear that lot's of developers are very big fans.

I don't get it. Maybe it's because active record is so easy to understand. I suppose active record might work for small systems. But I would never want to use it on any even moderately sized enterprise software project. I see enough ugly code, highly coupled, tangled classes, and mangled hierarchies without database access code mixed into my business classes and immediately coupling my design to the database. I shudder to imagine the maintenance headaches it would cause. At least I'm in good company -
"Another argument against Active Record is the fact that it couples the object design to the database design. This makes it more difficult to refactor either design as a project goes forward."

Martin Fowler, Patterns of Enterprise Application Integration
"In an object oriented program, UI, database, and other support code often gets written directly into the business objects. Additional business logic is embedded in the behavior of the UI widgets and database scripts. This happens because it is the easiest way to make things work, in the short run.

When the domain-related code is diffused through such a large amount of code, it becomes extremely difficult to see and to reason about. Superficial changes to the UI can actually change business logic. To change a business rule may require require meticulous tracing of UI code, database code, or other program elements."

Eric Evans, Domain Driven Design

I hope the fans of active record know what they could be getting themselves into, but as usual I doubt that is the case.


Anonymous said...

There are data models elsewhere too, like XML. The SOA people have a reason to promote the contract-first approach but this may result in issues similar to first creating your data model.

Richard Hansen said...

Good point about the possibility of other data models. Having done a bunch of web service development, I think I'd probably consider that interface development. Interfaces are generally good to start developing early on. But I would still hope that more thought has been put into the web service xml than just copying a database schema.

Shaghouri said...

I totally agree with what you are saying. The current project that I have been working on for the past couple years is a living example about the failure of the "data model" first mentality. We suffer from all the symptoms that you mentioned, starting from rigidness and tight-coupling to the loss of the ubiquitous language. Needless to say that it has been a nightmare to maintain or upgrade such a system.

This is probably a "job security" thing for some workers, but man, this security comes at a very expensive cost (both to the company, and to the rest of us who are trying to maintain their sanity!!)

Richard Hansen said...

Yes, not all ways to achieve job security are equally satisfying! A big problem I have seen is that what you can end up with is a business system that the business users don't understand and find hard to use. The system is supposed to solve a problem in the the business domain but the only people who understand it are developers. The developers only understand the system from long intimate association with it. So development ends up owning a process that is supposed to belong to the business.

Anthony Lewandowski said...

Excellent article. I’ve been working for years to get developers to understand that how one starts a project very often dictates how the project finishes, or how it is maintained. Almost all software products are about solving business problems - not about solving data problems. Users think in terms of business concepts. The product will be organized and displayed to users in along business concept lines. It will be extended by addressing additional business problems. The data model is a behind-the-scenes thing that only developers/DBAs should worry about. I think a better path to job security is to deliver software that addresses the business needs of customers, extending it when they need more functionality, and reusing all but the UI for the next customer.