The Select Star Problem

The thing that always bugs me about using any sort of data layer in Java is what I call the Select * problem. Whether you're using a roll-your-own DAO pattern or Entity Beans or an O/R bridge like JDO or Hibernate or even some other type of mapping layer like iBatis, you always end up doing "select * from tableName" somewhere deep inside your app, then throwing the stuff you don't need away. Sure you can try to design around this and you can write your code to try to make sure that you don't get more data than you need to get from a DB, but generally I think developers always end up defaulting to wasteful general purpose methods when something more exact is needed.

Take for example the classic Data Access Object pattern. You basically have a class that sits in front of your database and passes collections of Value Objects filled with data (or whatever they're called now... Data Transfer Objects? Who knows.) The Value Objects can be as simple as an ArrayList of HashMap name-value pairs, or you can go the full-on JavaBean route and pass back collections of custom JavaBeans (that match your tables) that you've populated with data from a database.

As an aside - what I'm doing (as I've written about in the past) is using XML instead of traditional Value Objects. I load up a DOM in my middle-tier and use that to carry data before passing it off to the presentation layer. I do this for a variety of reasons that have to do with supporting multiple mobile devices and using tools like XPath and XSL to help augment JSP/JSTL. I call these XML Access Objects (XAOs) but regardless, it's the same concept.

Back to my point. Say you need the email address from a customer. Well, because you're using an DAO or Hibernate or something, instead of doing a more precise "select email from users where id=1243", your SQL is separated away from your main logic and you end up calling the more generic getUser(int id) method. This method then returns an fully-populated object of some sort with all the values from the user table for that id: "select * from users where id=1234". Any of you who've done any server-side Java programming using a middle tier should find this familiar. It's horribly wasteful, but it's also quick to use and very object oriented - I don't have to think about the DB, I just think about some magical object that has the data I need.

This drives me insane. Just the wastefulness of it. I love writing "shallow" server-side apps like this weblog because I can be very precise with the data I need and get. But as the layers on the back end become more pronounced as complexity enters the system, the data access methods become more and more generic and you end up throwing away a lot of the data you queried for. Maybe it's just me wanting to "pre-optimize" a solution, but the surest way to create a non-scalable web app is to beat your DB to death with extraneous queries, no?

I think most people's response would be to "not worry about it" or let some magic silver-bullet middleware deal with it. Or better yet, wait until later to look for pain-points and then refactor the queries to be more efficient. But I just can't get over all the thrashing I'm doing on the back end. Trying to solve this problem, however, means lots and lots of methods for every conceivable use case (getUserEmail(id)) and that starts to become a maintenance nightmare. I don't know about you, but I keep forgetting what methods I've already written are. Add some more developers to the project and stir and suddenly separating the SQL from Java has now become exponentially more complex.

I wonder if there's a good solution to this? My druthers would be to just throw the damn SQL code in the code where ever I needed it. If Java wasn't so hostile to the long parameterized Strings that SQL entails I may have just done that. But I really can't. If maintaining dozens of tiny DAO methods are bad, routing through a hundred classes to find embedded SQL is even worse.

Anyways, I thought I'd just rant about this since it's on my mind (and on my screen). Someday someone will figure out an object/data mapping nirvana... Maybe it's already out there but I don't know about it because of my resistance to 30 different XMl config files and SQL-like querying languages. Tip for all O/R writers: SQL is perfect. Don't try to replace it, because you can't.

What's your thoughts? What am I missing?


Update: Three days later... Okay, I'm over it. On the drive to work this morning I realized I was overly sensitive to Michael and Gavin's comments (below) and I shouldn't have been. I think I was just grumpy that day (and then stubborn for three days after). I apologize to both of them.

< Previous         Next >