Saturday, March 03, 2007

Optimizing the Message Tree with JPA

The trouble with recursive tree structures is that databases tend to give poor performance implementing them. Further, JPA doesn't really natively do trees (so far as I can tell). Fortunately, morons.org solved this problem a very long time ago. My solution was to associate each collection of messages with a single "root" (though it isn't a root in the true sense of the word) and with each other. If you happened to have "connect by" syntax with your database, you could take advantage of it by virtue of the parent_id column. If you didn't, you could always do a fairly quick pass through the full collection of messages you got off the single root. The latter is exactly what Tally Ho will be doing with JPA.

Key to making this work is denoting the 1:1 relationship to the parent as fetch=FetchType.LAZY.

When I first tried this with Glassfish Persistence (aka Toplink Essentials) it seemed to be ignoring my request to use FetchType.LAZY. As soon as the first object loaded, it loaded its parent. And the parent of that object. And so on, recursively. Toplink would issue one SELECT statement for every single object in the database for the "root" in question. Obviously the performance of this strategy really sucked.

It turns out that for proper behaviour you need to turn on something the Toplink folks call "weaving", which I suspect is actually just a fancy Oracle term for Java bytecode instrumentation. To do this, you add the property toplink.weaving to persistence.xml and set its value to true. You must also add the JVM argument -javaagent:{path to libs}/toplink-essentials-agent.jar to your running configuration.

Once I did this, Toplink would happily wait to fetch any parent objects until they were asked for. And by virtue of all parent/child relationships being collected inside the same "root", when you first ask for a parent, it has already been loaded in memory, so an additional trip to the database is not required.

The downside is that the "children" List on the Message won't get populated, and if you let JPA handle the population, it will again make a lot of trips to the database to figure out which children belong to which parent... it has no way of knowing my rule that all of the relationships among the children are contained within the confines of that one root.

Fortunately for the needs of Tally Ho, this problem is easily mitigated. We simply mark the children List with @Transient to prevent JPA from trying to do anything with it at all. Then we pull this trick in MessageRoot:

public List getMessages() {
if (!messagesFixed) {
fixMessages();
}
return messages;
}

private void fixMessages() {
for (Message message : messages) {
if (message.getParent() != null) {
message.getParent().getChildren().add(message);
}
}
messagesFixed = true;
}


The only reason we can get away with this is that we never look at a collection of Messages without getting it from the MessageRoot. In the cases where we do look at a single Message, we don't look at its children.

It's a tiny bit hackish, but the performance improvement is tremendous, so it's worth it. Over the Internet, through my ssh tunnel to my server, 180 message objects were loaded and linked together correctly in about 650ms. I suspect this will improve substantially when the database is on the local host.

Labels: , , ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]