Sunday, November 04, 2007

Time for a Divorce

I've been using Toplink Essentials as the JPA provider for Tally Ho almost exclusively since the project began, except for a quick look at OpenJPA. This is in part because of my experience with the commercial Toplink product- I know that Oracle's Toplink is a mature product (having started out in the early 90's as a Smalltalk persistence provider), and I am comfortable working with it. Unfortunately, the open source Toplink Essentials product does not live up to the promise of Toplink. I've reached the point where I'm tired of coding around its bugs, and now that there are other, healthier projects out there, I shouldn't have to.

That got me to thinking: a lot of us developers use a lot of open source software. How do we choose which packages we want to use? Obviously whatever we choose has to be a good technical fit for our needs... what's the point if it doesn't do the job we're after? But now it occurs to me that open source software has to meet a particular social need as well. One way we can gauge a project's health is by how strong it is socially. How interested are people? Is the project active? Are people excited about the project? Excited enough to fix bugs?

It's a little hard to compare apples to apples in this case, but let's look at a couple things and try to relate them as best we can. Toplink Essentials is maintained as part of the Glassfish project.

In the last 30 days at the time of this writing, the folks working on it have fixed 10 bugs. In that same amount of time, 15 bugs were opened (or changed and left in an opened state). About 53 messages have been posted on the discussion forum. The oldest unresolved bug has been open for about a year and 10 months.

In that same amount of time, the OpenJPA folks have resolved 19 bugs while 20 have been opened. The mailing list has had about 215 posts. The oldest unresolved bug is a year and 4 months old (though it was touched 3 months ago). OpenJPA is using Jira which makes it a bit easier to produce meaningful metrics such that we can find that the average unresolved age of a bug in the last month is about 3 months, which has been fairly consistent.

(I gave up trying to compute the average unresolved age of bugs for Toplink Essentials. It's just too annoying to figure out if the bug tracking tool doesn't do it for you.)

It is probably the case that most open source projects (and probably closed ones too) have a few ancient bugs gathering dust. I think that it's more interesting to look at what a project has been doing recently, like in the last 30-180 days. Are they keeping up with their bug backlog? Is there an active community? Are you likely to get help if you ask for it? Of the bugs that come in, what percentage get fixed and what percentage get dumped in the attic?

And perhaps the most important criteria of all: are they fixing MY bug?

While I wasn't watching, OpenJPA reached a 1.0.0 release. It's available under the Apache 2 license from a Maven 2 repository. They fixed the bug I opened earlier this year (within a day even). It is full-featured and even has an extensive manual. Though, like Toplink, their ant task doesn't work very well.

I used to be concerned about the large number of dependencies that OpenJPA has, but now that the project is building with Maven 2, it's much less of a concern for me. It isn't necessary to go manually fetch anything to build the project, since Maven 2 takes care of all the direct and transitive dependencies. One thing I did have to manually tweak was to force inclusion of commons-collections 3.2 in my pom.xml, because something else in my project depends on an earlier version of commons-collections, and OpenJPA needs a later version.

So it's time to give Toplink one final heave-ho. My reasons for sticking with it have now been outweighed by my need of having compile-time weaving that works and a project where problems are likely to be fixed within my lifetime. It's time for Toplink and I to start seeing other people.

New releases of Tally-Ho will be using OpenJPA as the persistence provider... just as soon as I get all the unit tests passing.

Labels: , , , , , , , ,


Tuesday, August 14, 2007

Toplink Query Deficiencies

In any relational setting, it is wise to avoid this situation if you can:

TABLE_ONE
---------
object_id serial primary key not null
foo_id integer not null references TABLE_TWO(object_id)
bar_id integer not null references TABLE_TWO(object_id)

TABLE_TWO
---------
object_id serial primary key not null
some_field varchar(80) not null

There is no way to perform a single join from table_one to table_two to get a complete set of information, because of the references to multiple different rows in table_two. It's probably better to reorganize the relationship if you can.

The exception is if you don't necessarily need both fields. For example, if you can do without knowing anything about bar_id in most cases, you could always lazy load that field. That is, if lazy loading for 1:1 fields works in your JPA provider.

It still doesn't work with Toplink, and these bugs are STILL open:

https://glassfish.dev.java.net/issues/show_bug.cgi?id=2546
https://glassfish.dev.java.net/issues/show_bug.cgi?id=2554

The consequence is that I cannot static weave my model, so my only choice for now is to mark the fields as @Transient and ignore them for now. (I have a feature that tracks the changes made to every entry in a table with a changer that references an Account... for now, I just won't track the changer until I have time to come up with a better way of doing it or the glassfish folks fix their shit.)

Labels: , , ,


Sunday, March 11, 2007

JPA + J2SE Servlet Containers = Impossible

Well, somewhat impossible depending on what you need to do.

If you need dynamic weaving of your classes at runtime, give up. It won't work, not with a J2SE servlet container.

The problem seems to be that the javaagent (for both OpenJPA and for Toplink Essentials) is incapable of coping with classes that are loaded by a different classloader. If you manage to get your javaagent JVM argument to work (which will require annoying experiments with quotation marks and adding stuff to your classpath), then all of your Entity classes will fail to load because the agent can't find them. They don't exist as far is it is concerned, because they are loaded by a different classloader.

So if you want to use lazy loading of 1:1 fields with a J2SE container, you must either do static weaving or give up hope. Your other option is to use a full-blown J2EE container in all its bloated glory.

Screw this. I'm going to bed.

Labels: , , , , ,


Tuesday, March 06, 2007

Toplink Essentials: Not Ready for Prime Time

I really really like the Toplink commercial product. I think it's tremendously powerful, flexible, well-supported, and rock-solid. It does exactly what it is supposed to do, every time.

I wish I could say the same for Toplink Essentials. It feels like the open source branching was done as a rush-job, leaving large portions of the necessary code and tooling undone. My simple test case to take advantage of "weaving" (bytecode instrumentation) fails. My attempt to make the instrumentation work using static weaving (post-compile and pre-run) also fails, but in a different way. These are not elaborate test cases. My code isn't terribly complex yet, and my schema is fairly straightforward. Yet Toplink falls down and goes splat. Some of their tools can't even handle spaces in the pathname.

For now I'll keep using OpenJPA until the Glassfish people can make Toplink Essentials stop sucking so damn much. OpenJPA is a bit more pedantic anyway, and that's a good thing. My code should end up tighter as a result.

Much of this JPA stuff just isn't ready for the real world yet, though it's been a final spec since last May and a proposed final draft since December of 2005. They don't even have support for @OneToOne in Eclipse WTP 2.0 yet, and every edit to the persistence.xml file brings up some lame dialog box whining about a ConcurrentModificationException. Granted, it's just a milestone build. But this is the type of frustration one encounters when trying to use JPA.

And no, I'm not going to use sucky Hibernate and their dozens of goddamn dependencies, lack of integrated object caching, and mental-defective nerd community.

Labels: , , ,


Saturday, March 03, 2007

Optimizing the Message Tree with JPA

The trouble with recursive tree structures is that databases tend to give poor performance implementing them. Further, JPA doesn't really natively do trees (so far as I can tell). Fortunately, morons.org solved this problem a very long time ago. My solution was to associate each collection of messages with a single "root" (though it isn't a root in the true sense of the word) and with each other. If you happened to have "connect by" syntax with your database, you could take advantage of it by virtue of the parent_id column. If you didn't, you could always do a fairly quick pass through the full collection of messages you got off the single root. The latter is exactly what Tally Ho will be doing with JPA.

Key to making this work is denoting the 1:1 relationship to the parent as fetch=FetchType.LAZY.

When I first tried this with Glassfish Persistence (aka Toplink Essentials) it seemed to be ignoring my request to use FetchType.LAZY. As soon as the first object loaded, it loaded its parent. And the parent of that object. And so on, recursively. Toplink would issue one SELECT statement for every single object in the database for the "root" in question. Obviously the performance of this strategy really sucked.

It turns out that for proper behaviour you need to turn on something the Toplink folks call "weaving", which I suspect is actually just a fancy Oracle term for Java bytecode instrumentation. To do this, you add the property toplink.weaving to persistence.xml and set its value to true. You must also add the JVM argument -javaagent:{path to libs}/toplink-essentials-agent.jar to your running configuration.

Once I did this, Toplink would happily wait to fetch any parent objects until they were asked for. And by virtue of all parent/child relationships being collected inside the same "root", when you first ask for a parent, it has already been loaded in memory, so an additional trip to the database is not required.

The downside is that the "children" List on the Message won't get populated, and if you let JPA handle the population, it will again make a lot of trips to the database to figure out which children belong to which parent... it has no way of knowing my rule that all of the relationships among the children are contained within the confines of that one root.

Fortunately for the needs of Tally Ho, this problem is easily mitigated. We simply mark the children List with @Transient to prevent JPA from trying to do anything with it at all. Then we pull this trick in MessageRoot:

public List getMessages() {
if (!messagesFixed) {
fixMessages();
}
return messages;
}

private void fixMessages() {
for (Message message : messages) {
if (message.getParent() != null) {
message.getParent().getChildren().add(message);
}
}
messagesFixed = true;
}


The only reason we can get away with this is that we never look at a collection of Messages without getting it from the MessageRoot. In the cases where we do look at a single Message, we don't look at its children.

It's a tiny bit hackish, but the performance improvement is tremendous, so it's worth it. Over the Internet, through my ssh tunnel to my server, 180 message objects were loaded and linked together correctly in about 650ms. I suspect this will improve substantially when the database is on the local host.

Labels: , , ,


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]