Tuesday, October 31, 2006

A View to an Article

Today I decided that rather than reindex everything so that all my object IDs can be unique, it would be easier to create a few views in PostgreSQL. That would solve my earlier problem of doing a join with JPA between the article table and the message roots table.

Fortunately, this is easy in PostgreSQL:

CREATE VIEW article_message_root(root_source, root_id, posting_permitted, post_count, post_count_24hr, last_post)
AS SELECT root_source, root_id, posting_permitted='y', post_count, post_count_24hr, last_post
FROM roots
WHERE root_type='article';

As you can see, we can even convert one type to another with a view; when I originally made this table, eons ago, the boolean datatype wasn't available (initially this schema was from an ancient version of MySQL) or I was having some other problem with it. Now I can compensate for that in the view. PostgreSQL determines the data types of each of the resulting columns based on the data types given in the SELECT.

The view seems to be implemented by appending the conditions in the view's WHERE clause to any queries issued against the view:

=> explain select * from article_message_root where root_source = 1234;
Index Scan using roots_root_source_key on roots (cost=0.00..5.99 rows=1 width=29)
Index Cond: ((root_source = 1234) AND ((root_type)::text = 'article'::text))

Compare that to:

=> explain select * from roots where root_source = 1234 and root_type='article';
Index Scan using roots_root_source_key on roots (cost=0.00..5.99 rows=1 width=40)
Index Cond: ((root_source = 1234) AND ((root_type)::text = 'article'::text))

Eventually I'll restructure the whole site codebase and these views will go away in favor of a system where every object has a unique identity across the entire system.

I'm also pondering using the insert trigger to create an entry in a centralized table when that time comes to record the type of object associated with each object identity. Makes it easier to hunt these things down should it become necessary.


Thursday, October 26, 2006

Form Validation Made Simple

The next thing I wanted to figure out was the "Wicket Way" of doing form validation. I've made the mistake in the past of going off and doing things on my own when an existing framework already provided a particular best practice for accomplishing the task. Fortunately, Wicket makes it quite easy to add standard form validators and to add your own.

I decided to start with my "headline" field from my simple article form. A headline can be from 1 to 80 characters, must be provided, and must not contain HTML code. I decided to bite off the hard part first: Wicket does not provide a validator class for rejecting input that contains HTML.

I created a new package, net.spatula.news.ui.validators, figuring I'll want to reuse these and keep them all in one place. I looked at one of the standard validators in the Wicket API to see what it extended and/or implemented, and it turns out that it both extended AbstractValidator and implemented IValidator. Mind you, I still think it's dumb to start Interface class names with an I, especially since nobody who does this does it consistently. Nobody shortens "Abstract" to "A" or spells out the word "Interface." Nobody starts every normal class name with a "C" or the word "Class." This is all superfluous fluff. I know by the fact that the name follows the "implements" keyword that it's a bloody interface. But I digress.

I asked Eclipse to create for me a class called NoHtmlValidator, and have it extend AbstractValidator and implement (gag) IValidator. I was pleased to see that all I had to do was supply the method body for public void validate(FormComponent component). Easy.

FormComponents have a method called getInput(), which returns the value you might get from request.getParameter("name"). It's the String value of the component's data. Fortunately for me, a TextField is just one value, so for now, this will suffice. I coded up something really simple:

public void validate(FormComponent component) {
String data = component.getInput();
Pattern pattern = Pattern.compile(".*<[-\\w]+[^>]+>.*", Pattern.DOTALL);
Matcher matcher = pattern.matcher(data);
if (matcher.matches()) {

I'll refactor the Pattern to be a static member of the class later, so it doesn't have to be compiled every time. As you can see, the match is pretty dumb, but if HTML is detected, I call error(component). Notice that I don't actually give the error message here. It turns out that Wicket wants to get your error text from a resource (.properties) file.

Having had a little experience with Wicket so far, I figured that surely this meant that the name of the resource file it would want would be the class name of my page, followed by .properties. Sure enough, that's what it is looking for. Inside this file, Wicket wants a line like this:

formName.componentName.ValidatorClassName=Message to use when validation error occurs

After fighting with it initially, I found that the formName is the name you used when creating the form component. It is NOT the form's class name (if you created a separate form class). Now you might be thinking, "you're going to end up having to create one of these for every single field that doesn't allow HTML. You'll have messages coming out your ears in this file!"

Not so! Wicket provides this fantastic way of generalizing your messages. You can create a properties file for your application name (which unsurprisingly uses the ApplicationClassName.properties as a name) and say something like this:

NoHtmlValidator=HTML is not allowed for ${label}.

Now any time the NoHtmlValidator issues an error in the application, if there isn't a more specific error for that field, the generic "HTML is not allowed for ${label}." is used. ${label} is substituted with the name of the form component.

But Nick, you say, sometimes I use weird names for my components, and I would rather send a more friendly version of the name to the end user! For example, I might want to say "the headline" instead of "headline." Wicket provides for this too. Inside your PageClassName.properties file, you can specify user-friendly names for your components:

formName.componentName=friendly name

For example:

articleForm.headline=the headline

Now that we've created the error, how do we get it on the page? There are two good ways. You can either put all your feedback at the top of the page, or you can put your errors adjacent the form components to which they're related.

The first method is the easiest. In your HTML page, you create a span or a paragraph with a wicket:id as a placeholder for your error messages. For example:

<span wicket:id="messages">Errors and messages go here</span>

Then you add that component to your page. Inside your page class:

add(new FeedbackPanel("messages"));

That's really all you have to do. When a validator calls error(component), the error will show up in a list where your placeholder is in the HTML.

But suppose you'd rather put the headline errors with the headlines. This turns out to be fairly simple as well. There's a subclass of FeedbackPanel called ComponentFeedbackPanel. As you might imagine, you need to put a span or paragraph tag next to your component in the HTML and give it a wicket ID, just as you would with a regular FeedbackPanel. The only difference is that you're putting it where you want it to appear.

Next, add the ComponentFeedbackPanel to your form. Here's how all of this stuff looks in my code:

TextField headline = new TextField("headline", new PropertyModel(article, "headline"));
headline.add(new NoHtmlValidator());
headline.add(StringValidator.lengthBetween(1, 80));
add(new ComponentFeedbackPanel("headlineErrors", headline));

That's all there is to it. The messages now show where they belong.

One other thing I discovered is that the RequiredValidator is deprecated in favor of setRequired, but a field that fails to pass that validation still uses the resource text for RequiredValidator.

Want that error message to show up in red? Add this to your page's CSS:

.feedbackPanelERROR { color: red }


Sunday, October 22, 2006

Wicket + Sitemesh = feces nocturnus

Everything was going so well... I could make Wicket say "hello world" and get Sitemesh to decorate the page into my style, and then I thought, "now I should try a simple form!" That way, I could fail fast if form handling wasn't going to work, and I could learn what the "Wicket way" is for handling form submission and validation.

Since the first thing I want to replace on morons.org is article submission, I decided to come up with a simple article form. After one tiny hitch (if you include a wicket:id tag in the HTML, you must define a corresponding element in your WebPage; this is a good thing because it means you can't forget to define everything properly) I got the form displayed.

Then disaster struck when I hit the submit button! Not only did the processed form not render, but only a small amount of the decorator showed up. What the hell? Not only that, but hitting the Reload button made the page display as-expected!

After much digging, it came to light that there was a problem with the Content Length. The actual number of bytes in the HTTP Response buffer did not match the number of bytes it claimed to hold, by a difference of about 9KB. As it happened, the size it claimed to have roughly matched the size of the rendered Wicket form... and the difference was about the size of the decorator. I posted to the Wicket Users mailing list and the Sitemesh forum and went to bed, wondering if I could use my SevenBitFilter (which turns literal UTF8 characters into their HTML entity equivalents so the whole page is 7-bit ASCII) to work around the problem, since the SevenBitFilter would reset the Content Length (because it changes the length in most cases).

In the morning, I was pleased to find the answer in my email. Wicket uses different Render Strategies to deliver a rendered page to the end user. The default strategy is to render to a buffer stored in the user's session, then send the client a redirect to another URL; when the redirect is completed, Wicket then dumps the contents of that buffer. The reason for doing this is apparently to prevent the user from double-posting-- clicking "submit" twice in a row.

For some reason, Sitemesh doesn't like that strategy, though I'm at a bit of a loss to understand why it would notice or care. Probably there is some characteristic particular to a redirect that upsets it.

The solution is to change the render strategy for the application by calling getRequestCycleSettings().setRenderStrategy(IRequestCycleSettings.ONE_PASS_RENDER) from the WebApplication's init() method. This does mean that I'll have to handle double-posts myself, but I was planning to do that anyway, so it's not really a setback.

I'm starting to build up a set of settings that I like for my applications, so I may factor out my own WebApplication parent class (a subclass of WebApplication but a parent class to my other applications) that defines those settings by default next.

I'm also thinking about coding up a servlet filter that dumps the state of the Request and Response objects that can be inserted before and/or after any other servlet filter. Then maybe I could see what it was that caused Sitemesh its intestinal discomfort and provide a more detailed bug report. Unfortunately, development on Sitemesh seems to have all but ground to a halt 14 months ago despite promise of an impending new release. That's a bit worrysome.


Tuesday, October 17, 2006

Wicket and Sitemesh: installed and working

Today I decided to divert my attention for a little while from JPA and focus on installing Tomcat, the Tomcat Eclipse plugin and then Wicket and Sitemesh. I decided to use Tomcat for development mainly because of the Tomcat plugin support. Normally I hate Tomcat because of the standard Apache Open Sores way of handling errors: e.printStackTrace(). If I had a nickel for every time Tomcat shat out 300 lines of stack trace because one line was malformed in a JSP, I'd have a lot of nickels. Fortunately, it looks like a lot of the performance problems I'm accustomed to feeling with Tomcat are greatly mitigated in version 5.5.20. It no longer takes the better part of a day to start up, for example... now it's down to about 5 seconds, which I can tolerate. The first page load is still a little sluggish, but nowhere near as bad as it used to be.

My next task was to get Wicket going. For some reason the 'binary' distribution of Wicket 1.2.2 contains all of the source to it, including all of its tests. I don't really need all of that crap. Fortunately, in the root of the zip file is wicket-1.2.2.jar. I wish I had noticed that before I unpacked the entire thing.

It was, quite literally, more difficult to set up the directory structure for my web apps than it was to get Wicket to display my first wicket application. I'm really impressed with how well it works out of the box and how few dependencies it has (basically none). Granted, my first attempt was your typical "Hello, World" application, but I'm really impressed at how easy it was and how I didn't have to wade through 60 exceptions generated by my not knowing what I'm doing. It's really easy to do something simple, just as it should be!

Next it was Sitemesh's turn. I elected to grab the Sitemesh example war file for version 2.2.1. Since I was just experimenting, I dumped the whole thing, except the web.xml file, into my web app root. (I think I forgot to turn on auto-deployment so I just unzipped the whole thing myself.) Then I copied the bits of web.xml that I needed, reordered web.xml (though why Sun decided that the order of web.xml should be significant I'll never understand), restarted Tomcat and ... it just worked! My "hello" now appeared inside the Sitemesh example page.

My only complaint about Sitemesh is its relative lack of thorough documentation. Yes, the basic features all have their basic documentation, but what would really be helpful is a Sitemesh Cookbook, showing common layout problems and their solutions. Maybe if I knew what I was doing, this would be a good project to undertake.


Monday, October 16, 2006

Fun with JPA

I recently undertook a project to begin sorting out the mess that has become of the morons.org back-end code. Much of it was written when I had little idea what I was doing in Java, and a lot of it was written to the limitations present in Java 1.3 (including the absense of regular expressions!).

I decided to learn from some of my past mistakes and separate out the model code from the forms code from the business logic. As it stands today, some of the model (which loosely follows the DAO pattern) contains code for form validation.... and some of it doesn't. There's code in the JSPs to handle deciding whether a form is needed, whether validaton was successful, and whether inserting/updating/deleting happened correctly as well as dumping the errors to the page should anything go wrong. Basically, the thing's a mess, and I've learned a lot since those days.

For several years at my day job, I've been working with Toplink. Prior to that, I did some work with Hibernate and found it frustrating and not ready for serious work (though in fairness to Hibernate, this was with early version 2 stuff). It required patching just to make it barely functional in our environment. Toplink, on the other hand, is a very mature product, having its roots in Smalltalk in the early 1990's, and its mapping workbench is many orders of magnitude easier to work with than Hibernate's XML configuration files. (I despise XML configuration files passionately. They are hideous to look upon and even harder to read and impossible to work with. Or at least impossible for me. I have no tolerance for that garbage at my age.)

As an aside, the Hibernate documentation once answered the FAQ, "Why doesn't Hibernate have a mapping GUI?" with this:

Because you don't need one. The Hibernate mapping format was designed to be edited by hand; Hibernate mappings are both extremely terse and readable. Sometimes, particularly for casual / new users, GUI tools can be nice. But most of the time they slow you down. Arguably, the need for a special tool to edit mapping documents is a sign of a badly-designed mapping format. vi should be a perfectly sufficient tool.

And if you believe that, I have a nice bridge for sale that I'd like to tell you about. Incidentally, I like vi as much as the next guy, but the statement "vi should be a perfectly sufficient tool" calls into question the very sanity of whoever wrote that answer. Twiddling with XML configuration files by hand is not a reasonable or sane way to manage an object model. Maybe there are some people who get a little ego stroking and feel terribly studly and elite working that way, but I am waaaaay too old for that nonsense.

But I digress.

The main thing holding me back from using Toplink in a restructuring of the morons.org back-end was its rather expensive commercial license. Luckily, EJB3 Persistence, now better known as JPA (the Java Persistence API) achived its first release this year, and the reference implementation was done by Oracle (which bought Toplink a few years back) and is largely Toplink code. Around the same time, FreeBSD got an official binary Java 1.5 release, which made annotations available. Annotations aren't required for working with JPA (you can use... a big honking XML configuration file!), but they're by far the easier and cleaner way.

So it was easy to decide to use JPA to manage the object model. I also chose Wicket to handle the view and controller components of the system. Wicket is surprisingly non-stupid, which is a heck of a lot more than I can say for frameworks like Struts. The trouble with a lot of these MVC frameworks is that for the sake of saving you perhaps two if statements, they introduce 30 lines of convoluted XML configuration with a similarly complex API to go with it. I've always believed that frameworks should save you time. If a framework's design is such that complexity is increased-- that is, you replace a few lines of flow control logic written in Java with a large API with interfaces you must implement and XML configuration you must write-- ostensibly for cleanliness that in the end is not realized, then you haven't got a good framework... you've got an exercise in neurotic time-wastery that would turn off even most obsessive-compulsive autistic people. Somehow that doesn't stop some folks.

But I digress.

I also decided to use Sitemesh to handle the overall site layout, abandoning my old system of a customized XSLT filter. Realistically, there are two main categories of browser that access my site anymore: Microsoft Internet Explorer 6 and Firefox 1.5. There are a pittance of other browsers like Safari, Kmeleon, a couple Opera users and the occasional text browser or HTML fetcher. I get a rare hit from an old version of Netscape. The bottom line is that I no longer see value in designing customized XSLT stylesheets for ancient browsers that rarely hit my site. Also the XSLT is (relatively) slow. Sitemesh offers a way of using different files for merging for IE versus Firefox. Currently Firefox is the only browser I know of that can handle exclusively using DIV for layout and that handles display: block correctly. So I will take advantage of Sitemesh to give table-based layout to IE and DIV-based layout to Firefox. The latter saves a bit of bandwidth and is really the better way to do things.

For my code structure I decided I'd put my model in one package, my form handlers in another package, and a service layer in a third package. One discussion that happens with every new project is "where does the business logic go?" and "what operations will exist in the object model?" The trouble with putting intelligence in your object model is that these things tend to be auto-generated by mapping tools and you have to figure out some way to get this business code re-injected each time you generate the model. Further compounding the issue, many business operations involve complex relationships between multiple classes from the object model; how do you decide which part of the model in which to include these operations? Moreover, do operations of that nature even belong in the object model?

I elected to find the most reasonable compromise I could, which was to put these business operations in a service layer, which acts as an intermediary between things like GUI forms and the data model. In this way, business operations are kept out of the GUI, and things which wish to interact with the model are decoupled from directly operating on it.

(Mind you, this is not a service layer implemented as a web service, though it would be easy to tack on a web service using the service layer classes. Web services are nice for some things, but they would add an unnecessary level of indirection, complexity and latency.)

Having made all of these decisions, it was time to begin. I decided to start with the model, since that was the area where the most things could go wrong. And go wrong they did! The trouble with using a bleeding-edge technology like JPA is that sometimes things are bleeding because they've been cut by a sharp object, which will cut you if you aren't careful.

I decided to try the Dali plug-in for Eclipse. Dali helps you create object-relational mappings by marking up your code with annotations and providing a simple interface for supplying the right annotations and the right data to go in those annotations. Unfortunately, Dali is at version 0.5 and doesn't always play as nicely as it could with Eclipse and persistence. Dali also lacks support for PostgreSQL, a glaring oversight considering the popularity of that open-source database (and the one I use). No matter; it does support generic JDBC. I was able to create entities based on classes in my database, with the caveat that some types did not translate well to native Java types (they probably would have if Dali supported PostgreSQL). For example, a few chars and boolean database types became Java longs. These I edited by hand.

I had to struggle a little with the mapping between an Article and a MessageBoardRoot. In the previous incarnation of the morons.org backend code, entities become associated with their corresponding message boards through a table called "roots" which records the entity's type and primary key. For example, a board with the root_id 23654 might be associated with an "article" with the primary key "6745". The composite of "article" and "6745" is unique and associated to all of the messages bearing the root_id of "23654." I wanted to map this as a 1:1 relationship so I could ask an Article for its MessageBoardRoot. Unfortunately, JPA does not allow you to create a join based on a primary key and a constant; only persisted fields can participate in the join. So it was either include in every single article a field with the word "article" in it, or make every object on the entire system have a unique ID and drop the "root_type" field from the roots table. I decided on the latter; every object on the system would share a single sequence generator and name the primary key the same thing: object_id. This meant I would have to renumber some old data, but I think the change is worth it. I can also still record the creating class name in the MessageBoardRoot objects for the sake of being able to easily locate the corresponding class for administrative purposes later.

Then I tried to persist the simplest record I could- an Account, which corresponds to a user login account. JPA exploded immediately with a NullPointerException:

Exception in thread "main" java.lang.NullPointerException
at oracle.toplink.essentials.ejb.cmp3.EntityManagerFactoryProvider.createEntityManagerFactory(EntityManagerFactoryProvider.java:120)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:83)
at javax.persistence.Persistence.createEntityManagerFactory(Persistence.java:60)

Ah, I just love the intuitive errors provided by Open Sores code sometimes. After failing to find a conclusive answer with Google, I came to realize that this error occurs when JPA is unable to locate the persistence.xml file, which normally resides inside the META-INF directory of a jar file, but may also reside in META-INF in a directory on your classpath. After a lot of digging, and using FileMon from Sysinternals.com, I discovered the problem: JPA prepends META-INF onto your classpath entries, so if you think you're going to add your META-INF directory to your classpath, you're barking up the wrong tree. You actually need one directory above it.

Normally, Dali would have taken care of this for me by putting META-INF in the right place. What happened was that I had added Java Persistence to the Eclipse project before I added a source folder to the project. Eclipse apparently drops the project root as a source folder after you add your first source folder to a project, which meant that my classpath entry for source went from net.spatula.news to net.spatula.news/src/java, but the META-INF directory did not move along with that change. I elected to just have Dali create a new persistence.xml file for me by removing the persistence nature from .classpath and adding Java Persistence to the project again. This got me past the NullPointerException-- progress!

The next trouble was with sequences. Whereas Dali does not support PostgreSQL, Toplink Essentials (aka Glassfish persistence) does... sort-of. It turns out that sequence generation for primary keys for PostgreSQL is currently broken in Toplink Essentials. If you specify a sequence generator for a primary key field that isn't of type serial, Toplink completely ignores sequence generation and attemps to insert a null for the primary key. The workaround is not to specify a sequence generator and instead set up a table for the default sequence generation strategy. This means issuing the commands

create table sequence(seq_name varchar(40) not null primary key, seq_count integer not null);
insert into sequence values('SEQ_GEN', 1);

I anxiously await Toplink correcting sequence generation for PostgreSQL.

My next challenge was a new exception: "cannot persist detached object." In my Account model, an Account has a member called "changer" which is also an Account. The idea is that whoever caused the change in the model will be recorded as the changer. (I use triggers on the database to keep an audit trail for every record change in the account table.) The mapping is fairly straightforward- the changer field maps to the object_id field in the same table. I had created a row in the account table for an account called "System" with an object_id of 0 to bootstrap the table and handle initial inserts. It turned out that my choice of object_id for this special account was the culprit in the exception. Dali mapped the object_id as a Java native type long rather than an object type Long. This means that the only way JPA can tell whether an object is new (has a primary key) is whether the primary key is non-zero, since native types default to 0 (except booleans which default to false). JPA thought that I was trying to persist a new "changer" object by reachability because the primary key was set to 0. I haven't checked yet, but I suspect that were the primary key mapped as an object rather than a native type, that determination would have been done based on whether the reference was null, and 0 would then be a legitimate primary key. The solution was to give System an object_id of 1, rather than 0.

I also discovered that JPA has one weird deficiency: there's no way to annotate that an entity is read-only. This is desirable in situtions like the audit trail example above; one might want to read these audit records but prevent the developer from accidentally writing back a change or inserting into the audit table directly. One partial workaround is to specify that the primary key field is read-only.

After all of this, I managed to get a row inserted in the database. The next stop: getting it back out again.


This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]