Sunday, July 27, 2008
Tally-Ho Resumes!
One bit of weirdness to note is that data caching doesn't seem to work properly in OpenJPA using Java 1.6 runtime class retransformation. I discovered this the hard way by forgetting to include the -javaagent VM argument when running my unit tests. On 1.5, this didn't have as weird a failure mode... OpenJPA seemed to cope with it. With 1.6, it seems to lose track of object identity, such that when queries are run that should be returning the same object (or at least the newly queried version of an updated object), sometimes different objects or old versions of objects are returned.
The moral of the story is to specify -javaagent with OpenJPA at runtime if you're not doing static instrumentation of your classes. Class retransformation doesn't work properly yet. If I can find some time, I'll create a simple testcase for the OpenJPA folks demonstrating the problem.
Left to go on Tally-Ho is completing the arbitrary content with attachments feature, and then looking at Spring to see if / what can be leveraged to make life a little bit easier/simpler.
In other news, this is my 200th blog post. Woohoo?
Labels: tally-ho
Friday, November 23, 2007
A Major Milestone for Tally-Ho: Arbitrary HTML Pages
The whole mess integrates directly into the Tally-Ho BinaryResourceService locator/localizer system, just as the existing Wicket integration via the BinaryResourceStreamLocator does now, so it automatically takes advantage of localization and caching.
One of the more cumbersome challenges in this effort was getting a nice URL. Wicket has many URL coding strategies for Bookmarkable pages; I use IndexedParamUrlCodingStrategy quite a bit. But IndexedParamUrlCodingStrategy wasn't going to work in this case. It takes each element of a path and associates it with an index number. What I needed was the path itself as a parameter, not chopped up and indexed.
It turns out that writing one of these coding strategies from scratch for Wicket is difficult if you don't know what you're doing (like me), and the Javadoc for the various classes involved is sadly a bit lacking in direction. So I took a different approach: I stole a bunch of code from IndexedParamUrlCodingStrategy and modified it to fit my needs. Behold, UriPathUrlCodingStrategy (with comments snipped for space... they're in CVS, though, and rest assured they credit Igor for writing IndexedParamUrlCodingStrategy):
package net.spatula.tally_ho.wicket;
import java.io.UnsupportedEncodingException;
import java.util.Map;
import wicket.Application;
import wicket.PageMap;
import wicket.PageParameters;
import wicket.WicketRuntimeException;
import wicket.protocol.http.request.WebRequestCodingStrategy;
import wicket.request.target.coding.BookmarkablePageRequestTargetUrlCodingStrategy;
import wicket.settings.IRequestCycleSettings;
import wicket.util.string.AppendingStringBuffer;
import wicket.util.value.ValueMap;
public class UriPathUrlCodingStrategy extends BookmarkablePageRequestTargetUrlCodingStrategy {
public UriPathUrlCodingStrategy(String mountPath, Class bookmarkablePageClass) {
super(mountPath, bookmarkablePageClass, PageMap.DEFAULT_NAME);
}
public UriPathUrlCodingStrategy(String mountPath, Class bookmarkablePageClass, String pageMapName) {
super(mountPath, bookmarkablePageClass, pageMapName);
}
protected void appendParameters(AppendingStringBuffer url, Map parameters) {
if (parameters.containsKey("uri")) {
String[] pathParts = ((String) parameters.get("uri")).split("/+");
for (String string : pathParts) {
if (string == null || "".equals(string)) {
continue;
}
try {
Application app = Application.get();
IRequestCycleSettings settings = app.getRequestCycleSettings();
url.append("/").append(java.net.URLEncoder.encode(string, settings.getResponseRequestEncoding()));
} catch (UnsupportedEncodingException e) {
throw new WicketRuntimeException(e);
}
}
}
String pageMap = (String) parameters.get(WebRequestCodingStrategy.PAGEMAP);
if (pageMap != null) {
url.append("/").append(WebRequestCodingStrategy.PAGEMAP).append("/").append(urlEncode(pageMap));
}
}
protected ValueMap decodeParameters(String urlFragment, Map urlParameters) {
PageParameters params = new PageParameters();
if (urlFragment == null) {
return params;
}
if (urlFragment.startsWith("/")) {
urlFragment = urlFragment.substring(1);
}
String[] parts = urlFragment.split("/");
StringBuilder builder = new StringBuilder();
for (int i = 0; i < parts.length; i++) {
if (WebRequestCodingStrategy.PAGEMAP.equals(parts[i])) {
i++;
params.put(WebRequestCodingStrategy.PAGEMAP, parts[i]);
} else {
builder.append("/").append(parts[i]);
}
}
params.put("uri", builder.toString());
return params;
}
}
The next steps will be update capability for the HTML pages and then an attachment selector/uploader tool to handle the association of other resources to the HTML page.
This will be a giant leap forward for Tally-Ho and allow for the conversion of dozens of old morons.org pages to the new system.
Labels: software, tally-ho, wicket
Sunday, November 11, 2007
What's New in Tally-Ho?
Well, I made good on my threat to rip out Toplink Essentials and replace it with OpenJPA. OpenJPA is a bit more pedantic about some things. For example, this code would run fine in Toplink but would throw an IllegalStateException in OpenJPA:
entityManager.getTransaction.begin();
entityManager.close();
While I was working on dropping in OpenJPA, I decided that I really wanted my tests to pass from within Maven, so I could be sure that run-time enhanced (woven) classes were all going to work nicely. I also wanted to make sure that none of the tests depended on any data to be in the database that wasn't put there by the SQL scripts to initialize the DB. So I modified my base test class to perform one-time database wiping/initialization prior to running any tests. This exposed a great many flaws in tests that I wrote in a fairly lazy fashion to assume that certain objects were already present.
After fixing all of that, I decided to let Eclipse clean up a lot of other code for me. Eclipse's Source-->Cleanup feature is very powerful, allowing you to "final" gobs of things and implement the default serial ID for Serializable classes in one giant swoop.
Then I got to work on what the next major project / feature is for Tally-Ho: arbitrary HTML pages. For quite a while I agonized over how to manage associations between pages. In most of the world, if a page changes its name or location, links to it break. I also needed the capability to attach images, PDFs or other documents to arbitrary HTML. It turns out that the solution to both of these problems is the same. In a massive refactoring of BinaryResource, any BinaryResourceReference can now be attached to any other BinaryResourceReference. BinaryResource is gone, and instead the relationship is now 1 BinaryResourceReference has many BinaryResourceReferenceLocales, each of which has one BinaryResourceContent. A BinaryResourceReference may also have many Attachments, which have an sequence number and a reference to the attached BinaryResourceReference. An HtmlPage is just a subclass of BinaryResourceReference with some bits added for the title, keywords, whether to include a message board, etc.
Attachments are numbered in sequence (1, 2, 3). Inside the HtmlPageService, references to attachments are converted by an AttachmentUrlProvider (an interface) and Velocity to their URLs. So if you want to refer to the URL for attachment #1, you use ${1} in the HTML. (Roughly... this bit isn't done yet.) It is up to the AttachmentUrlProvider to decide how to make the URL, given the scope, path and extension.
This refactoring is probably 75% complete. I'm too burnt out on code and too tired to work on it any more this weekend.
So to summarize what's changed:
1. OpenJPA replaced Toplink Essentials
2. Everything builds and tests in Maven (including compile time bytecode instrumentation)
3. Refactoring of binary resources
4. Initial HtmlPage work
5. Binary resource attachment support.
A couple other things I learned today about JPA:
1. If you JOIN multiple things, at least with OpenJPA you need to alias each thing you join. Ie,
JOIN x.foo foo JOIN foo.bar bar
. The parser will complain if you leave off that last "bar" in that example.2. Your ability to lazy load ends once the EntityManager you used to load your object is closed. I knew this before and subsequently forgot, and then learned it again the hard way. Merging the entity with a new EntityManager doesn't work either. You need to keep the original one open until you're done navigating your object graph.
Labels: openjpa, software, tally-ho, toplink essentials, velocity
Sunday, November 04, 2007
Time for a Divorce
That got me to thinking: a lot of us developers use a lot of open source software. How do we choose which packages we want to use? Obviously whatever we choose has to be a good technical fit for our needs... what's the point if it doesn't do the job we're after? But now it occurs to me that open source software has to meet a particular social need as well. One way we can gauge a project's health is by how strong it is socially. How interested are people? Is the project active? Are people excited about the project? Excited enough to fix bugs?
It's a little hard to compare apples to apples in this case, but let's look at a couple things and try to relate them as best we can. Toplink Essentials is maintained as part of the Glassfish project.
In the last 30 days at the time of this writing, the folks working on it have fixed 10 bugs. In that same amount of time, 15 bugs were opened (or changed and left in an opened state). About 53 messages have been posted on the discussion forum. The oldest unresolved bug has been open for about a year and 10 months.
In that same amount of time, the OpenJPA folks have resolved 19 bugs while 20 have been opened. The mailing list has had about 215 posts. The oldest unresolved bug is a year and 4 months old (though it was touched 3 months ago). OpenJPA is using Jira which makes it a bit easier to produce meaningful metrics such that we can find that the average unresolved age of a bug in the last month is about 3 months, which has been fairly consistent.
(I gave up trying to compute the average unresolved age of bugs for Toplink Essentials. It's just too annoying to figure out if the bug tracking tool doesn't do it for you.)
It is probably the case that most open source projects (and probably closed ones too) have a few ancient bugs gathering dust. I think that it's more interesting to look at what a project has been doing recently, like in the last 30-180 days. Are they keeping up with their bug backlog? Is there an active community? Are you likely to get help if you ask for it? Of the bugs that come in, what percentage get fixed and what percentage get dumped in the attic?
And perhaps the most important criteria of all: are they fixing MY bug?
While I wasn't watching, OpenJPA reached a 1.0.0 release. It's available under the Apache 2 license from a Maven 2 repository. They fixed the bug I opened earlier this year (within a day even). It is full-featured and even has an extensive manual. Though, like Toplink, their ant task doesn't work very well.
I used to be concerned about the large number of dependencies that OpenJPA has, but now that the project is building with Maven 2, it's much less of a concern for me. It isn't necessary to go manually fetch anything to build the project, since Maven 2 takes care of all the direct and transitive dependencies. One thing I did have to manually tweak was to force inclusion of commons-collections 3.2 in my pom.xml, because something else in my project depends on an earlier version of commons-collections, and OpenJPA needs a later version.
So it's time to give Toplink one final heave-ho. My reasons for sticking with it have now been outweighed by my need of having compile-time weaving that works and a project where problems are likely to be fixed within my lifetime. It's time for Toplink and I to start seeing other people.
New releases of Tally-Ho will be using OpenJPA as the persistence provider... just as soon as I get all the unit tests passing.
Labels: JPA, open JPA, open source, openjpa, software, tally-ho, toplink, toplink essentials, weaving
Tuesday, October 16, 2007
Tally-Ho Turns 1
I guess I didn't figure on it taking well over a year to port the whole site over to a new architecture, one that would scale nicely and could be maintained without agony. Life has this tendency to get complicated.
When I first started thinking about doing the massive morons.org refactoring (before it even had a name), I didn't even have a boyfriend. I hadn't yet started taking classes in the Hendrickson Method of Orthopedic Massage. I certainly hadn't decided to aim myself in the direction of Chiropractic School and to begin walking.
Yeah, life gets complicated. And as we get older, our priorities change too. I think large, open-source projects may be better suited for kids in their 20's, who still have lots of energy, free time, and don't get laid. What a perfect combination for free software development!
Now that I'm on in years, I want other things out of life. I want to travel, to see new things. I want to climb Mount Shasta. I want to maintain a healthy relationship. I want to always be learning things, especially things unrelated to computer science, so I'm always challenged. I want to spend time in the gym, working on my strength, cardiovascular fitness and endurance. I want to watch many more years of Doctor Who. I want to try to compose some music, even if I don't share it with anyone.
Now don't take this to mean that Tally-Ho is dead or won't be completed. I just committed some code yesterday to bring back the Partners system. Instead, take this to mean that Tally Ho is taking longer than expected, because life gets complicated.
Happy first birthday, Tally Ho! Maybe we'll get to version 1.0 within another year!
Labels: life, software, tally-ho
Tuesday, April 03, 2007
Making MD5 Fuzzy, Redux
I struggled with the solution to this for quite a while, and then it dawned on me: I was looking at the problem the wrong way. It's fine if an off-by-one changes the outcome, if we're prepared to handle it.
The answer is to produce two checksums, not one! In the first, we begin at the beginning, and skip the last n/2 characters for an averaging length of n. In the second, we begin n/2 characters from the beginning and work all the way to the end.
Then instead of comparing one sum to another sum, we perform four comparisons:
object1.sum1 == object2.sum1
object1.sum2 == object2.sum1
object1.sum1 == object2.sum2
object1.sum2 == object2.sum2
If any of these statements returns true, we consider the objects to be "similar".
Here's the code. I've also simplified the way the distance between words is caculated and left room for non-english words to be handled at some point in the future (ie, there's no longer any special significance given to vowels).
package net.spatula.tally_ho.utils;
public class FuzzySum {
private static final int SLOP = 3;
private static FuzzySum instance;
private static final int SAMPLE_SIZE = 10;
private FuzzySum() {
}
public static synchronized FuzzySum getInstance() {
if (instance == null) {
instance = new FuzzySum();
}
return instance;
}
public String[] getSums(String text) {
text = TextUtils.stripTags(text).toLowerCase().replaceAll("[^\\w\\s]", "").trim();
if (text.length() < SAMPLE_SIZE * 1.5) {
String md5 = TextUtils.md5(text);
return new String[] { md5, md5 };
}
String[] words = text.split("(?s)\\s+");
String md5_1 = calculateFuzzyMd5(words, 0, words.length - 1 - (SAMPLE_SIZE / 2));
String md5_2 = calculateFuzzyMd5(words, SAMPLE_SIZE / 2, words.length - 1);
return new String[] {md5_1, md5_2};
}
private String calculateFuzzyMd5(String[] input, int startIndex, int endIndex) {
StringBuilder builder = new StringBuilder();
int distanceSum = 0;
for (int i = startIndex + 1; i<= endIndex; i++) {
String thisWord = input[i];
String lastWord = input[i - 1];
distanceSum += calculateDistance(thisWord, lastWord);
if (i % SAMPLE_SIZE == 0) {
if (builder.length() > 0) {
builder.append("\n");
}
builder.append(distanceSum / SAMPLE_SIZE);
distanceSum = 0;
}
}
if (distanceSum != 0) {
builder.append("\n");
builder.append(distanceSum / (endIndex + 1 - startIndex % SAMPLE_SIZE));
}
return TextUtils.md5(builder.toString());
}
private int calculateDistance(String word1, String word2){
int word1Sum = calculateWordSum(word1);
int word2Sum = calculateWordSum(word2);
return Math.abs(word1Sum - word2Sum) / SLOP;
}
private int calculateWordSum(String word) {
if (word.length() == 1) {
return (int)(word.charAt(0)) & 0xffff;
}
int wordSum = 0;
for (int i = 1; i < word.length(); i++) {
int prevChar = (int)(word.charAt(i-1)) & 0xffff;
int thisChar = (int)(word.charAt(i)) & 0xffff;
wordSum += Math.abs(thisChar - prevChar);
}
return SLOP * wordSum / word.length();
}
}
As you can see, this code has been committed as part of the Tally-Ho project, https://tally-ho.dev.java.net/
Labels: checksum, fuzzy, java, md5, software, tally-ho
Subscribe to Posts [Atom]