Monday, September 17, 2007

Why Should a Heartbeat be Regular?

Consider a common polling loop:

while(true) {
runSomeQuery();
doSomeStuff();
sleepBeforeNextIteration();
}

I've seen and used this pattern probably a thousand times, without really thinking about it. It just stood to reason that the sleeping bit should be some constant number of seconds, or maybe in a fancy case, a computed number of seconds taking into account the time that was spent running a query and doing stuff.

But why should a heartbeat by so regular? It may be worth taking some time to consider the characteristics of doSomeStuff(), especially in the case that doSomeStuff() might be altering the results of runSomeQuery().

I recently had a situation where this was true. The stuff-doing was creating a new row to be returned by the query. Yet even knowing that I had just placed something into my queue, it was going to still be some regular sleep duration before I queried again.

As it happens, where this loop exists I don't necessarily get to know whether I will be producing new rows or not or whether new rows have been produced (without going into the details, it's a highly decoupled architecture). But a fairly simplistic optimization can improve performance significantly without increasing the average number of queries per minute: instead of having one, regular sleep duration, I introduced two durations.

To see how this works, consider a job that creates a second job:

time (s)event
0All is quiet
1Job 1 is inserted
3*heartbeat* Job 1 is pulled from the queue
3.01Job 1 executes and places Job 2 in the queue
6*heartbeat* Job 2 is pulled from the queue and executes


Now consider what happens with a rhythmic, yet non-regular heartbeat:

time (s)event
0All is quiet
1Job 1 is inserted
4*heartbeat* Job 1 is pulled from the queue
4.01Job 1 executes and places Job 2 in the queue
6*heartbeat* Job 2 is pulled from the queue and executes


As you can see, in this case, both methods perform exactly the same way. But consider what happens in the case where the job happens to get inserted just before the first query runs:

time (s)event
0All is quiet
3Job 1 is inserted
3*heartbeat* Job 1 is pulled from the queue
3.01Job 1 executes and places Job 2 in the queue
6*heartbeat* Job 2 is pulled from the queue and executes


Now consider what happens with a rhythmic, yet non-regular heartbeat:

time (s)event
0All is quiet
4Job 1 is inserted
4*heartbeat* Job 1 is pulled from the queue
4.01Job 1 executes and places Job 2 in the queue
6*heartbeat* Job 2 is pulled from the queue and executes


In the lucky case, a full second is chopped off the amount of time it takes from job insertion of the first job to execution of the second job.

Then there's the most unlucky case:

time (s)event
4All is quiet
4.1Job 1 is inserted
6*heartbeat* Job 1 is pulled from the queue
6.01Job 1 executes and places Job 2 in the queue
10*heartbeat* Job 2 is pulled from the queue and executes


In this case, the first job happens to hit right after the query that followed the long sleep. But it still gets to executing the second job within 6 seconds. The regular heartbeat is guaranteed to complete within 6 seconds, but could go as quickly as just over 3 seconds in its most lucky case.

By alternating between a long sleep and a short sleep, we can shave off 1 second in the most lucky case, but in the least lucky case, we're no worse off than we would be with a regular heartbeat. In the real world, jobs will come in at any random time, so sometimes you'll get lucky and sometimes you won't... but when you do get lucky, the payout is considerable. This is especially true given that the cost is effectively nothing: the same number of queries will be performed over a minute (or indeed over 6 seconds in this case) and the cost of alternating the sleep times is negligible.

Remember, of course, that this won't help all situations. It only applied in my case because the result of performing processing on something I get from my query could be to cause something new to be returned from running the query again.

There might be additional tweaks that could be made to this general idea; for example, speeding up the heartbeat when rows are found by the query, then slowing it back down when they aren't, perhaps bounded by some rules to ensure that over a given time interval, some maximum number of queries are performed.

The moral of the story is to know what your workload tends to look like and make reasonable and simple trade-offs that might improve your performance.

Labels: , , , ,


Comments: Post a Comment





<< Home

This page is powered by Blogger. Isn't yours?

Subscribe to Posts [Atom]