The Quest for Solar - part 1

We hear a lot of news lately about dramatic growth in solar power generation in the US, and for good reason: solar prices are the lowest they've ever been, and there's a handsome 30% tax credit available until the end of 2016 to help defray the cost of the installation.  With the tax credit going away in just over 18 months from now, and with the price of installation finally falling to levels many people can afford, it's a terrific time to drastically reduce your personal carbon footprint and ultimately save yourself a lot of money, too.

I've necessarily learned a lot about solar generation in the last few months that I'll spend the rest of this post sharing.  If you're thinking of solar, I hope this will be a helpful read for you.  As an anti-disclaimer, I am not affiliated with any solar company, don't (to my knowledge) own stock in any solar manufacturers or providers, etc.  If you should decide to use Sun Power, there's a nice referral program, but that's not why I'm sharing our experiences.  I'd rather we all saved some money and we all put less carbon into our atmosphere.

Cost

One of the most misleading figures you'll hear in news articles about solar power is the "cost per watt."  While this number is useful for describing trends in the solar industry, it's almost completely worthless when considering the cost to install a solar system at your house.  The reason for this is that not all solar systems are created equal, not all installations are equal, and the size of the system you ultimately purchase is not based on your instantaneous power consumption.  Let that last point sink in just a little bit, and I'll explain.

If you discuss solar with your friends and neighbors and Twitter followers enough, eventually some smartalec is going to point out that solar only generates power during the daytime, when the sun is up.  Usually this will be pointed out in such a way as to suggest that this is the most amazing revelation you're likely to hear, and that this fact renders the total existence of solar power completely worthless.  I recommend responding to such people with as much sarcasm as you can possibly muster.  Beat them senseless with it, because they have it coming.

It's true, of course, that you're only going to be generating electricity during certain hours of the day, and you are also not going to be generating peak wattage that entire time (the "cost per watt" is actually relative to the peak wattage of a panel under laboratory conditions).  In most cases, this means you're still going to be connected to the electrical grid so that at nighttime you're getting power the old-fashioned way.

When you sign up for solar, you will switch your billing from your utility company from the traditional metering you have today to what's called "net metering."  The idea is to size your solar system so that, on average, the amount of electricity, measured in kilowatt-hours (kWh), that you generate during the daytime is roughly equal to the amount that you use overnight, so that your net usage is roughly 0.  Or, if you want to size your system a little smaller, you get all of your net usage to fit inside of the "tier 1" rate, so you're paying the cheapest possible rate for what net power you do have to buy.

Naturally some days are sunnier than others, and days are longer in the summer than in the winter, and our power consumption patterns vary throughout the year.  So to take all of that into consideration, the utility company does a "true up" on an annual basis, because your massive amount of over-generation in June is expected to be offset by the considerably larger amount of electricity you need to buy in December.  What happens is in June you'll end the month with a massively negative power bill, or to put it another way, with a very large bill credit.  As the days get shorter, cloudier, and wetter, you'll start burning through that bill credit.

The common theme here is that we're talking about cost in terms of kilowatt-hours, not watts.  You want to size your system to generate roughly as many kWh per year as you expect to consume per year, and it is the cost of the size of the system that can do that at your house which matters, not the theoretical peak wattage of a single panel in a laboratory that matters.  So the more relevant figure when comparing bids is the "cost per kilowatt-hours-per-year".

Panels

There is tremendous choice in the market today for solar panels, and they are certainly not all created equal.  One of the numbers you'll hear discussed the most is the efficiency of a panel.  This is a measure of the amount of energy in the form of light from the sun that strikes the panel relative to the amount of electrical energy that exits the panel.  The theoretical maximum (for various physics reasons) one can practically expect to ever achieve is about 33%.  In practice, no panels come that close.  Efficiency ratings below 20% are more common, with the most efficient Sun Power panels being just above 20%.  This small difference in percentage actually matters even for small home installations, as it can make the difference between needing 12 panels or 14 panels.  More panels can translate to more cost, though the less-efficient panels are also cheaper in general.

A number you'll hear less often is actually more relevant: the total kWh a panel can be expected to generate over a given time period.  (Here we go back to kWh again.)  Again, all panels are not created equal, and all installations are not created equal.  Some of the nicer panels have a lower threshold for generating sufficient electricity from sunlight to start supplying your house and the grid, which means that for the same number of kilowatts of a system, you'll actually generate more kilowatt-hours of electricity, because the number of hours of generation is greater for a system with a lower threshold.  This factors back into your cost: perhaps with better panels, you can get away with a system of smaller size, because the greater efficiency and lower threshold mean that you're generating the same kWh of power with fewer panels.

Degradation is another value to keep in mind.  Like everything else, entropy affects solar panels.  They heat up, they cool down.  They experience weather. Many panels are rated to degrade in performance from between 0.5% to 1% per year.  Really nice panels may degrade at a rate closer to 0.25% per year.  This number will affect your total cost of ownership, and you can expect a good solar bid to include a chart showing you just how many kWh you can be expected to generate from year 1 all the way through year 25.

Other factors to consider include the warranty -- how long is it, and does it include labor-- and how your solar panels were manufactured in the first place.  This is another point where your smartalec friend will think he's discovered the most amazing flaw in your plan that has never before been discussed.  Energy is, in fact, consumed to produce your solar panels.  There will be inevitable waste products as a consequence of that manufacturing. The amount pales in comparison to the energy that you are currently using by the process of burning coal or natural gas to power your house and the waste products that produces, but if this stuff matters to you, then by all means look into how your panels are manufactured.  You might also want to look into where they are manufactured, again, if that's a thing that matters to you.

Inverters, Microinverters, Bypasses

Solar panels generate DC power, but your home and the grid use AC.  Converting from DC to AC requires at least one inverter (which is literally inverting the polarity of the current 60 times per second progressively in some approximation of a sine wave).  The simplest system is a single-inverter system, in which all the panels are wired together in series, and at the end of the chain is a single inverter box converting the total output to AC.  Some single inverter systems allow for the connection of a power outlet that still works during a power outage.

Panels are affected to varying degrees by shading, and arrays of panels are negatively affected when some panels are shaded more than others.  To put it another way, a solar system is most efficient when all of the panels in that system are receiving a uniform amount of light.  A panel which is receiving less light can actually be detrimental to the total power output of the system.

Some installers work around this problem by converting to AC not at the end of the chain, but at every panel using microinverters.  An advantage of this approach is that it also allows for the monitoring of each individual panel.  The downside to this approach is the introduction of more points of failure in the system, increased complexity, and possibly increased cost.

Another approach is to give the system the capability to completely bypass a shaded panel automatically, so that if a panel is behaving more like a resistor than a battery, it can be temporarily routed-around.

Yet another approach is building in tolerance for this situation into the panels themselves, such that their performance in this scenario is still acceptable.

There are good points and bad points to each approach, which you'll need to consider when deciding which type of system you want to buy.

Your Roof

Solar panels typically will be installed on your roof.  In a perfect world, you have a roof whose peak runs east and west such that it has a slope facing the south (in the northern hemisphere, or north in the southern hemisphere).  You can still generate some power with a roof that faces east or west, but for just about half the day instead of the entire day.    It's best if you're installing on a roof that still has 10-15 years of life left in it, because it will cost you roughly an extra $1000/kW to remove and reinstall a solar system when replacing a roof.  On the other hand, part of your roof will be shaded by the panels, extending its life underneath the panels somewhat.  A good solar installer will warrant their roof penetrations for leaks.

It's not a bad idea if you're thinking of going solar to have an experienced roofer come by and inspect your roof to get some idea of how much life it has remaining.  If it's less than 10 years, consider re-roofing, for which you may also be able to collect some tax incentives.

Tax Incentives

Currently in the United States there exists a federal tax incentive of 30% for the cost of a solar installation, with no cap on the size of the credit.  It is the opinion of many tax advisers that this 30% credit can also be applied to the cost of re-roofing, provided that re-roofing needs to be done as part of the solar installation and is done at the same time.  I am not a tax adviser, and I am merely reporting what my own research has turned up.  Do your own research and ask your own tax adviser if you plan to claim this credit.  The tax credit applies to the cost of the panels, hardware, and installation of your system, and can mean a very large discount, though you'll have to wait until you file your taxes to get the money.  Some installers are offering no-interest loans to help people defer the cost until they get their tax refund.

The solar tax credit expires at the end of 2016, so expect next year to be a very busy year for solar installers.  Take that into consideration, and remember there may be a lead time for your bid and your job.

Buy vs Lease

We opted to buy our system, because for us it made better economic sense.  If you lease, you don't get any of the tax incentives (your installer does).  You also have to worry about lease transfers if you should sell your home.  On the other hand, if the solar industry is to be believed, the ROI for a solar installation is >= 100%, meaning that $20,000 spent installing solar on a home will increase the value of a home at least $20,000, because of the desirability of solar power.  Installing solar also typically does not increase the cost basis of a home, meaning you won't be taxed on the increase to your home's property value because you installed solar.

Of course the downside to buying is the immediate capital outlay.  It is expensive.  Look into what financing options you may have at your disposal, and whether the interest may tax-deductible.

Caveats

Unless you're designing for a more expensive "off the grid" solution, your solar system is tied directly to the power grid, and you will not have electricity during a power outage.  This is because your system must disconnect itself to prevent potential injury to power workers and equipment during repairs. As I mentioned above, some inverters do have the capability of providing electricity to a single outlet during a power outage, so at least you can keep your cell phones charged or power some other essential pieces of equipment.

Installation

From the time you sign your contract to the time you are enjoying the benefits of solar power could be a couple months.  Your job is likely to be subcontracted to an installer, so there's some lead time to get on their schedule.  Permits need to be pulled with your local jurisdiction.  The installation has to be inspected before it can be powered on.  The utility company has to issue a "permission to operate" for you to tie your system into the electrical grid.  Each of these activities takes time to wind its way through bureaucracy.  The actual installation itself will only take a day or three, depending on the size of the installation and any complications.

I'll have more to say about installation as ours progresses.  We opted to go with a 4kW Sun Power system for a variety of reasons, and our installation process has just begun.  The first step will be scheduled soon-- a survey of the installation site to determine where the panels will be installed, how the cabling will be routed, etc.

Conclusion

I hope this brain-dump of everything I've learned so far has been helpful.  Be sure you do your research, get several bids, and consider your options carefully.  It may take you several years to make back the money you spend, but you'll be cutting your carbon footprint immediately.

---Nick
Like many application developers, we use RoboGuice to make a lot of things nicer and easier to deal with under Android.  One of the nice bits we use is SafeAsyncTask, which gives you quite a nice way of running a background task with convenient hooks for dealing with success, failure, pre-execution and finally/post-execution.

SafeAsyncTask in turn wants to run your onSuccess, onException, and onFinally via a Handler you provide.  This is done for you under the hood, but the gist of it is that it passes a Callable to a Handler which contains a decrement to a CountDownLatch in its finally block; meanwhile, the calling method waits on the latch (so it blocks until the background task is completed on its handler).

In our Activity, we have a polling task that we want to run every 2 seconds, and we achieve this by starting a HandlerThread, connecting to that a Handler, and using sendMessageDelayed with a simple Message with a constant int indicating what we want done.  Then in our handleMessage method, we fire off a subclass of SafeAsyncTask to do the poll, passing in the polling Handler and the Message we got. Inside the SafeAsyncTask's onFinally(), we recycled our Message using Message.obtain on the old message, and then we invoke handler.sendMessageDelayed using the new (recycled) message instance.

But this did not give us in any way the behavior we expected or wanted.  Instead of seeing this sequence of events:

onFinally() [two second pause]
handleMessage()
onFinally() [two second pause]
handleMessage()

we in fact saw THIS sequence of events:

onFinally() [two second pause]
onFinally() [from the handler thread this time, two second pause]
handleMessage()

It appears that though Message.obtain should have left us with a clean Message to use that should have been indistinguishable (as far as we are concerned) from the original Message, in fact it gave us a Message with a callback to run the current onFinally() method, recycled from the same Message that was used to run onFinally in a Handler by SafeAsyncTask when the callback really should have been nulled.

I'm somewhat inclined to call this a defect in Android, but I would need to research the semantics of how Message.obtain(message) is expected to work, look at the Android source in a little more detail, and see precisely what went wrong inside the obtained message to fire onFinally() instead of handleMessage() via normal message delivery before I would be willing to make that claim strongly.

Our workaround was to instead call handler.obtainMessage(message.what), which does give us a nice, clean message that doesn't attempt to call back to onFinally() again, and it results in ordinary message delivery as we expected.

The Perils of Legacy Android Development

One of the least awesome things to do is to make last-minute changes to an ecosystem that is known to be working when installing on a customer site.  Nonetheless, sometimes you're stuck and you have to scramble.  This was the case for us yesterday when we needed an inexpensive Android tablet to use at a customer site.  So we acquired a Samsung Galaxy Tab 3 "Lite".  This is a fairly non-spectacular, dated tablet, which runs Android 4.2.2, and probably always will.  Samsung appears to have abandoned support for this device, so it is not likely to graduate from Jelly Bean to Kitkat without rooting the device and installing a custom ROM.

Still, our app isn't horribly fancy, so we anticipated that things would probably work fine.  We were mistaken.

Even installation from the Play Store failed.  We tracked this down to probably happening because we specified a signing algorithm of SHA256withRSA.  This works on later versions of Android (or maybe just on other devices) but not the tablet we picked up; the whole app downloads, and then the installation fails with an error about the package file not being signed correctly.  The fix here is to specify -sigalg SHA1withRSA instead.

The next problem we encountered (after manually installing the app) was in annotation processing from within Jackson.  We share our model objects between our cloud REST service and our Android application.  Android makes using the XML annotations that Jackson handles somewhat of a pain; these annotations are in the javax.xml.bind.annotations package, which Android itself does not supply.  Further, dexing forbids providing this package, unless you specify the --core-library switch... which itself is cautioned against by Google in the strongest possible terms.

The curious thing is that this never seemed to matter in the Android 4.4 and later world; though we did nothing to strip out annotations, our app always ran fine, though we occasionally got an exception the first time Jackson scanned for annotations.  In 4.2, we got the exception every time, which prevented deserialization from working at all.  It isn't clear why it works in  >= 4.4 and not <= 4.3, though there are some potential indicators that earlier versions of Android were a little more aggressive about classloading annotation classes.

Fortunately, Jackson > 1.8 provides the means to disable annotation scanning entirely to avoid this whole JAXB binding issue.  For Jackson 2.4, it's just a matter of calling objectMapper.disable(MapperFeature.USE_ANNOTATIONS).  Of course this means that your JAXB annotations won't do anything and Jackson won't know anything about them, but in our case that was acceptable, because we are deserializing only, and Jackson is fairly intelligent about its default deserialization behavior in the absence of annotations.

Speaking of annotations, the next problem was a curious error coming out of Joda Time, again caused by an annotation, org.joda.convert.ToString.  This appears to be an annotation that Joda uses internally on its AbstractInstant, and again, it was something that we never had any problems handling on 4.4 and above.  We got a NoClassDefFoundError for the annotation class when deserializing an object using a DateTime type, giving some credence to our weak hypothesis about Android 4.2 being somewhat more aggressive about loading annotation classes than later versions of Android (and the Sun and OpenJDK JVMs as well).  The fix in this case was to add the dependency on joda-convert, which contains the classes needed to resolve those annotations.

Once we got past these three problems, we hit the dreaded NetworkOnMainThreadException.  In response to certain events, we post messages to Roboguice's event bus, and it turns out that these events may be received on the UI thread, at least in Android 4.2.2.  (It's also possible that they can be received on the UI thread in later versions of Android as well, but performing a network task from the UI thread is not a fatal exception that kills your app in later Android releases.)  In our case, when Preferences change, we want to refresh some data from our server, which we get by performing a REST GET request, which would kill our app on 4.2.2.  The easiest workaround for us is to publish the event to the eventBus from inside a Roboguice SafeAsyncTask, thereby ensuring it is never received on the UI thread.  (A possibly better solution would be for the entity that subscribes to the event to perform its work inside a SafeAsyncTask, but this proved to be far more work than we wanted to do in the near term.)

Then, just when everything appeared to finally be working, one last problem snuck up on us.  It seems that Android 4.2.2 has some issues with HTTP(s) Keep-Alive management.  Our app makes frequent GET requests, polling the server for new data (eventually we'll switch to a more event-driven model).  It seems that Jelly Bean isn't terribly inclined to close sockets when it is done with them if keep-alives are turned on, even if those connections are never reused.  The end result of this is a lot of sockets left open that aren't doing anything and aren't needed, which eventually become so numerous that new requests cannot be created, causing an exception to be thrown about too many open files (EMFILE).  We use Jersey to handle our web resources, and working around this problem is a matter of adding webResource.header("Connection","close"), which asks the server NOT to keep the connection alive so the sockets will end up getting closed in a timely manner.

After a lot of head-scratching, visits to StackOverflow, and making many changes, we have a functional app on Android 4.2.2.  In the end, it was a useful exercise, because some abnormalities were uncovered... so there is probably something to be said for keeping around a weird old piece of hardware for use in testing; chances are you'll probably experience at least one customer to try to deploy something that is less-than-optimal.
Quite often when multiple components of a system are combined, one can experience unexpected side-effects and unexpected failures, the cause of which seems completely unrelated to the consequence.  I recently experienced a situation in which a bit of configuration to jDeb (a Maven plug-in that assists in the creation of debian .deb packages) caused an application to stall out completely after about 4 minutes of run-time.

The Backstory

Our app is a Java program that runs from a .jar file, and we run it in production wrapped inside YAJSW (Yet Another Java Service Wrapper).  YAJSW allows us to do a few nice things like run as a daemon and run as a different, non-root user, along with some other nice features.  We log using sjf4j with log4j, and YAJSW also intercepts System.out and System.err to log those messages to its own log file.  The whole thing gets wrapped up in a .deb package using jDeb, which lets us install smoothly, installing the daemon and updating the rc.d files.

The Failure

After installation, our program started up fine and began doing its customary work.  Success! Almost!  After about four minutes, it stopped working, no longer making any of the API calls we expected, and generally behaving in an unresponsive manner.

The first thing I did was run jstack against the running Java process to see what all the threads were doing.  Almost all the threads in the application were in a BLOCKED state on org.apache.log4j.Category.callAppenders. A quick Google search revealed a lot of complaints about similar behavior, even finding one comment that described exactly what we were seeing: one thread seemed to be stuck in java.io.FileOutputStream.writeBytes forever, and all the other threads waiting for a lock so they could write to the appender too.

So what could be causing a thread to block forever trying to perform a simple write? Cursory checks to see if there were any issues with excessive garbage collection, or impending disk failures revealed no problems.

Things seemed to run okay running the jar file from the command line, as our effective user, by just using sudo -u and java -jar, so it seemed in some way related to running our application from inside YAJSW.

A little more Google searching found a YAJSW thread which mentioned the need of creating several memory-mapped files to which the wrapped process would write instead of writing to stdout and stderr; the wrapping process then 'gobbles' from these memory-mapped files for the purposes of logging.

Aha! Another thing I had noticed was that we didn't seem to get any console messages from our app in the YAJSW log like we would have expected, and a quick look revealed that in fact, no memory-mapped files for stdout and stderr were created in tmp/ for the wrapped app to write to.

So at this point it was starting to look like log4j was trying to write to its redirected System.err or System.out (for the console appender), and once some buffer filled, further writes blocked, while holding the appender lock, preventing any other threads from making any further progress, blocked trying to log.

Closer examination of the permissions on the tmp/ directory revealed that dpkg had created it with mode 600 instead of mode 700, and creating a new file is not possible on a Posix system without execute permission on the directory. Because of inadequate permissions, the memory-mapped files for System.out and System.err redirection couldn't be created, a failure which YAJSW couldn't log, because the files were needed to set up logging.

The culprit turned out to be my own misunderstanding of how jDeb's permissions work when applying a template data type in a dataSet. The perm mapper allows you to specify a user, group, filemode and dirmode for use when creating directories and files, but when using the template data type, the mode that gets used for creating the paths is not the dirmode, but the filemode. Consequently, the filemode needed to be 700 rather than 600 for the mapper for the template paths.

Conclusion

The chain of failure ended up looking like this:

  1. I misunderstood how jDeb/Debian package permission mappings worked when creating new, empty directories in a .deb package. Intuitively it seemed like dirmode permissions would be used when creating an empty directory, but in fact filemode permissions get used.
  2. This caused dpkg to create the new tmp/ directory with mode 600, excluding execute permission on the directory.
  3. Because of the exclusion of the execute bit, the YAJSW wrapper could not create the memory-mapped files it needed for writing using its overridden System.out and System.err.
  4. Because the files didn't get created, the YAJSW wrapper process could not consume the output of System.out and System.err, and the wrapped process had nowhere to write once whatever internal buffering existed was full.
  5. Because the buffer filled up and writes blocked, one log4j appender blocked waiting to write while inside a synchronized block.
  6. Because the write inside the synchronized block was blocked forever, all of the other threads that wanted to log anything then blocked waiting to enter the synchronized block.
  7. The whole process stopped working because eventually everything was waiting on a lock or waiting for IO to complete.
And that is how a permission mistake in a configuration file can lead to an application locking up after about 4 minutes (the time it took to fill up some buffer somewhere).

PostgreSQL + ActiveMQ = (auto)Vacuum Hell

This is one of those "I had this weird problem that others are likely to encounter as well, so here's a detailed explanation that will hopefully come up in a search engine for those people when they are trying to figure it out" posts.

As most PostgreSQL administrators know, vacuuming is a necessity for the long-term survival of your database.  This is because the PostgreSQL Multi-Version Concurrency Control strategy replaces changed/deleted rows with their new versions, but keeps the old tuples around for as long as some other existing transaction might need to see them.  Eventually these old rows are no longer visible to any transaction, and they can be cleaned up for re-use or discarded completely once an entire page is empty.

This works out really well, provided you have fairly straightforward database access going on: transactions start, stuff happens, transactions complete; repeat.  Things can go radically wrong if the "transactions complete" part doesn't happen, however.

Because PostgreSQL has no idea about what any given transaction might do next, it is not able to vacuum out any dead rows newer than the oldest transaction still in progress.  If you think about it, that makes sense.  Any given transaction could look at an older version of any row that was around at the time that transaction began, in any table in the database.

The symptom of this is that vacuum tells you about a large number of dead tuples, but says that it removed 0 / none.  Doing a "vacuum verbose" tells you there was a large number of "dead row versions" which "cannot be removed yet".  If you turn on autovacuum logging, you'll see messages like this:

Feb 27 02:57:41 slurm postgres[12345]: [3-1] LOG:  automatic vacuum of table "activemq.public.activemq_msgs": index scans: 0
Feb 27 02:57:41 slurm postgres[12345]: [3-2] #011pages: 0 removed, 14002 remain
Feb 27 02:57:41 slurm postgres[12345]: [3-3] #011tuples: 0 removed, 58897 remain

The number that remain will continue increasing forever, and the number removed is always 0.  Like me, you might naively check for transactions running against activemq_msgs, and find none, or find only ones which are short-lived.  And that would be you mistake.  While autovacuum runs per-table, running transactions are per-database.  You may well have a long-running transaction running statements against some other table preventing rows from being removed from the table you're watching.  Again, this is because PostgreSQL cannot predict the future; that long-running transaction might run a query against the table you're watching two seconds from now, and as long as that could happen, those old tuples cannot be removed.

How does this relate to ActiveMQ, you ask?  If you're running ActiveMQ and using PostgreSQL as your backing/persistent store (and you may well have reasons to do this), and you don't do anything to change it, the default failover locking strategy is for the master to acquire a JDBC lock at startup, and hold onto it forever.  This translates into a transaction that starts when ActiveMQ starts, and never completes until ActiveMQ exits.  You can see this in progress from the command line:

activemq=# select xact_start, query from pg_stat_activity where xact_start is not null and datname='activemq';
          xact_start           |                                                query                                                
-------------------------------+-----------------------------------------------------------------------------------------------------
 2014-02-27 01:24:13.677693+00 | UPDATE ACTIVEMQ_LOCK SET TIME = $1 WHERE ID = 1

If you look at the xact_start timestamp, you'll see that this query has been running since ActiveMQ started.  You can also see the locks it creates:


activemq=# select n_live_tup, n_dead_tup, relname, relid from pg_stat_user_tables order by n_dead_tup desc;
 n_live_tup | n_dead_tup |    relname    | relid 
------------+------------+---------------+-------
        628 |      58903 | activemq_msgs | 16387
          4 |          0 | activemq_acks | 16398
          1 |          0 | activemq_lock | 16419
activemq=# select locktype, mode from pg_locks where relation = 16419;
 locktype |       mode       
----------+------------------
 relation | RowShareLock
 relation | RowExclusiveLock


Again, as long as this transaction is running holding the ActiveMQ lock, (auto)vacuum cannot reclaim any dead tuples for this entire database.

Fortunately, ActiveMQ has a workable solution for this problem in version 5.7 and later in the form of the Lease Database Locker.  Instead of starting a transaction and blocking forever, instead the master will create a short-lived transaction and lock long enough to try to get a leased lock, which it will periodically renew (with timing that you specify in the configuration; see the ActiveMQ documentation for an example).  So long as the lock keeps renewing, the slave won't try to take over.  Your failover time, then, depends on the duration of the lease; it won't be nearly-instantaneous as it would in the case of a lock held when ActiveMQ exits (though it could be faster than a transaction ending after a socket times out due to an unclean exit).

Because the locking transactions come and go, rather than persisting forever, the autovacuum process is able to reap your dead tuples.


So the moral of the story is this: if you're using PostgreSQL as the persistent store for ActiveMQ, make sure you configure the Lease Database Locker in your persistenceAdapter configuration.  Otherwise, PostgreSQL will never be able to vacuum out old tuples and you may suffer performance degradation and a database that bloats in size forever (or until you stop ActiveMQ, run a vacuum, and restart it).

Five More Bloody Signs You Aren't Bloody Agile

This is getting old, isn't it?  I still think I might try to turn this into a book of some kind, though, so we press on.

5. You can't touch some code because someone "owns" it.  Or there's some piece of code that only one person can work on.  Sometimes this can happen because what a unit of code does is so complicated and so specialized that there's only one person on your team capable of understanding it.  But most of the time what is happening is some combination of a person's ego becoming involved, a lack of sufficient unit testing, and insufficient cross-pollination of code modules among the members of the team.  Consider this situation very carefully, because it means your code base now has a single point of potential failure.  If only one person understands it, what happens if that person gets sick or wants to take a vacation?  If someone's ego gets involved, it's almost always to the detriment of the rest of the team. Or are people just afraid to touch it because something might break?  That's the easiest case to deal with: keep adding unit tests until people feel comfortable with that safety net.  Pair Programming can help with the cross-pollination, as can inflating an estimate, biting the bullet, and giving the work to anyone BUT the person who "owns" that code.  Collective code ownership may be a little painful and a little slow in the short term, but in the long run, you'll be glad you did it.

4. Your team lives from one crisis to the next crisis.  Management is freaking out regularly.  Flailing, weeping, wailing, and gnashing of teeth.  What could this possibly have to do with Agile, you ask? Quite a lot: it's an indication that your Agile process has completely broken down or your stakeholders are not playing the game.  If Agile were working properly, why would Management need to freak out?  They'd have a predictable set of deliverables for each iteration and each release.  They'd have a good idea of the team's velocity.  They'd have a decent idea of when things would be "done".  They'd be seeing progress in the form of demos regularly.  If they're completely disengaged and not participating, you're going to need a heart-to-heart.  If they are flailing because your team has given them timelines, and they just don't like what you had to say, then you have a trust issue between your stakeholders and your team, and again, you need to have a heart-to-heart.  If Agile is new to your organization, you need to set expectations accordingly, and make sure everyone understands how the game is played.  Remember, though, out of crisis can arise opportunity: if you can hold your team together and execute cleanly, you can demonstrate how Agile can provide the predictability and response to change that has your reporting chain in an uproar.

3. Your team has more than 6-8 developers.  Agile performs best with small teams who can meet regularly and communicate easily.  If your product is really a product of products, consider breaking teams apart along product boundaries.  You may be tempted to break teams apart across some other boundaries, but try not to do that.  Remember, you're trying to satisfy a product owner; the key word is "product."  How can you do end-to-end iterative development if you've aligned "vertically" instead of "horizontally."  Give it some thought, because there's a decent amount of overhead in bootstrapping a new team.  Reflect on the values of Agile, and form your teams in such a way that you can satisfy the various roles most effectively and play the game by the rules.  

2. Chickens are being pigs.  Pigs are being chickens.  Or to put it another way, someone is regularly stepping outside the boundaries of their role in your meetings.  This could be a product owner trying to change the rules of your Scrum, a Scrum Master trying to influence the implementation of your code, a developer trying to inject a feature into the product that wasn't requested, etc.  It can also take the form of "Some Guy" trying to do anything at all in your meetings.  When this happens it is the responsibility of the Scrum Master to call it out, explain why it isn't allowed, and remind everyone what their roles are. If you're playing soccer out on a pitch, you can't have a handful of players decide they want to play rugby, nor can you have your goalie decide he wants to play forward.  For the game to work, the players must first agree to the rules, and then play by those rules.

1. The rules of your game keep changing, and nobody asked your team.  Someone from outside the team is issuing edicts to the team and expecting them to be obeyed.  Agile teams should be self-organizing, with the aid of the Scrum Master as a servant-leader.  It's a negotiation within the team to set things like meeting times, iteration length, and consulting with the Product Owner, to set things like the number of iterations to a release, release dates, demo agendas, etc.  If someone from outside the team is issuing directives (like "Some Guy"), they are cheating.  This, sadly, is an organizational problem and often an indicator of a lack of trust of or lack of respect for the team.  It's a tough nut to crack to figure out why this lack of trust or lack of respect persists, and even more difficult to remedy the problem.

I seriously hope I'm done this time, and that anything else I think of will be a straightforward repetition of one of my previous articles:



Yet Another 5 Signs You Are Not Agile

Boy, these just never get old, do they?  Given the span of time it's started taking me to collect another 5 signs, though, I'm hopeful that I'm running out of material.  Or maybe I'm just being optimistic.

5. Your iteration runs Monday through Friday.  Everyone checks in on Friday.  Test is always one week behind.  Acceptance is always delayed. I think I've seen this particular behavior in every Agile project to which I've ever contributed, and the cause is very simple: your stories (and probably your tasks) are too big.  If you're working on the same story through the entire iteration, either you're making no progress day-to-day, or your story was so big it took the entire week to accomplish.  The former problem can be addressed in daily stand-ups by asking what tasks each person completed on the previous day.  If the answer is "nothing" then you have a blockage or some other problem that needs to be addressed immediately.  The latter problem means you need to spend more time in story decomposition and task breakdown.

4. Something changes. Chaos ensues trying to hide the change.  Agile, when properly applied, excels at coping with change.  If you've kept your iterations short, decomposed your stories, broken down your tasks, considered your velocity in planning, etc., you can estimate the impact of whatever changed and adjust accordingly.  We've all had the key team member falling ill, business priorities changing, last-minute items coming from security auditors, etc.  These things will impact your velocity. And that's okay!  What you don't want to do is to try to hide the impact of the change, treating your velocity like it's a stick with which to beat people, pulling out all the stops to try to "meet your numbers" for the iteration.  That's not what velocity is for; rather, the impact of many small changes over time will reduce your average velocity slightly, and this will ultimately aid you in planning.

3. Rather than decomposing stories, your team lengthens its iterations.  Key to being Agile is having frequent opportunities for adjustment and collecting frequent data points about team velocity. Consider this: if you need four data points to start to get a good feel for your team velocity, with one-week iterations, that takes you a month.  With two-week iterations, it takes two months.  With three-week iterations, it takes three months just to figure out what your velocity is.  It's like trying to learn to pilot a jet ski, fishing boat, or cruise ship.  Moreover, when adjustments are needed, you're trying to steer a jet ski, fishing boat, or cruise ship.  Granted, a jet ski isn't the appropriate watercraft for every situation, but if you're planning to change to a fishing boat or a cruise ship just because you're unwilling to leave some of your gear on the shore between trips, you're not making a change for a good reason.

2. Related to #3, you survive many planning meetings without ever decomposing a single user story. Maybe it's possible that you entered all your planning meetings with all your user stories perfectly decomposed into chunks that can be delivered end-to-end inside of one iteration, but it's not particularly likely to happen iteration after iteration (unless your product owner is exceedingly well-trained).  Odds are your stories are going in too big, and you're about to experience #5 from this list.  You need to have a discussion about the right size for a user story, the difference between user stories and tasks, and how to write user stories correctly.  You may need to allow extra time for planning for several iterations until the right way becomes a healthy habit, and the wrong way starts to feel weird.

1. Your team throws out Agile practices willy-nilly whenever they become more than mildly inconvenient, yet still expects to reap the benefits of Agile.  I'm the last person to say that teams must absolutely adhere to Agile dogma (though one could get that impression based on my blog postings), but when you're making tweaks and changes to your process, you really must think carefully about what you're changing, why you're changing it, and what the potential unintended (and direct) consequences of your change may be.  If you choose three-week iterations, it's going to take months to establish a velocity.  If you don't track or use your velocity in planning, your estimates aren't going to be very useful.  Be very cautious here: it's easy to fall into the trap of the "compressed waterfall."  And don't complain that Agile failed when you've thrown the baby out with the bathwater, adopting a process that looks nothing like Agile (but still calling it "Agile").

Bonus: Nobody from the team ever acts as a dissenting voice in your commits, retrospectives, or estimation exercises.  Nothing says "just get me out of this meeting" than everyone giving a "five" when you call for a "fist of five" to commit for the iteration.  Similarly, nothing says "I don't really care about the size of this story" than having a handful of people giving a different estimate, yet automatically giving in to the more-common estimate without any discussion.  Your team is getting burned out, they don't understand the process, or they have no incentive to care.  Take the time to really understand the underlying problem here, rather than just having a knee-jerk reaction to it.  You may find valuable insight.

Like most things in life, what you get out of Agile is directly proportional to what you put into it.  Know your Agile practices, but also know why the practices are what they are, and understand the consequences of abandoning, changing, or tweaking the practices in your organization.  Make sure these consequences are clearly communicated to your stakeholders, too!

Travelocity, Why Are You Terrible?

Quite often when I say I will never do business with a particular company ever again, it's because they have engaged in some socially egregious behaviour rather than directly affecting me personally.  This is not the case with Travelocity.  I hope I never have to deal with them again, because they are awful.  Calling them "incompetent" would be an insult to the incompetent.

My partner and I spent hours planning out and booking a fairly complicated itinerary for what is effectively a honeymoon in Asia for us.  It is a plan that involves stays in three cities, and transport using three different airlines.  We booked the majority of this via Travelocity's site.  It went reasonably well once we had figured out our flight choices (which we did using Google Flight Search, not Travelocity).

But he decided that he'd like to spend one more week with his family before coming home.  I had to return to work.  So he wanted to change his ticket home to leave one week into the future.  My ticket would be unchanged.

The way Travelocity handled this, you would think we were the first couple in the history of powered flight to ever attempt such a change.

First the outsourced Travelocity customer service rep changed MY ticket instead of his.  This happened even after my partner told him at least THREE times that my ticket was to be left alone, and only his should be changed.  Then the rep wanted another $40 fee to change it back, for his own mistake!  To their credit, this fee was waived.  But this behaviour demonstrates a fundamental lack of comprehension on the part of the outsourced service drones.

After fixing the booking and performing the requested change, we found that all of the legs of my flights were now no longer visible on Travelocity.com.  The total price was still correct, and the number of passengers was correct, but none of my details were present.

He called back.  This time you would think we were trying to explain to someone how to assemble a nuclear reactor over the phone.  The customer service rep assured us that all would be well, and we ended the call.  Of course, all was not well.  The rep didn't actually DO anything.  My flight legs were still not visible on the site, and the confirmation email we got also contained only his flights.

So I called back, and this time I was informed by the outsourced customer service drone that this is a natural consequence of "splitting" the ticket so that one could be changed and not the other.  Yes, because booking airfare and changing tickets is a completely new thing in 2013, referential integrity is completely lost if you change anything.  The rep assured me that I could still get to my itinerary by visiting each airline's website individually and using different reservation codes with each site.  She also promised to send me emails containing all the reservation details.

Of course, that makes using Travelocity sort of pointless, doesn't it?

About 4 hours later, the emails finally arrived, and I discovered that at the very bottom of the emails was a code that could be used to pull up all the flight legs.  This is denoted by the string "***".  Because when you see "***" you think "oh here is a single reservation code I could use on either airline's web site to see all the legs of my flight.  It's obvious, you see.

But of course, Travelocity's own web site, which I used to book these tickets, lacks this capability. Because, again, nobody has ever changed a ticket before, ever, so that's not really a feature anyone ever thought might be nice to have.

Travelocity, why are you terrible?  Are you trying to be terrible deliberately?
Enhanced by Zemanta

Five More Signs You Are Not Agile

In a previous blog posting I wrote about my top 5 signs that an organization is not really agile.  That posting was more about the mechanics of Agile.  In this posting, I present 5 signs that are more social or cultural indicators that agile either is not working in your organization, or is doomed to fail.  Or flail. 

Honorable mention, because I thought of this after I already finished the top 5: the expectation of heroic effort to meet unreasonable expectations.  Sometimes this is coming from Some Guy (see #5, next) or it is indicative of a disengagement in your planning or demo process between your development team and your product owner; perhaps your product owner is expecting more than is reasonable to expect based on your velocity, or there's an expectation that change can happen without any consequences.  We embrace change, and we can try to predict how much change is going to affect the schedule, quickly and with reasonable certainty, but we can't ever predict that the cost is 0.  Also see item #2 in my previous post.

5. There is someone in your meetings who can't tell you why he or she is there (or what Agile role he or she is playing).  For a proper game of Agile Development, the roles need to be clearly defined, just as in any well-organized game.  Games of American Football are not played with a Center, Quarterback, Guards, Tackles, Running Backs, Wide Receivers, Tight End, Linebackers, Referees, Coaches, and Some Guy.  There's no place for "Some Guy" in this process.  Similarly, in Agile, there are well-defined roles, and no role called "Some Guy who shows up to your meetings and injects noise."  And that's what happens when someone who is not properly involved becomes involved: noise in your signal.  It's a distraction and an interference with your team's ability to self-organize to meet its goals.  If Some Guy is trying to control, manipulate, or micromanage your process from the outside, it could be a sign of serious trust issues that I can't help you with.  (Though you might want to skip to my last paragraph below.)

4. Your team frequently demonstrates misconceptions about Agile.  Part of implementing Agile correctly is that the players have to understand their roles, and they also need to understand something about the game they're playing.  I could do an entire blog posting on Agile Misconceptions, but for the sake of brevity I'll cite just a few common ones here: the idea that "agile" is synonymous with "fast", the idea that planning is minimal to nonexistent, and the idea that somehow Agile has no or low overhead.  The remedy to this is education: you must consistently reinforce with people the reasons for being agile: delivering the right product with decent estimates and responding well to change.

3. You are a slave to your tools.  Usually this means being a slave to Rally or Pivotal or Jira or even Excel or your favorite Gantt chart nightmare.  You spend a lot of time in your planning sessions fighting with or updating these tools.  You find yourself changing your own processes that work well for your team so that they fit into the model provided by the tool, or you spend a lot of time faking-out the tool in weird ways to make your processes work.  In the worst case, someone from #5 above sees pretty charts in your tool and starts using the tool as a mechanism of bean counting or performance tracking.  If you must use a tool, at least kick it out of your meetings.  Sticky notes if everyone is present, or just your favorite text editor shared in an online session if you have remote people will speed your process along, and then you can go update your tool later when you don't have a room full of expensive engineers sitting around wasting their time.

2. You are not doing Retrospectives. Or worse, you are doing retrospectives, and the team is offering few, if any "starts, stops and keeps."  (Worse because their time is expensive, and it's a waste of money to have them sitting around doing nothing.)  This is a sign that your team is not engaged or invested in the process.  Take a team out to lunch, and they can think of 20 different ways the restaurant could have improved their performance.  It's hard to believe, then, that after a week of meetings and development time, they can't think of anything they'd change.  To fix this, you have to figure out why the team is disengaged.  Is it something you said?  Are they afraid to speak up?  Do they think their previous retrospective items have been ignored or gone unimplemented?  Are there other problems going on within your organization that has impacted your team's morale?  This is by no means an exhaustive list of questions, but rather a place to start.  Lack of engagement will kill your process, so figure it out and fix it.

1. You are not doing demos.  Your product owner should want to see progress.  Your team should want to show off their achievements.  Moreover, demos offer a unique opportunity to "force" any integration work that might otherwise be put off until it's far too late.  (Integration should be happening more frequently than your demos as well, but nothing forces the issue like having to give a demo to your product owner.)  A lack of demos can have many causes.  You may have a disengaged team (see #2).  You may not be truly performing iterative development.  You may be putting off integration work that you shouldn't be putting off.  You may have planning problems with stories that are badly written or far too big or you may have no proper product owner (see #5 in my previous post).  Whatever the case may be, if you're not doing demos, find out why, quick!  A failure to demonstrate work achieved on a regular basis (ideally every iteration) is a tremendous red flag that something has gone wildly off the rails.  If you can change or fix only one thing, fix this one, because this will tend to expose and correct many other, smaller problems under the hood.

Remember: Changing code is easy.  Changing behavior is hard.  Changing culture is almost impossible.  If you're having these problems, the fix may not be trivial.  It might be very difficult.  Start with education and get the support of your management chain if the onus is on you to correct the problem.  (And if you're a self-organizing team, the onus IS on you.)

And I hate to say it, but I would be remiss if I did not mention Martin Fowler's well-known quote: "if you can't change your organization, change your organization."  In other words, if nothing you can ever say or do will make your organization "get well", and you want to be someone involved in an agile development process, you might be better off making a career adjustment.  I also take this statement in a way that Fowler probably didn't intend: if you have people in your organization who are holding it back and sabotaging your agile development efforts, maybe you need to have their careers adjusted so that you can get on with the business at hand.

The "Privilege" Fallacy

Let me be very clear: I have never disagreed that it is a truth that some people in western culture have it easier than others.  To deny that heterosexual, Christian white males in the United States have the deck stacked in their favor would be to deny reality. Though things have gotten better in the last few decades, and we're on a trajectory toward a more inclusive culture overall, we still have much progress to be made.  I don't deny any of these things, and in fact I am one of the people working to even the odds a little.  Being a gay atheist has also given me some insight into what it means to have to work a little harder because you're not automatically included.

Having said all of that, I have seen and experienced firsthand a disturbing trend among the more sanctimonious hipster crowd in which this unfortunate facet of current social norms is not used as a reminder that we must do more to level the playing field, but rather as a mechanism to attack others, insist upon their guilt, and, perhaps most egregiously of all, to exclude them from the conversation entirely.  If you should dare to attempt to express an opinion, play the devil's advocate, or otherwise question the conventional wisdom or party line, you can almost set your watch by how long it'll take before someone hisses "privilege" at you, should you happen to be any combination of white, male, heterosexual, or Christian (in the US at least).

The implication, of course, is that you cannot possibly have anything to add to the discussion, nor is it even possible that you might have a point of value to make or something thought-provoking to say, because, simply by virtue of your genitalia, you are such a sociopathic monster that you are utterly without empathy for anyone else and incapable of seeing the world from another person's perspective.  By any standard, this is an ad-hominem, a logical fallacy of the most trivial order.  Your idea cannot be discussed, because you are de-facto "bad" and probably not even a human being.  You belong to the "privileged" few, like it or not, and so nothing you could possibly say (unless it toes the party line) could possibly have any value whatsoever.

You could be volunteering in women's shelters, fighting AIDS in South Africa, feeding the homeless in soup kitchens, voting for candidates who support equality for everyone, contributing money to help open schools for underprivileged girls in far corners of the world while yodeling and drinking a glass of water at the same time, and none of that would count for anything.  You're "privileged" so you're not invited to the discussion (if you could call it a discussion; most real discussions have more than one side or at least consider multiple points of view) and nothing you could ever say could possibly be considered for any reason. You are a monster by birth.

"Privilege" isn't even the right word to use, but those who spit it at others enjoy it for the negative connotations it has.  The word conjures up ideas of being fed with a silver spoon, having everything handed to you with little or nothing asked in return, living a life of luxury and ease.  And while it's true to say that heterosexual white male Christians may be playing the game of life on the easiest setting, it's not correct to say that anyone growing up in middle-class or poor America is really living a life of "privilege."  It's not easy, even on the easiest setting.  Sure, it's definitely harder if you're, say, a dark-skinned woman, but that doesn't mean it's trivial otherwise.

Of course people puking out this sort of attack know that, but it wouldn't be as much fun to say something like, "I question whether you're able to empathize about this situation due to your background."  Why?  Because that's an affirmative statement that someone can actually be asked to defend.  Oh really?  Why do you think I'm incapable of experiencing empathy, exactly?  Conversely, the projectile-diarrhea of "privilege" just means that you have light skin or male genitals, to which the accused (presumed guilty) can't really mount a defense.  That is, of course, the intention.  It's a cheat, not a thoughtful response.

(Using a word like "privilege" in this manner also allows its users to avoid being accused of any impropriety themselves.  Imagine if, instead of shouting "privilege," someone shouted, "you're only saying that because you're white" or "you just think that because you're a man."  When you put it like that, it sounds like the mean-spirited, shallow, sweeping generalization that it is.)

It is among the most intellectually lazy of fallacious arguments, but more egregiously, it is used to prevent a conversation from happening, rather than having a conversation with someone who has already expressed an interest in hearing different points of view.  It is saying, "I cannot provide (or can't be bothered to provide) an intelligent or reasonable response to what you've said, therefore let's just assume you're horrible and you lose."

In effect, this blanket, targeted exclusion of certain people from even being invited to participate in the discourse demonstrates the very height of hypocrisy in many cases: in response to people being treated unfairly because of their gender, skin color, sexual orientation, etc., those who assert "privilege" as a mechanism of exclusion are, in fact, targeting others to treat them unfairly based solely on their gender, skin color, sexual orientation, etc.

Now some might argue that this is only fair; after all, isn't it just a question of the tables being turned?  That depends on whether you want a discussion or you want revenge.  It depends on whether adding more wrongs on the pile make the world more fair.

So if you want revenge, and if you think the best way to resolve egregious behavior by others is more egregious behavior on your part, then by all means, accuse someone of having "privilege". It'll make you feel better, because launching personal attacks on others always does.

But if you don't want to be intellectually lazy, avoiding questions to which you may not have good answers, perhaps you could try listening to what the other person has to say, evaluating what they've said to see if perhaps they could have something worth thinking about to say, and providing an adequate and thoughtful response.  Yes, even if you feel like you've said it a thousand times before, maybe you haven't said it to this person before, or maybe there's a better way to say it. This is how civil discourse is supposed to work.  It's supposed to be civil (maybe even free from personal attacks) and involving discourse (multiple people being permitted to contribute to the discussion).

Yes, it's more effort to explain to another person your different perspective and to elucidate how your own experiences led you to have a different point of view.  Sorry.

Think about it carefully: when you already have someone's ear, and they're receptive to communicating with you on a topic, why would you deliberately shut them out with such a thoughtless accusation?  You have a teaching and a learning opportunity when you engage with another person.  Don't throw it away out of spite.


Postscript: Let's open a betting pool on how long it'll be before someone says, "well this is exactly what I would expect a privileged person to say!"  (rather than actually offer any counterpoint to anything I've said, of course).
Enhanced by Zemanta