Resin, OS X, wrappers and image-generating servlets
Switching to the Java service wrapper has given us a better insight into what's going wrong when image-generation servlets fail with OS X. It looks like a manifestation of the "CFMessagePortCreateLocal failed" problem. Here is a post on Apple's support forum describing the problem with Tomcat. Googling around shows that the problem is reasonably well attested to and documented, but answers are light on the ground. As Apple use Java application servers for their support forums, and, of course, for WebObjects applications (including the iTunes Music Store), I presume there must be understanding at Apple about the problem, and best practice information within Apple about remediating the problem.
In the absence of that information being in the public domain, we can venture a theory. We believe the Tomcat, Resin, or Java service wrapper process respawns the JVM too quickly after the JVM dies (it's another thing entirely as to why the JVM dies once or twice a week on even lightly-loaded XServes), and the JVM hasn't freed all the OS resources it's using before it's restarted. Critically, it hasn't freed some resources connected with "headless" operation. This is what causes "CFMessagePortCreateLocal failed" to be logged. Thereafter, anything the JVM does that requires the window-server causes the JVM to crash again.
We chose the Java service wrapper to replace Resin's own wrapper because it's highly configurable. We've adjusted the wrapper.restart.delay parameter upwards to 10 seconds from its default of 5. Although this means there will be visible service failures when the JVM crashes, we hope that this will leave enough time for whatever OS resources are not currently being released to actually be released. If this is successful, we will reduce the restart time back down in one second intervals until we hit the optimum time. Conversely, we will increase the time in one second intervals if we find 10 seconds is not enough.
With luck we'll rapidly converge on a platform that's stable enough for production use.