I'm chewing through my email here at work, and I see a notice that our Cloud platform (built on ESX/Surgient) is going down for upgrades this weekend, a nearly 24 hour outage of the platform.  It got me to thinking - when does Amazon Web Services or Google App Engine ever shut down their cloud? 

People who know me well know that I really hate technology.  Not that I'm a luddite, but that most technology is actually crap - poorly engineered, prone to failure, difficult to fix.  I'm in the midst of building a webapp with a Lua plugin for Adobe Lightroom, and I will have to deal with the fact that a user may install my plugin, and never update it.  I may have to service version 1 of my plugin forever.  My web service is the type that needs to be robust - taking the whole cloud of servers down to upgrade just isn't in the cards.  I have to build my data model to be resilient - I can't change it midstream, and I can't do massive database upgrades that allow me to take the system offline for a day at a time.  I must plan for cases where I can only upgrade components of my infrastructure at a time. 

Things I must be able to tell about my infrastructure: 

1. what I'm running and it's versions. 

2. health of what I'm running 

3. status of any dependencies. 

 

Because truth be told, I must write my software and my app as if the Cloud can't be rebooted.  And the fact that my day job requires this on a constant basis is borderline criminal - we should hold our software vendors to higher standards.  I shouldn't put 100% of my cloud users out of commission, when I might be able to limit it to a fraction - or none at all.