Rebooting the Cloud??
I'm chewing through my email here at work, and I see a notice that our Cloud platform (built on ESX/Surgient) is going down for upgrades this weekend, a nearly 24 hour outage of the platform. It got me to thinking - when does Amazon Web Services or Google App Engine ever shut down their cloud?
People who know me well know that I really hate technology. Not that I'm a luddite, but that most technology is actually crap - poorly engineered, prone to failure, difficult to fix. I'm in the midst of building a webapp with a Lua plugin for Adobe Lightroom, and I will have to deal with the fact that a user may install my plugin, and never update it. I may have to service version 1 of my plugin forever. My web service is the type that needs to be robust - taking the whole cloud of servers down to upgrade just isn't in the cards. I have to build my data model to be resilient - I can't change it midstream, and I can't do massive database upgrades that allow me to take the system offline for a day at a time. I must plan for cases where I can only upgrade components of my infrastructure at a time.
Things I must be able to tell about my infrastructure:
1. what I'm running and it's versions.
2. health of what I'm running
3. status of any dependencies.
Because truth be told, I must write my software and my app as if the Cloud can't be rebooted. And the fact that my day job requires this on a constant basis is borderline criminal - we should hold our software vendors to higher standards. I shouldn't put 100% of my cloud users out of commission, when I might be able to limit it to a fraction - or none at all.