Design to fail
I was chatting to a customer earlier about a solution they had built for Azure. They had implemented a thingythangy that stored a few hundred requests in memory, before dumping it into a blob. My immediate reaction was – “What happens when your role gets recycled, do you loose the cached requests?”
Windows Azure does not have an SLA for restarting your service, but if it did it would be 100%. Restarting is just a reality. Hardware fails, OS’s get patched, etc. At some point you will be restarted, maybe with a warning, but maybe not.
This by the way is no different from when you run your service on the server under your desk. At some point it will get restarted, loose power or some other such calamity.
One thing you need to think about when writing good code for the cloud is how to deal with this. There are a few choices to think about:
Ignore it and carry on
My Dad used to say “nothing to see here, move along” – sometimes it really doesn’t matter. You can ignore some things that happen twice. As an example if you were counting web site hits, and extra “count” here or there isn’t really going to change the outcome or purpose, however if the action is generating a patients prescription, that extra Hydrocodone tablet will probably make a significant difference.
Write code to handle failure
Write code to detect if the action has already been completed as well as write code to recover in-complete actions. You can do things like:
- Check how many times a message has been dequeued – a message that has been dequeued more than once is either a poison message, or was the subject of a failure (or both.
- Check the eTag & timestamp on data from Windows Azure storage. Has it been updated recently, is the message you are trying to process older than the last update?
Essentially you are trying to write code that is idempotent. Idempotent code is code that can be executed multiple times without changing the outcome. There are a bunch of techniques which I’ll cover over the coming weeks (with code), but the bottom line is:
Your code will fail – make sure you handle that.
THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS, EVEN IF YOU SAY PLEASE
Scaling out Azure
<Rant Warning/>
In previous blog posts, I’ve talked about some of the patterns you can use to build your apps for the cloud, including Task-Queue-Task and de-normalizing your data using that pattern. But now something on scaling out.
When you are building apps in the cloud, you have to remember you are running in a shared environment and have no control over the hardware.
Let’s think about that for a moment.
In Windows Azure and SQL Azure we run you on hardware. Information on roughly what to expect can be found here (scroll down and expand compute instances), but here is a table of the compute part of Windows Azure:
So how fast is the memory? What kind of CPU caching do we have? How fast are the drives? What about the network?
For SQL Azure we don’t even tell you what it’s running on, although you can watch this to get a better idea of how “shared” you are.
The point I’m going to make is that when you control the hardware, you can figure out lots of things like the throughput of disk controllers, CPU & Memory and based on that knowledge create filegroups for databases that span multiple drives, install more cores, faster drives, more memory, faster networking – all to improve performance. You are scaling up.
In the cloud, things work differently – you have to scale out. You have lots of little machines doing little chunks of work. No more 32-way servers at your disposal to crank through that huge workload. Instead you need 32 x 1 way servers to crank through that workload. There are no file groups, no 15,000 rpm drives. Just lots of cheap little servers ready for you whenever you need them.
I get a lot of questions about the performance of this, that and the other and while I understand that information can be useful, I think they somewhat miss one of the points and the potential of using the cloud.
I don’t need 15,000rpm drives and 8 cores to handle my anticipated peak workload. Instead I can have 20 servers working my data at peak times, and 5 servers the rest of the time. So stop thinking about how fast the memory is, and start figuring out how you can use as many servers as you need – when you need it.
Remember OUT not UP.
THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS, EVEN IF YOU SAY PLEASE
Remember to check your framework version
I’ve just recently installed the final version of Microsoft Visual Web Developer 2010 Express – which is my tool of choice for building for Windows Azure.
When you open a project from an older version, you get the option to upgrade the projects. My default response to this dialog box is to click on Finish and not walk through the wizard. Now one thing that I notice is that if there are any web projects, you will be asked if you want to leave them as framework 3.5 projects, or upgrade them. Right now Windows Azure does not support .net 4 applications – so you should choose to leave them at framework 3.5.
This is great, but if you have any class libraries or other projects, those seem to get upgraded automatically to framework 4.0. The easy fix is to check the project properties of each project and make sure the framework version is set to 3.5. Fortunately most of the projects I’ve converted have thrown up warnings – but it’s always good to check.
THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS