Archive for April, 2010

Scaling out Azure

<Rant Warning/>

In previous blog posts, I’ve talked about some of the patterns you can use to build your apps for the cloud, including Task-Queue-Task and de-normalizing your data using that pattern. But now something on scaling out.

When you are building apps in the cloud, you have to remember you are running in a shared environment and have no control over the hardware.

Let’s think about that for a moment.

In Windows Azure and SQL Azure we run you on hardware. Information on roughly what to expect can be found here (scroll down and expand compute instances), but here is a table of the compute part of Windows Azure:

Compute Instance Size CPU Memory Instance Storage I/O Performance
Small 1.6 GHz 1.75 GB 225 GB Moderate
Medium 2 x 1.6 GHz 3.5 GB 490 GB High
Large 4 x 1.6 GHz 7 GB 1,000 GB High
Extra large 8 x 1.6 GHz 14 GB 2,040 GB High

So how fast is the memory? What kind of CPU caching do we have? How fast are the drives? What about the network?

For SQL Azure we don’t even tell you what it’s running on, although you can watch this to get a better idea of how “shared” you are.

The point I’m going to make is that when you control the hardware, you can figure out lots of things like the throughput of disk controllers, CPU & Memory and based on that knowledge create filegroups for databases that span multiple drives, install more cores, faster drives, more memory, faster networking – all to improve performance. You are scaling up.

In the cloud, things work differently – you have to scale out. You have lots of little machines doing little chunks of work. No more 32-way servers at your disposal to crank through that huge workload. Instead you need 32 x 1 way servers to crank through that workload. There are no file groups, no 15,000 rpm drives. Just lots of cheap little servers ready for you whenever you need them.

I get a lot of questions about the performance of this, that and the other and while I understand that information can be useful, I think they somewhat miss one of the points and the potential of using the cloud.

I don’t need 15,000rpm drives and 8 cores to handle my anticipated peak workload. Instead I can have 20 servers working my data at peak times, and 5 servers the rest of the time. So stop thinking about how fast the memory is, and start figuring out how you can use as many servers as you need – when you need it.

Remember OUT not UP.

THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS, EVEN IF YOU SAY PLEASE

Remember to check your framework version

I’ve just recently installed the final version of Microsoft Visual Web Developer 2010 Express – which is my tool of choice for building for Windows Azure.

When you open a project from an older version, you get the option to upgrade the projects. My default response to this dialog box is to click on Finish and not walk through the wizard. Now one thing that I notice is that if there are any web projects, you will be asked if you want to leave them as framework 3.5 projects, or upgrade them. Right now Windows Azure does not support .net 4 applications – so you should choose to leave them at framework 3.5.

This is great, but if you have any class libraries or other projects, those seem to get upgraded automatically to framework 4.0. The easy fix is to check the project properties of each project and make sure the framework version is set to 3.5. Fortunately most of the projects I’ve converted have thrown up warnings – but it’s always good to check.

THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS

De-Normalizing your data

I mentioned yesterday about how we use some trickery to maintain the top views table in Bid Now.

Items are stored in a table, but to generate a quick top views view, we maintain a separate table. This is done using the Task/Queue/Task pattern described yesterday here.

Since TOP isn’t a keyword that Windows Azure table storage deals with, we have to use some trickery.

Items are stored in the AuctionItems table. But when you visit the home page, you get several different lists of items. Each of these items are in fact separate tables, which are kept up to date using the Task/Queue/Task pattern.

image

For the Most Viewed list – we use the MostViewedItems table. The table contains PartitionKey, RowKey, TimeStamp, Title, EndDate, ItemId, PhotoUrl, ShortDescription & ThumbnailUrl. In fact just enough information to display the list and enable a click through to the item details.

The query to return the top 5 items is simple – we simply return the first 5 items from that table – which is super fast. How you say. Well we use the PartitonKey to order the table!

Take a look at the Most Viewed from the home page.

image

Here is the MostViewItems table. Note the partition key is numerical. Every time an item is viewed we read the row, decrement the number in the partition key, save the new row and delete the old one. (since you cannot update the partition key).

image

Note Football is 2nd from the bottom. If we view football, the update works and we end up with this on the home page.

image

And an update to the partition key. Note the number has been decremented by one.

image

You can use this pattern to keep your own data de-normalized and provide super fast queries.

THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS

Go to Top