Everyone understands the concept of scaling out compute when building applications for Windows Azure, but many times I’m seeing storage left out of the equation. The results are applications that don’t scale because they are tied to a single database. In this post I will explain how you need to start thinking about storage in the cloud.
Storage on-premises
If you think about how you build big applications on premises and how you approach data in general is very different. Typically you have a database or storage standard. This could be SQL Server, or some other relational database. The law of the enterprise is usually “all data shall be stored in <insert your enterprise database here>”. There is no other choice.
The second big difference is how you approach scale. Typically to get scale, you build the biggest most elaborate database server you can. I remember years ago, tuning everything from the hardware, through to how the disks were partitioned and where the file groups in SQL Server were located and used. Machines typically had multiple CPU’s with multiple cores, multiple disk controllers, disks and a bucket full of RAM.
If you also needed high availability, that was easy. You typically built a second monster machine and configured an Active/Passive setup, or hot standby,
How the cloud changes storage
The first thing about using storage in the cloud is that there are zero setup costs. You can just as easily create a SQL Azure database or 7. Create multiple storage accounts, tables etc. You pay for what you consume. Setup is free. This means you have less reason to put all your data in one place. You can start thinking about which data source is right for what data. It is not unusual to have multiple data sources used.
The second big thing is that we don’t have that big huge server with lots of cores and ram and fancy stuff. At least not tuned to the extent you would for your workload. Instead we have “commodity hardware”. To achieve “scale” you have to think about using lots of little storage things, rather than a single huge one. This is the exact same thing you do with compute.
If you have 50GB of data, you may be better served by 50 x 1GB databases than 1 x 50GB. The cost is comparable too. Except you now have 50 of those servers running your database. Guess which model will scale better?
There is a tax in thinking about, designing and implementing this – however if done right, you can scale out as far as your compute (and credit card) will let you.
This might have been a rant. Apologies.
THIS POSTING IS PROVIDED “AS IS” WITH NO WARRANTIES, AND CONFERS NO RIGHTS