Scalability
From Epowiki
Scalability is the ability to keep solving a problem as the size of the problem increases.
Scale is measured relative to your requirements. As long as you can scale enough to solve your problem then you have scale. If you can handle the number of objects and events required for your application then you can scale. It doesn't really matter what the numbers are.
Scaling often creates a difference in kind for potential solutions. The solution you need to handle a small problem is not the same as you need to handle a large problem. If you incrementally try to evolve one into the other you can be in for a rude surprise, because it won't work as you pass through different points of discontinuity.
Scale is not language or framework specific. It is a matter of approach and design.
Contents |
The Two Classes of How to Handle Scalability
I've come to think there are two classes of scalability problems:
- Scalability under fixed resources.
- Scalability under expandable resources.
The two different classes lead to solutions using completely different
lity problem you have a fixed set of resources
yet you have to deal with ever increasing loads.
For example, if you are an embedded system like a router or switch, you are not likely ever to get more CPU, more RAM, more disk, or a faster network. Yet you will be asked to handle:
- more and more functionality in new upgrade images
- more and more load from clients
The techniques for dealing with loads in this scenario are far different than load when you can expand your resources.
Scalability Under Expandable Resources
In this class of scalability problems you have the ability to add more resources to handle more work, in general this is called horizontal scaling.
The new era of cheap yet powerful computers has made horizontal scaling possible for virtually anyone. Many companies can afford to keep grid of hundreds of machines to solve problems.
This is the approach google has taken to handle their search systems, for example, and it's a very different approach from a fixed resource approach. In a fixed resource approach we would be squeezing every cycle of performance of the resources, we would be spending a lot of time on developing new approaches and tuning existing code to fit the exact problem.
When resources are available, and your approach is right, you can just add more machines. You start to figure out ways to solve your problem assuming horizontal scaling.
In general this are is called data parallel algorithms.
For example, terrascale (http://terrascale.net/) has an amazing storage grid called Terragrid, that allows you to scale up by adding incrementally adding commodity machines. With the availability of 10Gb ethernet interfaces these approaches become quite powerful.
Of course, your approach has to be right. If you select an architecture with single points of serialization then you won't be able to scale by adding more machines.
Examples of Dealing with Scale
Here are a few examples of how different people have dealt with scale.
Real Life Web Site Architectures
- LiveJournal's Backend 2007, video, 2005
- Mixi.jp Scaling Out wiht Open Source
- Wikimedia Architecture
- The eBay Architecture
- My Space Architecture
- Wikipedia Architecture
- Root Cluster Architecture for Bioinformatics
- The Architecture of Mailinator
- Friendster's Architecture
- Feedburner Architecture
- Google papers on their file system and cluster
- Travelocity's Multi-Terabyte Data Warehouse and MySQL
- Early Amazon: Splitting the website, Early Amazon: The end
- Lessons from an Interactive Environment
- Scalable Distributed Web Sessions With MySQL
- UIE Podcast: Christian Rohrer - eBay�s Transactions on a Massive Scale
- Zimbra Collaboration Suite Architectural Overview, benchmark
Databases
- C-Store: A Column-oriented DBMS - is column-oriented (values for a column are stored contiguously) instead of row-oriented like most databases. It is optimized for reads. It is designed for sparse table structures and compresses data.
- Bigtable: A Distributed Storage System for Structured Data - Bigtable is a distributed storage system for managing structured data that is designed to scale to a very large size: petabytes of data across thousands of commodity servers. Many projects at Google store data in Bigtable, including web indexing, Google Earth, and Google Finance.
- Distributed Caching Essential Lessons - by Cameron Purdy from Tangosol
- Avoiding Two Phase Commit
- MySQL Scaling and High Availability Architectures
- The Perfect DB Storage Array
- MySql Cluster on Xeon Using Dolphin Express
- Use highly partitioned databases instead of large read caches, as Live Journal does. Make each database server out of memory rather than disk.
- Google Video on Designing Large Systems
- MonetDB/X100:Hyper-PipeliningQueryExecution
- Cache-Conscious Radix-Decluster Projections
- How do you support 500,000 users with Drupal?
Networks
- InfiniBand and 10-Gigabit Ethernet for I/O in Cluster Computing
- TCP Offloading Engine
- Trouble Shooting Networks
File System
- Isilon for Large Scale Storage Virtualization
- Lustre - Lustre is a scalable, secure, robust, highly-available cluster file system. It is designed, developed and maintained by Cluster File Systems, Inc.
- GFS - The open source cluster file system for enterprise deployments
- OpenAFS - AFS is a distributed filesystem product, pioneered at Carnegie Mellon University
- MogileFS - MogileFS is our open source distributed filesystem (used by Live Journal)
- Terragrid -
- Better Characterization of Disk Activity on Linux
- Iozone - filesystem benchmark tool.
- GPFS - IBM General Parallel File System
- Star Fish
One thing to think about when building such a system from a large number of hard disks is that disks will fail, all the time. The argument is fairly convincing:
Suppose each disk has a MTBF (mean time before failure) of 500,000 hours. That means that the average disk is expected to have a failure about every 57 years. Sounds good, right? Now, suppose you have 1000 disks. How long before the first one fails? Chances, are, not 57 years. If you assume that the failures are spread out evenly across time, a 1000-disk system will have a failure every 500 hours, or about every 3 weeks!
- WHAT is going to be done (database, file storage?)
- HOW will it be accessed? (One large file, many smaller files)
- WHEN will it be accessed? (During business hours, distributed over the day?)
- AVERAGE TRANSFERS - will the whole schmear come over, selected parts?
- SECURITY a concern? (Sensitive data, protected network)
- BACKUP - a petabyte of tape storage is expensive, and takes quite a while to do.
- POWER - do you have enough?
- COOLING - ditto
- SPACE - ditto
Storage
- Large Scale Data Repository: Petabox
- Ready NAS
- Ibrix - Scalable File Serving Solutions
- 3par
- SATA Beast
- HW RAID vs. ZFS software RAID, more
Petabox claims 40 watts per terrabyte. That is pretty low, if you are going to try to come up with your own solution with off the shelf parts it'll be hard to match that. If they can't pay for 40 watts per terrabyte for a petabyte maybe they should reconsider that they need the petabyte for now.
- Lets say $0.07 per kW/hr,
- Then the 50kW as you said would be:
- 50*24*31*$0.07 = $2,604/month
Misc
- Designing Of A Large Scale Streaming Event System
- On the Performance and Use of Dense Servers
- Unorthodox approach to database design - this is Flickr's architercture
- You Scaled Your What? - talks about the dimensions of scalability: transactional, data, operational, deployability, productivity, feature TTM. These are testable metrics
Monitoring
- Munin - Munin the monitoring tool surveys all your computers and remembers what it saw. It presents all the information in graphs through a web interface.
Grid
Platforms
- Apache, Perl
- TurboGears, Python
- Java, *
- Windows, .net
