Editorials

Sharding for Scale, Cost and Performance

Sharding data is a good way to increase performance for persistence and containing the cost of your software, depending on the engine you select. Sharding data is accomplished by using an algorithm to separate data, placing it on different storage instances for the purposes of balance, redundancy, increased capacity or any combination of the above.

You can shard data using a Sql Relational engines or many NoSql engines. Some engines require you to write or purchase your own sharding capability directing data to the correct storage instance(s). Many NoSql engines have sharding built into the engine itself.

MySql is one of the popular relational engines to shard. It doesn’t have huge performance capabilities in a single instance with large scale data. But, when you add sharding to the architecture, many smaller instances of MySql can outperform some of the largest instances of other engines on a single massive platform.

There is nothing keeping you from using a similar architecture on commercial engines such as Oracle or SQL Server. The difference would be the cost involved. So, it is common for implementations to utilize MySql as a low cost engine on a number of commodity servers, the combination of which can perform as well or better than an expensive implementation on a single massive server.

Are you sharding your data today? Is it sharded in NoSql or relational data engines. Is sharding accomplishing the goals for which you chose this architecture? Share your thoughts or experiences with sharding here or by Email to btaylor@sswug.org.

Cheers,

Ben