Editorials

Sharding With a Purpose

If you have looked into sharding there are a number of techniques you can use. You can shard data of the same type across multiple servers. You can replicate the same data duplicating it across multiple servers. You can shard data of different types across different servers. You can do any combination of the previous techniques.

How you choose to shard your data depends on the usage of the data. For example, social networks tend to try and put data from friends on the same server if possible. That way when updates occur that may be shared with others, the data can be pulled from a single server instead of multiple machines.

For sales engines you might see customers and their orders sharded across servers based on region or some other factor more from the perspective of balance, or maybe distribution center, country, etc.

The goal for sharding is to separate data for performance and scale, and to reduce the number of servers you must access in order to complete an order of work. For this reason, some data may be duplicated across multiple servers making it available to different shards.

Are you sharding for performance, scalability, or cost? Share your experience here online or by email to btaylor@sswug.org.

Cheers,

Ben