What is Sharding?

Put simply, sharding is breaking a single database into smaller, more manageable chunks, and distributing those chunks across multiple servers, in order to spread the load and maintain a high throughput.

There are two main ways of cutting up a database — vertical partitioning and horizontal partitioning.

Vertical partitioning happens when different tables from the same database are stored in different instances. Each table is a distinct shard. A simple example of this would be putting all the transactions from North America in one table, and all the transactions from Europe in another table.

Horizontal Partitioning splits the database table into separate sets of rows, and these rows are stored in different database instances. Each set of rows is a shard. A simple example of this would be ‘“Server 1’” taking rows 1 to 10,000 and “Server 2” taking rows 10,001 to 20,000, and so on.

Horizontal partitioning has has two advantages over vertical partitioning:

  1. You do not need to plan how to split your data up in advance;
  2. Your database isn much less ‘“lumpy’” — you can dynamically scale how much of the whole database any one server needs to service.

Why is Sharding Important?

For example, if we split a single table into two partitions, then each host will need to handle half of the operations of the original host. As the operation requests increase, the database can be split into more and more parts.

Sharding is a well established scaling solution for systems with large data sets and/or high throughput operations.

What is Distributed Ledger Technology (DLT)?

DLT networks are peer-to-peer, meaning there is neither a centralized administrator function nor a centralized data store. Public blockchains such as Bitcoin and Ethereum are, perhaps, the best-known examples of DLTs.

DLTs require independent machines, or nodes, to record, synchronize and share transactions in their respective ledgers.

Distributed ledgers can be permissioned or permissionless depending on whether or not nodes require permission to change the ledger. Ledgers are public or private depending on whether anyone (as opposed to just nodes in the network) can access the ledger.

What are DLTs used for?

  • digital currencies;
  • digitally unique objects;
  • digital objects with intrinsic value (like tickets).

We can extrapolate many related use-cases from these, including cryptocurrency, notary services, prediction markets, smart contracts, audit trails, regulatory reporting, streamlining clearing and settlement, digital identity, and so forth. And that’s just the beginning.

Basically, distributed ledgers can be used in some form wherever securing digital relationships via a system of record is required, and control of a central authority is not desired. In other words, DLTs are useful where there is a need for a trustless, peer-to-peer network for exchange of incentives and value.

How does sharding a DLT introduce the potential for fraud?

If a transaction that should happen only once is recorded across an entire network as happening more than once, then you run into a situation known as a double spend.

Consider this: by splitting up a DLT into multiple shards, it is possible for a bad actor to take over a small part of the network with fewer computing resources. For example, if you divide a DLT into two shards, with half of your nodes working on one shard and half on another, you now need only 25% of the hash power of the network to attack one of the two shards. With enough nodes reflecting this false spend, a double spend is now possible.

When a double spend is possible on a DLT, it allows a form of fraud. In late May of 2018, a double spend attack was launched against the Bitcoin Gold exchange: in this attack, a bad actor was able to acquire over 50% of the network’s total hashpower. By doing so, they were able to modify and omit transactions of their own coins from blocks. Subsequently, they could reverse transactions that were previously confirmed.

Understanding how a double spend works reminds us how a sharding a DLT introduces this potential for fraud: ledgers are a transaction record, and they exist to make sure that each spend happens only once. In a distributed ledger, the shards that form the ledger are not all in the same place, and thus are not going to be immediately consistent.

Conclusion

Put more simply, you need to make sure a spend has happened only once on the entire network, and having a lot of shards makes this difficult.

In the next article, we’ll review how a 51% attack presents specific problems when sharding a blockchain.

Stay tuned for more on these specific problems, especially sharding, some attack tools and vectors, and how they are solved in Radix DLT.

Originally published at www.radixdlt.com.

Have questions about Radix? Join The Conversation …

1) Telegram: https://t.me/radixdlt
2) Discord:
https://discord.gg/fS6F9NP