Data Availability Sampling
Last updated
Last updated
As the validator set and block size grow, it will be inefficient for every node to download entire blocks to ensure data availability. Therefore, Nubit also integrates Data Availability Sampling (DAS) to scale the network with full storage nodes and light clients.
Before going step into the DAS, Nubit needs to give solutions for the following three challenges:
To conclude that the overall data of the block is available by sampling only part of the information of a block (i.e. chunks), we need to use erasure code to encode the block. Otherwise, any missed part would lead to a devastating to the block's integrity:
Erasure coding is a strategy that introduces extra data to safeguard against the loss of original data, crucial for storing and sending data. A common type of erasure coding used is Reed-Solomon (RS) coding. With RS coding, each network participant needs to store only a small section of the block, called a chunk, while ensuring the data’s integrity. Essentially, the lead node divides the block’s data into a grid of chunks. This grid is then enlarged through 2D-RS coding to form a grid of chunks.
To ensure the correctness of the encoding, we need a commitment to be generated and included in the block header.
Following the design of erasure coding, the leader creates a KZG commitment for each chunk’s data and proofs. This setup prevents unauthorized modification of the chunks in transit, as the commitment makes it impossible to generate valid proof for any altered chunk. Note that the KZG commitment is included in the block header, also known as Data Attestation. This allows for the direct verification of chunks, leaving no possibility for incorrect encoding.
For further understanding of what KZG commitment is, please refer to KZG commitment.
The entire block can be large and introduce extreme communication costs. To mitigate this issue, block dispersion is applied on each block:
The process starts with the leader distributing the block data to ensure every network node has access to it. Having every validator hold a complete copy of the block would be too costly. So, the leader broadcasts the coded chunks, as described before. These coded chunks are then shared among different groups of validators. Each group is responsible for keeping and managing specific chunks, which they do by subscribing to particular topics. To ensure that there are always enough active validators in each group to keep the communication stable and reliable, systems like reputation scores and slashing penalties are put in place.
After addressing the three aforementioned issues, two protocols are employed to support the data availability sampling technique:
The sampling protocol is executed between a verifier and the data source (the validators or a full storage node). Starting with the KZG commitment from the block header, the verifier asks for a sufficient quantity of block chunks selected randomly from the source. If all requested chunks are received and match the KZG commitment, the verifier concludes the check as successful.
The decoding protocol is carried out by a decoder working with the validator group. It also begins with the KZG commitment and involves requesting block chunks from the validators. When it gathers more than a specific percentage of the total chunks, all verified to be correct, the decoder reconstructs the full block through RS decoding.
Three types of nodes participate in DAS:
Validator: Validators are responsible for running the sampling protocol within the validator set. A validator will sign the block header only if the output of the sampling protocol is successful, thereby ensuring the data’s availability within the validator group.
Full Storage Node: Once a block is finalized, full storage nodes take on the entire block by employing the decoding protocol with the validator set. These nodes then respond to chunk requests from light clients.
Light Client: Light clients obtain the block header from a validator and engage in the sampling protocol with a full storage node. If the protocol is successful after sufficient sampling, the block is considered available, and the full storage node’s reputation is enhanced.
In the next three sections, we will summarize the functionalities of Validators, Full Storage Nodes, and Light Clients.