I recently came across the routing algorithm of Distributed Hash Table, a distributed data structure in which the access and the storage of data is collectively supported by members of a virtual community. A member who respects Distributed Hash Table design is responsible for reducing the cost of searching and maintaining the availability of values. Individual characteristics of each member build up the efficiency of the whole system.
Within a distributed network, each member has limited knowledge of his surroundings. Due to the fact that the distribution of useful resources is not deterministic, a peer without sufficient knowledge has to search desperately in a gigantic space. If I only know of one peer, who knows stuff but is unwilling to help, brute-force scanning the network is the only way to discover more data, which requires a lot of energy and time.
Knowledge by Sharing
Sharing knowledge is the key to know more and search less.
Fortunately, I found peer C who does not know stuff but is willing to help. He gave me a list of n = 10 people who may or may not have what I want. Okay, I will ask them. The time complexity of finding the right person who has what I want is O(n).
I asked peer D and he told me that peers E, F, G are most likely to have what I want, while the other people in my list are unlikely to help. At this point, peer D helped me reduce my search space by more than one half. Hopefully, everyone will guide me towards the right direction. The time complexity of finding the right person now becomes O(log(n)).
Interest and Responsibility
How does peer D know who are the most helpful people? It must be the case that either peer D used to retrieve the same files from peers E, F, G, or peers E, F, G are supposed to be responsible for saving the files I want.
Furthermore, I can verify if peer D told me the truth by asking peers E, F, G for the file. If any of them has the file I want, or brings me closer to my destination, I can be confident that peer D indeed helped me, and that peers E, F, G are also trustworthy. If none of them has the file I want, and most of them point me to a farther location, I will be skeptical about the trustworthiness of peer D. Perhaps peer D is innocent but peers E, F, G do not fulfill their responsibilities, so I will also avoid asking peers E, F, G for help in the future.
Finally, I have the file I am interested in. Most importantly, I know who has the file. I can share my knowledge with others, upon request.
At the heart of a DHT network is peer responsibility. However, ZeroNet works slightly differently. In ZeroNet, no peer has a predefined responsibility. We host data according to our interests, so that unsolicited content does not come into our computers without permission. However, responsibility can be defined anyway. In all of the files one is interested in, a subset of them should be kept longer.
The principle of sharing knowledge in a DHT network is to spread the most helpful information. By giving out a ordered list of addresses sorted by the possibility of having the desired file, a peer helps others by reducing time complexity.
Why does DHT work?
Everyone is responsible for keeping the system running. There must be people to help you store things. There must be people to help you find things.
At the beginning, everyone decides his own responsibility in such a way that if one peer fails, the network is still working. By hashing the data received, one knows if saving those data fulfills his responsibility. By comparing the other people’s responsibilities with one’s own responsibility, one knows who to keep in touch with in order to make parts of the network reachable.
Hashing also helps rank the usefulness of routing information. Time complexity cannot be reduced unless the returned list of peers is sorted by usefulness. Useful routing information makes one closer to his destination. The closeness to destination can be measured mathematically by computing
requested_hash xor responsibility_of_any_person_I_know_of.
ZeroNet has a reputation mechanism in development. In addition to closeness, usefulness of routing information can be defined as to the best of the responder’s knowledge, mentioned people having high reputations. Reputation can be measured by the success rate of retrieving the desired data in untampered form. This does not utilize hashing functions, but is consistent with the principles of a DHT network.
How to fulfill routing responsibilities
Knowledge - Ask people about their responsibilities. Remember their addresses and responsibilities. Sort the addresses by closeness. - Bind the peer addresses to our interested files. - Remember Peer Exchange results.
Ranking - Evaluate the peers I know and yield reputation values. - The initial value of reputation should be 0 (neutral) and should change as the result of evaluation. - Impression fades due to infrequent contact. More complex algorithms can be applied to reset the reputations of peers we do not contact with.
Being Helpful 1. Closest: Given a file request, if I have the file, I say I have the file. 2. Second Closest: If I do not have the file, but I know who has given me the file, I return a list of peers. I put their addresses at the beginning of the list. I only include people with high reputations. 3. Closer: If I do not have the file, but I know who is responsible for saving the file, I append a list of peers to the result. I only put people with high reputations there. 4. Closer: If I do not have the file, and I do not know who is responsible for saving the file, I append a list of closest peers to the result. Hopefully, people there will help you find your file.
How to fulfill storage responsibilities
Our garbage recycler should value peer responsibility. Optional files that matches a node’s responsibility should be kept longer.
- Integrity of values should be verified by checking digital signatures.
- Key size does not need to be 160 bits. It can be any reasonable size. For instance, MORPHiS uses a bigger key size to fit a SHA-512 hash into a key.
- DHT protocol does not need to rely on UDP. All we have to do is to get information around. For instance, MORPHiS uses TCP to transfer DHT protocol payload.
- Responsibility of a node should be changeable by modifying configuration files.
!Uniqueness of node responsibility allows potential fingerprinting attacks.