We have networks
connecting nodes; we have files distributed across nodes in the network. But without the
ability to search them efficiently over the network we will not be benefited
much. So searching is a very basic task that needs to be performed in networks.
We already know about some search techniques such as Page Rank, HITS, and
Elimination Generation.
In today’s world peer-to-peer
(P2P) traffic constitutes approximately 35-40% of all internet traffic. We use
them heavily for the purpose of sharing files (documents, photos, videos etc). Considering
the vast size and frequent use of p2p networks we need a very efficient and
scalable search technique. Distributed Hash Table (DHT) is very efficient and
scalable way to search for nodes that have a particular file.
DHT principle is based
on Chord*, a distributed lookup protocol. Chord supports the basic operation of
mapping key to a node in the network. How the key is computed? Given the name
of a file a consistent hashing** function is used to compute the key for that
particular file name.
The simplest form such
file searching network can be that every node maintains the “key and node”
mapping for every file. But this is clearly not feasible for large file sharing
network as the number of files is very high.
Chord
improves the scalability of consistent hashing by avoiding the requirement that
every node know about every other node. A Chord node needs only a small amount
of “routing” information about other nodes. Because this information is
distributed, a node resolves the hash function by communicating with a few
other nodes. In an N-node network, each node maintains information only about O(logN) other nodes, and a lookup
requires O(logN) messages.
In this
approach a Key always resides at the successor node. Each node need only be
aware of its successor node on the circle. Queries for a given identifier can
be passed around the circle via these successor pointers until they first
encounter a node that succeeds the identifier; this is the node the query maps
to.
In the above picture
identifier node 0, 1 and 3 are active. Successor of a key is defined as the
node with the same key value or the active node with value immediately after
the key. Successor of node 1 is 1 itself but successor of node 2 is 3.
We already said that
for a N node network each node maintains information about O(logN) number of
nodes. It stores the ranges of key value the node contains in a table known as
finger table. Suppose m is the number of bits in the key value. Clearly
N<<2m<<K, where N is the number of nodes and K is the
number of files. Naturally each node contains at an average K/N number of
files. Finger table of a node tells about the range of keys stored in a node.
In the below figure we see the finger tables associated with the active node numbered 0, 1 and 3. Every finger tables tells about the range and the successor node.
During searching the
node who initiates the search first finds the key value associated with the
user given file name and then finds the range in which the key falls and then
returns the corresponding successor.
One thing is very
interesting to note that a node contains a key index does not mean that the node
contains the file also. It may happen so that the file is located in some other
node, obviously the node with the key have that information available with it.
The DHT techniques can
automatically detect any change in the network topology and updates it
accordingly. When node fails- it can fail after informing its successor or can
fail silently. In the first case the node first moves all its indexes/keys to
its successor and then fails or leave. In the second case those files needs to
be rehashed.
As the node containing
the files and node containing keys can be different it may happen that some
keys become orphan after a node containing files leave. To tolerate this up to
some level some implementation of DHT keeps copies of files in the successor
node also depending on the feasibility. So if the node containing the file and its
successor both fail/leave we lose both keys and files which is rather a
consistent situation. In that case also some files may need to be rehashed.
In steady state Chord
based DHT offers very good performance. It can stabilize very quickly when a new node joins or a node fails. Its strength lies in its simplicity.
Chord is very simple, scalable algorithm for distributed p2p network.
References:
* Chord: A Scalable Peer
to peer Lookup Service for Internet Applications, Ion Stoica, Robert Morris,
David Karger, M. Frans Kaashoek, Hari Balakrishnan, MIT Laboratory for Computer
Science.
** KARGER, D., LEHMAN,
E., LEIGHTON, F., LEVINE, M., LEWIN, D., AND PANIGRAHY, R.
Consistent hashing and
random trees: Distributed caching protocols for relieving hot spots on the World
Wide Web. In Proceedings of the 29th Annual ACM Symposium on Theory of
Computing.
No comments:
Post a Comment