Friday, September 7, 2007

Concepts in peer to peer computing: A hierarchical model of file distribution in peer to peer networks through designation of sub-hubs



Abstract

The use of an algorithm to designate sub-hubs within a peer to peer network will help to lessen some of the problems that networks face i.e. excessive traffic within the network leading to situations such as upstream congestion, which results in the deterioration in performance of other softwares that upload information into the Internet. Besides, the sub-hubs will ensure that the file is being distributed throughout the network as the latter will be relatively free from the constraints of congestion. The algorithm is as follows:

1) Status of user node is determined in terms of percentage of file downloaded;
2) Scan the network for other nodes whose download status is of a certain percentage e.g. 10% greater than the user;
3) Generate a distribution curve of the other nodes over this percentage range;
4) Determine the highest frequency of nodes over the range (if the frequency of nodes over the range is determined to be too low, steps 2 to 4 is repeated, but the network is scanned for nodes whose download status range is greater than the current one e.g. 20% > 10%);
5) Designate these nodes as sub-hubs;
6) User will download from these sub-hubs;
7) else (download from any user irregardless of download status with the part of the file in demand)

Introduction

Peer to peer networks are a popular way to transfer files, and is quite likely, they will be here to stay for quite a long time. They are different from the traditional client-server system, adopting a more decentralised mode of file distribution whereby peers act as equal, merging the roles of server and clients. A number of peer to peer software are in popular usage, e.g. Bittorrent, Emule and many others. This is not to say that peer to peer file transfer is free of any problems of its own. The growth in peer to peer traffic results in network congestion. Upstream traffic congestion can occur in circumstances when a particular user within the network is uploading the file to a large number of peers. This results in a large upstream to downstream ratio. This will likely result in congestion in the upstream link. Interestingly, it has also been shown that 60% of all Internet traffic is being driven by peer to peer traffic (Briere and Hurley, 2007).

A hypothetical scenario and the solution

Consider the network in Figure A. One seed (red node) with 100% of the file is being swarmed by downloaders. The seed is swarmed by 2 downloaders (green nodes) with 90% of the file, 3 downloaders with 70% of the file (blue nodes), 4 downloaders with 20% of the file (brown nodes) and 5 downloaders with 10% (magenta nodes) of the file. There is high likelihood that the seed will experience upstream congestion with dire consequences in their respective networks. The aforementioned algorithm serves to reduce the likelihood of congestion through designation of sub-hubs. In figure B, you can see the frequency distribution of the nodes in terms of percentage of file downloaded. In Figure C, you will be able to see what the algorithm does, i.e. to designate sub-hubs in a hierarchical fashion. You will notice that instead of the seed supplying the file to everyone, it will instead supply the file to the 90% completed downloaders, and the latter will supply the file to the 70% completed downloaders, who will supply the file to the 20% completed downloaders, who will lastly supply the file to the 10% completed downloaders. It can be seen in a hierarchical sense, nodes with a certain percentage of file downloaded that are at the lower echelons of the hierarchy will act as sub-hubs, which can relieve upload traffic from the main hub (the seed or those who have nearly completed downloading the file). It should be noted that the nodes can progress through the hierarchy as a greater percentage of file is being downloaded with time.

Discussion

This algorithm, which can be adopted in peer to peer softwares, is suggested as one of the possible solutions to ease some of the problems inherent in networks with heavy peer to peer traffic. However, it is not really known at this point of time the speed of file download relative to the current peer to peer softwares in use. However, prediction is that the speed of download should be comparable vis-a-vis current peer to peer softwares. Firstly, it will be relatively freer from congestion problems as which is the case with current peer to peer software. Secondly, at each step of progression of download, a source of sub-hubs will increase the speed of file download. It has also been pointed out that one of the main concerns on peer to peer networks is that the speed of download slows down towards the end. However, it's still not known whether this algorithm will be able to address this problem. Finally, it can be seen that this algorithm retains the essential features of the mainstream peer to peer software, i.e. users upload and download files simultaneously, whereby they function as downloaders from sub-hubs above them in the hierarchy (in terms of percentage of file completed), whilst they themselves act as uploaders to users below them in the hierarchy.

Citations
Briere, D and Hurley, P. The Bleeding Edge. Networkworld.com









No comments: