Saturday, March 17, 2012

Fortune sometime favors the Poor

In complex Network, 'Preferential attachment', 'Assortativity' - these frequent use of terms make the readers convinced that there is no place for "Poor" in this world. But if the thing goes in this way, no one becomes rich from the scratch. Fortunately, God gives this opportunity by creating a concept "Randomness" so that Poor eventually gets little bit of luck, turns back and survives in this competing world.
      In his paper,  Pennock et al.(2002) brought out an important observation that though the world (WWW) shows the characteristics of Power Law with a relatively small number of sites receiving a disproportionately large share of hyperlink references and traffic, the microscopic observations state that the components of WWW behave link Log-normal degree distribution considering both "Preferential" and "Uniform" attachments.
        Several studies (Albert et al.,1999;  Broder et al., 2000) find that the probability that a randomly selected web page has k links is proportional to some power of negative constant (alpha) of k for large k , empirically determined the value of constant as roughly 2.1 for inbound links and 2.72  for outbound links. When displayed on a log-log plot, this so-called power law distribution appears linear with slope of (- alpha). A power law distribution has a heavy tail, which drops off much more slowly than the tail of a Gaussian distribution. As a result, although the vast majority of web pages have relatively small numbers of links, a few pages have enormous numbers of links—enough to skew the mean well above the median. But if we consider the small fragments of the Internet, little different think has been observed. If we look at the results of Pennock et al.(2002), it is seen that the graphs of individual component (company network, university network, power-grid network etc.) obey  some kind of mixture distribution instead if power low degree distribution. 

                                                                         Figure 1

    It is an open question exactly how peaked distributions for subsets of the web like those in Figure 1 sum together to produce the nearly pure power law for the web as a whole. There should be complementary things going on across the fragments of the internet which is responsible for  building ultimate power low structure. Pennock at al. (2002) also suggested that the vast majority of subsets (or subsets containing the vast majority of pages) exhibit a nearly zero mode and dominate this sum.   this portion needs to be investigated more.

Through a broad survey of observations given by Pennock et al., I would like to share my opinion:
1. In real world, it may not the case that any node can get arbitrary link from other node. I mean, the randomness in real world graph is not likely to be possible theoretically. If we think of a real example like new website building, someone wants to link his newly launched web page to the most popular site like Twitter or Facebook (Preferential attachment) and some other sites which are, I think, somehow related to him by any means (which is called here as random attachment) i.e. their profiles are strongly semantically related. The relation may be one of the followings: the field of interest (research areas) is similar to the linking page, or the profile is somehow matching (colleagues or partners), or he would be attached with that site previously (the site is the institute where he studied earlier) etc. So the profile study or background information of both the linking and linked nodes may be very useful analysis to answer the actual cause of randomness.
2. Another important observation is slightly beyond this study. If we take a gentle look at the power low curve, it is seen that it has a heavy tail which drops of much more slower. If we identify those nodes of high degrees, we can see that some of them are spammers. Now-a-days, spammers are getting populated by making several web pages and recursively linking them together and getting high rank. It is also observed that some of the highly popular nodes in social network like famous politicians, actors are pointing to the spammers only for getting more or more popularity. Now if we identify the spammers, delete them from the graph, penalized the high degree node for pointing to the spammers and redistribute that penalized rank to the authentic web pages are pointing to that high degree nodes, the low degree nodes may become richer.

        So thinks are not so easy as it looks like. Over the time, I will try to edit this blogs if any ideas related to describing the different curve like structures of the components aggregated together to make the power low curve come up. I am requesting all the readers to post their suggesting which they think about to describe this peculiarity and any other observation if they have noticed.

 "winner's don't take it all: characterizing the competitions for links in web", Pennock et al. (2002).


No comments:

Post a Comment