Friday, April 13, 2012

Hyperlink-Induced Topic Search (HITS)

Hyperlink-Induced Topic Search (HITS) is a link analysis algorithm which helps in rating Web pages also known as Hubs and authorities and is developed by Jon Kleinberg. It was a precursor to PageRank.
The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs

It conclude two main values for a page:
1. Page authority, which estimates the value of the content of the page.
2. Page hub value, which estimates the value of its links to other pages.

First it retrieves the set of results to the search query so that the computation is performed only on this result set and not across all Web pages.

The algorithm performs a series of iterations, each consisting of two basic steps:
Authority Update: Update every node's Authority score to be equal to the sum of the Hub Score's of every node that points to it. That is, a node is given a high authority score by being linked to by pages that are recognized as Hubs for information.

Hub Update: Update every node's Hub Score to be equal to the sum of the Authority Score's of every node that it points to. That is, a node is given a high hub score by linking to nodes that are considered to be authorities on the subject.
The Hub score and Authority score for a node are defined with the following algorithm:
1. Start with every node having a hub score and authority score of 1.
2. Run the Authority Update Rule
3. Run the Hub Update Rule
4. Normalize the values by dividing every Hub score by the sum of the squares of all Hub scores, and dividing each Authority score by the sum of the squares of all Authority scores.
5. Repeat from the second step as necessary.

A Comparison between Page Rank and Hyperlink Induced Topic Search (HITS) Algorithms 

Hyperlink Induced Topic Search:
  • HITS is based on two quality values of “Authority Update” and “Hub Update”. Authority update is calculated by the number of hub links connected with the authority website and Hub update is calculated by the number of authority websites connected by the Hub website. HITS overall result will be based on the connection between these two values. It actually calculates two scores per document.
  • HITS operates on small sub graphs representing a linkage between Hub and Authority websites.
  • In HITS, increase in the authority weight increases the hub weight of the sites.
  • HITS calculate score without indexing.
  • HITS has a special use in websites relational analysis specifically.
Page Rank:
  • Page Rank is based on number of different factors especially number of quality back links. Quality back links are those links which are relevant to the niche of the website and are placed on high page rank websites. So Page Rank calculates mainly one score per document.
  • Page Rank operates on a big web Graph focusing on all the back links and relevance factors.
  • In Page Rank, quality back link on high PR website increases the page rank of the website.
  • Page Rank calculates score after indexing process.
  • Page Rank can be used for multiple factors like Street rank (ranking of places other than websites on the basis of population visits). Similarly Page rank is used in multiple environments from institutes to search engine crawlers.
Both HITS and Page Rank have their plus point and benefits and both can be applied in different scenarios. Page Rank is more popular because it can be utilized in multiple environments other then web search. HITS is very useful because of its special focus on Hub and authority websites categorization.