The Implementation of an equation can sometimes change it in not-so-subtle ways
Last edited: 16th May 2002
In my paper PageRank Explained I claim that â€œChris Ridings of www.searchenginesystems.netâ€ has made a mistake in a paper on the same subject (see addendum below).
I didnâ€™t want to labour the point in the main paper as I have nothing in particular against Chris but because several people have asked for a clarification, and in the interest of open academic critique (Chris does invite that in the first paragraph of his paper), Iâ€™ll describe the problem here.
Chrisâ€™ paper has roughly the following structure:
- Title: â€œPageRank Explainedâ€, sub-title: â€œEverything youâ€™ve always wanted to know about PageRankâ€
- Introductory discussion of Google and PageRank.
- A quote from the Brin and Page Google paper, including the original algorithm.
- Introduces, for illustrative purposes, a ranking calculation called MiniRank to quote: â€œwhich is very similar to PageRank. This should help us understand it.â€ [PageRank]. The terms MiniRank and PageRank are freely mixed in several sections thereafter though.
- Gives advice to webmasters about how to structure links in real web sites based on observations of MiniRank behaviour.
This would be fine, and some of the more general comments in the paper are useful, but a fundamental flaw in the implementation of MiniRank means that it models an equation very different from Googleâ€™s PageRank! Most of the conclusion in the paper, therefore, are not applicable to real websites.
This is the Google PageRank equation:
We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:
PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
This is explained fully in my paper along with a correct method of implementation.
Chris Ridingsâ€™ paper uses an implementation of MiniRank that results in a very different equation though.
In the section â€œThe calculation of PageRankâ€, Chris uses English phrases like:
â€œso at the end of the process weâ€™re going to add 0.425 to Page Bâ€™s MiniRank and 0.425 to Page Câ€™s MiniRankâ€
and, in a diagram just below that, Page B has indeed jumped from 1 to 1.425. Page C has had an even more incredible jump from 1 to 3.125 (because of all the other MiniRank that has been added via other links).
When you work through the examples itâ€™s clear the MiniRank equation is something like
PR(A) =PR(Aâ€™) + (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))
The numbers Chris comes up with spiral ever upwards. This equation will never converge to settled values (unlike the original Brin and Page PageRank equation)!
Itâ€™s interesting to note that Ridings never describes a method for terminating the MiniRank algorithm!
The most important effect of altering the equation like this is that Ridingsâ€™ analysis of Feedback Loops is completely wrong. This is most clearly seen in the sections â€œLoopingâ€ and â€œExtensive Interlinkingâ€. The examples start with a MiniRank of 1, but after â€œa few iterationsâ€ they reach the dizzying heights of 469.5883 without any incoming links! A correct PageRank implementation would converge to 1.0 for all these pages.
The MiniRank model may be interesting, but it has no relevance to Google PageRank. Conclusions from the MiniRank model should not be used to influence website design in the real world.
Addendum (5 July 2002)
Chrisâ€™ original paper that I have been critiquing is available on the Alexa Wayback Machine. Once Chris was made aware of this page (after some denial) the paper was re-written (with no attributions), but itâ€™s not clear who the authors are now (the â€œBlack Box Groupâ€).