The Implementation of an equation can sometimes change it in not-so-subtle ways

Copyright Ian Rogers, 2002 onwards.

Last edited: 16th May 2002

In my paper PageRank Explained I claim that “Chris Ridings of www.searchenginesystems.net” has made a mistake in a paper on the same subject (see addendum below).

I didn’t want to labour the point in the main paper as I have nothing in particular against Chris but because several people have asked for a clarification, and in the interest of open academic critique (Chris does invite that in the first paragraph of his paper), I’ll describe the problem here.

Chris’ paper has roughly the following structure:

  • Title: “PageRank Explained”, sub-title: “Everything you’ve always wanted to know about PageRank”
  • Introductory discussion of Google and PageRank.
  • A quote from the Brin and Page Google paper, including the original algorithm.
  • Introduces, for illustrative purposes, a ranking calculation called MiniRank to quote: “which is very similar to PageRank. This should help us understand it.” [PageRank]. The terms MiniRank and PageRank are freely mixed in several sections thereafter though.
  • Gives advice to webmasters about how to structure links in real web sites based on observations of MiniRank behaviour.

This would be fine, and some of the more general comments in the paper are useful, but a fundamental flaw in the implementation of MiniRank means that it models an equation very different from Google’s PageRank! Most of the conclusion in the paper, therefore, are not applicable to real websites.

This is the Google PageRank equation:

    We assume page A has pages T1…Tn which point to it (i.e., are citations). The parameter d is a damping factor which can be set between 0 and 1. We usually set d to 0.85. There are more details about d in the next section. Also C(A) is defined as the number of links going out of page A. The PageRank of a page A is given as follows:

    PR(A) = (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

This is explained fully in my paper along with a correct method of implementation.

Chris Ridings’ paper uses an implementation of MiniRank that results in a very different equation though.

In the section “The calculation of PageRank”, Chris uses English phrases like:

    “so at the end of the process we’re going to add 0.425 to Page B’s MiniRank and 0.425 to Page C’s MiniRank”

and, in a diagram just below that, Page B has indeed jumped from 1 to 1.425. Page C has had an even more incredible jump from 1 to 3.125 (because of all the other MiniRank that has been added via other links).

When you work through the examples it’s clear the MiniRank equation is something like

    PR(A) =PR(A’) + (1-d) + d (PR(T1)/C(T1) + … + PR(Tn)/C(Tn))

The numbers Chris comes up with spiral ever upwards. This equation will never converge to settled values (unlike the original Brin and Page PageRank equation)!

It’s interesting to note that Ridings never describes a method for terminating the MiniRank algorithm!

The most important effect of altering the equation like this is that Ridings’ analysis of Feedback Loops is completely wrong. This is most clearly seen in the sections “Looping” and “Extensive Interlinking”. The examples start with a MiniRank of 1, but after “a few iterations” they reach the dizzying heights of 469.5883 without any incoming links! A correct PageRank implementation would converge to 1.0 for all these pages.

Conclusion

The MiniRank model may be interesting, but it has no relevance to Google PageRank. Conclusions from the MiniRank model should not be used to influence website design in the real world.

Addendum (5 July 2002)

Chris’ original paper that I have been critiquing is available on the Alexa Wayback Machine. Once Chris was made aware of this page (after some denial) the paper was re-written (with no attributions), but it’s not clear who the authors are now (the “Black Box Group”).