Skip to content →

Category: web

the google matrix

This morning there was an intriguing post on arXiv/math.RA
entitled A Note on
the Eigenvalues of the Google Matrix
. At first I thought it was a
joke but a quick Google revealed that the PageRank algorithm really
is at the heart of Google technology, so I simply had to find out more
about it. An extremely readable account of it can be found in The PageRank Citation Ranking: Bringing Order to the Web which is really the
start of Google. It is coauthored by the two founders : Larry Page and
Sergey Brin. A quote from the introduction

“To test the utility of PageRank for search, we built a web
search engine called Google (Section
5)”

Here is an intuitive idea of
_PageRank_ : a page has high rank if the sum of the ranks of its
_backlinks_ (that is, pages linking to the page in question) is
high and it is computed by the _Random Surfer Model_ (see
sections 2.5 and 2.6 of the paper). More formally (at least from my
quick browsing of some papers, maybe the following account is slightly
erroneous and I’ll have to spend some more time reading) let
N be the number of webpages (estimated between 3 and 4
billion) and consider the N x N matrix
A the so called GoogleMatrix where

A = cP  + (1-c)(v x
vec(1)) 

where P is the
column-stochastic matrix (meaning : all entries are zero or positive and
the sum of all entries in each column adds up to 1) with
entries

P(i,j) = 1/N(i) if i->j and 0
otherwise 

where i and j are webpages and i->j
denotes that page i has a link to page j and where N(i) is the total
number of pages linked to in page i (all this information is available
once we download page i). c is a constant 0 < c < 1 and
corresponds to the fraction of webpages containing an _outlink (that
is, a link to another page) by all webpages (it seems that Google uses
c=0.85 as an estimate). Finally, v is a column vector with zero or
positive numbers adding up to 1 and vec(1) is the constant row vector
(1,…,1). The idea behind this term is that in the _Random Surfer
Model_ to compute the PageRank the Googlebot (normally following
links randomly in pages it enters) jumps every (1-c)x100% links randomly
to an entirely different webpage where the chance that it will end up at
page i is given by the i-th entry of v (this is to avoid being trapped
in a web-loop). So, in Googles model the bot _teleports_ itself
randomly every 6th link or so. Now, the PageRank is a
column-eigenvector for the GoogleMatrix A with eigenvalue 1 which can be
approximated by the RandomSurfer model and the rate of convergence of
this process depends on the _second_ largest eigenvalue for A
(the largest being 1). Now, in the paper posted this morning a simple
proof is given that this eigenvalue is c (because the matrix P has
multiple eigenvalues equal to 1). According to a previous paper on the
subject The
Second Eigenvalue of the Google Matrix
, this statement has
implications for the convergence rate of the standard PageRank algorithm
as the web scales, for the stability of PageRank to perturbations to the
link structure of the web, for the detection of Google spammers, and for
the design of algorithms to speed up PageRank. But I’ll have to
read more to understand the Google spammers bit…

Leave a Comment

homemade .mac

The
other members of my family don’t understand what I am trying to do the
last couple of days with all those ethernet-cables, airport-stations,
computer-books and the like. ‘Improving our network’ doesn’t make
much of an impression. To them, our network is fine as it is : from
every computer one has access to the internet and to the only
house-printer and that is what they want. To them, my
computer-phase is just an occupational therapy while recovering
from the flu. Probably they are right but I am obstinate in
experimenting to prove them wrong. Not that there is much hope,
searching the web for possible fun uses of home-networks does not give
that many interesting pages. A noteworthy exception is a series of four
articles by Alan Graham for the macdevcenter
on the homemade dot-mac with OS X-project.

In
the first article Homemade Dot-Mac with OS X he explains how to
set-up a house-network (I will give a detailed account of our
home-network shortly) and firing up your Apache webserver. One nice
feature I learned from this is to connect a computer by ethernet to the
router and via an Airport card to the network (you can force this by
specifying the order of active network ports in the
SystemPreferences/Network/Show Network port configuration-pane :
first Built-in Ethernet and second Airport). This way you
get a faster connection to the internet while still connecting to the
other computers on the network. In the second part he explains how to
get yourself a free domain name even if you have (as we do) a dynamic
IP-address via a service like DynDNS. Indeed it is quite easy to set this up but
so far I failed to reach my new DNS-server from outside the network,
probably because of bad port-mapping of my old isb2lan-router.
This afternoon I just lost two hours trying to fix this (so far :
failed) as I didn’t even know how to talk to my router as I lost the
manual which is no longer online. A few Google-searches further I
learned that i just had to type http://192.168.0.1 to get at the set-up pages
(there is even a hidden page) but you shouldnt try these links
unless you are connected to one of these routers. Maybe I will need
another look at this review.

In the second
article, Homemade Dot-Mac with OS X, Part 2 he discusses in
length setting up a firewall with BrickHouse (shareware costing $25) compared to the
built-in firewall-pane in SystemPreferences/Sharing convincing me
to stay with the built-in option. Further he explains what tools one can
use to set up a homepage (stressing the iPhoto-option).Finally, and this
is the most interesting part (though a bit obscure), he hints at the
possibility of setting up your own iDisk facility either using
FTP (insecure) or WebDAV.

The third article in the
series is Homemade Dot Mac: Home Web Radio in which he
claims that one can turn the standard OS X-Apache server into an iTunes
streaming server. He uses for this purpose the QuickTime Streaming Sever which you can get for
free from the Apple site but which I think works only when you have an
X-server. It seems that all nice features require an X-server so
maybe I should consider buying one…

The (so far)
final article is Six Great Tips for Homemade Dot Mac Servers is
really interesting and I will come back to most op these possibilities
when (if) I get them to work. The for me most promising options are :
the central file server (which he synchronizes using the
shareware-product ExecutiveSync ($15 for an academic license) but
I’m experimenting also a bit with the freeware Lacie-program Silverkeeper which seems to be doing roughly the
same things. The iTunes central-hack is next on my ToDo-list as
is (at a later stage) the WebDav and the Rendezvous-idea. So it seems
I’ll prolong my occupational therapy a while…

Leave a Comment

combinatorial game software


As I am going to give a talk on Combinatorial Game Theory early
next month I have to update my rusty knowledge of canonical forms of
two-person game positions, their temperature theory and the like. As
most of the concepts in this field are recursive they are hard to work
out by humans but easy for computers. So it is nice to have a good
program to use. I remember that David Wolfe wrote a couple of years ago the Gamesman Toolkit but it seems he has taken it off
his website. Still, you can get it from the Software released by Michael Ernst page. So,
download the games.tar.gz-file and uncompress it on your Desktop.
Then do the following

cd Desktop/games sudo make sudo cp
games /usr/bin/ /usr/bin/games

to get it up and
running (for documentation of how to use it see the Gamesman
Toolkit-paper above. But, it seems that as of July 2003 there is a much
better alternative around : the Combinatorial Game Suite of Aaron Siegel. It is an open source Java-program so it
runs on many platforms (including Mac OSX). Here is the way to get it
going : first download it and you will get a
cgsuite-0.4-folder on your desktop. Then type

cd
Desktop/cgsuite-0.4 java -jar cgsuite.jar

and after a few
questions (including whether you want to be on the mailing list of the
project) the program starts up. It is very well documented with an
on-line manual which I have to read over the coming
days…

Leave a Comment