Thursday, April 12, 2012

Pólya's Urn

This is an interesting model named after Hungarian mathematician George Pólya ( wide range of fields where he contributed includes number theory, complex analysis, combinatorics, geometry ect.) which gives us some insight about some statistical distributions. Apart from his contribution to all these fields he is also famous for this books (there are four of them) on  'how to solve it' and his conjecture (Pólya's conjecture) which is proved false in 1958.

The simplest version of the model is like this, given an urn (a container) containing some fixed number (say x) of balls of colour-1 (say red) and some fixed number (say y) of balls of colour-2 (say blue) we perform the following experiment.

1. At each step we draw one ball randomly from the urn, note its colour.
2. Return the ball in the urn along with another ball of the same colour.

A more complex model is the one where we have balls of colour more than one and after each draw we return d number of balls of same colour instead of only one. This model is completely opposite to the model of sampling without replacement where number of balls decrease with time.

Some interesting questions which can arise is
1. Can we estimate the distribution of balls after a fixed number of trial ?
2. Can we determine the probability of choosing a particular sequence of coloured balls  if we know x and y. For example what is the probability that at 12th,13th and 14th trial we will see a red, a blue and a red ball respectively ?
3. If we observe repeatedly n number of red balls with what confidence we can infer that there is no blue ball ?

These kind of questions give rise to well known discrete probability distributions. For example distribution of the number of successful draws is beta-binomial distribution. When there are balls of more than two colours, the first question above gives rise to Dirichlet-multinomial distribution which is also known as multivariate Pólya distribution.

Pólya's Urn model is used to model preferential attachment in networks where nodes having high degree have the high probability of getting an edge when a new node enters into the system.

1 comment: