this page is prepared by wentian li of north shore LIJ research institute
you are the visitor no. since January 1, 1999.

Zipf's law, named after the Harvard linguistic professor George Kingsley Zipf (1902-1950), is the observation that frequency of occurrence of some event ( P ), as a function of the rank ( i) when the rank is determined by the above frequency of occurrence, is a power-law function Pi ~ 1/ia with the exponent a close to unity.

The most famous example of Zipf's law is the frequency of English words. Click here (or here is a PDF file of the class note) to see a count of the top 50 words in 423 TIME magazine articles (total 245,412 occurrences of words), with "the" as the number one (appearing 15861 times), "of" as number two (appearing 7239 times), "to" as the number three (6331 times), etc. When the number of occurrences is plotted as the function of the rank (1, 2, 3, etc.), the functional form is a power-law function with exponent close to 1.

If you want to download English texts and analyze it yourself, get texts from Project Gutenberg (National Clearinghouse for Machine Readable Texts) (one mirror site is at UIUC ).

The second example Zipf showed in his book was the population of cities (or population of communities). The population of the city as plotted as a function of the rank (the most popular city is ranked number one, etc) is a power-law function with exponent close to 1.

The income or revenue of a company as a function of the rank is also an example of the Zipf's law (also in Zipf's book). This should also be called the Pareto's law because Pareto observed this at the end of the last century.

Does Zipf's law describe rare or common events?

(new on sept-15-1999)

Well, both! It depends on the quantity used in ordering the events. If an event is number 1 because it is most popular, Zipf's plot describes the common events (e.g. the use of English words). On the other hand, if an event is number 1 because it is unusual (biggest, highest, largest...), then it describes the rare events (e.g. city population).

Actually, in Miller's preface of Zipf's book, he distinguished Zipf's "first law" and "second law", one for rare events and another for common events. We don't make such distinction here (it's hard to remember which is the first law and which is the second law!)

Power-law or "stretched exponential" (Weibull) or "log-normal" or "Yule distribution"?

(new on dec-02-2002)

I am yet to find a more complete list, let me just start to compile papers which question whether a seemingly power-law function may not really be a power-law functions...

Zipf's original work

pre-Zipf work: "Pareto-Estoup-Zipf law"

Mandelbrot's early work

Mandelbrot and Simon's debate

Zipf's law in natural languages

(updated on december-10-2001)

online reports (new on sept-15-1999)

Zipf's law in natural languages (papers written in non-English languages)

(new on feb-05-2002, I would like to thank Dr. Gabriel Altmann for this collection)

Zipf's law in monkey-typing texts
(updated on feb-12-2002)

Turing's formula?

Connection with information theory (added on may-10-2002)

Zipf's law discussed in popular books/Tutorial

Zipf's law in city populations

(updated on jul-30-2001)

Zipf's law in Web Access Statistics and Internet Traffic

(updated on mar-07-2001)

See also, Mark Crovella's publication list
Jakob Nielsen's column Zipf curve and website popularity
Jakob Nielsen's column Traffic from referring sites
Hewlett-Packard's information dynamics group

Zipf's law in bibliometrics, informetrics, scientometrics, and library science

(updated on mar-07-2001)

This is similar to the Zipf's law in natural language, but discussed in the context of information retrieval and library science.

Some links to conferences:
7th International Conference on Scientometrics and Informetrics (July 5-9, 1999, Mexico)
6th International Conference on Scientometrics and Informetrics (June 16-19, 1997, Israel)

a collection of links on bibliometrics

Zipf's law in finance and business

(updated on sep-09-2001)

Zipf's law in ecological systems

(updated on dec-02-2002)

(well, i haven't checked the original papers, so i'm not sure the papers are in the right place ...)

Zipf's law in earthquake?

Biomolecular sequences, Genomics

(note that i didn't use the words "zipf's law", because these are not!)

Estimation issues


(updated on jul-05-2001)

Relation with Benford's Law (also called first-digit law)?

(new on sep-19-2001)

More links to Benford's law: