PlanetMath (more info)
 Math for the people, by the people.
Encyclopedia | Requests | Forums | Docs | Wiki | Random | RSS  
Login
create new user
name:
pass:
forget your password?
Main Menu
Zipf's law (Definition)

Zipf's law (named for Harvard linguistic professor George Kingsley Zipf) models the occurrence of distinct objects in particular sorts of collections. Zipf's law says that the $ i$th most frequent object will appear $ 1/i^\theta$ times the frequency of the most frequent object, or that the $ i$th most frequent object from an object “vocabulary” of size $ V$ occurs

$\displaystyle O(i) = \frac{n}{i^{\theta} H_{\theta}(V) } $

times in a collection of $ n$ objects, where $ H_{\theta}(V)$ is the harmonic number of order $ \theta$ of $ V$.

Figure: A typical Zipf-law rank distribution. The y-axis represents occurrence frequency, and the x-axis represents rank (highest at the left)
\includegraphics[scale=.85]{zipfslaw.eps}
(generated by GNU Octave and gnuplot)

Zipf's law typically holds when the “objects” themselves have a property (such as length or size) which is modelled by an exponential distribution or other skewed distribution that places restrictions on how often “larger” objects can occur.

An example of where Zipf's law applies is in English texts, to frequency of word occurrence. The commonality of English words follows an exponential distribution, and the nature of communication is such that it is more efficient to place emphasis on using shorter words. Hence the most common words tend to be short and appear often, following Zipf's law.

The value of $ \theta$ typically ranges between 1 and 2, and is between 1.5 and 2 for the English text case.

Another example is the populations of cities. These follow Zipf's law, with a few very populous cities, falling off to very numerous cities with a small population. In this case, there are societal forces which supply the same type of “restrictions” that limited which length of English words are used most often.

A final example is the income of companies. Once again the ranked incomes follow Zipf's law, with competition pressures limiting the range of incomes available to most companies and determining the few most successful ones.

The underlying theme is that efficiency, competition, or attention with regards to resources or information tends to result in Zipf's law holding to the ranking of objects or datum of concern.

References



"Zipf's law" is owned by akrowne.
(view preamble)

View style:


Cross-references: information, efficiency, type, forces, ranges, word, restrictions, places, distribution, exponential distribution, length, property, generated by, harmonic number of order, size, collections, objects

This is version 4 of Zipf's law, born on 2002-09-05, modified 2003-01-24.
Object id is 3422, canonical name is ZipfsLaw.
Accessed 10776 times total.

Classification:
AMS MSC60E05 (Probability theory and stochastic processes :: Distribution theory :: Distributions: general theory)
 68P20 (Computer science :: Theory of data :: Information storage and retrieval)
 94A99 (Information and communication, circuits :: Communication, information :: Miscellaneous)

Pending Errata and Addenda
None.
Discussion
Style: Expand: Order:
forum policy

No messages.

Interact
rate | post | correct | update request | add derivation | add example | add (any)