Files are just approximations of Pi…

All those files in your computer. What are they exactly? Well, they are just approximations of that wonderful number, $\pi$. Some are good, some are bad, but each file is an approximation. What did I smoke? Nothing… Let me explain.

It is well known that the probability of two integers being coprime is $\frac{6}{\pi^2}$. Now, if a file is structured as a list of lines, each line can be seen as a number (a field of bits being a number in base two). Considering the probability for each line to be coprime with the next line, any file provides an approximation of $\pi$. All you have to do to obtain the approximation for a given file is to compute the gcd of quite large numbers.

Here is a Perl script that uses dc to compute the $\pi$ number of a given file:

#!/usr/bin/perl -w

# without this line, the pi number of this program is 2.8784916685157

# computes the pi approximation of a file
# uses the fact that the probability for two integers to be coprime
# is 6/pi^2

use strict;

# assume the first line is one (the first test will be positive, and $p will
# be nonzero).
my ($n, $p, $l) = (0, 0, 1);

# for all nonempty and nonnull lines
while (<>) {
  # conversion: returns (in hexa) the bits in the line seen as a number
  # chop the \n at the end of lines (if not, lines are never coprime)
  # dc wants hexa numbers to be uppercased
  chomp;
  $_ = uc unpack "H*", $_;
    if (not /^0*$/) {
    # test the current line with the previous one; increase by one the 
    # number of tests; increase the number of positive tests if numbers 
    # are coprime
    # package Math::BigInt is so slow that it's more efficient to fork and
    # ask dc to compute the gcd
    $n++;
    $p++ if (`echo "16i$l $_ sa[sblalbdsa%d0<.]ds.xlap" | /usr/bin/dc` eq "1\n");
    $l = "$_";
    # print the current approximation of pi
    # print "($p/$n)\n"; # for debug
    print (sqrt (6.0*$n/$p)."\n");
  }
}

Of course, the larger and the more random a file is, the better the approximation. For instance, the $\pi$ number of my Ph.D. thesis (uuencoded Postscript) is 3.302, which suggests it was mostly (but not entirely) nonsense. I am currently writing a research paper with a $\pi$ number (at present) equal to 42.001, the stuff of geniuses.

The kernel of Windows 98 gives the outstanding approximation of 3.14159265358979323846264338327950. This proves that this program is randomly generated (as most of its users suspect). With a pseudo random (rand48) file of 10 GB (41,943,040 lines) and an optimized C/gmp program to compute the $\pi$ number instead of the Perl script, I only got 3.141543785 (and a $1,000 fine from Caltech for wasting 56 hours of CPU time…).

The current record (00/02/18) in the computation of $\pi$ is 206,158,430,000 digits. Interestingly, it only took 37 hours and 21 minutes. However, I don’t know what random file they used…

So, what’s the best $\pi$ file on your machine—besides the Windows OS?