2012-PHYS-339 Statistics / Geiger

From McGill University Physics Department Technical Services Wiki

News

Files

  • Geiger handout (http://www.ugrad.physics.mcgill.ca/resources/339/geiger/geiger.pdf) - PDF version of the lab handout
  • Geiger Supplement (http://www.ugrad.physics.mcgill.ca/resources/339/geiger/geiger-supplement.pdf) - A bunch of stuff which might be interesting or useful
  • sample.data (http://www.ugrad.physics.mcgill.ca/resources/339/geiger/sample.data) - The self-consistent sample data which is characterized in the appendix of the handout
  • read_geiger.m (http://www.ugrad.physics.mcgill.ca/resources/339/geiger/read_geiger.m) - MATLAB function to load geiger data
  • old-sample.data (http://www.ugrad.physics.mcgill.ca/resources/339/geiger/old-sample.data) - The imfamous sample data as collected by defective acquisition software - can you figure out what went wrong?

Guide to running the provided software

The geiger software is installed on all 339 workstations, and is in the path. There is actually a little bit of trickery going on. The actually geiger software runs on the beige machine, while the command geiger on the workstation is a shell script to SSH in to the beige machine and run the real command in the appropriate subdirectory.

If this is the first time you run the geiger program, you may get asked if you really want to connect ... you do, type "yes".

mark.orchard-webb@ws339-17:~$ geiger
The authenticity of host 'lm339-17 (132.206.252.107)' can't be established.
RSA key fingerprint is 9d:52:72:9d:09:d4:5d:36:a3:f1:b0:f1:e9:67:a9:1f.
Are you sure you want to continue connecting (yes/no)?

If you have not set up public/private keys for SSH between machines, you will get asked for your password each time you run geiger. This is boring. If you have no idea what I'm talking about regarding public/private keys, just run "autossh" once from the command-line ... after this no password should be required.

mark.orchard-webb@ws339-17:~$ autossh 
Generating public/private rsa key pair.
Created directory '/home/morcha/.ssh'.
Your identification has been saved in /home/morcha/.ssh/id_rsa.
Your public key has been saved in /home/morcha/.ssh/id_rsa.pub.
The key fingerprint is:
a9:2b:af:b3:f5:99:32:50:ac:92:dc:15:f5:ee:36:a0 mark.orchard-webb@ws339-17
The key's randomart image is:
+--[ RSA 2048]----+
|       ..        |
|      .  .       |
|     . .  .      |
|      +  o       |
| . o +  S .      |
|  + +  o o       |
|   . .E   +      |
|    o.oo + .     |
|    o*oo+        |
+-----------------+

Responding to all those questions from geiger is also boring. You can run with no questions asked, which is very handy for running from a shell script.

With the Geiger Counter turned on and the gamma source placed close to the detector, try the following:

mark.orchard-webb@artemis:/mnt/data/orchard/geiger$ geiger -a -r 100 -i 100 -p 0.05 -o geiger.data


You should get a window which after a few seconds looks something like this:

Image:Geiger-2011-01-24.png

Structure of data

Check the documentation in the source tar file

christian.voyer@hades:~$ man ./geiger.5
Reformatting geiger.5, please wait...

Reads

GEIGER(5)                   Lab Programmers Manual                   GEIGER(5)

NAME
       geiger - a data collection tool - the output data format

DESCRIPTION
       The  data  saved by the geiger program consists of two major parts, the
       header and the data-body.

HEADER
       The header is a single line consisting of 4 numbers:  replicas,  inter-
       vals,  columns  and  period.  replicas, intervals and columns are inte-
       gers, period is a floating point number.  The definitions of the param-
       eters are as follows:

       replicas
              The number of replicas requested.

       intervals
              The number of intervals per replica.

       columns
              The number of columns in the histogram.

       period The period in seconds requested by the user.  If the true period
              differed from this because of quantization this a user  problem.

DATA-BODY
       The  data-body  section  consists of replicas rows, each row containing
       columns numbers, separated by tab characters.  The sum of numbers in  a
       given  row  should  equal intervals.  The number in the Nth column of a
       replica represents the frequency of observation of N counts per  period
       during the replica measurement.

SEE ALSO
       geiger(1)

BUGS
       The period problem mentioned above.

AUTHOR
       Mark Orchard-Webb (orchard@physics.mcgill.ca)

Chi-Square, χ2, distribution

The following is lifted from Appendix C4 of Bevington.

The probability distribution Px(x2,ν) for χ2 is given by

P_x(x^2,\nu) = \frac1{2^{\nu/2}\Gamma(\nu/2)}(x^2)^{1/2(\nu-2)}e^{-x^2/2}

The probability of observing a value of chi-square larger than χ2 for a random sample of N observations with ν degrees of freedom is the integral of this probability Pχ2,ν).

P_\chi(\chi^2,\nu) = \frac1{2^{\nu/2}\Gamma(\nu/2)}\int_{\chi^2}^\infty (x^2)^{1/2(\nu-2)}e^{-x^2/2} d(x^2)

Values of reduced chi-square \chi^2_\nu corresponding to various values of the integral probability Pχ2,ν). of exceeding χ2 in a measurement with \nu degrees of freedom are tabulated in Table C-4 for ν ranging from 1 to 200. These values were calculated with the subroutine PCHISQ of Program 10-1. The functional dependence of Pχ2,ν). corresponding to representative values of ν is graphed in Figure C-4 as a smooth variation with the reduced chi-square \chi^2_\nu.

The function which is tabulated and graphed is the shaded area under the tail of the probability curve for values larger than χ2 as indicated.

Figure C-4

Image:pchisq-2011-01-27.png

Table C-4

Image:Table-C4-2011-01-27.png

P0.990.980.950.900.800.700.600.500.400.300.200.100.050.020.010.001
ν================================================================================================
10.000160.000630.003930.01580.06420.1480.2750.4550.7081.0741.6422.7063.8425.4126.63510.827
20.01000.02020.05150.1050.2230.3570.5110.6930.9161.2041.6092.3032.9963.9124.6056.908
30.03830.06170.1170.1950.3350.4740.6230.7890.9821.2221.5472.0842.6053.2793.7825.422
40.07420.1070.1780.2660.4120.5490.6880.8391.0111.2201.4971.9452.3722.9173.3194.617
50.1110.1510.2290.3220.4690.6000.7310.8701.0261.2131.4581.8472.2142.6783.0174.103
-
60.1450.1890.2730.3670.5120.6380.7620.8911.0351.2051.4261.7742.0992.5062.8023.743
70.1770.2240.3090.4050.5460.6670.7850.9071.0401.1981.4001.7172.0102.3752.6393.475
80.2060.2540.3420.4360.5740.6910.8030.9181.0441.1911.3791.6701.9382.2712.5113.266
90.2320.2810.3700.4630.5980.7100.8180.9271.0461.1841.3601.6321.8802.1872.4073.097
100.2560.3060.3940.4860.6180.7270.8300.9341.0471.1781.3441.5991.8312.1162.3212.959
-
110.2770.3280.4160.5070.6350.7410.8400.9401.0481.1731.3301.5701.7892.0562.2482.842
120.2980.3480.4350.5250.6510.7530.8490.9451.0491.1681.3181.5461.7522.0052.1852.742
130.3170.3660.4530.5420.6640.7640.8560.9491.0491.1631.3071.5241.7201.9592.1302.655
140.3330.3840.4690.5560.6760.7730.8630.9531.0491.1591.2971.5051.6921.9192.0822.580
150.3490.3990.4840.5700.6870.7820.8690.9561.0491.1551.2871.4871.6661.8842.0392.462
-
160.3630.4130.4980.5820.6970.7890.8740.9591.0491.1511.2791.4711.6441.8522.0002.453
170.3770.4270.5100.5930.7060.7960.8790.9611.0481.1481.2721.4571.6231.8231.9652.222
180.3900.4390.5220.6040.7140.8020.8830.9631.0481.1451.2641.4441.6041.7971.9342.351
190.4020.4510.5320.6130.7220.8080.8870.9651.0481.1421.2581.4321.5871.7721.8952.016
200.4130.4620.5430.6220.7290.8130.8910.9671.0481.1391.2521.4211.5711.7511.8782.266
-
220.4340.4820.5610.6380.7420.8230.8970.9701.0471.1341.2411.4011.5421.7121.8312.194
240.4520.5000.5770.6530.7530.8310.9020.9721.0461.1291.2311.3831.5171.6781.7912.133
260.4690.5160.5920.6650.7620.8380.9070.9751.0451.1251.2231.3681.4961.6481.7562.079
280.4850.5300.6050.6760.7710.8450.9110.9761.0451.1211.2151.3541.4761.6221.7242.032
300.4980.5440.6170.6870.7790.8500.9150.9781.0441.1181.2081.3421.4591.5991.6971.990
-
320.5110.5560.6270.6960.7860.8550.9180.9791.0431.1151.2021.3311.4441.5781.6711.953
340.5230.5670.6370.7050.7920.8600.9210.9811.0421.1121.1961.3211.4291.5591.6491.919
360.5340.5770.6460.7120.7980.8640.9240.9821.0421.1091.1911.3121.4171.5411.6281.889
380.5450.5870.6550.7200.8040.8680.9260.9831.0411.1071.1861.3031.4051.5261.6101.861
400.5540.5960.6630.7260.8090.8720.9280.9831.0411.1041.1821.2951.3941.5111.5921.835
-
420.5630.6040.6700.7330.8130.8750.9300.9841.0401.1021.1781.2881.3841.4981.5761.812
440.5720.6120.6770.7380.8180.8780.9320.9851.0401.1001.1741.2811.3751.4851.5621.790
460.5800.6200.6830.7440.8220.8810.9340.9861.0391.0981.1701.2751.3661.4731.5481.770
480.5870.6270.6900.7490.8260.8840.9360.9861.0381.0961.1671.2691.3581.4631.5351.751
500.5940.6330.6950.7540.8290.8860.9370.9871.0381.0951.1631.2631.3501.4521.5231.733
-
600.6250.6620.7200.7740.8440.8970.9440.9891.0361.0871.1501.2401.3181.4101.4731.660
700.6490.6840.7390.7910.8560.9050.9490.9911.0341.0811.1391.2221.2931.3771.4351.605
800.6690.7030.7550.8030.8650.9120.9520.9921.0321.0771.1301.2071.2741.3511.4041.560
900.6860.7180.7680.8140.8730.9170.9560.9931.0311.0731.1231.1951.2571.3301.3791.525
1000.7010.7320.7790.8240.8800.9210.9580.9931.0301.0691.1171.1851.2441.3121.3581.495
-
1200.7240.7530.7980.8390.8900.9280.9620.9951.0281.0641.1071.1691.2211.2831.3251.447
1400.7430.7700.8120.8500.8980.9340.9650.9951.0261.0591.0991.1561.2041.2611.2991.410
1600.7590.7840.8240.8600.9050.9390.9680.9961.0241.0561.0931.1461.1911.2431.2781.381
1800.7710.7960.8330.8680.9100.9420.9700.9961.0231.0521.0881.1371.1791.2281.2621.358
2000.7820.8060.8410.8740.9150.9450.9720.9971.0221.0501.0831.1301.1701.2161.2471.338

Various codes

The fortran code for PCHISQ is shown below

      FUNCTION PCHISQ (CHISQ, NFREE)
      DOUBLE PRECISION Z, TERM, SUM
 11   IF (NFREE) 12, 12, 14
 12   PCHISQ = 0.
      GO TO 60
 14   FREE = NFREE
      Z = CHISQ*FREE/2
      NEVEN = 2*(NFREE/2)
      IF (NFREE - NEVEN) 21, 21, 41
C
C     NUMBER OF DEGREES OF FREEDOM IS EVEN
C     
 21   IMAX = NFREE/2
      TERM = 1.
      SUM = 0.
 31   DO 34 I=1, IMAX
      FI = I
      SUM = SUM + TERM
 34   TERM = -TERM * Z/FI
 35   PCHISQ = SUM * DEXP(-Z)
      GOTO 60
C     
C     NUMBER OF DEGREES OF FREEDOM IS ODD
C     
 41   IF (Z - 25) 44, 44, 42
 42   Z = CHISQ * (FREE-1.)/2.
      GO TO 21
 44   PWR = FREE/2
      TERM = 1.
      SUM = TERM/PWR
 51   DO 56 I=1, 1000
      FI = I
      TERM = TERM * Z/FI
      SUM = SUM + TERM/(PWR+FI)
 55   IF (DABS(TERM/SUM) - 0.00001) 57, 57, 56
 56   CONTINUE
 57   PCHISQ = 1. - (Z**PWR)*SUM/GAMMA(PWR)
 60   RETURN
      END

This is a little too ugly for me, so I rewrote it in C as follows

double pchisq (double chisq, int nfree)
{
  double free, z, pwr, term, sum, fi;
  int i, imax;
  if (nfree < 1) return 0;
  free = nfree;
  z = chisq * free / 2;
  if (nfree % 2) {		/* nfree is odd */
    if (z > 25) {
      z = chisq * (free-1.)/2.;
    } else {
      pwr = free/2;
      term = 1.;
      sum = term/pwr;
      for (i = 0; i<1000;) {
	i++;
	fi = i;
	term = -term * z/fi;
	sum = sum + term/(pwr+fi);
	if (fabs(term/sum) <= 0.00001) break;
      }
      return 1. - pow(z,pwr)*sum/exp(lgamma(pwr));
    }
  }
  imax = nfree/2;
  term = 1.;
  sum = 0.;
  for (i = 0; i < imax;) {
    i++;
    fi = i;
    sum = sum + term;
    term = term * z/fi;
  }
  return sum * exp(-z);
}

Using the following main

int main (int argc, char **argv)
{
  double rchisq;
  int nu;
  sscanf (argv[1], "%d", &nu);
  for (rchisq = 0; rchisq < 1.7; rchisq+=0.01) {
    fprintf (stdout, "%g %g\n", rchisq, pchisq (rchisq, nu));
  }
  return 0;
}

And the following bit of shell script

#! /bin/sh

# prepare the graph
cat << EOF > pchisq.gnu
set xlabel '{/Symbol C^2/n}'
set ylabel 'P_{/Symbol C}({/Symbol C}^2,{/Symbol n})'
EOF

for nu in 1 2 3 5 10 20 30
do
    ./pchisq $nu > pchisq-$nu.data
    if test $nu = 1
    then 
      echo -n "plot " >> pchisq.gnu
    else
      echo -n ", " >> pchisq.gnu
    fi
    echo -n "'pchisq-$nu.data' w l title '{/Symbol n} = $nu'" >> pchisq.gnu
done

I am able to recreate Bevingtons figure C-4.

Table C-4 was created with the following code, an then hand edited for the variable precision in the upper left corner. As can be seen it started out as an elegant binary search and degenerated in to a brute force linear search as time ran out.

#include <stdio.h>
#include <math.h>
#include <stdlib.h>

double pchisq (double chisq, int nfree)
{
  double free, z, pwr, term, sum, fi;
  int i, imax;
  if (nfree < 1) return 0;
  free = nfree;
  z = chisq * free / 2;
  if (nfree % 2) {		/* nfree is odd */
    if (z > 25) {
      z = chisq * (free-1.)/2.;
    } else {
      pwr = free/2;
      term = 1.;
      sum = term/pwr;
      for (i = 0; i<5000;) {
	i++;
	fi = i;
	term = -term * z/fi;
	sum = sum + term/(pwr+fi);
	if (fabs(term/sum) <= 0.00001) break;
      }
      return 1. - pow(z,pwr)*sum/exp(lgamma(pwr));
    }
  }
  imax = nfree/2;
  term = 1.;
  sum = 0.;
  for (i = 0; i < imax;) {
    i++;
    fi = i;
    sum = sum + term;
    term = term * z/fi;
  }
  return sum * exp(-z);
}

double binsearch(int nu, double target, double low, double high)
{
  static int depth = 1;
  double mid = 0.5 * (low + high);
  double P = pchisq(mid, nu);
  depth++;
  if (depth > 20000) {
    fprintf (stderr, "Tooo deeeeeep. nu = %d, target = %f, low = %f, high = %f, P = %f Abort\n", nu, target, low, high, P);
    return -1;
  }
  if ((fabs(target - P)/target) < 0.0001) return mid;
  if (P > target) return binsearch(nu, target, mid, high);
  if (P < target) return binsearch(nu, target, low, mid);
  fprintf (stderr, "No can not get here!!!!!\n");
  exit(1);
}

const char *pretty(double x)
{
  static char buf[16];
  int scale = log(x) / log(10);
  sprintf (buf, "%5.3f", x);
  return buf;
}

double ugly (int nu, double target, double start)
{
  int i;
  for (i = 10000*start; ; i++) {
    if (pchisq(1e-4*i, nu) < target) return 1e-4*i;
  }
  fprintf (stderr, "No can not get here!!!!!\n");
}

int main (int argc, char **argv)
{
  char *Ps[] = {"0.99", "0.98","0.95", "0.90", "0.80","0.70","0.60","0.50","0.40","0.30","0.20","0.10","0.05","0.02","0.01","0.001", 0};
  double *P;
  int ncols, i, nu;
  fprintf (stdout, "<table><tr><td></td><td>P</td>");
  for (i = 0; Ps[i]; i++) fprintf (stdout, "<td>%s</td>", Ps[i]);
  fprintf (stdout, "</tr>\n");
  fprintf (stdout, "<tr><td><math>\\nu</math></td><td></td>");
  ncols = i;			/* take advantage of knowing how many columns */
  P = malloc (ncols*sizeof(double));
  for (i = 0; i < ncols; i++) {
    sscanf (Ps[i], "%lf", &P[i]); /* string to double */
    fprintf (stdout, "<td>======</td>");
  }
  fprintf (stdout, "</tr>\n");  
  for (nu = 1; nu < 10; nu++) {
    fprintf (stdout, "<tr><td>%d</td><td></td>", nu);
    for (i = 0; i < ncols; i++) {
      fprintf (stdout, "<td>%s</td>", pretty(binsearch(nu,P[i],0,12/sqrt(nu))));
    }
    fprintf (stdout, "</tr>\n");
  }
  for (nu = 10; nu < 15; nu++) {
    fprintf (stdout, "<tr><td>%d</td><td></td>", nu);
    for (i = 0; i < ncols; i++) {
      fprintf (stdout, "<td>%s</td>", pretty(binsearch(nu,P[i],0.25,10/sqrt(nu))));
    }
    fprintf (stdout, "</tr>\n");
  }
  for (nu = 15; nu < 20; nu++) {
    fprintf (stdout, "<tr><td>%d</td><td></td>", nu);
    for (i = 0; i < ncols; i++) {
      fprintf (stdout, "<td>%s</td>", pretty(ugly(nu,P[i],0.2)));
    }
    fprintf (stdout, "</tr>\n");
  }
  for (nu = 20; nu < 50; nu+=2) {
    fprintf (stdout, "<tr><td>%d</td><td></td>", nu);
    for (i = 0; i < ncols; i++) {
      fprintf (stdout, "<td>%s</td>", pretty(ugly(nu,P[i],0.4)));
    }
    fprintf (stdout, "</tr>\n");
  }
  for (nu = 50; nu < 100; nu+=10) {
    fprintf (stdout, "<tr><td>%d</td><td></td>", nu);
    for (i = 0; i < ncols; i++) {
      fprintf (stdout, "<td>%s</td>", pretty(ugly(nu,P[i],0.59)));
    }
    fprintf (stdout, "</tr>\n");
  }
  for (nu = 100; nu < 201; nu+=20) {
    fprintf (stdout, "<tr><td>%d</td><td></td>", nu);
    for (i = 0; i < ncols; i++) {
      fprintf (stdout, "<td>%s</td>", pretty(ugly(nu,P[i],0.7)));
    }
    fprintf (stdout, "</tr>\n");
  }
  fprintf (stdout, "</table>\n");
  return 0;
}

Supplemental toys

  • chisq.c (http://www.ugrad.physics.mcgill.ca/resources/339/geiger/chisq.c) - This is an implementation of table C4. in Bevington. There are two functions of interest in this file:

double chisq_five (int dof)
returns the reduced Χ2 value which only 5% of replicas having dof are expected to exceed.
double chisq_ten (int dof)
returns the reduced Χ2 value which only 10% of replicas having dof are expected to exceed.
These functions are written in such a way that they can easily be pasted into another piece of code.

Code to generate Gnuplot script

Here is some delicious copy-pasta which will generate a gnuplot script, which plots nice smooth poisson and gaussian functions. It should look something like

geiger-gnuplot.jpg

The function expects 7 arguments:

  • graphname: this is a string, it will be the name of the gnuplot script to be produced.
  • dataname: this is the name of the data file to be plotted, it should contain the usual 3 columns.
  • A_p: The normalization constant for the poisson curve (intervals)
  • mu_p: The mean of the poisson curve (replica mean)
  • A_g: The normalization constant of the gaussian curve (GC * intervals, where GC is great than or equal to 1, due to lost area of gaussian for x < 0). "GC" can be calculated by symbolically evaluating the gaussian integral from 0 to \infty.
  • mu_g: The mean of the gaussian curve (replica mean)
  • sigma_g: The standard deviation of the gaussian curve (sqrt(replica variance))
int make_gnuplot (const char *graphname, const char *dataname, double A_p, double mu_p, double A_g, double mu_g, double sigma_g)
{
  FILE *graph;

  graph = fopen (graphname, "w"); /* try to open requested filename */
  if (graph == NULL) {		/* if it didn't work ... */
    fprintf (stderr, "Can't open '%s' for write", graphname); /* warn the user */
    exit(1);			/* crash and burn ... "#include <stdlib.h>" to avoid gcc warning */
  }

  /* Dump out the variables */
  fprintf (graph, "A_p = %g\n", A_p);
  fprintf (graph, "mu_p = %g\n", mu_p);
  fprintf (graph, "A_g = %g\n", A_g);
  fprintf (graph, "mu_g = %g\n", mu_g);
  fprintf (graph, "sigma_g = %g\n", sigma_g);

  /* Define the functions */
  fprintf (graph, "poisson(x,A,mu) = A*exp(x*log(mu)-lgamma(x+1)-mu)\n");
  fprintf (graph, "gaussian(x,A,mu,sigma) = A*exp(-0.5*((x-mu)/sigma)**2)/(sqrt(2*pi)*sigma)\n");

  /* Make some pretty labels */
  fprintf (graph, "set xlabel 'Events per 0.1 s interval'\n");
  fprintf (graph, "set ylabel 'Frequency of observation'\n");

  /* The big plot command */
  fprintf (graph, "plot [-1:] '%s' using 1:2:3 title 'sample replica' with error, \\\n", dataname);
  fprintf (graph, "(x<0)?1/0:poisson(x, A_p, mu_p) title 'Poisson' with line, \\\n"); /* See comments on wiki */
  fprintf (graph, "(x<0)?1/0:gaussian(x, A_g, mu_g, sigma_g) title 'Gaussian' with line\n"); /* regarding "(x<0)?1/0:" */
  fclose (graph);
  return 0;
}

Note: portion of code x<0)?1/0:poisson(x, A_p, mu_p) is a dirty (elegant) trick to avoid plotting the curve for domain values less than zero. Basically this reads "plot 1/0 if x < 0, otherwise plot poisson (x, A_p, mu_p)." 1/0 is of course undefined, so gnuplot plots nothing.

Writing out data from a replica to a file

This is a related problem. The following copy-pasta function accepts 4 arguments,

  • dataname: this is a string, which is the name of the datafile to be created.
  • columns: the number of columns in the replica
  • replica: this is the array of frequencies. This bit is tricky! Let's assume you allocated storage for the frequencies in an array as follows:
int data[rows][cols];

You can then pass the nth replica to this function as

make_graphdata ("sample-replica.data", cols, data[n], colrep);
  • colvar: this is the array of column variances to be used as error-bar.
int make_graphdata (const char *dataname, int columns, int *replica, double *colvar)
{
  FILE *datafile;
  int i;

  datafile = fopen (dataname, "w"); /* create a data stream */
  if (datafile == NULL)	{	/* if it failed */
    fprintf (stderr, "Can't open '%s' for write\n", dataname);
    exit(1);			/* going down with the ship  ... "#include <stdlib.h>" to avoid gcc warning */
  }
  for (i = 0; i < columns; i++) {
    if (colvar[i] > 0) {
      fprintf (datafile, "%d\t%d\t%g\n",
	       i, replica[i],
	       sqrt(colvar[i]));
    }
  }
  return 0;
}

Historical Archives

Log