Data Analysis of the Central Limit Theorem Using Excel and the TI-83

By

Dr. Emmett C. Dennis

 

 

Abstract

The Central Limit Theorem says: If random selections of different samples, each of size n, are drawn from a population with finite mean, μ, and standard deviation, σ, then, when n is large, the calculated sample mean,  from each sample will be approximately normally distributed with mean (  ) equal to μ and standard deviation (  ) equal to       .

As the sample size increases, (n>30) the distribution of sample means will approach a normal distribution, even if the original population is not normally distributed.

        

In simple terms, the Central Limit Theorem says that the sampling distribution of the means of different samples drawn from the same population approximates a Normal model when the samples are large enough.

 

 We don’t even care about the shape of the population distribution.  This surprising fact was proven in a fairly general form in 1810 by Pierre-Simon Laplace, and caused quite a stir (at least in the mathematics circles) because it is so unintuitive.

 

The object of this workshop is to engage participants in doing a simulation – with the TI-83 Plus calculator and an Excel spreadsheet – where we will draw random samples repeatedly, from populations that are not necessarily normally distributed, find their means and standard deviations,  make  histograms of the means to see if the histogram of the sample means get closer and closer to the Normal model as the sample size grows, and compare sample standard deviations with the calculations as per the Central Limit Theorem.

 

 

 

 

TI-83 Plus Calculator Model

 

Procedure:    

 

PART A

Create a simulation with your TI-83 calculators to generate 100 random samples of size 5 from a Normal distribution population with mean = 0 and standard deviation = 1.

OBJECTIVE:  Find the sample mean from each set of 5 numbers and investigate the distribution of these sample means.  Compare the mean of these sample means with the population mean.  Compare the histogram of these 100 sample means with the histogram of any column of 100 numbers from this normal distribution population.

 

A.  Put the 100 randomly chosen numbers in lists L1, L2, L3, L4  and L5

 

[math, prb,6:randNorm(0,1,100) sto 2nd L1 ][enter],(mean = 0 and standard deviation = 1)

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L2 ]  [ enter ];    

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L3 ]  [ enter ];

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L4 ]  [ enter ];

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L5 ]  [ enter ];

 

 

B. Find the sample mean of the 100 groups of 5 numbers and put in L6 as follows:

 (  2nd L1 + 2nd L2 + 2nd L3 + 2nd L4 + 2nd L5 ) / 5 [sto] [2nd ] [ L6 ] enter

 

To view this collection of numbers

[stat], [1:edit]

 

 

C. Find the mean of the 500 numbers (100 x 5) in lists  L1, L2, L3, L4  and L5       ( the “population” mean  ( μ ))  and compare with the mean of the 50 sample means  (  )  in L6.  

Blank screen

(Mean of the 500 numbers)

[ ( ] [2nd]  [stat] [ » ] [math]  [5:sum( ] [2nd ] [ L1 ] [ ) ] +

[2nd]  [stat] [ » ] [math]  [5:sum( ] [2nd ] [L2 ] [ ) ] +

[2nd]  [stat] [ » ] [math]  [5:sum( ] [ 2nd ] [L3 ] [ ) ]+

[2nd]  [stat] [ » ] [math]  [5:sum( ] [ 2nd ] [ L4 ] [ ) ]+

[2nd]  [stat] [ » ] [math]  [5:sum(  ] [ 2nd ] [ L5 ] [ ) ] [ )]

[/][500][ enter]

 

(Mean of the sample means)

[2nd]  [stat] [ » ] [math]  [3:mean( ] 2nd L6 ) enter

 

 

D. Create a series of histograms that describes the distribution of these sample data values.

[window]

[ Xmin = -10  Xmax = 10,  Xscl = 0.5,  Ymin = 0,  Ymax = 80,  Yscl = 5, Xres=1]

[2nd] [y=] [1:Plot1…On] [enter] [Type: histogram] [Xlist:L1] [Freq:1] [graph]

 

(repeat as above)…

[2nd] [y=] [1:Plot1…On] [enter] [Type: histogram] [Xlist:L2] [Freq:1] [graph]

[2nd] [y=] [enter] [Xlist:L3] [graph]

[2nd] [y=] [enter] [Xlist:L4] [graph]

[2nd] [y=] [enter] [Xlist:L5] [graph]

 

finally, change to  [Xlist:L6]  the distribution of the sample means  and graph

[2nd] [y=] [enter] [Xlist:L6] [graph]

 

E. Compare the sample standard deviation ( )  and standard deviation ( σ ) of these 500 numbers (100 x 5)  as per the Central Limit Theorem, which says, sample mean  (  ) =   population mean ( μ )    and sample standard deviation ( )  = 

(Standard Deviation of the sample means)

[2nd]  [stat] [ » ] [math]  [7:stdDev( ] [2nd ] [L6 ][ ) ] enter

 

(Standard Deviation of the population means)

To find the population standard deviation, we must remove all data from the L1 thru L5 lists and “stack” them into the L6 storage cell using the “augment” function which is valid for only two lists at a time.  Then use the L6 cell to compute the population standard deviation.  Look in the “Catalog” to find the augment function.

 

Augment(L1,L2) sto L6

Augment(L6,L3) sto L6

Augment(L6,L4) sto L6

Augment(L6,L5) sto L6

 

With all of the data now in L6 the population standard deviation is:

[2nd]  [stat] [ » ] [math]  [7:stdDev( ] [2nd ] [L6 ][ ) ] enter

 

 To compute         for n = 5

[2nd]  [stat] [ » ] [math]  [7:stdDev( ] [2nd ] [L6 ][ ) ][ / ] [2nd ] [ x2 ][5 ][ ) ] enter

 

 

PART B

Use your TI-83 Plus calculators to generate 100 Random Samples of size 5

 (n = 5) each from the numbers 1 thru 6.  Pretend that this is a simulation of the tossing of five dice once (or 1 die tossed 5 times) and record the five numbers that show up.  This toss is repeated 100 times. The population of 500 numbers (100 x 5) comes from the digits  1, 2, 3, 4, 5, 6, which are randomly selected with replacement. 

(Assumption: we are randomly selecting samples of size n = 5 with replacement, from the numbers 1, 2, 3, 4, 5, 6).

 

OBJECTIVE:  Find the sample mean from each set of 5 numbers and investigate the distribution of these sample means.  Compare the mean of these sample means with the population mean.  Compare the histogram of these 100 sample means with the histogram of any column of 100 numbers representing the 100 repeated tosses of a single die.

 

A. Put the 100 randomly chosen numbers in lists L1, L2, L3, L4  and L5  as follows.  

 

[ math,  prb,  5:randInt(1,6,100) sto L1  ],  [ enter ]

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L2 ]  [ enter ];    

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L3 ]  [ enter ];

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L4 ]  [ enter ];

[ 2nd ]  [ enter ]  [ < ]  [ 2nd ] [L5 ]  [ enter ];

 

 

B. Find the sample mean of the 50 groups of 5 numbers and put in L6 as follows:

(  2nd L1 + 2nd L2 + 2nd L3 + 2nd L4 + 2nd L5 ) / 5 [sto] [2nd ] [ L6 ] enter

 

To view this collection of numbers

[stat], [1:edit]

 

C. Find the mean of the 500 numbers (100 x 5) in lists  L1, L2, L3, L4  and L5       ( the “population” mean  ( μ ))  and compare with the mean of the 100 sample means  (  )  in L6.  

Blank screen

(Mean of the 500 numbers)

[ ( ] [2nd]  [stat] [ » ] [math]  [5:sum( ] [2nd ] [ L1 ] [ ) ] +

[2nd]  [stat] [ » ] [math]  [5:sum( ] [2nd ] [L2 ] [ ) ] +

[2nd]  [stat] [ » ] [math]  [5:sum( ] [ 2nd ] [L3 ] [ ) ]+

[2nd]  [stat] [ » ] [math]  [5:sum( ] [ 2nd ] [ L4 ] [ ) ]+

[2nd]  [stat] [ » ] [math]  [5:sum(  ] [ 2nd ] [ L5 ] [ ) ] [ )]

[/][500][ enter]

 

 

(Mean of the sample means)

[2nd]  [stat] [ » ] [math]  [3:mean( ] 2nd L6 ) enter

 

D. Create a series of histograms that describes the distribution of these sample data values.

[window]

[ Xmin = 0  Xmax = 10,  Xscl = 1,  Ymin = 0,  Ymax = 80,  Yscl = 5, Xres=1]

[2nd] [y=] [1:Plot1…On] [enter] [Type: histogram] [Xlist:L1] [Freq:1] [graph]

 

(repeat as above)…

[2nd] [y=] [1:Plot1…On] [enter] [Type: histogram] [Xlist:L2] [Freq:1] [graph]

[2nd] [y=] [enter] [Xlist:L3] [graph]

[2nd] [y=] [enter] [Xlist:L4] [graph]

[2nd] [y=] [enter] [Xlist:L5] [graph]

 

finally, change to  [Xlist:L6]  the distribution of the sample means  and graph

[2nd] [y=] [enter] [Xlist:L6] [graph]

 

E. Compare the sample standard deviation ( )  and standard deviation ( σ ) of these 500 numbers (100 x 5)  as per the Central Limit Theorem, which says, sample mean  (  ) =   population mean ( μ )    and sample standard deviation ( )  = 

(Standard Deviation of the sample means)

[2nd]  [stat] [ » ] [math]  [7:stdDev( ] [2nd ] [L6 ][ ) ] enter

 

(Standard Deviation of the population means)

To find the population standard deviation, we must remove all data from the L1 thru L5 lists and “stack” them into the L6 storage cell using the “augment” function which is valid for only two lists at a time.  Then use the L6 cell to compute the population standard deviation.  Look in the “Catalog” to find the augment function.

 

Augment(L1,L2) sto L6

Augment(L6,L3) sto L6

Augment(L6,L4) sto L6

Augment(L6,L5) sto L6

 

With all of the data now in L6 the population standard deviation is:

[2nd]  [stat] [ » ] [math]  [7:stdDev( ] [2nd ] [L6 ][ ) ] enter

 

 To compute         for n = 5

[2nd]  [stat] [ » ] [math]  [7:stdDev( ] [2nd ] [L6 ][ ) ][ / ] [2nd ] [ x2 ][5 ][ ) ] enter

 

 

 

 

EXCEL Spreadsheet Model

 

 

Procedure:   

 

Use your Excel Spreadsheet to generate 50 Random Samples of size 5

 (n = 5) each from the numbers 1 thru 6.  Pretend that this is a simulation of the tossing of five dice once (or 1 die 5 times) and record the five numbers that show up.  This toss is repeated 50 times. The population of 250 numbers (50 x 5) comes from the digits  1, 2, 3, 4, 5, 6, which are randomly selected with replacement.  (Assumption: we are randomly selecting with replacement, samples of size n = 5 from the numbers 1, 2, 3, 4, 5, 6).

 

BACKGROUND

The two most important formulas used in the design of this spreadsheet are RANDBETWEEN and COUNTIF.  The RANDBETWEEN function can only be used after the Analysis Toolpak has been added to Excel.  (To add the Analysis Toolpak, under the Tools, menu, choose Add-ins…, and then select the Analysis Toolpak option.) 

 

The RANDBETWEEN function will randomly choose a number between a given maximum and minimum, inclusive (eg., RANDBETWEEN(1,6) chooses numbers from the integers 1 thru 6).

The COUNTIF function will search through a given range of cells and count a tally if a desired entry is found in a cell.  (e.g., COUNTIF(K5:K104, 2 ) will find every instance of the number 2 within the range of K5:K104).

 

 

Step one: (blank spreadsheet)

 

In the first 7 columns  of row1, (A1, B1, C1, D1, E1, F1, G1) enter the words die1, die2, die3, die4, die5, sum, average.

 

Go to row 3, slot A3, click on fx at the top of your spreadsheet table to find the RANDBETWEEN formula. (search under the category “all” to find it)

put the number 6 in “bottom” and the number 1 in “top”

 

Highlight the right lower corner of row 3 column 1  (A3)  scroll across to  row E3.  Random numbers between 1 and 6 should appear in rows B3, C3, D3, and E3. Then highlight the right lower corner of row E3 and scroll down to row 53 column E53.  Random numbers between 1 and 6 should appear in all slots from A4 to E53.

 

Go to row 3, slot F3, click fx on the top of your spreadsheet table to find the SUM formula. (search under the category “all” to find it).  Input it as SUM(A3:E3) in this slot.  A number representing the sum of the numbers in rows A3 thru E3 appears in slot F3.

 

Highlight the right corner of row 3 column 6  (F3)  scroll down to  row 53 slot F3.  Numbers representing the sum of each row should appear in those columns.

 

Go to row 3, slot G3, click fx on the top of your spreadsheet table to find the AVERAGE formula. (search under the category “all” to find it).  Input it as AVERAGE(A3:E3) in this slot.  A number representing the average of the numbers in rows A3 thru E3 appears in slot G3.

 

Highlight the right corner of row 3 column 7 (G3)  scroll down to row 53  slot G53.  Numbers representing the average of each row should appear in those columns.

 

In the 9th, 10th and 11th columns  of row1, (I1, J1, K1) enter the words frequency, psum, pmean.

 

Go to column 9, slot I3, click fx on the top of your spreadsheet table to find the COUNTIF formula. (search under the category “all” to find it).  Input it as COUNTIF(F3:F53, 5) in this slot.  The COUNTIF function will search through the range of cells (F3:F53) to count a tally if 5 is found in a cell within that range.

 

Repeat the same procedure for slot I4. input it as COUNTIF(F3:F53, 6).  Now continue for slots I5, …, I28 and input in respectively, COUNTIF(F3:F53, 7), COUNTIF(F3:F53, 8) , …, COUNTIF(F3:F53,30).  Remember, the largest sum that could occur from the 5 tosses is 30 which happens if all 5 tosses are the number 6.

 

Go to column 10, slot J3.  From slot J3 to slot J28, input the numbers corresponding to the possible sum (psum) that could occur on the 5 dice when tossed, namely the numbers, 5, 6, 7, 8, 9, …, 30.  (put 5 in slot J3, put +j3+1 in slot J4, highlight the right lower corner of slot J4, scroll down to J28.  The numbers 6, 7, …, 30 appears).

 

Go to column 11, slot K3.  From slot K3 to slot K28, input the numbers corresponding to the possible mean (pmean) that could occur when each of the possible sums (psum) values is divided by 5. .  (put +J3/5 in slot K3, highlight the right lower corner of slot K3, scroll down to K28.  The numbers 1, 1.2, 1.4. 1.6, … , 5.8, 6 appears).

 

THE HISTOGRAM

 

We will now use the frequency values and psum values to create  our first interactive histogram.  The second histogram will come from the frequency values and the pmean values.

 

We open the Chart Wizard using either the Chart entry on the insert menu or the Chart Wizard toolbar button. 

Step one, select the graph type.  Since we would like to create a histogram, we select the “Column” chart type.

Step two,  click “next”,  to get to the Chart Source Data screen.  We use the Data Range tab to check that the location of our data is in the frequency column (I3 to I28).  We can use the Series tab to name the data set.  Here, let us name this data series Frequency.  After entering the name for our data series in the Name box, the name will appear in the Series box as well as in the graph legend. The “Category (X) axis labels:” allows us to identify the data that will appear on the x-axis of the histogram.  The data values for this are located in the psum column (J3 to J28).  Highlight these values and input into the Categories (X) axis labels: slot.

Step three, click “next”, to get to the Chart Options screen.  For Chart title, input “Tosses of 5 dice”, for Category (X) axis:, input “Sum of numbers”, for Value (Y) axis:, input “frequency”.

Step four, click “next”, to get to the Chart Location screen.  We will place our chart “As object in” our spreadsheet.  (sheet1).  Click Finish.

 

The graph appears in our spreadsheet.  We can interactively, simulate a new set of data “tosses” by repeatedly pressing the F9 button on the keyboard.  The histogram changes each time.

 

To make the bar chart that shows up really look like a histogram, click on any of the bars in the chart to get to the “Format Data Series” screen.  Click the “options” button and then change the “Gap width” to 0.  This will give you a picture that looks more like a histogram.

 

This is the identical set of steps that you must do to create a histogram using the frequency values and the possible mean (pmean) values to view a distribution of the sample means.  Instead of choosing the data from the psum columns (J3 to J28), you choose the data from the pmean columns (k3 to K28).

 

COMPUTING THE MEAN AND STANDARD DEVIATION

 

Identify a cell below your histogram and type in “sample mean”.   Three cells to the right of this one type in “Population mean”.  Four cells below your “sample mean” entry, type in “Sample Standard Deviation”.  Also, four cells below your “Population Mean “ entry, type in “Population Standard Deviation”.

 

Place your cursor in the cell below your “sample mean” entry.  Click on fx at the top of your spreadsheet table to find the AVERAGE formula. (search under the category “all” to find it)

 fx .  When the Function Arguments screen appears, you must now select the data values in the “average” column, (G3 thru G53).  When you click OK, the mean of sample means for the 50 tosses appears in the cell below your “sample mean” entry. 

 

Next place your cursor in the cell below your “Population Mean” entry.  Click on fx again to locate the AVERAGE formula.  When the Function Arguments screen appears, you must now select the data values in the columns from A3 thru E53.  When you click OK, the mean of the population set of numbers for all 250 entries appears in the cell below your “population mean” entry. 

Place your cursor in the cell below your “Sample Standard Deviation” entry.  Click on fx at the top of your spreadsheet table to find the STDEV formula.   When the Function Arguments screen appears, you must now select the data values in the “average” column, (G3 thru G53).  When you click OK, the standard deviation of the sample means for the 50 tosses appears in the cell below your “Sample Standard Deviation” entry.

 

Next place your cursor in the cell below your “Population Standard deviation” entry.  Click on fx again to locate the STDEV formula.  When the Function Arguments screen appears, you must now select the data values in the columns from A3 thru E53.  When you click OK, the standard deviation of the population set of numbers for all 250 entries appears in the cell below your “population mean” entry.

 

You can now validate the Central limit Theorem concept  that states  

 standard deviation (  ) equal to       .

We can repeat this interactive exercise with any sample size that we want on our spreadsheet.  The procedure is still the same.  Only difference, the spreadsheet will be larger.