Data
Analysis of the Central Limit Theorem Using Excel and the TI-83
By
Dr. Emmett C. Dennis
Abstract
The Central Limit Theorem says: If
random selections of different samples, each of size n, are drawn from a population with finite
mean, μ, and standard
deviation, σ, then, when n is large, the calculated sample mean,
from each sample will
be approximately normally distributed with mean (
) equal to μ and standard deviation (
) equal to
.
As the sample size increases, (n>30) the distribution of sample means will
approach a normal distribution, even if the original population is not normally
distributed.
In simple terms, the Central Limit Theorem says that
the sampling distribution of the means of different samples drawn from the same
population approximates a
We don’t even
care about the shape of the population distribution. This surprising fact was proven in a fairly
general form in 1810 by Pierre-Simon Laplace, and caused quite a stir (at least
in the mathematics circles) because it is so unintuitive.
The object of this workshop is to engage participants
in doing a simulation – with the TI-83 Plus calculator and an Excel spreadsheet
– where we will draw random samples repeatedly, from populations that are not
necessarily normally distributed, find their means and standard deviations, make
histograms of the means to see if the histogram of the sample means get
closer and closer to the Normal model as the sample size grows, and compare
sample standard deviations with the calculations as per the Central Limit
Theorem.
TI-83
Plus Calculator Model
Procedure:
PART A
Create a simulation
with your TI-83 calculators to generate 100 random samples of size 5 from a
Normal distribution population with mean = 0 and standard deviation = 1.
OBJECTIVE: Find the sample mean from each set of 5
numbers and investigate the distribution of these sample means. Compare the mean of these sample means with
the population mean. Compare the
histogram of these 50 sample means with the histogram of any column of 50
numbers from this normal distribution population.
A. Put
the 100 randomly chosen numbers in lists L1, L2, L3,
L4 and L5
[math, prb,6:randNorm(0,1,100)
sto 2nd L1 ][enter],(mean = 0
and standard deviation = 1)
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L2 ] [ enter ];
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L3 ] [ enter ];
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L4 ] [ enter ];
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L5 ] [ enter ];
B. Find the sample mean of the 50 groups of
5 numbers and put in L6 as follows:
( 2nd
L1 + 2nd L2 + 2nd L3 + 2nd
L4 + 2nd L5 ) / 5 [sto] [2nd ] [ L6
] enter
To view this collection of numbers
[stat], [1:edit]
C. Find the mean of the 500 numbers (100 x
5) in lists L1, L2,
L3, L4 and L5 ( the “population” mean ( μ )) and compare with the mean of the 50
sample means (
) in L6.
Blank screen
(Mean of the 500 numbers)
[ ( ] [2nd] [stat] [ » ] [math] [5:sum( ] [2nd ] [ L1 ]
[ ) ] +
[2nd] [stat] [ » ] [math] [5:sum( ] [2nd ] [L2 ]
[ ) ] +
[2nd] [stat] [ » ] [math] [5:sum( ] [ 2nd ] [L3 ]
[ ) ]+
[2nd] [stat] [ » ] [math] [5:sum( ] [ 2nd ] [ L4
] [ ) ]+
[2nd] [stat] [ » ] [math] [5:sum(
] [ 2nd ] [ L5 ] [ ) ] [ )]
[/][500][ enter]
(Mean of the sample means)
[2nd] [stat] [ » ] [math] [3:mean( ] 2nd L6 )
enter
D. Create a series of histograms that
describes the distribution of these sample data values.
[window]
[ Xmin = -10 Xmax = 10,
Xscl = 0.5, Ymin = 0, Ymax = 80,
Yscl = 5, Xres=1]
[2nd] [y=] [1:Plot1…On]
[enter] [Type: histogram] [Xlist:L1] [Freq:1] [graph]
(repeat as above)…
[2nd] [y=]
[1:Plot1…On] [enter] [Type: histogram] [Xlist:L2] [Freq:1] [graph]
[2nd] [y=] [enter]
[Xlist:L3] [graph]
[2nd] [y=] [enter]
[Xlist:L4] [graph]
[2nd] [y=] [enter]
[Xlist:L5] [graph]
finally, change to [Xlist:L6] the
distribution of the sample means and
graph
[2nd] [y=] [enter]
[Xlist:L6] [graph]
E. Compare the sample standard deviation (
) and standard deviation ( σ ) of these 500
numbers (100 x 5) as per the Central
Limit Theorem, which says, sample mean (
) = population mean ( μ ) and sample standard deviation (
)
= ![]()
(Standard Deviation of the
sample means)
[2nd] [stat] [ » ] [math] [7:stdDev( ] [2nd ] [L6
][ ) ] enter
(Standard Deviation of the
population means)
To find the population
standard deviation, we must remove all data from the L1 thru L5
lists and “stack” them into the L6 storage cell using the “augment”
function which is valid for only two lists at a time. Then use the L6 cell to compute
the population standard deviation. Look
in the “Catalog” to find the augment function.
Augment(L1,L2)
sto L6
Augment(L6,L3)
sto L6
Augment(L6,L4)
sto L6
Augment(L6,L5)
sto L6
With
all of the data now in L6 the population standard deviation is:
[2nd] [stat] [ » ] [math] [7:stdDev( ] [2nd ] [L6
][ ) ] enter
To compute
for n = 5
[2nd] [stat] [ » ] [math] [7:stdDev( ] [2nd ] [L6
][ ) ][ / ] [2nd ] [ x2 ][5 ][ ) ] enter
PART B
Use your TI-83
Plus calculators to generate 100 Random Samples of size 5
(n = 5) each from the numbers 1 thru 6. Pretend that this is a simulation of the tossing
of five dice once (or 1 die tossed 5 times) and record the five numbers that
show up. This toss is repeated 100
times. The population of 500 numbers (100 x 5) comes from the digits 1, 2, 3, 4, 5, 6, which are randomly selected
with replacement.
(Assumption: we
are randomly selecting samples of size n = 5 with replacement, from the numbers
1, 2, 3, 4, 5, 6).
OBJECTIVE: Find the sample mean from each set of 5
numbers and investigate the distribution of these sample means. Compare the mean of these sample means with
the population mean. Compare the
histogram of these 50 sample means with the histogram of any column of 50
numbers representing the 50 repeated tosses of a single die.
A. Put the 100 randomly chosen numbers in
lists L1, L2, L3, L4 and L5 as follows.
[ math, prb,
5:randInt(1,6,100) sto L1 ], [ enter ]
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L2 ] [ enter ];
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L3 ] [ enter ];
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L4 ] [ enter ];
[ 2nd ] [ enter ] [ < ] [ 2nd ] [L5 ] [ enter ];
B. Find the sample mean of the 50 groups of
5 numbers and put in L6 as follows:
( 2nd L1 + 2nd
L2 + 2nd L3 + 2nd L4 + 2nd
L5 ) / 5 [sto] [2nd ] [ L6 ] enter
To view this collection of numbers
[stat], [1:edit]
C. Find the mean of the 500 numbers (100 x 5)
in lists L1, L2, L3,
L4 and L5 ( the “population” mean ( μ )) and compare with the mean of the 50 sample
means (
) in L6.
Blank screen
(Mean of the 500 numbers)
[ ( ] [2nd] [stat] [ » ] [math] [5:sum( ] [2nd ] [ L1 ]
[ ) ] +
[2nd] [stat] [ » ] [math] [5:sum( ] [2nd ] [L2 ]
[ ) ] +
[2nd] [stat] [ » ] [math] [5:sum( ] [ 2nd ] [L3 ]
[ ) ]+
[2nd] [stat] [ » ] [math] [5:sum( ] [ 2nd ] [ L4
] [ ) ]+
[2nd] [stat] [ » ] [math] [5:sum(
] [ 2nd ] [ L5 ] [ ) ] [ )]
[/][500][ enter]
(Mean of the sample means)
[2nd] [stat] [ » ] [math] [3:mean( ] 2nd L6 )
enter
D. Create a series of histograms that
describes the distribution of these sample data values.
[window]
[ Xmin = 0 Xmax = 10,
Xscl = 1, Ymin = 0, Ymax = 80,
Yscl = 5, Xres=1]
[2nd] [y=]
[1:Plot1…On] [enter] [Type: histogram] [Xlist:L1] [Freq:1] [graph]
(repeat as above)…
[2nd] [y=]
[1:Plot1…On] [enter] [Type: histogram] [Xlist:L2] [Freq:1] [graph]
[2nd] [y=] [enter]
[Xlist:L3] [graph]
[2nd] [y=] [enter]
[Xlist:L4] [graph]
[2nd] [y=] [enter]
[Xlist:L5] [graph]
finally, change to [Xlist:L6] the
distribution of the sample means and
graph
[2nd] [y=] [enter]
[Xlist:L6] [graph]
E. Compare the sample standard deviation (
) and standard deviation ( σ ) of these 500
numbers (100 x 5) as per the Central
Limit Theorem, which says, sample mean (
) = population mean ( μ ) and sample standard deviation (
)
= ![]()
(Standard Deviation of the
sample means)
[2nd] [stat] [ » ] [math] [7:stdDev( ] [2nd ] [L6
][ ) ] enter
(Standard Deviation of the
population means)
To find the population
standard deviation, we must remove all data from the L1 thru L5
lists and “stack” them into the L6 storage cell using the “augment”
function which is valid for only two lists at a time. Then use the L6 cell to compute
the population standard deviation. Look
in the “Catalog” to find the augment function.
Augment(L1,L2)
sto L6
Augment(L6,L3)
sto L6
Augment(L6,L4)
sto L6
Augment(L6,L5)
sto L6
With
all of the data now in L6 the population standard deviation is:
[2nd] [stat] [ » ] [math] [7:stdDev( ] [2nd ] [L6
][ ) ] enter
To compute
for n = 5
[2nd] [stat] [ » ] [math] [7:stdDev( ] [2nd ] [L6
][ ) ][ / ] [2nd ] [ x2 ][5 ][ ) ] enter
EXCEL
Spreadsheet Model
Procedure:
Use your Excel
Spreadsheet to generate 50 Random Samples of size 5
(n = 5) each from the numbers 1 thru 6. Pretend that this is a simulation of the tossing
of five dice once (or 1 die 5 times) and record the five numbers that show up. This toss is repeated 50 times. The population
of 250 numbers (50 x 5) comes from the digits
1, 2, 3, 4, 5, 6, which are randomly selected with replacement. (Assumption: we are randomly selecting with
replacement, samples of size n = 5 from the numbers 1, 2, 3, 4, 5, 6).
BACKGROUND
The
two most important formulas used in the design of this spreadsheet are
RANDBETWEEN and COUNTIF. The RANDBETWEEN
function can only be used after the Analysis Toolpak has been added to
Excel. (To add the Analysis Toolpak,
under the Tools, menu, choose Add-ins…, and then select the Analysis Toolpak
option.)
The
RANDBETWEEN function will randomly choose a number between a given maximum and
minimum, inclusive (eg., RANDBETWEEN(1,6) chooses numbers from the integers 1
thru 6).
The
COUNTIF function will search through a given range of cells and count a tally
if a desired entry is found in a cell.
(e.g., COUNTIF(K5:K104, 2 ) will find every instance of the number 2
within the range of K5:K104).
Step
one: (blank spreadsheet)
In
the first 7 columns of row1, (A1, B1,
C1, D1, E1, F1, G1) enter the words die1, die2, die3, die4, die5, sum, average.
Go
to row 3, slot A3, click on fx at
the top of your spreadsheet table to find the RANDBETWEEN formula. (search
under the category “all” to find it)
put
the number 6 in “bottom” and the number 1 in “top”
Highlight the right lower corner
of row 3 column 1 (A3) scroll across to row E3.
Random numbers between 1 and 6 should appear in rows B3, C3, D3, and E3.
Then highlight the right lower corner of row E3 and scroll down to row 53
column E53. Random numbers between 1 and
6 should appear in all slots from A4 to E53.
Go
to row 3, slot F3, click fx on the
top of your spreadsheet table to find the SUM formula. (search under the
category “all” to find it). Input it as
SUM(A3:E3) in this slot. A number
representing the sum of the numbers in rows A3 thru E3 appears in slot F3.
Highlight the right corner
of row 3 column 6 (F3) scroll down to row 53 slot F3. Numbers representing the sum of each row
should appear in those columns.
Go
to row 3, slot G3, click fx on the
top of your spreadsheet table to find the AVERAGE formula. (search under the
category “all” to find it). Input it as
AVERAGE(A3:E3) in this slot. A number
representing the average of the numbers in rows A3 thru E3 appears in slot G3.
Highlight the right corner
of row 3 column 7 (G3) scroll down to row
53 slot G53. Numbers representing the average of each row
should appear in those columns.
In
the 9th, 10th and 11th columns of row1, (I1, J1, K1) enter the words
frequency, psum, pmean.
Go
to column 9, slot I3, click fx on
the top of your spreadsheet table to find the COUNTIF formula. (search under
the category “all” to find it). Input it
as COUNTIF(F3:F53, 5) in this slot. The
COUNTIF function will search through the range of cells (F3:F53) to count a
tally if 5 is found in a cell within that range.
Repeat
the same procedure for slot I4. input it as COUNTIF(F3:F53, 6). Now continue for slots I5, …, I28 and input
in respectively, COUNTIF(F3:F53, 7), COUNTIF(F3:F53, 8) , …, COUNTIF(F3:F53,30). Remember, the largest sum that could occur from
the 5 tosses is 30 which happens if all 5 tosses are the number 6.
Go
to column 10, slot J3. From slot J3 to
slot J28, input the numbers corresponding to the possible sum (psum) that could
occur on the 5 dice when tossed, namely the numbers, 5, 6, 7, 8, 9, …, 30.
(put 5 in slot J3, put +j3+1 in slot J4, highlight the right lower corner
of slot J4, scroll down to J28. The
numbers 6, 7, …, 30 appears).
Go
to column 11, slot K3. From slot K3 to
slot K28, input the numbers corresponding to the possible mean (pmean) that
could occur when each of the possible sums (psum) values is divided by 5. .
(put +J3/5 in slot K3, highlight the right lower corner of slot K3,
scroll down to K28. The numbers 1, 1.2,
1.4. 1.6, … , 5.8, 6 appears).
THE HISTOGRAM
We
will now use the frequency values and psum values to create our first interactive histogram. The second histogram will come from the
frequency values and the pmean values.
We
open the Chart Wizard using either the Chart entry on the insert menu or the Chart
Wizard toolbar button.
Step
one, select the graph type. Since we
would like to create a histogram, we select the “Column” chart type.
Step
two, click “next”, to get to the Chart Source Data screen. We use the
Step
three, click “next”, to get to the Chart Options screen. For Chart title, input “Tosses of 5 dice”,
for Category (X) axis:, input “Sum of numbers”, for Value (Y) axis:, input
“frequency”.
Step
four, click “next”, to get to the Chart Location screen. We will place our chart “As object in” our
spreadsheet. (sheet1). Click Finish.
The
graph appears in our spreadsheet. We can
interactively, simulate a new set of data “tosses” by repeatedly pressing the
F9 button on the keyboard. The histogram
changes each time.
To
make the bar chart that shows up really look like a histogram, click on any of
the bars in the chart to get to the “Format Data Series” screen. Click the “options” button and then change
the “Gap width” to 0. This will give you
a picture that looks more like a histogram.
This
is the identical set of steps that you must do to create a histogram using the
frequency values and the possible mean (pmean) values to view a distribution of
the sample means. Instead of choosing
the data from the psum columns (J3 to J28), you choose the data from the pmean
columns (k3 to K28).
COMPUTING THE MEAN AND
STANDARD DEVIATION
Identify
a cell below your histogram and type in “sample mean”. Three cells to the right of this one type in
“Population mean”. Four cells below your
“sample mean” entry, type in “Sample Standard Deviation”. Also, four cells below your “Population Mean
“ entry, type in “Population Standard Deviation”.
Place
your cursor in the cell below your “sample mean” entry. Click on fx
at the top of your spreadsheet table to find the AVERAGE formula. (search
under the category “all” to find it)
fx . When the Function Arguments screen appears,
you must now select the data values in the “average” column, (G3 thru
G53). When you click OK, the mean of sample
means for the 50 tosses appears in the cell below your “sample mean” entry.
Next
place your cursor in the cell below your “Population Mean” entry. Click on fx
again to locate the AVERAGE formula. When the Function Arguments screen
appears, you must now select the data values in the columns from A3 thru
E53. When you click OK, the mean of the
population set of numbers for all 250 entries appears in the cell below your
“population mean” entry.
Place
your cursor in the cell below your “Sample Standard Deviation” entry. Click on fx
at the top of your spreadsheet table to find the STDEV formula. When the Function Arguments screen appears,
you must now select the data values in the “average” column, (G3 thru
G53). When you click OK, the standard
deviation of the sample means for the 50 tosses appears in the cell below your
“Sample Standard Deviation” entry.
Next
place your cursor in the cell below your “Population Standard deviation” entry.
Click on fx again to locate the STDEV formula. When the Function
Arguments screen appears, you must now select the data values in the columns
from A3 thru E53. When you click OK, the
standard deviation of the population set of numbers for all 250 entries appears
in the cell below your “population mean” entry.
You
can now validate the Central limit Theorem concept that states
standard
deviation (
) equal to
.
We
can repeat this interactive exercise with any sample size that we want on our
spreadsheet. The procedure is still the
same. Only difference, the spreadsheet
will be larger.