Student’s Name
Instructor
Subject
Date of Submission
Statistic Project, Sampling
11. Population = 828 claims
a.) SRS – sample 85 each with 215 fields
Claims1142257
Error43210
fiyi438220
we estimate mean(Y) with mean(y)
n= ∑fi = 10, ∑fiyi=37
mean(y) = ∑fiyi/∑fi = 37/10 = 3.7
The standard error mean(y) isδy= [var mean(y)]1/2= [S2/n( 1 – f)]1/2
S2= 1/(n-1) ∑[yi- mean(y)]2
S2= 1/9 * (523 – 372/10) = 42.9
Estimated δy=0.45289429
b.) Estimated (Y) = Ny= 3.7 * 828 = 3063.8
standard error of Estimated Y = N δy = 828 * 0.45289429
c.) Sample – 18275 fields
pop = 178,020 fields
Number of errors in sample = 10
Mean(y) = 3.7
14.
a.) School1234
Smokers female729457511800
Smokers
Smokers105627
n= ∑fi = 46, ∑fiyi= 1495, mean(y)= 32.5
S2=1/(n-1)[∑fiyi2 – ∑(fiyi)2/∑fi]
=1/99[ 52525 – 14952/46] = 39.7727273
b.) The (1-α)100% confidence interval of mean(y) is:
mean(y) ±Z α/2* S/n1/2[(N- n/N]1/2
∑f = 46, ∑fiyi=1495, ∑(fiyi)2, N=2550, n=100 , S=6.30656224.
32.5 ± Z 0.025* (6.365224/1001/2)[(2550 – 100)/2550]1/2
32.5 ± 1.96* (6.365224/1001/2)[(2550 – 100)/2550]1/2 = 33.72287683
c.) The 100 (1 – α)% C.I for the population total
Nmean(y)±Z α/2* Ns/n1/2 *[(N-n) /N]1/2
2550 ± 1.96 * (255 * 6.30656224) /(100)1/2) [(2550 – 100)/2550]1/2= 2709
16.
> school1=read.table(“c:/xyz/data.csv”,sep=”,”,header=T)
> school1
returnf
1 1
2 1
3 1
4 0
5 1
6 9
7 1
8 1
9 0
10 0
11 1
12 0
13 0
14 1
15 0
16 1
17 0
18 0
19 1
20 0
21 0
22 9
23 0
24 0
25 0
26 1
27 0
28 0
29 0
30 1
31 1
32 0
33 1
34 1
35 0
36 0
37 1
38 1
39 1
40 1
a.) >sum (school1)
[1] 37
Percentage of parents who returned the forms: 37/78 *100 =47.44%
>sum (read.table(“c:/xyz/school2.csv”,sep=”,”,header=T))
[1] 37
Percentage of parents who returned the forms: 37/238 *100 =15.54%
>sum (read.table(“c:/xyz/school3.csv”,sep=”,”,header=T))
[1] 31
Percentage of parents who returned the forms: 31/261 *100 =11.88%
>sum (read.table(“c:/xyz/school4.csv”,sep=”,”,header=T))
[1] 18
Percentage of parents who returned the forms: 18/174 *100 =10.34%
>sum (read.table(“c:/xyz/school5.csv”,sep=”,”,header=T))
[1] 48
Percentage of parents who returned the forms: 48/236 *100 =20.34%
>sum (read.table(“c:/xyz/school6.csv”,sep=”,”,header=T))
[1] 22
Percentage of parents who returned the forms: 22/188 *100 =11.70%
>sum (read.table(“c:/xyz/school7.csv”,sep=”,”,header=T))
[1] 24
Percentage of parents who returned the forms: 24/113 *100 =21.24%
>sum (read.table(“c:/xyz/school8.csv”,sep=”,”,header=T))
[1] 84
Percentage of parents who returned the forms: 84/170 *100 = 49.41%
>sum (read.table(“c:/xyz/school9.csv”,sep=”,”,header=T))
[1] 50
Percentage of parents who returned the forms: 50/296 *100 =16.89%
>sum (read.table(“c:/xyz/school10.csv”,sep=”,”,header=T))
[1] 43
Percentage of parents who returned the forms: 43/207 *100 =20.77%
c.) > sum (read.table(“c:/xyz/consent.csv”,sep=”,”,header=T))
[1] 339
Percentage of parents who returned the forms: 339/9962 *100 =3.40%
0.95 *339= 322
b.)
The procedure is as follows:
• The weights wi are the inverses of the selection probabilities ψi.
• The weighted estimator of the population total is 1st ψ = ∑witi.
• We calculate ψ (estimate) for each.
Sample n=18275 pop N=178020Var(Y) =(N2S2/n)*(N-n)/n
9.)
Procedure
– Suppose the number of samples, n is greater than 1 and we sample with replacement.
-This implies πi = 1− (1 − ψi)n
-The probability that an item i is selected on the first draw is the same as the probability that item i is selected on any other draw.
-Sampling with replacement gives us n independent estimates of the population total, one for each unit in sample.
-We average these n estimates.
-Estimated variance is variance of the estimates divided by n
-N = 52 classes of states in the USA
– Mi students in class i (i = 1 to 52)
– Values of Mi range from 1 to 3142.
-We want a sample of 10 states.
-In this case ψi=Mi/3142
units size Cumulative size Y=Population 1
2
3
4
5
6
7
8
9
10
.
.
.
52 67
25
15
75
58
63
8
3
1
67
.
.
.
159 67
92
107
182
240
303
311
314
315
382
.
.
.
3142 Select a random number R between 1 and (TN) =52 by using random number table.
4137511
587766
3832368
2394253
30895356
3464675
3279116
690884
585221
13482716
464736
If Ti-1≤R≤Ti, then the ith unit is selected with probability Xi/52,
i = 1, 2,…, 52.
Repeat the procedure 10 times to get a sample of size 10.
First Draw: Draw a random number between 1 and 3142.
Suppose it’s 167
T3≤132≤T4, Unit Y is selected and Y4 = 2394253 enters in the sample.
2. Second Draw: Draw a random number between 1 and 64
Suppose it is 308
T6< 38 < T7 , Unit 7 is selected and Y7 = 3279116
Enters in the sample and so on.
This procedure is repeated till the sample of required size is obtained.
10.)
units size Cumulative size Y=Population
1
2
3
4
5
6
7
8
9
10
67
25
15
75
58
63
8
3
1
67
67
92
107
182
240
303
311
314
315
382
4137511
587766
3832368
2394253
30895356
3464675
3279116
690884
585221
13482716
Works Cited
Chambers, John M. Software for Data Analysis: Programming with R. Berlin: Springer New York, 2008. Print.
Gardener, Mark. Beginning R: The Statistical Programming Language. Indianapolis: John Wiley & Sons, 2012. Print.
Gentleman, Robert. R Programming for Bioinformatics. Boca Raton: CRC Press, 2009. Print.
Matloff, Norman S. The Art of R Programming: Tour of Statistical Software Design. San Francisco: No Starch Press, 2011. Print.