Robert Tibshirani
The American Statistician, Vol. 51, No. 2. (May, 1997), pp. 106-111.
Abstract: I compare the world record sprint races of Donovan Bailey and
Michael Johnson in the 1996 Olympic Games, and try to answer the questions: 1.
Who is faster?, and 2. Which performance was more remarkable? The statistical
methods used include cubic spline curve fitting, the parametric bootstrap, and
Keller's model of running.
Frederick Mosteller
The American Statistician, Vol. 51, No. 4. (Nov., 1997), pp. 305-310.
Abstract: The author reviews and comments on his work in sports statistics,
illustrating with problems of estimation in baseball's World Series and with a
model for the distribution of the number of runs in a baseball half inning.
Data on collegiate football scores have instructive distributions that indicate
more about the strengths of the teams playing than their absolute values would
suggest. A robust analysis of professional football scores led to widespread
publicity with the help of professional newswriters. Professional golf players
on the regular tour are so close in skill that a few rounds do little to
distinguish their abilities. A simple model for golf scoring is "base
+X" where the base is a small score for a round rarely achieved, such as
64, and X is a Poisson distribution with mean about 8. In basketball, football,
and hockey the leader at the beginning of the final period wins about 80% of
the time, and in baseball the leader at the end of seven full innings wins 95%
of the time. Empirical experience with runs of even and odd numbers in tossing
a die millions of times fits closely the theoretical distributions.
Traci Clemons; Marcello Pagano
The American Statistician, Vol. 53, No. 4. (Nov., 1999), pp. 298-302.
Abstract: The distribution of our weight is one of the standard examples
given of the normal distribution appearing in nature. If we verify this
assertion by looking at birth certificates to check our birth weight, we find
that birth certificates are instructive and provide even more: they provide a
good, real example of a large dataset with some of the problems associated with
large datasets. Plus they produce other interesting, and to us unexpected,
characteristics of recorded birth weights in the
D. N. Joanes; C. A. Gill
The Statistician,
Vol. 47, No. 1. (1998), pp. 183-189.
Abstract: Over the years, various measures of sample skewness and kurtosis
have been proposed. Comparisons are made between those measures adopted by
well-known statistical computing packages, focusing on bias and mean-squared
error for normal samples, and presenting some comparisons from simulation
results for non-normal samples.
Adrian Bowman; W. Harper Gilmour; Gillian Constable; Neville Davies; Steven
G. Gilmour; Edwin J. Redfern
The Statistician,
Vol. 47, No. 2. (1998), pp. 349-364.
Abstract: Software which allows interactive textual and graphical material
to be created relatively easily has now existed for several years. This has
promoted considerable interest in computer-based learning. This paper describes
an approach which uses these tools to create problem-based material for use in
laboratory sessions in association with courses in statistics. Illustrations
are given, and the presentation of one particular problem, based on the design
and analysis of a simple clinical trial, is developed in detail. Important
issues of design, construction and evaluation are also discussed. It is argued
that, although it is expensive to produce, material of this type can provide
real benefits as a teaching resource.
Jan R. Magnus; Franc J. G. M. Klaassen
The Statistician,
Vol. 48, No. 2. (1999), pp. 239-246.
Abstract: We analyse two often-heard hypotheses concerning tennis-balls.
The first is: are new balls an advantage to the server? They are not (at least
not at
Hardeo Sahai; Satish Misra
The Statistician,
Vol. 41, No. 1. (1992), pp. 55-64.
Abstract: This paper discusses some teaching problems encountered in using S2
or shat2 as a definition of sample
variance, and how to overcome them, depending upon the teaching context and the
instructor's personal preference. In addition, several other definitions of
sample variance are introduced in the context of finding a good estimator of
the population variance as well as test statistics in hypothesis testing
procedures.
Mark J. Schervish
The American Statistician, Vol. 50, No. 3. (Aug., 1996), pp. 203-206.
Stable URL: http://links.jstor.org/sici?sici=0003-1305%28199608%2950%3A3%3C203%3APVWTAA%3E2.0.CO%3B2-0
Abstract: P values (or significance probabilities) have been used in place
of hypothesis tests as a means of giving more information about the
relationship between the data and the hypothesis than does a simple reject/do
not reject decision. Virtually all elementary statistics texts cover the
calculation of P values for one-sided and point-null hypotheses concerning the
mean of a sample from a normal distribution. There is, however, a third case
that is intermediate to the one-sided and point-null cases, namely the interval
hypothesis, that receives no coverage in elementary
texts. We show that P values are continuous functions of the hypothesis for
fixed data. This allows a unified treatment of all three types of hypothesis
testing problems. It also leads to the discovery that a common informal use of
P values as measures of support or evidence for hypotheses has serious logical
flaws.
Paul H. Kvam
The American Statistician, Vol. 50, No. 3. (Aug., 1996), pp. 238-242.
Stable URL:
http://links.jstor.org/sici?sici=0003-1305%28199608%2950%3A3%3C238%3AUESTET%3E2.0.CO%3B2-C
Abstract: Instructors for introductory courses with large enrollments must
routinely work to curb cheating during exams. A method used for such purposes
is described here. Perhaps more interesting than the method's
effectiveness is the inherited ability to draw inference on unknown parameters
of interest, including the proportion of students who cheat (as opposed to
guess) when faced with not knowing how to obtain the solution to a
multiple-choice exam question. For estimation we consider the method of
maximum likelihood.
David J. Aldous
Statistical Science,
Vol. 16, No. 1. (Feb., 2001), pp. 23-34.
Stable URL:
http://links.jstor.org/sici?sici=0883-4237%28200102%2916%3A1%3C23%3ASMADSF%3E2.0.CO%3B2-Z
Abstract: In 1924 Yule observed that distributions of number of species per
genus were typically long-tailed, and proposed a stochastic model to fit these
data. Modern taxonomists often prefer to represent relationships between
species via phylogenetic trees; the counterpart to Yule's observation is that
actual reconstructed trees look surprisingly unbalanced. The imbalance can
readily be seen via a scatter diagram of the sizes of clades involved in the
splits of published large phylogenetic trees. Attempting
stochastic modeling leads to two puzzles. First, two somewhat opposite
possible biological descriptions of what dominates the macroevolutionary
process (adaptive radiation; "neutral" evolution) lead to exactly the
same mathematical model (Markov or Yule or coalescent). Second, neither this
nor any other simple stochastic model predicts the observed pattern of
imbalance. This essay represents a probabilist's musings on these puzzles,
complementing the more detailed survey of biological literature by Mooers and
Heard, Quart. Rev. Biol. 72 [(1997) 31-54].
Alan Julian Izenman
Statistical Science,
Vol. 16, No. 1. (Feb., 2001), pp. 35-57.
Stable URL: http://links.jstor.org/sici?sici=0883-4237%28200102%2916%3A1%3C35%3ASALAOT%3E2.0.CO%3B2-Q
Abstract: Prosecuting those arrested for the unlawful possession,
distribution or importation of illicit drugs, such as powder cocaine, crack
cocaine, heroin and LSD, is usually both time-consuming and expensive,
primarily because of the need to determine "beyond a reasonable
doubt" the total amount of drugs seized in each case. Accuracy is
important since penalties (imprisonment or fines) depend upon the quantity
seized. Substantial backlogs in processing drug cases often develop as a
result. In some jurisdictions, complete testing and analysis of all substances
seized from a defendant are customary in support of a case, while in other
jurisdictions random sampling of drugs and the subsequent presentation of an
estimate of the total amount seized have been used for many years. Due to
pressure from crime laboratories and prosecutors, who point to major increases
in their caseloads as well as a trend toward decreasing funding and staffing
for the crime laboratories, jurisdictions which currently carry out a complete
census of all seized evidence are now seriously considering a change in their
methodology with a view to instituting new guidelines for the scientific
sampling of evidence. In this article, we discuss the statistical and legal
issues that have arisen in cases involving illicit drugs.
Robert E. Kass
Statistical Science,
Vol. 14, No. 2. (May, 1999), p. 149.
Stable URL: http://links.jstor.org/sici?sici=0883-4237%28199905%2914%3A2%3C149%3AIT%22TBC%3E2.0.CO%3B2-H
Brendan McKay; Dror Bar-Natan; Maya Bar-Hillel; Gil Kalai
Statistical Science,
Vol. 14, No. 2. (May, 1999), pp. 150-173.
Stable URL:
http://links.jstor.org/sici?sici=0883-4237%28199905%2914%3A2%3C150%3ASTBCP%3E2.0.CO%3B2-D
Abstract: A paper of Witztum, Rips and
Nien Fan Zhang
Technometrics, Vol.
40, No. 1. (Feb., 1998), pp. 24-38.
Stable URL:
http://links.jstor.org/sici?sici=0040-1706%28199802%2940%3A1%3C24%3AASCCFS%3E2.0.CO%3B2-1
Abstract: In the statistical process control environment, a primary method
to deal with autocorrelated data is the use of a residual chart. Although this
methodology has the advantage that it can be applied to any autocorrelated
data, it needs some modeling effort in practice. In addition, the detection
capability of the residual chart is not always great. This article proposes a
statistical control chart for stationary process data. It is simple to
implement, and no modeling effort is required. Comparisons are made among the
proposed chart, the residual chart, and other charts. When the process
autocorrelation is not very strong and the mean changes are not large, the new
chart performs better than the residual chart and the other charts.
Roger D. Vaughan; Melissa D. Begg
Journal of Educational and Behavioral Statistics, Vol. 24, No. 4. (Winter, 1999),
pp. 367-383.
Stable URL:
http://links.jstor.org/sici?sici=1076-9986%28199924%2924%3A4%3C367%3AMFTAOP%3E2.0.CO%3B2-%23
Abstract: In the evaluation of school-based intervention programs,
students' knowledge, behavior, and attitudes about a particular issue are
typically assessed before and after the intervention. The effectiveness of the
intervention can then be gauged by comparing these "pre-treatment"
and "post-treatment" responses. In the most rigorous evaluations,
students are randomized to the intervention or control group. However, instead
of randomizing individual students to treatment, most school-based studies rely
on clustered randomization schemes. This is generally operationalized as the
assignment of schools to treatment condition, although students within schools
serve as the units of observation. Because there tends to be positive
correlation between responses from students at the same school, the assumption
of statistical independence is violated; hence, application of statistical
tests that ignore this correlation can result in biased significance levels.
This paper explores two statistical methods that account for this correlation
in analyzing binary data. A proposal for adapting these methods for application
to matched pairs data is presented. The performance of the methods is evaluated
via simulation study.
Howard Wainer
Journal of Educational and Behavioral Statistics, Vol. 22, No. 1. (Spring, 1997),
pp. 1-30.
Stable URL: http://links.jstor.org/sici?sici=1076-9986%28199721%2922%3A1%3C1%3AITDWNT%3E2.0.CO%3B2-L
Abstract: The modern world is rich with data; an inability to effectively
utilize these data is a real handicap. One common mode of data communication is
the printed data table. In this article we provide four guidelines the use of
which can make tables more effective and evocative data displays. We use the
National Assessment of Educational Progress both to provide inspiration for the
development of these guidelines and to illustrate their operation. We also
discuss a theoretical structure to aid in the development of test items to tap
students' proficiency in extracting information from tables.
Shirley Pledger; Leigh Bullen
Biometrics, Vol. 54, No. 1. (Mar., 1998), pp. 61-66.
Stable URL:
http://links.jstor.org/sici?sici=0006-341X%28199803%2954%3A1%3C61%3ATFMANF%3E2.0.CO%3B2-U
Abstract: In a study of little blue penguins (Eudyptula minor) in
Clint W. Coakley; Mark A. Heise
Biometrics, Vol. 52, No. 4. (Dec., 1996), pp. 1242-1251.
Stable URL: http://links.jstor.org/sici?sici=0006-341X%28199612%2952%3A4%3C1242%3AVOTSTI%3E2.0.CO%3B2-X
Abstract: The question of how to treat tied observations is often a problem
in nonparametric tests. In this article the many different procedures that have
been suggested for dealing with ties in the sign test are reviewed and
compared. We discuss a new version of the sign test that is especially
well-suited for situations in which the probability of an observation equalling
the hypothesized median is known to be very high. This test is based on the
test statistic formed by adding the number of positive observations and
two-thirds of the number of zeroes. Studies of the power and significance level
of many tests that have been proposed for this problem indicate that the
asymptotic uniformly most powerful (UMP) test performs very well for almost all
settings considered, even when the sample size is small. This asymptotic UMP
test, which has been little used to date, is recommended for use in practice. A
particular application of the use of the sign test in the study of fluctuating
asymmetry of meristic traits in ornamental goldfish is presented.
K. G. Russell
Biometrics, Vol. 34, No. 1. (Mar., 1978), pp. 95-99.
Stable URL:
http://links.jstor.org/sici?sici=0006-341X%28197803%2934%3A1%3C95%3AEOTPOT%3E2.0.CO%3B2-T
Abstract: Moment estimators of the parameters of the Thomas distribution
are set out, and the maximum likelihood estimators (MLE's) of these parameters
are derived. A new result is obtained which shows that the MLE of the Thomas
population density is the sample mean. This result is used to obtain a
one-variable Newton-Raphson procedure, which provides estimates of the
distribution parameters which do not require as much computation as the usual
two-variable procedure.
Ann R. Miller
Journal of the American Statistical Association, Vol. 71, No. 354. (Jun., 1976), pp.
286-292.
Stable URL:
http://links.jstor.org/sici?sici=0162-1459%28197606%2971%3A354%3C286%3ARDOWSI%3E2.0.CO%3B2-C
Abstract: Data from the 1970 census on work status of respondents in April
1965 are compared with those from the April 1965 Current Population Survey
(CPS) for specific age-sex cohorts to gain some insight into the reliability of
the retrospective data collected in 1970. The results indicate that on a net
difference basis, the retrospective inquiry yielded reasonable approximations
for most cohorts. The problem of gross versus net differences is reviewed
briefly with respect to its implications for analyzing retrospective data.