The course covers the essential concepts of probability and statistics, including the design and analysis of experiments, or trials. A range of popular statistical techniques are dealt with, ranging from classical tests based on the normal distribution to non-parametric methods. Emphasis is on the practical use of statistics to improve data analysis and enhance decision making.
There are no particular prerequisites, although participants must be numerate. They will normally be educated to degree level and have A-Level mathematics, or equivalent.
The course includes:
Summarising Data and Exploratory Data Analysis:
Principles of good data analysis; the collection, analysis
and interpretation of data; different types of data. Ways of summarizing data -
the mean, median and mode as measures of location; the standard deviation,
variance, mean absolute deviation and inter-quartile range as measures of
dispersion. Plotting data - scatter plots, bar charts, histograms, relative
frequency and cumulative relative frequency distributions. Exploratory data
analysis, including initial inspection of the data, outliers and
box-and-whisker plots.
Probability:
Basic probability theory,
recognising mutual exclusivity, independence and dependence of events;
combining the probabilities of such events and using probability trees;
conditional probabilities and Bayes' Theorem; calculating permutations and
combinations.
Probability Distributions:
Calculations for the
discrete binomial and Poisson distributions and for the continuous uniform,
normal and exponential distributions; possible uses of these and other
probability distributions; approximations to certain probability
distributions.
Confidence Intervals and Significance Tests for Large
and Small Samples:
Sampling distributions
and the Central Limit Theorem; constructing confidence intervals for the
population mean given a large or small sample; performing a significance test
given a large or small sample, calculating the significance of the result (the
p-value), Type I and Type II errors and the power of a test, calculating the
minimum sample size needed to achieve a minimum level of precision. Two-sample
tests for large and small samples - comparing two sample means in order to look
for differences in two population means.
Tests of Consistency and Goodness of Fit:
Use
of the F-distribution and the F-test to compare the consistency of samples (ie
their variances), and to construct confidence intervals for standard
deviations. Use of the chi-squared distribution to test the goodness of fit of
a data set to a particular distribution, and to test for the independence of
categorical variables in contingency tables. The Kolmogorov-Smirnov test for
the goodness of fit of a data set to a continuous distribution.
Non-Parametric Methods:
The nature of
non-parametric tests; the one-sample and two-sample sign tests; ranking methods
such as the Wilcoxon signed rank test for paired data and the
Mann-Whitney-U-test for unpaired data; computer intensive methods such as Monte
Carlo generation of test statistics.
Introduction to Experimental/Trials Design:
The principles of good
experimental design, including randomization, blocking and replication. Factors
and levels, main effects and interactions. Particular trial designs-completely
randomized, randomized block, complete factorial, 2n, Latin square
and fractional factorials. Techniques to analyse the results from such
trials.
The Analysis of Variance and Regression:
Comparing several sample means -
how ANOVA works; one-way and two-way ANOVA; testing for significant main
effects and interaction effects; multiple comparison follow-up tests such as
the Least Significant Difference test and Tukey's test; ANOVA for the Latin
square. The principles of linear regression - calculating the best
straight-line fit to a data set; confidence intervals for the slope and the
intercept, and testing them for significance; extensions to curvilinear
regression and multiple regression; the coefficient of determination and the
correlation coefficient as measures of fit.
Statistics and Simulation:
How to generate representative
random samples from probability distributions; methods for reducing variance in
simulation output estimators and/or reducing the number of replications
required. Tests on proportions with one or two samples, eg comparing the
proportions of successful replications achieved by different strategies in a
simulation experiment.
Practicals:
These will involve the use of commercially
available statistical software. The aim of the practicals is to reinforce
students' understanding of the underlying principles behind the techniques, and
to make them aware of what a typical statistical software package has
to offer.
The course lectures will be given by the teaching and research staff of the Applied Mathematics and Operational Research Group under the direction of Dr T J Ringrose with the assistance of other colleagues. External speakers may give lectures on specialist topics.