This repository contains a collection of teaching datasets that can be used for teaching R. Although this repository was created specifically for use in teaching the Open Reproducible Science in R lecture series, the data made available through here is open for access and use by anyone and is distributed under a Creative Commons 1.0 Universal (CC0) license.
File Name | File Type | File Description | Epidemiology/Statistics Usage |
---|---|---|---|
ba.dat | DAT | The dataset from Bland JM, Altman DG. Statistical Methods for Assessing Agreement Between Two Methods of Clinical Measurement. The Lancet. 1986;1: 307–310. | Useful for learning and practicing how to perform statistical methods to compare diagnostic tests using the Bland and Altman approach and the Bland and Altman plot |
bateman.dat | DAT | On Saturday, 21st April 1990, a luncheon was held in the home of Jean Bateman. There was a total of forty-five guests which included thirty-five members of the Department of Epidemiology and Population Sciences at the London School of Hygiene and Tropical Medicine. On Sunday morning, 22nd April 1990, Jean awoke with symptoms of gastrointestinal illness; her husband awoke with similar symptoms. The possibility of an outbreak related to the luncheon was strengthened when several of the guests telephoned Jean on Sunday and reported illness. On Monday, 23rd April 1990, there was an unusually large number of department members absent from work and reporting illness. Data from this outbreak is stored in this dataset. | Useful for learning and practicing how to perform logistic regression |
ca.dat | DAT | A dataset on the survival of cancer patients in two different treatment groups | Useful for learning and practicing how to perform survival anlaysis |
cover.dat | DAT | The dataset contains data from a coverage survey for a therapeutic feeding program (TFP) in central Malawi undertaken in March 2003. Data were collected using the centric systematic area sampling method to define sampling locations: A number of communities located closest to the centres of thirty 10 x 10 kilometre grid squares were sampled using active (investigative) case-finding. | Useful for learning and practicing how to analyse survey data and how to perform basic spatial analysis |
diets.dat | DAT | The dataset contains data from a trial of two different diets undertaken at an adult therapeutic feeding centre in Somalia. | Useful for learning and practicing statistical tests to show difference in mean between two groups |
fem.dat | DAT | A dataset from 118 female pyschiatric patients | Useful for learning and practicing various statistical tests, linear regression, logistic regression, and linear modelling |
fem.xlsx | XLSX | A dataset from 118 female pyschiatric patients | Useful for learning and practicing various statistical tests, linear regression, logistic regression, and linear modelling |
gudhiv.dat | DAT | This data is from a cross-sectional study of 435 male patients who presented with sexually transmitted infections at an outpatient clinic in The Gambia between August 1988 and June 1990. | Useful for learning and practicing logistic regression |
koko_plus_coverage.csv | CSV | Dataset from a coverage survey of Koko+ in Eastern Ghana | Useful for learning how to analyse survey data, perform basic spatial analysis, and perform comparative analysis for evaluating programme performance |
malaria.dat | DAT | A dataset that contains data on rainfall (in mm) and the number of cases of malaria reported from health centres in an administrative district of Ethiopia between July 1997 and July 1999 | Usefule for learning and practicing time series analysis and plotting |
nut.dat | DAT | Useful for learning and practicing how to analyse survey data | |
octe.dat | DAT | This data is from a matched case-control study investigating the association between oral contraceptive use and thromboembolism. The cases are 175 women aged between 15 and 44 years admitted to hospital for thromboembolism and discharged alive. The controls are female patients admitted for conditions believed to be unrelated to oral contraceptive use. Cases and controls were matched on age, ethnic group, marital status, parity, income, place of residence, and date of hospitalisation. | Useful for learning and practicing how to perform analysis for a matched cases-control study |
pop.dat | DAT | A dataset that contains data on the age (in months) and sex of 438 children aged between six and sixty months collected as part of a nutritional anthropometry survey of the Khosh Valley in Northeast Afghanistan. | Useful for learning and practicing how to create various plots including a population pyramid plot |
salex.dat | DAT | This data comes from a food-borne outbreak. On Saturday 17th October 1992, eighty-two people attended a buffet meal at a sports club. Within fourteen to twenty-four hours, fifty-one of the participants developed diarrhoea, with nausea, vomiting, abdominal pain and fever. | Useful for learning and practicing how to perform analysis for relative risk and odds ratios |
school_nutrition.csv | CSV | A dataset from a nutrition survey of school children 10 years and older from Pakistan. | Useful for learning how to analyse survey data |
school_nutrition.xlsx | XLSX | A dataset from a nutrition survey of school children 10 years and older from Pakistan. | Useful for learning how to analyse survey data |
south_wollo_coverage.csv | CSV | A dataset from a Community-based Management of Acute Malnutrition (CMAM) programme in South Wollo Zone, Ethiopia | Useful for learning how to analyse survey data and perform basic spatial analysis |
sssw.dat | DAT | This dataset contains data on the marital status, home circumstances, and ethnic group of 152 persons recruited into a study into the levels of stress experienced by student social workers in the United Kingdom. | Useful for learning how to analyse survey data and using various plots for exploratory data analysis |
tsstamp.dat | DAT | This data is from a matched case-control study investigating the association between the use of different brands of tampon and toxic shock syndrome undertaken during an outbreak. Only a subset of the original dataset is used here. | Useful for learning and practicing how to perform logistic regression and stratified analysis |
waste.dat | DAT | The dataset contains the location of twenty-three recent cases of childhood cancer in 5 by 5 km square surrounding an industrial waste disposal site. | Useful for learning and practicing computer simulation to test spatial clustering |
whz.dat | DAT |