From the CDC: "The second National Health and Nutrition Examination Survey, NHANES II, is a nationwide probability sample of 27,801 persons from 6 months - 74 years of age. From this sample, 25,286 people were interviewed and 20,322 people were examined, resulting in an overall response rate of 73 percent. Because children and persons classified as living at or below the poverty level were assumed to be at special risk of having nutritional problems, they were sampled at rates substantially higher than their proportions in the general population. Adjusted sampling weights were computed within 76 age-sex income groups in order to inflate the sample to closely reflect the target population at the midpoint of the survey."

nhanes

Format

A data frame with 10351 rows and 58 variables. Key variables include:

height

height in inches

weight

weight in kilograms

sex

Male or Female

race

White, Black or Other

age

age in years

bpsystol

systolic blood pressure

heartatk

1 if the individual had a heart attack

diabetes

1 if the individual has diabetes

rural

1 if the individual lives in a rural county

...

Source

https://wwwn.cdc.gov/nchs/nhanes/nhanes2/default.aspx

Note

This data set is useful for teaching introductory topics. It can be treated as a random sample of U.S. adults. A number of the variables (e.g. weight, height) are normally distributed. Teaching regression is easier when using a model like lm(weight ~ height) because height can affect weight but not the other way round. For more recent NHANES data see the R packages NHANES (https://cran.r-project.org/web/packages/NHANES/NHANES.pdf) and nhanesA (https://cran.r-project.org/web/packages/nhanesA/index.html).