Survey Design for NHANES Data — nhanes

NHANES design objects are the data structure used in the cardioStatsUSA package for analysis of NHANES data.

Usage

nhanes_design(
  data,
  key,
  outcome_variable,
  outcome_quantiles = NULL,
  group_variable = NULL,
  group_cut_n = NULL,
  group_cut_type = NULL,
  stratify_variable = NULL,
  time_variable = "svy_year",
  time_values = NULL,
  pool = FALSE,
  run_checks = TRUE
)

Arguments

data

[data.frame] A set of NHANES data with one row per survey participant and one column per variable. See nhanes_data for more details See Details for specific requirements. See nhanes_data for an example.

key

[data.frame] A data set with one row per variable and with columns that describe the variable. See nhanes_key for more details See Details for specific requirements. See nhanes_key for an example.

outcome_variable

[character(1)] The name of the outcome variable to be summarized.

outcome_quantiles

[numeric(1+)] The quantiles to be summarized for a continuous outcome. The default is c(0.25, 0.50, 0.75). For example,

outcome_quantiles = c(.5) will compute the 50th percentile (i.e., the median)
outcome_quantiles = c(.25, .5, .75) will compute the 25th, 50th, and 75th percentile.
outcome_quantiles = seq(.1, .9, by = .1) will compute every 10th percentile, except for the 0th and 100th

group_variable

[character(1)] The name of the group variable. See Details for a description of the group variable and the stratify variable.

group_cut_n

[integer(1)] The number of groups to form using the group variable. This is only relevant if the group variable is continuous, and can be omitted. Default is 3

group_cut_type

[character(1)] The method used to create groups with the grouping variable. This is only relevant i fthe group variable is continuous, and can be omitted. Valid options are:

"interval": equal interval width, e.g., three groups with ages of 0 to <10, 10 to <20, and 20 to < 30 years.
"frequency": equal frequency, e.g., three groups with ages of 0 to <q, q to <p, and p to <r, where q, p, and r are selected so that roughly the same number of people are in each group.

stratify_variable

[character(1)] the name of the stratify variable. See Details for a description of the group variable and the stratify variable.

time_variable

[character(1)] The name of the time variable. The default, svy_year, corresponds to the variable in nhanes_data that indicates which 2 year NHANES cycle an observation was collected in.

time_values

[character(1+)] The time values that will be included in this design object. The default is to include all time values present in data. Valid options are:

'most_recent': includes the most recent time value.
'last_5': includes the 5 most recent time values.
'all': includes all time values present in data.
You can also give a vector of specific time values, e.g., c("2009-2010", "2011-2012", "2013-2014"), if these values are present in the time_variable column (they are for nhanes_data).

pool

[logical(1)] If FALSE (the default), results are presented for individual times, separately. If TRUE, data from each time value are pooled together. Note that only contiguous cycles should be pooled together, e.g., using pool = TRUE with time_values = 'last_5' is okay, but using pool = TRUE with time_values = c("2009-2010", "2013-2014") is not recommended (that would be a strange result to interpret).

run_checks

[logical(1)]

If TRUE (the default), inputs will be checked for validity. If FALSE, checks of inputs are skipped.

Value

an nhanes_design object.

Details

Requirements for data

data should include at a minimum the following variables:

svy_weight_mec: mobile examination center weights
svy_psu: Primary sampling unit
svy_strata: Strata
svy_year: NHANES cycle

Requirements for key

The key data should have all of the required columns, and should not have any column that is not listed below.

Column name	Column type	Is this column required?	The purpose of this column
class	character	TRUE	divide variables into classes
variable	character	TRUE	variable name in NHANES data
label	character	TRUE	label to present in results
source	character	FALSE	indicate where this variable is from
type	character	TRUE	variable type impacts summary results
outcome	logical	TRUE	indicate if variable is an outcome
group	logical	TRUE	indicate if variable is a grouper
subset	logical	TRUE	indicate if variable is a subsetter
stratify	logical	TRUE	indicate if variable is a stratifier
module	character	FALSE	indicate what module this variable belongs to
description	character	TRUE	describe the variable in detail

Group and stratify

The group variable and the stratify variable are both used to summarize an outcome within subgroups. If the summary is returned as a table, then the group and stratify variables can be considered interchangeable. If the summary is returned as a plot, then the subgroups defined by the group variable are shown on the same graph, while the subgroups defined by the stratify variable are shown on different graphs (i.e., one graph per strata). For example, if you wanted to estimate the race and sex specific prevalence of hypertension with one graph for men and one graphs for women, you would use group = 'demo_race' and stratify = 'demo_gender'

Examples

library(cardioStatsUSA)

ds <- nhanes_design(data = nhanes_data,
                    key = nhanes_key, 
                    time_values = 'most_recent',
                    outcome_variable = 'bp_sys_mean')

print(ds)

## -------------------------------- NHANES design --------------------------------
## 
## Outcome variable: bp_sys_mean
## - label: Systolic blood pressure (SBP), mm Hg
## - type: continuous
## - description: Mean systolic blood pressure in mm Hg. This is based on the
##     average of up to 3 readings. Participants were required to have at least one
##     reading. Overall, >95% of participants with at least one systolic blood
##     pressure reading had three readings.  From 1999-2000 through 2015-2016,
##     systolic blood pressure was measured using a mercury sphygmomanometer.  In
##     2017-2020, systolic blood pressure was measured using an oscillometric device.
##     The systolic blood pressure in 2017-2020 was calibrated to the mercury device
##     by adding 1.5 mm Hg to the mean measured oscillometric value.
## 
## Group variable: None
## Stratify variable: None
## 
## N observations
## - Unweighted: 8,965
## - Weighted: 247,835,696
## --------------------------------------------------------------------------------