Survey Design for NHANES Data
nhanes_design.Rd
NHANES design objects are the data structure used in
the cardioStatsUSA
package for analysis of NHANES data.
Usage
nhanes_design(
data,
key,
outcome_variable,
outcome_quantiles = NULL,
group_variable = NULL,
group_cut_n = NULL,
group_cut_type = NULL,
stratify_variable = NULL,
time_variable = "svy_year",
time_values = NULL,
pool = FALSE,
run_checks = TRUE
)
Arguments
- data
[data.frame] A set of NHANES data with one row per survey participant and one column per variable. See nhanes_data for more details See Details for specific requirements. See nhanes_data for an example.
- key
[data.frame] A data set with one row per variable and with columns that describe the variable. See nhanes_key for more details See Details for specific requirements. See nhanes_key for an example.
- outcome_variable
[character(1)] The name of the outcome variable to be summarized.
- outcome_quantiles
[numeric(1+)] The quantiles to be summarized for a continuous outcome. The default is
c(0.25, 0.50, 0.75)
. For example,outcome_quantiles = c(.5)
will compute the 50th percentile (i.e., the median)outcome_quantiles = c(.25, .5, .75)
will compute the 25th, 50th, and 75th percentile.outcome_quantiles = seq(.1, .9, by = .1)
will compute every 10th percentile, except for the 0th and 100th
- group_variable
[character(1)] The name of the group variable. See Details for a description of the group variable and the stratify variable.
- group_cut_n
[integer(1)] The number of groups to form using the group variable. This is only relevant if the group variable is continuous, and can be omitted. Default is 3
- group_cut_type
[character(1)] The method used to create groups with the grouping variable. This is only relevant i fthe group variable is continuous, and can be omitted. Valid options are:
"interval": equal interval width, e.g., three groups with ages of 0 to <10, 10 to <20, and 20 to < 30 years.
"frequency": equal frequency, e.g., three groups with ages of 0 to <q, q to <p, and p to <r, where q, p, and r are selected so that roughly the same number of people are in each group.
- stratify_variable
[character(1)] the name of the stratify variable. See Details for a description of the group variable and the stratify variable.
- time_variable
[character(1)] The name of the time variable. The default,
svy_year
, corresponds to the variable innhanes_data
that indicates which 2 year NHANES cycle an observation was collected in.- time_values
[character(1+)] The time values that will be included in this design object. The default is to include all time values present in
data
. Valid options are:'most_recent'
: includes the most recent time value.'last_5'
: includes the 5 most recent time values.'all'
: includes all time values present indata
.You can also give a vector of specific time values, e.g.,
c("2009-2010", "2011-2012", "2013-2014")
, if these values are present in the time_variable column (they are fornhanes_data
).
- pool
[logical(1)] If
FALSE
(the default), results are presented for individual times, separately. IfTRUE
, data from each time value are pooled together. Note that only contiguous cycles should be pooled together, e.g., usingpool = TRUE
withtime_values = 'last_5'
is okay, but usingpool = TRUE
withtime_values = c("2009-2010", "2013-2014")
is not recommended (that would be a strange result to interpret).- run_checks
[logical(1)]
If
TRUE
(the default), inputs will be checked for validity. IfFALSE
, checks of inputs are skipped.
Details
Requirements for data
data
should include at a minimum the following variables:
svy_weight_mec
: mobile examination center weightssvy_psu
: Primary sampling unitsvy_strata
: Stratasvy_year
: NHANES cycle
Requirements for key
The key data should have all of the required columns, and should not have any column that is not listed below.
Column name | Column type | Is this column required? | The purpose of this column |
class | character | TRUE | divide variables into classes |
variable | character | TRUE | variable name in NHANES data |
label | character | TRUE | label to present in results |
source | character | FALSE | indicate where this variable is from |
type | character | TRUE | variable type impacts summary results |
outcome | logical | TRUE | indicate if variable is an outcome |
group | logical | TRUE | indicate if variable is a grouper |
subset | logical | TRUE | indicate if variable is a subsetter |
stratify | logical | TRUE | indicate if variable is a stratifier |
module | character | FALSE | indicate what module this variable belongs to |
description | character | TRUE | describe the variable in detail |
Group and stratify
The group variable and the stratify variable are both used to summarize
an outcome within subgroups. If the summary is returned as a table, then
the group and stratify variables can be considered interchangeable. If
the summary is returned as a plot, then the subgroups defined by the
group variable are shown on the same graph, while the subgroups defined
by the stratify variable are shown on different graphs (i.e., one graph
per strata). For example, if you wanted to estimate the race and sex
specific prevalence of hypertension with one graph for men and one
graphs for women, you would use group = 'demo_race'
and
stratify = 'demo_gender'
Examples
library(cardioStatsUSA)
ds <- nhanes_design(data = nhanes_data,
key = nhanes_key,
time_values = 'most_recent',
outcome_variable = 'bp_sys_mean')
print(ds)
## -------------------------------- NHANES design --------------------------------
##
## Outcome variable: bp_sys_mean
## - label: Systolic blood pressure (SBP), mm Hg
## - type: continuous
## - description: Mean systolic blood pressure in mm Hg. This is based on the
## average of up to 3 readings. Participants were required to have at least one
## reading. Overall, >95% of participants with at least one systolic blood
## pressure reading had three readings. From 1999-2000 through 2015-2016,
## systolic blood pressure was measured using a mercury sphygmomanometer. In
## 2017-2020, systolic blood pressure was measured using an oscillometric device.
## The systolic blood pressure in 2017-2020 was calibrated to the mercury device
## by adding 1.5 mm Hg to the mean measured oscillometric value.
##
## Group variable: None
## Stratify variable: None
##
## N observations
## - Unweighted: 8,965
## - Weighted: 247,835,696
## --------------------------------------------------------------------------------