Visualize summaries of NHANES
nhanes_visualize.Rd
Visualize summaries of NHANES
Usage
nhanes_visualize(
data,
key,
outcome_variable,
outcome_quantiles = NULL,
outcome_stats = NULL,
group_variable = NULL,
group_cut_n = NULL,
group_cut_type = NULL,
stratify_variable = NULL,
time_variable = "svy_year",
time_values = NULL,
pool = FALSE,
subset_calls = list(),
standard_variable = "demo_age_cat",
standard_weights = NULL,
statistic_primary = NULL,
title = NULL,
geom = "bar",
reorder_cats = FALSE,
width = NULL,
height = NULL,
size_point = NULL,
size_error = NULL
)
Arguments
- data
[data.frame] A set of NHANES data with one row per survey participant and one column per variable. See nhanes_data for more details See Details for specific requirements. See nhanes_data for an example.
- key
[data.frame] A data set with one row per variable and with columns that describe the variable. See nhanes_key for more details See Details for specific requirements. See nhanes_key for an example.
- outcome_variable
[character(1)] The name of the outcome variable to be summarized.
- outcome_quantiles
[numeric(1+)] The quantiles to be summarized for a continuous outcome. The default is
c(0.25, 0.50, 0.75)
. For example,outcome_quantiles = c(.5)
will compute the 50th percentile (i.e., the median)outcome_quantiles = c(.25, .5, .75)
will compute the 25th, 50th, and 75th percentile.outcome_quantiles = seq(.1, .9, by = .1)
will compute every 10th percentile, except for the 0th and 100th
- outcome_stats
[character(1+)]
The statistics that should be computed. Multiple statistics may be requested. Valid options depend on the type of outcome to be summarized. For continuous outcomes, valid options include
'mean': estimates the mean value of the outcome
'quantile': estimates 25th, 50th, and 75th percentile of the outcome.
For categorical outcomes, valid options include
'percentage': estimates the prevalence of the outcome
'percentage_kg': estimates the prevalence and uses Korn and Graubard's method to estimate a 95% confidence interval
'count': estimates the number of US adults with the outcome.
- group_variable
[character(1)] The name of the group variable. See Details for a description of the group variable and the stratify variable.
- group_cut_n
[integer(1)] The number of groups to form using the group variable. This is only relevant if the group variable is continuous, and can be omitted. Default is 3
- group_cut_type
[character(1)] The method used to create groups with the grouping variable. This is only relevant i fthe group variable is continuous, and can be omitted. Valid options are:
"interval": equal interval width, e.g., three groups with ages of 0 to <10, 10 to <20, and 20 to < 30 years.
"frequency": equal frequency, e.g., three groups with ages of 0 to <q, q to <p, and p to <r, where q, p, and r are selected so that roughly the same number of people are in each group.
- stratify_variable
[character(1)] the name of the stratify variable. See Details for a description of the group variable and the stratify variable.
- time_variable
[character(1)] The name of the time variable. The default,
svy_year
, corresponds to the variable innhanes_data
that indicates which 2 year NHANES cycle an observation was collected in.- time_values
[character(1+)] The time values that will be included in this design object. The default is to include all time values present in
data
. Valid options are:'most_recent'
: includes the most recent time value.'last_5'
: includes the 5 most recent time values.'all'
: includes all time values present indata
.You can also give a vector of specific time values, e.g.,
c("2009-2010", "2011-2012", "2013-2014")
, if these values are present in the time_variable column (they are fornhanes_data
).
- pool
[logical(1)] If
FALSE
(the default), results are presented for individual times, separately. IfTRUE
, data from each time value are pooled together. Note that only contiguous cycles should be pooled together, e.g., usingpool = TRUE
withtime_values = 'last_5'
is okay, but usingpool = TRUE
withtime_values = c("2009-2010", "2013-2014")
is not recommended (that would be a strange result to interpret).- subset_calls
[named list(n)]
the names of
subset_calls
are variable names, and the values are values of the variable to include in the subsetted data. For example,subset_calls = list("demo_gender" = "Women")
will subset the data to include rows wheredemo_gender
is equal to"Women"
. Multiple entries are allowed and collapsed with the logical&
operator. For example,subset_calls = list(demo_gender = "Women", bp_med_use = "Yes")
will subset the data to include rows wheredemo_gender
is equal to'Women'
ANDbp_med_use
is equal to"Yes"
- standard_variable
[character(1)]
The name of the variable used to create standardization groups. The default is to use
demo_age_cat
, which leads to age standardization.- standard_weights
[numeric(n)]
The proportionate weights for each group defined by the standard variable. The number of weights should equal the number of groups defined by
standard_variable
and all weights must be >0.- statistic_primary
[character(1)]
the statistic that defines the geometric objects in the plot. Other statistics will be featured in the text that appears when the users mouse hovers over the corresponding object.
- title
[character(1)]
The title that will appear above the plot. If this is not supplied, the title will be generated using the
key
data inx
.- geom
[character(1)]
The type of figure that will be made. Valid options are:
'bar'
creates a bar plot with annotations on the bars'point'
creates a scatter plot with 95% confidence interval error bars
- reorder_cats
[logical(1)]
whether to re-order the categorical group variable so that its levels are shown in increasing order by the expected outcome.
- width
[numeric(1)]
the width of the plot, in pixels
- height
[numeric(1)]
the height of the plot, in pixels
- size_point
[numeric(1)]
the size of points in the plot. (only relevant if
'geom' = 'point'
)- size_error
[numeric(1)]
the size of error bars in the plot. (only relevant if
'geom' = 'point'
)