Replicating analysis from the NCHS data brief, 2017
replicate_nchs_2017.Rmd
This article shows how cardioStatsUSA
can replicate
results from a report from the National Center for Health Statistics
(NCHS) data brief. The results can be replicated with our shiny
application or with code.
Replicate with shiny application
The video below shows how Figure 1 can be replicated using our web application.
The video below shows how Figure 2 can be replicated using our web application.
The code used in the video to run the application with a different age variable is provided below for interested users:
library(cardioStatsUSA)
library(data.table)
library(magrittr)
# Make a copy of the NHANES data
nhanes_data_test <- copy(nhanes_data)
# modify the age groups prior to running the app
nhanes_data_test[
, demo_age_cat := cut(demo_age_years,
breaks = c(18, 39, 59, Inf),
labels = c("18-39", "40-59", "60+"),
include.lowest = TRUE)
]
#' Include the same cohort as the CDC report
nhanes_data_test <- nhanes_data_test %>%
# exclude pregnant women
.[demo_pregnant == 'No' | is.na(demo_pregnant)] %>%
# exclude participants missing both SBP and DBP
.[!(is.na(bp_dia_mean) & is.na(bp_sys_mean))]
# don't include cholesterol module (cholesterol is in development)
nhanes_key_test <- nhanes_key[module != 'chol']
# important: if 'module' is included as a column in your key,
# then the app will automatically subset your cohort to include
# NHANES participants in the module that your outcome variable is in.
# We don't want to do that here, since the cohort analyzed by the CDC
# didn't have the same exclusion criteria as the cohort we analyze by
# default in the app. So, the simple fix is to delete the module column:
nhanes_key_test$module <- NULL
app_run(nhanes_data = nhanes_data_test,
nhanes_key = nhanes_key_test)
Replicate with code
The code in this section does exactly what the videos above do, and provides a glimpse of the R code that is used on the back-end of the shiny application.
We modify our demo_age_cat
variable so that it has the
same groups that are analyzed in the CDC report.
# Make a copy here so that we don't modify the NHANES data
# (doing so would break tests that are run after this one)
nhanes_data_test <- copy(nhanes_data)
# match the age groups of CDC report
nhanes_data_test[
, demo_age_cat := cut(demo_age_years,
breaks = c(18, 39, 59, Inf),
labels = c("18-39", "40-59", "60+"),
include.lowest = TRUE)
]
Who is included in this test
For these tests, we include the same cohort as the CDC report. Full details on the inclusion criteria are given in the paper.
nhanes_data_test <- nhanes_data_test %>%
# exclude pregnant women
.[demo_pregnant == 'No' | is.na(demo_pregnant)] %>%
# exclude participants missing both SBP and DBP
.[!(is.na(bp_dia_mean) & is.na(bp_sys_mean))]
In most cases, we use nhanes_summarize()
or
nhanes_visualize()
when generating results with
cardioStatsUSA
. Here, we use the nhanes_design
family of functions. NHANES design functions are lower level functions
that support nhanes_summarize
and
nhanes_visualize
. Full details on this family can be found
at the help page for [nhanes_design].
ds <- nhanes_data_test %>%
nhanes_design(
key = nhanes_key,
outcome_variable = 'htn_jnc7',
group_variable = 'demo_age_cat',
time_values = '2015-2016'
)
We create a separate NHANES design object that will use age adjustment via direct standardization, copying the standard weights reported by the CDC
ds_standard <- ds %>%
nhanes_design_standardize(
standard_variable = 'demo_age_cat',
standard_weights = c(0.420263, 0.357202, 0.222535)
)
Our first test: do we have the same sample size as the CDC report?
expect_equal(
5504, # this is the sample size reported by CDC
nrow(ds$design$variables)
)
Test results without age adjustment
This test compares our answers versus Figure 1 of the CDC report. We begin by loading the data from Figure 1:
test_data <- cdc_db289_figure_1 %>%
setnames(old = c('estimate', 'std_error'),
new = c('cdc_estimate', 'cdc_std_error'))
Now we summarize our design object (ds
) with various
updates to create results by age group and by sex within age groups
shiny_answers_by_age <- ds %>%
nhanes_design_summarize(outcome_stats = 'percentage',
simplify_output = TRUE) %>%
.[htn_jnc7 == 'Yes'] %>%
.[, demo_gender := 'Overall']
shiny_answers_by_age_sex <- ds %>%
# this adds a stratify variable to the already existing group variable (age)
# to make results stratified by age and sex groups.
nhanes_design_update(stratify_variable = 'demo_gender') %>%
nhanes_design_summarize(outcome_stats = 'percentage',
simplify_output = TRUE) %>%
.[htn_jnc7 == 'Yes']
Putting all our answers together:
shiny_answers <- list(shiny_answers_by_age,
shiny_answers_by_age_sex) %>%
rbindlist(use.names = TRUE) %>%
.[, .(demo_gender,
demo_age_cat,
shiny_estimate = round(estimate,1),
shiny_std_error = round(std_error,1))]
Putting everything together:
test_results <- test_data %>%
merge(shiny_answers, by = c("demo_gender", "demo_age_cat"))
test_results
#> Key: <demo_gender, demo_age_cat>
#> demo_gender demo_age_cat cdc_estimate cdc_std_error shiny_estimate
#> <char> <char> <num> <num> <num>
#> 1: Men 18-39 9.2 1.4 9.2
#> 2: Men 40-59 37.2 2.9 37.2
#> 3: Men 60+ 58.5 2.2 58.5
#> 4: Overall 18-39 7.5 1.0 7.5
#> 5: Overall 40-59 33.2 1.7 33.2
#> 6: Overall 60+ 63.1 2.1 63.1
#> 7: Women 18-39 5.6 1.1 5.6
#> 8: Women 40-59 29.4 2.0 29.4
#> 9: Women 60+ 66.8 2.6 66.8
#> shiny_std_error
#> <num>
#> 1: 1.4
#> 2: 2.9
#> 3: 2.2
#> 4: 1.0
#> 5: 1.7
#> 6: 2.1
#> 7: 1.1
#> 8: 2.0
#> 9: 2.6
In the following tests, we will compare our application’s estimates to corresponding estimates reported by the CDC. To ensure all of these differences are exactly 0, we set the tolerance values to 0
est_diff_tolerance <- 0
se_diff_tolerance <- 0
Now we can formally check for equality:
test_that(
'cardioStatsUSA matches CDC report, Figure 1',
code = {
expect_equal(test_results$cdc_estimate,
test_results$shiny_estimate,
tolerance = est_diff_tolerance)
expect_equal(test_results$cdc_std_error,
test_results$shiny_std_error,
tolerance = se_diff_tolerance)
}
)
#> Test passed
Test results with age adjustment
This test compares our answers versus Figure 2 of the CDC report. We begin by loading the data from Figure 2:
test_data <- cdc_db289_figure_2 %>%
# sync levels of the race variable
.[, demo_race := factor(demo_race,
levels = levels(nhanes_data$demo_race))] %>%
setnames(old = c('estimate', 'std_error'),
new = c('cdc_estimate', 'cdc_std_error'))
using our standardized NHANES design, we create results in subgroups defined by race and race by sex.
shiny_answers_total <- ds_standard %>%
nhanes_design_update(group_variable = 'demo_race') %>%
nhanes_design_summarize(outcome_stats = 'percentage',
simplify_output = TRUE) %>%
as.data.table() %>%
.[, demo_gender := 'Total']
shiny_answers_by_gender <- ds_standard %>%
nhanes_design_update(group_variable = 'demo_race',
stratify_variable = 'demo_gender') %>%
nhanes_design_summarize(outcome_stats = 'percentage',
simplify_output = TRUE)
Putting all of our answers together:
shiny_answers <-
list(shiny_answers_total,
shiny_answers_by_gender) %>%
rbindlist(use.names = TRUE) %>%
.[htn_jnc7 == 'Yes'] %>%
.[, .(demo_gender,
demo_race,
shiny_estimate = round(estimate, 1),
shiny_std_error = round(std_error, 1))]
Merging our results with the CDC results:
Asserting equality between our results and the CDC results
#> Test passed
Make your own results
You can find the online application where customized graphs can be made here: https://bcjaeger.shinyapps.io/nhanesShinyBP/