Replicating analysis from the NCHS data brief, 2017 • cardioStatsUSA

This article shows how cardioStatsUSA can replicate results from a report from the National Center for Health Statistics (NCHS) data brief. The results can be replicated with our shiny application or with code.

Replicate with shiny application

The video below shows how Figure 1 can be replicated using our web application.

The video below shows how Figure 2 can be replicated using our web application.

The code used in the video to run the application with a different age variable is provided below for interested users:


library(cardioStatsUSA)
library(data.table)
library(magrittr)

# Make a copy of the NHANES data
nhanes_data_test <- copy(nhanes_data)

# modify the age groups prior to running the app
nhanes_data_test[
 , demo_age_cat := cut(demo_age_years,
                       breaks = c(18, 39, 59, Inf),
                       labels = c("18-39", "40-59", "60+"),
                       include.lowest = TRUE)
]

#' Include the same cohort as the CDC report

nhanes_data_test <- nhanes_data_test %>%
 # exclude pregnant women
 .[demo_pregnant == 'No' | is.na(demo_pregnant)] %>%
 # exclude participants missing both SBP and DBP
 .[!(is.na(bp_dia_mean) & is.na(bp_sys_mean))]


# don't include cholesterol module (cholesterol is in development)
nhanes_key_test <- nhanes_key[module != 'chol']

# important: if 'module' is included as a column in your key,
# then the app will automatically subset your cohort to include
# NHANES participants in the module that your outcome variable is in.
# We don't want to do that here, since the cohort analyzed by the CDC 
# didn't have the same exclusion criteria as the cohort we analyze by
# default in the app. So, the simple fix is to delete the module column:
nhanes_key_test$module <- NULL


app_run(nhanes_data = nhanes_data_test,
        nhanes_key = nhanes_key_test)

Replicate with code

The code in this section does exactly what the videos above do, and provides a glimpse of the R code that is used on the back-end of the shiny application.

We modify our demo_age_cat variable so that it has the same groups that are analyzed in the CDC report.


# Make a copy here so that we don't modify the NHANES data
# (doing so would break tests that are run after this one)
nhanes_data_test <- copy(nhanes_data)
# match the age groups of CDC report
nhanes_data_test[
 , demo_age_cat := cut(demo_age_years,
                       breaks = c(18, 39, 59, Inf),
                       labels = c("18-39", "40-59", "60+"),
                       include.lowest = TRUE)
]

Who is included in this test

For these tests, we include the same cohort as the CDC report. Full details on the inclusion criteria are given in the paper.


nhanes_data_test <- nhanes_data_test %>%
 # exclude pregnant women
 .[demo_pregnant == 'No' | is.na(demo_pregnant)] %>%
 # exclude participants missing both SBP and DBP
 .[!(is.na(bp_dia_mean) & is.na(bp_sys_mean))]

In most cases, we use nhanes_summarize() or nhanes_visualize() when generating results with cardioStatsUSA. Here, we use the nhanes_design family of functions. NHANES design functions are lower level functions that support nhanes_summarize and nhanes_visualize. Full details on this family can be found at the help page for [nhanes_design].


ds <- nhanes_data_test %>%
 nhanes_design(
  key = nhanes_key,
  outcome_variable = 'htn_jnc7',
  group_variable = 'demo_age_cat',
  time_values = '2015-2016'
 )

We create a separate NHANES design object that will use age adjustment via direct standardization, copying the standard weights reported by the CDC


ds_standard <- ds %>%
 nhanes_design_standardize(
  standard_variable = 'demo_age_cat',
  standard_weights = c(0.420263, 0.357202, 0.222535)
 )

Our first test: do we have the same sample size as the CDC report?


expect_equal(
 5504, # this is the sample size reported by CDC
 nrow(ds$design$variables)
)

Test results without age adjustment

This test compares our answers versus Figure 1 of the CDC report. We begin by loading the data from Figure 1:

test_data <- cdc_db289_figure_1 %>%
 setnames(old = c('estimate', 'std_error'),
          new = c('cdc_estimate', 'cdc_std_error'))

Now we summarize our design object (ds) with various updates to create results by age group and by sex within age groups


shiny_answers_by_age <- ds %>%
 nhanes_design_summarize(outcome_stats = 'percentage',
                         simplify_output = TRUE) %>%
 .[htn_jnc7 == 'Yes'] %>%
 .[, demo_gender := 'Overall']

shiny_answers_by_age_sex <- ds %>%
 # this adds a stratify variable to the already existing group variable (age)
 # to make results stratified by age and sex groups.
 nhanes_design_update(stratify_variable = 'demo_gender') %>%
 nhanes_design_summarize(outcome_stats = 'percentage',
                         simplify_output = TRUE) %>%
 .[htn_jnc7 == 'Yes']

Putting all our answers together:


shiny_answers <- list(shiny_answers_by_age,
                      shiny_answers_by_age_sex) %>%
 rbindlist(use.names = TRUE) %>%
 .[, .(demo_gender,
       demo_age_cat,
       shiny_estimate = round(estimate,1),
       shiny_std_error = round(std_error,1))]

Putting everything together:


test_results <- test_data %>%
 merge(shiny_answers, by = c("demo_gender", "demo_age_cat"))

test_results
#> Key: <demo_gender, demo_age_cat>
#>    demo_gender demo_age_cat cdc_estimate cdc_std_error shiny_estimate
#>         <char>       <char>        <num>         <num>          <num>
#> 1:         Men        18-39          9.2           1.4            9.2
#> 2:         Men        40-59         37.2           2.9           37.2
#> 3:         Men          60+         58.5           2.2           58.5
#> 4:     Overall        18-39          7.5           1.0            7.5
#> 5:     Overall        40-59         33.2           1.7           33.2
#> 6:     Overall          60+         63.1           2.1           63.1
#> 7:       Women        18-39          5.6           1.1            5.6
#> 8:       Women        40-59         29.4           2.0           29.4
#> 9:       Women          60+         66.8           2.6           66.8
#>    shiny_std_error
#>              <num>
#> 1:             1.4
#> 2:             2.9
#> 3:             2.2
#> 4:             1.0
#> 5:             1.7
#> 6:             2.1
#> 7:             1.1
#> 8:             2.0
#> 9:             2.6

In the following tests, we will compare our application’s estimates to corresponding estimates reported by the CDC. To ensure all of these differences are exactly 0, we set the tolerance values to 0

est_diff_tolerance <- 0
se_diff_tolerance <- 0

Now we can formally check for equality:

test_that(
 'cardioStatsUSA matches CDC report, Figure 1',
 code = {

  expect_equal(test_results$cdc_estimate,
               test_results$shiny_estimate,
               tolerance = est_diff_tolerance)

  expect_equal(test_results$cdc_std_error,
               test_results$shiny_std_error,
               tolerance = se_diff_tolerance)

 }
)
#> Test passed

Test results with age adjustment

This test compares our answers versus Figure 2 of the CDC report. We begin by loading the data from Figure 2:


test_data <- cdc_db289_figure_2 %>%
 # sync levels of the race variable
 .[, demo_race := factor(demo_race,
                         levels = levels(nhanes_data$demo_race))] %>%
 setnames(old = c('estimate', 'std_error'),
          new = c('cdc_estimate', 'cdc_std_error'))

using our standardized NHANES design, we create results in subgroups defined by race and race by sex.


shiny_answers_total <- ds_standard %>%
  nhanes_design_update(group_variable = 'demo_race') %>%
  nhanes_design_summarize(outcome_stats = 'percentage',
                          simplify_output = TRUE) %>%
  as.data.table() %>%
  .[, demo_gender := 'Total']

shiny_answers_by_gender <- ds_standard %>%
 nhanes_design_update(group_variable = 'demo_race',
                      stratify_variable = 'demo_gender') %>%
 nhanes_design_summarize(outcome_stats = 'percentage',
                         simplify_output = TRUE)

Putting all of our answers together:


shiny_answers <-
 list(shiny_answers_total,
      shiny_answers_by_gender) %>%
 rbindlist(use.names = TRUE) %>%
 .[htn_jnc7 == 'Yes'] %>%
 .[, .(demo_gender,
       demo_race,
       shiny_estimate  = round(estimate, 1),
       shiny_std_error = round(std_error, 1))]

Merging our results with the CDC results:


test_results <- test_data %>%
 merge(shiny_answers, by = c("demo_gender", "demo_race"))

Asserting equality between our results and the CDC results

#> Test passed

Make your own results

You can find the online application where customized graphs can be made here: https://bcjaeger.shinyapps.io/nhanesShinyBP/