--- title: "Example: National Ambulatory Medical Care Survey (NAMCS) tables" output: rmarkdown::html_vignette vignette: > %\VignetteIndexEntry{Example: National Ambulatory Medical Care Survey (NAMCS) tables} %\VignetteEngine{knitr::rmarkdown} %\VignetteEncoding{UTF-8} --- ```{r, include = FALSE} knitr::opts_chunk$set( collapse = TRUE # , comment = "#>" ) ``` This example uses the National Ambulatory Medical Care Survey (NAMCS) 2019 Public Use File (PUF) to replicate certain tables from the [National Ambulatory Medical Care Survey: 2019 National Summary Tables](https://www.cdc.gov/nchs/data/ahcd/namcs_summary/2019-namcs-web-tables-508.pdf). NAMCS is "an annual nationally representative sample survey of visits to non-federal office-based patient care physicians, excluding anesthesiologists, radiologists, and pathologists." Note that the unit of observation is visits, not patients -- this distinction is important since a single patient can make multiple visits. Selected variables from NAMCS 2019 come with the `surveytable` package, for use in examples, in an object called `namcs2019sv`. # Begin Begin by loading the `surveytable` package. When you do, it will print a message explaining how to specify the survey that you'd like to analyze. We are omitting that message here. ```{r, results=FALSE, message=FALSE, warning=FALSE} library(surveytable) ``` Now, specify the survey that you'd like to analyze. ```{r, results='asis'} set_survey(namcs2019sv) ``` Check the survey name, survey design variables, and the number of observations to verify that it all looks correct. For this example, we do want to turn on certain NCHS-specific options, such as identifying low-precision estimates. If you do not care about identifying low-precision estimates, you can skip this command. To turn on the NCHS-specific options: ```{r, results='asis'} set_opts(mode = "NCHS") ``` # Table 1 ## Counts and percentages This table shows the overall estimated count as well as the counts and percentages by type of doctor, physician specialty, and metropolitan statistical area. The variables that are necessary for creating this table are already in the survey, making the commands very straightforward. ```{r, results='asis'} total() tab("MDDO", "SPECCAT", "MSA") ``` ## Rates The published table also shows several rates. To calculate rates, in addition to the survey, we need a source of information with population estimates. You would typically use a function such as `read.csv()` to load the population estimates and get them into the correct format. The `surveytable` package comes with an object called `uspop2019` that contains several population estimates for use in these examples. ```{r} class(uspop2019) names(uspop2019) ``` Here is the overall population estimate: ```{r} uspop2019$total ``` Once we have the overall population estimate, the overall rate is: ```{r, results='asis'} total_rate(uspop2019$total) ``` To calculate the rates for a particular variable, we need to provide a data frame with a variable called `Level` that matches the levels of the variable in the survey, and a variable called `Population` that gives the population size (which is assumed to be a constant rather than a random variable). For `MSA`, we can see the levels of the variables just by using the `tab()` command, just as we did above. Thus, to calculate rates, we need a data frame as follows: ```{r} uspop2019$MSA ``` Now that we have the appropriate population estimates, the rate is: ```{r, results='asis'} tab_rate("MSA", uspop2019$MSA) ``` We can also calculate rates of a specific variable based on the entire population: ```{r, results='asis'} tab_rate("MDDO", uspop2019$total) tab_rate("SPECCAT", uspop2019$total) ``` # Table 3 ## Counts and percentages This table presents estimates for each age group, as well as for each age group by sex. ```{r, results='asis'} var_list("age") ``` The survey has a couple of relevant age-related variables. `AGE` is the patient age in years. `AGER` is a categorical variable based on `AGE`. However, for this table, in addition to `AGER`, we need another age group variable, with different age categories. We create it using the `var_cut` function. ```{r} var_cut("Age group", "AGE" , c(-Inf, 0, 4, 14, 64, Inf) , c("Under 1", "1-4", "5-14", "15-64", "65 and over") ) ``` Now that we've created the `Age group` variable, we can create the tables: ```{r, results='asis'} tab("AGER", "Age group", "SEX") tab_cross("AGER", "SEX") ``` ## Rates ```{r, results='asis'} tab_rate("AGER", uspop2019$AGER) tab_rate("Age group", uspop2019$`Age group`) tab_rate("SEX", uspop2019$SEX) ``` To calculate the rates for one variable (`AGER`) by another variable (`SEX`), we need population estimates in the following format: ```{r} uspop2019$`AGER x SEX` ``` Once we have these population estimates, the rates are: ```{r, results='asis'} tab_subset_rate("AGER", "SEX", uspop2019$`AGER x SEX`) ``` # Table 5 This table gives the expected sources of payment. We use the `PAY*` variables to create several new variables that are required by the table. Note that the `PAY*` variables are logical (`TRUE` or `FALSE`), which simplifies the workflow. (The survey was imported into R using the `importsurvey` package, which automatically detects binary variables and imports them as logical variables.) ```{r, results='asis'} # var_all("Medicare and Medicaid", c("PAYMCARE", "PAYMCAID")) # var_any("Payment used", c("PAYPRIV", "PAYMCARE", "PAYMCAID" , "PAYWKCMP", "PAYOTH", "PAYDK")) var_not("No other payment used", "Payment used") var_all("Self-pay", c("PAYSELF", "No other payment used")) var_all("No charge", c("PAYNOCHG", "No other payment used")) var_any("No insurance", c("Self-pay", "No charge")) # var_case("No pay", "NOPAY", "No categories marked") var_any("Unknown or blank", c("PAYDK", "No pay")) ## tab("PAYPRIV", "PAYMCARE", "PAYMCAID", "Medicare and Medicaid" , "No insurance", "Self-pay", "No charge" , "PAYWKCMP", "PAYOTH", "Unknown or blank") ``` Check the presentation standards flags! Under NCHS presentation standards rules, some of these estimates should not be shown. # Table 6 This table shows the primary care provider and referral status, by prior-visit status. In the table, the "Unknown" and "Blank" values are collapsed into a single value. We can collapse two or more levels of a factor into a single level using the `var_collapse` function. ```{r} var_collapse("PRIMCARE", "Unknown if PCP", c("Unknown", "Blank")) var_collapse("REFER", "Unknown if referred", c("Unknown", "Blank")) ``` Now, for the table: ```{r, results='asis'} tab("PRIMCARE", "REFER", "SENBEFOR") ``` The percentages within each subset that is defined by `SENBEFOR` add up to 100% -- for this reason, we want to use `tab_subset()`, not `tab_cross()`. ```{r, results='asis'} tab_subset("PRIMCARE", "SENBEFOR") tab_subset("REFER", "SENBEFOR") ``` # Table 11 This table shows the same information as Table 3, but only for preventive care visits. That is, estimates for each age group, as well as for each age group by sex, but only for preventive care visits. Let's create `Age group` from `AGE` and cross `AGER` and `SEX` to create a variable called `Age x Sex`: ```{r} var_cut("Age group", "AGE" , c(-Inf, 0, 4, 14, 64, Inf) , c("Under 1", "1-4", "5-14", "15-64", "65 and over") ) var_cross("Age x Sex", "AGER", "SEX") ``` To see the possible values of `MAJOR` (Major reason for this visit), and to estimate the total count for preventive care visits: ```{r, results='asis'} tab("MAJOR") ``` To create the tables of age, sex, and their interaction, and limit them to only the preventive care visits: ```{r, results='asis'} tab_subset("AGER", "MAJOR", "Preventive care") tab_subset("Age group", "MAJOR", "Preventive care") tab_subset("SEX", "MAJOR", "Preventive care") tab_subset("Age x Sex", "MAJOR", "Preventive care") ``` As each of the above commands is similar, and differs only in the first variable that is passed to the `tab_subset()` function, this code can be streamlined with a `for` loop: ```{r, results='asis'} for (vr in c("AGER", "Age group", "SEX", "Age x Sex")) { print( tab_subset(vr, "MAJOR", "Preventive care") ) } ``` Note that when called from inside a `for` loop, the `print()` function needs to be called explicitly. ## More advanced coding In addition, for each age-sex category, the published table shows the percentage of preventive care visits made to primary care physicians. To calculate these percentages, a slightly more involved `for` loop is needed. Below is the code, followed by an explanation: ```r set_opts(csv = "output.csv") ``` ```{r} for (vr in c("AGER", "Age group", "SEX", "Age x Sex")) { var_cross("tmp", "MAJOR", vr) for (lvl in levels(surveytable:::env$survey$variables[,vr])) { tab_subset("SPECCAT", "tmp", paste0("Preventive care: ", lvl)) } } set_opts(csv = "") ``` * Since `tab_subset()` is called from within a `for` loop, if we wanted to print to the screen, we would need to use `print( tab_subset(*) )`. Since we don't want to print to the screen, a call to `print()` is omitted. * Since so many tables are being produced, the output is being sent to a CSV file. * As before, the loop goes through the age, sex, and age / sex interaction variables, calling each of these variables `vr`. * MAJOR and `vr` are crossed, with the result stored in a variable called `tmp`. * Next, the inner loop goes through all levels of `vr`, calling each of these levels `lvl`. * The code tabulates `SPECCAT` (Type of specialty – Primary, Medical, Surgical) on a subset in which `tmp` (which is `MAJOR` crossed with `vr`) is restricted to `"Preventive care: "` followed by `lvl`, which is some level of `vr`, such as “Under 15 years” for `AGER`. * Finally, CSV output is turned off. If you run this code, all of the tables should be stored in the CSV file. To give you an idea of what the tables should look like, here is just one of the tables: ```{r, results='asis'} vr = "AGER" var_cross("tmp", "MAJOR", vr) lvl = levels(surveytable:::env$survey$variables[,vr])[1] tab_subset("SPECCAT", "tmp", paste0("Preventive care: ", lvl)) ``` To match the percentage in the published table, see the "Primary care specialty" row. Be sure to check the presentation standards flags.