Skip to contents

Nicely reports NA values according to existing guidelines. This function reports both absolute and percentage values of specified column lists. Some authors recommend reporting item-level missing item per scale, as well as participant’s maximum number of missing items by scale. For example, Parent (2013) writes:

I recommend that authors (a) state their tolerance level for missing data by scale or subscale (e.g., “We calculated means for all subscales on which participants gave at least 75% complete data”) and then (b) report the individual missingness rates by scale per data point (i.e., the number of missing values out of all data points on that scale for all participants) and the maximum by participant (e.g., “For Attachment Anxiety, a total of 4 missing data points out of 100 were observed, with no participant missing more than a single data point”).

See: Parent, M. C. (2013). Handling item-level missing data: Simpler is just as good. The Counseling Psychologist, 41(4), 568-600. https://doi.org/10.1177%2F0011000012445176

Usage

nice_na(data, vars, scales)

Arguments

data

The data frame.

vars

Variable (or lists of variables) to check for NAs.

scales

The scale names to check for NAs (single character string).

Examples

# Use whole data frame
nice_na(airquality)
#> Warning: Some variables are not numeric. They are ignored for calculating the `all_na` column.
#>         var items na cells na_percent na_max na_max_percent all_na
#> 1 Ozone:Day     6 44   918       4.79      2          33.33      0

# Use selected columns explicitly
nice_na(airquality,
  vars = list(
    c("Ozone", "Solar.R", "Wind"),
    c("Temp", "Month", "Day")
  )
)
#> Warning: Some variables are not numeric. They are ignored for calculating the `all_na` column.
#>          var items na cells na_percent na_max na_max_percent all_na
#> 1 Ozone:Wind     3 44   459       9.59      2          66.67      0
#> 2   Temp:Day     3  0   459       0.00      0           0.00      0
#> 3      Total     6 44   918       4.79      2          33.33      0

# If the questionnaire items start with the same name, e.g.,
set.seed(15)
fun <- function() {
  c(sample(c(NA, 1:10), replace = TRUE), NA, NA, NA)
}
df <- data.frame(
  scale1_Q1 = fun(), scale1_Q2 = fun(), scale1_Q3 = fun(),
  scale2_Q1 = fun(), scale2_Q2 = fun(), scale2_Q3 = fun(),
  scale3_Q1 = fun(), scale3_Q2 = fun(), scale3_Q3 = fun()
)

# One can list the scale names directly:
nice_na(df, scales = c("scale1", "scale2", "scale3"))
#> Warning: Some variables are not numeric. They are ignored for calculating the `all_na` column.
#>                   var items na cells na_percent na_max na_max_percent all_na
#> 1 scale1_Q1:scale1_Q3     3 11    42      26.19      3            100      3
#> 2 scale2_Q1:scale2_Q3     3 17    42      40.48      3            100      3
#> 3 scale3_Q1:scale3_Q3     3 10    42      23.81      3            100      3
#> 4               Total     9 38   126      30.16      9            100      3