Identify outliers based on 3 median absolute deviations (MAD) from the median.
Arguments
- data
The data frame.
- col.list
List of variables to check for outliers.
- ID
ID variable if you would like the outliers to be identified as such.
- criteria
How many MAD to use as threshold (similar to standard deviations)
- mad.scores
Logical, whether to output robust z (MAD) scores (default) or raw scores. Defaults to
TRUE
.
Value
A list of dataframes of outliers per variable, with row
numbers, based on the MAD. When printed, provides the number
of outliers, selected variables, and any outlier flagged for
more than one variable. More information can be obtainned
by using the attributes()
function around the generated object.
Details
The function internally use scale_mad()
to "standardize" the data
based on the MAD and median, and then check for any observation greater
than the specified criteria (e.g., +/-3).
For the easystats equivalent, use:
performance::check_outliers(x, method = "zscore_robust, threshold = 3)
.
References
Leys, C., Ley, C., Klein, O., Bernard, P., & Licata, L. (2013). Detecting outliers: Do not use standard deviation around the mean, use absolute deviation around the median. Journal of Experimental Social Psychology, 49(4), 764–766. https://doi.org/10.1016/j.jesp.2013.03.013
Examples
find_mad(
data = mtcars,
col.list = names(mtcars),
criteria = 3
)
#> 20 outlier(s) based on 3 median absolute deviations for variable(s):
#> mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#>
#> The following participants were considered outliers for more than one variable:
#>
#> Row n
#> 1 3 2
#> 2 9 2
#> 3 18 2
#> 4 19 2
#> 5 20 2
#> 6 26 2
#> 7 28 2
#> 8 31 2
#> 9 32 2
#>
#> Outliers per variable:
#>
#> $qsec
#> Row qsec_mad
#> 1 9 3.665557
#>
#> $vs
#> Row vs_mad
#> 1 3 Inf
#> 2 4 Inf
#> 3 6 Inf
#> 4 8 Inf
#> 5 9 Inf
#> 6 10 Inf
#> 7 11 Inf
#> 8 18 Inf
#> 9 19 Inf
#> 10 20 Inf
#> 11 21 Inf
#> 12 26 Inf
#> 13 28 Inf
#> 14 32 Inf
#>
#> $am
#> Row am_mad
#> 1 1 Inf
#> 2 2 Inf
#> 3 3 Inf
#> 4 18 Inf
#> 5 19 Inf
#> 6 20 Inf
#> 7 26 Inf
#> 8 27 Inf
#> 9 28 Inf
#> 10 29 Inf
#> 11 30 Inf
#> 12 31 Inf
#> 13 32 Inf
#>
#> $carb
#> Row carb_mad
#> 1 31 4.046945
#>
mtcars2 <- mtcars
mtcars2$car <- row.names(mtcars)
find_mad(
data = mtcars2,
col.list = names(mtcars),
ID = "car",
criteria = 3
)
#> 20 outlier(s) based on 3 median absolute deviations for variable(s):
#> mpg, cyl, disp, hp, drat, wt, qsec, vs, am, gear, carb
#>
#> The following participants were considered outliers for more than one variable:
#>
#> Row car n
#> 1 3 Datsun 710 2
#> 2 9 Merc 230 2
#> 3 18 Fiat 128 2
#> 4 19 Honda Civic 2
#> 5 20 Toyota Corolla 2
#> 6 26 Fiat X1-9 2
#> 7 28 Lotus Europa 2
#> 8 31 Maserati Bora 2
#> 9 32 Volvo 142E 2
#>
#> Outliers per variable:
#>
#> $qsec
#> Row car qsec_mad
#> 1 9 Merc 230 3.665557
#>
#> $vs
#> Row car vs_mad
#> 1 3 Datsun 710 Inf
#> 2 4 Hornet 4 Drive Inf
#> 3 6 Valiant Inf
#> 4 8 Merc 240D Inf
#> 5 9 Merc 230 Inf
#> 6 10 Merc 280 Inf
#> 7 11 Merc 280C Inf
#> 8 18 Fiat 128 Inf
#> 9 19 Honda Civic Inf
#> 10 20 Toyota Corolla Inf
#> 11 21 Toyota Corona Inf
#> 12 26 Fiat X1-9 Inf
#> 13 28 Lotus Europa Inf
#> 14 32 Volvo 142E Inf
#>
#> $am
#> Row car am_mad
#> 1 1 Mazda RX4 Inf
#> 2 2 Mazda RX4 Wag Inf
#> 3 3 Datsun 710 Inf
#> 4 18 Fiat 128 Inf
#> 5 19 Honda Civic Inf
#> 6 20 Toyota Corolla Inf
#> 7 26 Fiat X1-9 Inf
#> 8 27 Porsche 914-2 Inf
#> 9 28 Lotus Europa Inf
#> 10 29 Ford Pantera L Inf
#> 11 30 Ferrari Dino Inf
#> 12 31 Maserati Bora Inf
#> 13 32 Volvo 142E Inf
#>
#> $carb
#> Row car carb_mad
#> 1 31 Maserati Bora 4.046945
#>