How to Calculate Outliers: 7 Steps (with Images)

Table of contents:

How to Calculate Outliers: 7 Steps (with Images)
How to Calculate Outliers: 7 Steps (with Images)
Anonim

An outlier is observational data that is considerably different numerically from the other observations in a sample. The term is used in statistical studies and can point to abnormalities in the dataset or errors in the measurement performed. Knowing how to calculate outliers is important to ensure a proper understanding of the data and will lead to more accurate conclusions from the study. There is a very simple process for calculating them for a given set of observations.

Steps

Calculate Outliers Step 1

Step 1. Learn to recognize a potential outlier

Before calculating whether or not an observational data represents an outlier, it is always useful to examine the dataset and recognize potential outliers. For example, consider a dataset that represents the temperature of 12 different objects in a room. If 11 objects have a temperature of around 21º C, but the twelfth (perhaps an oven) has a temperature of 150º C, a quick examination might say that the oven is an outlier.

Calculate Outliers Step 2

Step 2. Organize observational data from smallest to largest

Continuing with the above example, consider the following dataset representing the temperatures of various objects: {22, 21, 24, 21, 21, 20, 21, 23, 22, 150, 22, 20}. This set should be distributed as: {20, 20, 21, 21, 21, 21, 22, 22, 22, 23, 24, 150}.

Calculate Outliers Step 3

Step 3. Calculate the median of the dataset

The median is the observational data located above the bottom half of the data and below the top half. If the dataset contains an even number of observations, then the two middle terms must be factored out. In the example above, the two middle terms are 21 and 22, so the median is ((21 + 22) / 2), or 21, 5.

Calculate Outliers Step 4

Step 4. Calculate the bottom quartile

This point, called Q1, is observational data located below 25% of observations. In the example above, two terms will have to be factored again, this time 21 and 21. The average of the two will be ((21 + 21) / 2), or 21.

Calculate Outliers Step 5

Step 5. Calculate the top quartile

This point, called Q3, is the observational data located above 25% of the observations. Continuing with our example, taking the average of the two dice 22 and 23 leads to Q3, which is 22, 5.

Calculate Outliers Step 6

Step 6. Find the “inner barriers” of the dataset

The first step is to multiply the difference between Q1 and Q3 (called the interquartile range) by 1.5. In the example above, the interquartile range is (22, 5 - 21), that is, 1, 5. Multiply this value by 1, 5 gives 2, 25. Add this number to Q3 and subtract from Q1 to build the barriers. In this example, the top and bottom internal barriers would be 24, 75 and 18, 75.

All observational data outside this range are considered moderate outliers. In the dataset for this example, only the oven temperature (150°C) is considered a moderate outlier

Calculate Outliers Step 7

Step 7. Find the “external barriers” of the dataset

This is done in the same way as for the internal barriers, except that the interquartile range is multiplied by 3 instead of 1.5. By multiplying the above interquartile range by 3, we get (1, 5 * 3), or 4, 5. Thus, the upper and lower external barriers are 27 and 16, 5.

Any observational value found outside the external barriers is considered an extreme outlier. In this example, the oven temperature, 150º C, is also an extreme outlier

Tips

Popular by topic