Box Plots |
|
![]() |
A box plot is a graph that is useful for very large data sets that are too unwieldy for a stem and leaf or line plot. A box plot summarizes the data to only five numbers -- the median, upper and lower quartiles, and minimum and maximum values. It provides a quick visual summary that easily shows center, spread, range and any outliers. When we want to compare two or more sets of data, we make side-by-side boxplots. This statistical graph is very efficient in comparing center and spread of two or more data sets. We can immediately visualize the ranges, medians, and "shapes" of each data set. |
5-Number SummaryThe median
is found by listing the data values in increasing order, and
finding the center value. If there is an even number of data values,
find the average of the two center values. This number forms the interior
line of the box.
The lower quartile (Q1) is found by considering only the bottom half of the data, below the median. Find the median, or middle value, of this part of the data. The lower quartile number forms the bottom line of the box. The upper quartile (Q3) is the median of the upper half of the data, above the median. The upper quartile number forms the top line of the box. Connect these three lines to form the sides of the box. The minimum and maximum values can be read right off the list of data values. If there are no outliers, these numbers form the ends of the whiskers of the graph, and are connected to the upper and lower quartile lines. OutliersSometimes a data set will have one or
more outliers. An outlier can be detected by finding the value of 1.5*(IQR);
then subtracting this number from the lower quartile, and adding it
to the upper quartile. This is the maximum range of the whiskers of
the graph, a theoretical "fence" of the range data. Any data
values falling outside this "fence" are considered outliers.
They are labeled on the graph with an asterisk. There can be outliers
above or below, or multiple outliers in a data set.
|
|