DISCUSS | BOXPLOTS IN EXCEL

Introduction

A boxplot, or box and whisker diagram, provides a simple graphical summary of a set of data. It shows a measure of central location (the median), two measures of dispersion (the range and inter-quartile range), the skewness (from the orientation of the median relative to the quartiles) and potential outliers (marked individually). Boxplots are especially useful when comparing two or more sets of data. Regrettably, there is currently no boxplot facility in Microsoft Excel. For simplicity, many recent statistics textbooks (for example, Daly et al, 1995) omit the fences used to identify possible outliers. These simplified boxplots, displaying most of the important features, can be drawn quite easily in Excel.  In the absence of any fences (see Devore and Peck (1990) for a definition), a simple rule is that a whisker which is longer than three times the length of the box probably indicates an outlier.

Method

Suppose we have data from three groups, A, B and C. Calculate the statistical functions QUARTILE(,1),  MIN, MEDIAN, MAX and QUARTILE(,3) in that order for each data set. Arrange the results on an Excel worksheet as shown below.

Statistic

Group A

Group B

Group C

q1

20

22

30

min

10

15

18

median

40

45

50

max

100

110

90

q3

70

75

57

In Excel 5/95:

  • Highlight the whole table, including figures and series labels.
  • Use Chart-Wizard - Line - Option 7 - Data in Rows - Finish to produce something like the chart below.
    Option 7 plots all the series as symbols without connecting lines, but also includes high-low lines which connect the maximum and minimum points for each group.

wpe3.gif (4172 bytes)

  • Now activate the chart and select Format - Chart Type - Options - Options - Up-Down Bars - OK
    The outcome should be a set of boxplots, as shown below.  The essential feature of up-down bars is that they connect the first and last series - hence the rather strange ordering of the statistics in the table!

wpe6.gif (4610 bytes)

In Excel 97/2000/2003:

  • Highlight the whole table, including figures and series labels, then click on the Chart Wizard.
  • Select a Line Chart.
  • At Step 2 plot by Rows, (the default is Columns), then Finish.
  • Select each data series in turn and use Format Data Series to remove the connecting lines.
  • Select any of the data series and Format Data Series; select the Options tab and switch on the checkboxes for High-Low lines and Up-Down bars.

In Excel 2007:

  • Highlight the whole table, including figures and series labels, then select Insert from the main menu.
  • Under Charts select a Line chart and choose the Line with Markers option.
  • Under Chart Tools select Design > Switch Row/Column.
  • Right-click on a data point from the first data series, and choose Format Data Series > Line Colour > No line to remove the connecting lines.  Repeat for the other four data series in turn.
  • Select any of the data series and under Chart Tools select Layout > Analysis > Lines > High-Low Lines, then Layout > Analysis > Lines > Up/Down Bars > Up/Down Bars.
  • Further customising can be carried out according to your own preferences by right-clicking on the relevant object and selecting the Format option on the shortcut menu.

In Excel 2008 (MAC):

  • Set up the table as in PC instructions above (q1, q0, q2, q4, q3).
  • Highlight the whole table, including figures and series labels, and then select the Charts menu.
  • Select a Line chart and choose the Marked Line option.
  • On the formatting palette choose Chart data, Edit ... Sort by rows.
  • Click on a data point from the first data series, and from the formatting palette choose Colors, Weights, and Fills.
  • Under Line Color, select No line. Repeat this process for the other four data series in turn.
  • Double click on any of the data series and you should get the Format Data Series menu.
  • Choose Options on this menu and check the boxes marked High-Low Lines and Up/Down Bars. Hit OK.

In Excel 2011 (MAC):

  • Set up the table as in PC instructions (q1, q0, q2, q4, q3)
  • Highlight the whole table, including figures and series labels, and then select Charts.
  • Select a Line chart and choose the Marked Line option.
  • Check to make sure that the data are plotted by row.  If they are not, select Switch Plot.
  • Click on a data point from the first data series, and under Chart Format, choose Line and select No line.
  • Repeat this process for the other four data series in turn.
  • Double click on any of the data series and you should get the Format Data Series menu.
  • Check the boxes marked High-low Lines and Up-down Bars. Hit OK.

References
Daly, F, Hand, D J, Jones, M C, Lunn A D and McConway, K J (1995). Elements of Statistics. Addison Wesley / The Open University.
Devore, J and Peck, R (1990). Introductory Statistics. West Publishing Co.

Last content update: 24 March 2003.