In this tutorial, you will learn about box and whisker plot google sheets.
Box and whisker plots are used to summarise the distribution of a large data set using 5 statistical numbers i.e minimum, first quartile, median, third quartile and maximum.
Many times the central tendency (mean, median, mode) does not tell us enough about the data and we need more information like its variability. In such cases, a box and whisker plot quickly shows us if our data is spread symmetrically or if it has any skewness. Also, it highlights outliers if any. Visualizing this makes it easier to make decisions about the data.
What is required to create box and whisker plot google sheets?
To create a box and whisker type of plot in google sheets, we need 5 important data points.
- Min (The smallest value)
- The first quartile (Q1) – Is the middle number between the Minimum and Median
- Median (It is a middle number when the data set is organized in ascending order)
- Third quartile – It is the middle number between Median and Maximum
- Max (The largest value of the data set)
Please note that google sheets do not have the option to plot box and whisker chart under chart type. But it has the candlestick chart type which we can use to mimic the box and whisker chart.
How to create a box and whisker plot google sheets
Step-1: Format the data. Organize it in one column
Let’s say we have the record of temperatures(in F) of Los Angeles city for 10 days as given below
34, 76, 29, 15, 41, 46, 25, 54, 38, 20, 32, 43, 22
We need to put these in one column or row in sheets. It will be helpful when selecting the data series and calculating statistical five numbers in the next step.
Step-2: Calculate the statistical five-number summary
Now that the data is arranged in a column, we can need to calculate the min, max, first quartile, third quartile and median.
Min:- To calculate the smallest value, we can use the in-built min() function
The first quartile (Q1):- To calculate the first quartile we can use the in-built Quartile formula [2] . This formula takes two parameters. One is for the data series that we want to calculate and the second is the numeric input of the quartile number. For this example, it will be Quartile(data series, 1)
Median:- To calculate the median, we can use the in-built median() function which will tell us the middle number of the series when sorted in either ascending or descending order
Third quartile (Q3):- Similar to the first quartile calculation, here we can use the Quartile() function and pass “3” as the numeric value. It will be Quartile(data series, 3)
Max:- It is the maximum value in the series. We can use the in-built max() function over the data series to get the largest number
Note:- We have added a cell named “Temperature” at the start of the row. A cell with type as text is required to plot the candlestick chart in google sheets.
Step-3: Eliminate outliers (optional)
An outlier is a value that is extremely small or extremely large as compared to the data series and these outliers influence the overall spread and skewness of the data and also change the mean.
We should ideally avoid these outliers to make a logically correct decision.
How to calculate an outlier while plotting box and whisker plots?
First, we need to calculate IQR which is called the interquartile range. This is the difference between Quartile 3 and Quartile 2.
IQR = Q3 – Q2 = 33 – 22.75 = 10.25
The range within which our data should lie is defined by this formula
[Q1 – 1.5 IQR, Q3 + 1.5IQR]
If we plug our values in this range would be [22.75 – 1.5*10.25, 42.5 + 1.5*10.25]
This comes to [7.375, 57.875]
We can round these, so the data range should be between 7 and 58
When we sort our data in ascending order, we can see that the lowest value is 15 which is within this range, whereas the largest value 76 is out of the range.
We can optionally eliminate 76 and plot the box and whisker chart.
Step-4: Select 5 number summary and insert a candlestick plot
Now it’s time to plot the chart. First, we will plot the data without eliminating the outlier.
We can select the 5 summary numbers and the text field and click on the insert menu and select chart. Many times, google sheets will automatically insert the candlestick chart for this kind of data selection, if not then we can open the chart editor by double-clicking on the chart area and then from the chart type dropdown, we can select chart type as candlestick from the other section.
Here is how it will look finally.
Let’s understand various terminology used in the above box and whisker plot.
Understanding a box and whisker plot google sheets
The box and whisker plot might look a little difficult to read at the first sight, but once you understand various sections of the plot area, it will be a piece of cake.
Median here also can be called as 50th percentile or Second quartile
Q3: The starting point of the second whisker which ends with the largest data point is called the third quartile. It’s also the endpoint of the box.
The third quartile is also known as the 75th percentile. And it’s represented by the middle number between the median and the largest number (not ‘maximum’) of the data set.
Max: The max is represented by the largest data point in our data set. It’s the endpoint of the second whisker.
IQR: The interquartile range is the difference between the third and first quartile i.e 25th to 75th percentile
Outliers in a box and whisker plot google sheets, how to plot them
As highlighted in step-4, in this data set, we found 76 to be an outlier. We saw how to calculate an outlier.
An outlier always represents additional skewness in the data.
For example here in the temperature reading, we can say that the highest temperature is 76 Fahrenheit whereas the second-highest temperature is 54.
This difference between the two highest temperatures is very much compared to the general difference between any other consecutive data when sorted in ascending order.
We should eliminate this data point to accurately represent the temperature distribution in Los Angeles for the past 10 days.
So our final data of 5 number summary would look like this:
Min | Q1 | Q3 | Max | |
Temperature | 15 | 22.75 | 42.5 | 54 |
Note that the Max is now 54. Now our box and whisker plot will look like this
When we compare this plot with the one with the outlier, this plot tells us that our data is mostly spread out symmetrically as compared to the first plot where it was looking like it was skewed towards the right.
Real-life example for data analysis – Box and whisker plot google sheets
Let’s look at a real-life problem and plot it in google sheets.
We have collected sample data from an Apparel company that sold women’s apparel in the past 1 week, using their 3 exclusive stores in 3 different cities in the USA.
Here is the data:
P.S:- Do note that this is just a sample collection. In an actual scenario, you might have thousands of such data for each city.
Problem statement:- The store owner has a sample list of buyers who have all spent more than $520 on a transaction. The store owner wants to know the city of these shoppers.
Before we plot the chart, let’s highlight in the below figure, what a box and whisker chart represents in terms of percentile.
The above data is the First example of the Los Angeles temperature distribution that we discussed at the start.
Now let’s plot the box and whisker chart for the Apparel company.
First, let us get the 5 number summary for each store.
It will look like this:-
Min | Q1 | Median | Q3 | Max | |
Chicago | 200 | 375 | 550 | 725 | 900 |
San Francisco | 200 | 400 | 600 | 1100 | 2000 |
New York | 200 | 500 | 800 | 1500 | 3000 |
If we plot the candlestick chart in google sheet to mimic the box and whisker plot, it will look like this:
The chart here compares all the store revenue in a single plot area.
From this plot, we can see that the Q1 of New York City is $500, which means 75% of the sales in NYC are greater than $500.
This is compared to San Francisco, only 50% has more than $500 revenue buyers. Chicago has more than 60% below $500.
So it’s clear that if we sample users who did a transaction greater than $525, then they will be most likely from New York.
Note that, here we have not yet considered outliers, but in a large data set of thousands or millions of records, outliers can influence this decision. So it’s always advisable to separate the outliers when making a decision.
Wrapping up
In this tutorial, you learned how to create box and whisker plot google sheets with real examples. You learned how to prepare data for creating the box and whisker plots. Also, you saw what are outliers when creating a box and whisker plot.
Now go ahead and start using this in your sheet.
Appendix
[1] Learn more about box plots – Link
[2] Quartile function – Link
Further Reading
New to google sheets ? Start here
Learn about other types of charts like bubble chart in google sheets or Line chart with multiple lines in google sheets.
New to google sheets ? Start here
More about Data analysis in google sheets: