Search

   

 



Histogram

A histogram is a basic graphing tool that displays the relative frequency or occurrence of continuous data values showing which values occur most and least frequently. It illustrates the shape, centering, and spread of data distribution and indicates whether there are any outliers.

When should we use a histogram?

When you are unsure what to do with a large set of measurements presented in a table, you can use a histogram to organize and display the data in a more user-friendly format. A histogram will make it easy to see where the majority of values fall in a measurement scale, and how much variation there is. It is helpful to construct a histogram when you want to do the following:

  • Summarize large data sets graphically. When you look at Viewgraph 2, you can see that a set of data presented in a table is not easy to use. You can make it much easier to understand by summarizing it on a tally sheet (Viewgraph 3) and organizing it into a histogram (Viewgraph 8).

  • Compare process results with specification limits. If you add the process specification limits to your histogram, you can determine quickly whether the current process was able to produce “good” products. Specification limits may take the form of length, weight, density, quantity of materials to be delivered, or whatever is important to produce the required results of a given process.

  • Communicate information graphically. The team members can easily see the values which occur most frequently. When you use a histogram to summarize large data sets, or to compare measurements to specification limits, you are employing a powerful tool for communicating information.

  • Use a tool to assist in decision-making. As we move along, you will see that the shapes, sizes, and the spread of data have meanings that can help you in investigating problems and making decisions. However, always bear in mind that if the data you have in hand are not the most recent, or you do not know the manner how the data were collected, it is a waste of time trying to chart them. Measurements cannot be used for making decisions or predictions when they were produced by a process that is different from the current one, or were collected under unknown conditions.

What are the parts of a histogram?

As you can see in Viewgraph 1, a histogram is made up of five (5) parts:

  1. Title: The title briefly describes the information that is contained in the histogram.

  2. Horizontal or X-axis: The horizontal or X-axis shows you the scale of values into which the measurements fit. These measurements are generally grouped into intervals to help you summarize large data sets. Individual data points are not displayed.

  3. Bars: The bars have two important characteristics -- height and width. The height represents the number of times the values within an interval occurred. The width represents the length of the interval covered by the bar. It is the same for all bars.

  4. Vertical or Y-axis: The vertical or Y-axis is the scale that shows you the number of times the values within an interval occurred. The number of times is also referred to as “frequency.”

  5. Legend: The legend provides additional information that documents where the data came from and how the measurements were gathered.

How do we develop a histogram?

There are many different ways to organize data and build histograms. You can safely use any of them as long as you follow the basic rules.

The following scenario will be used as an example to provide data as we go through the process of building a histogram step by step:

During sea trials, a ship conducted test firings of its MK 75, 76mm gun. The ship fired 135 rounds at a target. An airborne spotter provided accurate rake data to assess the fall of shot both long and short of the target. The ship computed what constituted a hit for the test firing as:

FROM 60 yards short of the target TO 300 yards beyond the target

 

Step 1 Count the total number of data points you have listed. Suppose your team collected data on the miss distance for the gunnery exercise described in the example. The data you collected was for the fall of shot both long and short of the target. The data are displayed in Viewgraph 2. Simply counting the total number of entries in the data set completes this step. In this example, there are 135 data points.
Step 2 Summarize your data on a tally sheet. You need to summarize your data to make it easy to interpret. You can do this by constructing a tally sheet.
 
  • First, identify all the different values found in Viewgraph 2 (-160, -010…030, 220, etc.). Organize these values from smallest to largest (-180, -120…380, 410).

  • Then, make a tally mark next to the value every time that value is present in the data set.

  • Alternatively, simply count the number of times each value is present in the data set and enter that number next to the value, as shown in Viewgraph 3.
This tally helped us organize 135 mixed numbers into a ranked sequence of 51 values. Moreover, we can see very easily the number of times that each value appeared in the data set. Forming intervals of values can summarize this data even further.
Step 3

Compute the range for the data set. Compute the range by subtracting the smallest value in the data set from the largest value. The range represents the extent of the measurement scale covered by the data; it is always a positive number. The range for the data in Viewgraph 8 is 590 yards. Subtracting -180 from +410 obtains this number. The mathematical operation broken down in Viewgraph 4 is:

+410 – (-180) = 410 + 180 = 590

Remember that when you subtract a negative (-) number from another number it becomes a positive number.

Step 4 Determine the number of intervals required. The number of intervals influences the pattern, shape, or spread of your Histogram. Use the following table (Viewgraph 5) to determine how many intervals (or bars on the bar graph) you should use.
  If you have this many data points: Use this number of intervals:
 

Less than 50

50 to 99

100 to 250

More than 250

5 to 7

6 to 10

7 to 12

10 to 20

  In this example, 10 have been chosen as an appropriate number of intervals.
Step 5

Compute the interval width. To compute the interval width (Viewgraph 6), divide the range (590) by the number of intervals (10). When computing the interval width, you should round the data up to the next higher whole number to come up with values that are convenient to use. For example, if the range of data is 17, and you have decided to use 9 intervals, then your interval width is 1.88. You can round this up to 2.

In this example, you divide 590 yards by 10 intervals, which give an interval width of 59. This means that the length of each interval is going to be 59 yards. To facilitate later calculations, it is best to round off the value representing the width of the intervals. In this case, we will use 60, rather than 59, as the interval width.

Step 6 Determine the starting point for each interval. Use the smallest data point in your measurements as the starting point of the first interval. The starting point for the second interval is the sum of the smallest data point and the interval width. For example, if the smallest data point is -180, and the interval width is 60, the starting point for the second interval is -120. Follow this procedure (Viewgraph 7) to determine all of the starting points (-180 + 60 = -120; -120 + 60 = -160; etc.).
Step 7 Count the number of points that fall within each interval. These are the data points that are equal to or greater than the starting value and less than the ending value (also illustrated in Viewgraph 7). For example, if the first interval begins with -180 and ends with -120, all data points that are equal to or greater than -180, but still less than -120, will be counted in the first interval. Keep in mind that EACH DATA POINT can appear in only one interval.
Step 8

Plot the data. A more precise and refined picture comes into view once you plot your data (Viewgraph 8). You bring all of the previous steps together when you construct the graph.

  • The horizontal scale across the bottom of the graph contains the intervals that were calculated previously.

  • The vertical scale contains the count or frequency of observations within each of the intervals.

  • A bar is drawn for the height of each interval. The bars look like columns.

  • The number of observations or percentage of the total observations determines the height for each of the intervals.

  • The histogram may not be perfectly symmetrical. Variations will occur. Ask yourself whether the picture is reasonable and logical, but be careful no to let your preconceived ideas influence your decisions unfairly.
Step 9 Add the title and legend. A title and a legend provide the Who, What, When, Where, and Why (also illustrated in Viewgraph 8) that are important for understanding and interpreting the data. This additional information documents the nature of the data, where it came from, and when it was collected. The legend may include such things as the sample size, the dates and times involved, who collected the data, and indefinable equipment or work groups. It is important to include any information that helps clarify what the data describes.

     

More Quality Tools