Data Shape
Doing Stats 
The Final Frontier 
   Data Shape
 
Tools:
Doing Histogram For:
Qualitative Data
Quantitative Data
 
External References:
Wikipedia
Cliffs Notes
MathWorld
 
Copyright © 2010, doingstats.com
The Shape of Data
There are many ways to visualize the shape of data. Arguably, the best way of doing it is by means of the frequency distribution .
Consider a categorical sample, representing Color Preferences mentioned on page Data Scope:
{white,green,red,red,blue,yellow,blue,red,yellow,green,yellow,red,white,green,yellow,yellow,
yellow,white,red,yellow,white,blue,yellow,blue,white,yellow,blue,blue,white,yellow}
Just by looking at this sample, it is not easy to see how this variable is shaped:
  • Which color (generally - category) has the highest preference?
  • Which color has the lowest preference?
  • Are the preferences shaped evenly?
  • Is there a dominating color or group of colors?
  • Etc.
Sorting the sample may provide a richer view:
{blue,blue,blue,blue,blue,blue,green,green,green,red,red,red,red,red,white,white,white,white,white,white,
yellow,yellow,yellow,yellow,yellow,yellow,yellow,yellow,yellow,yellow}
Even a primitive aggregate view, showing frequencies of the categories, does a better job:
red|5|*****
blue|6|******
green|3|***
white|6|******
yellow|10|**********
Right away, one can better see which category is dominating, which one is neglected or less popular.
Arranging the frequency distribution in descending order by the frequencies provides even a better insight onto the data shape. A diagram based on such a distribution is referred to as the Pareto diagram:
yellow|10|**********
blue|6|******
white|6|******
red|5|*****
green|3|***