Algorithm Details: Histogram of all Numeric Values
A histogram is an accurate representation of the distribution of numerical data. It differs from a bar graph, in the sense that a bar graph relates two variables, but a histogram relates only one. To construct a histogram, the first step is to select the range of values—that is, divide the entire range of values into a series of intervals—and then count how many values fall into each interval. The intervals are usually specified as consecutive, non-overlapping intervals of a variable. The bins (intervals) must be adjacent, and are often (but are not required to be) of equal size. In the portal CSM histogram algorithm:
- Examination of separate variables (shape of the distribution, its variability, central tendency, and other simple statistics) in order to detect systematic data collection errors or fabricated data.
- For every selected variable creates a histogram and calculates simple statistics: range, mean, standard deviation, number of missing values and some others.
[1] Oxford math center Probability histogram definition.
[2] https://www.edrawsoft.com/histograms-data-analysis.php
[3] https://en.wikipedia.org/wiki/Histogram
[4] https://statistics.laerd.com/statistical-guides/understanding-histograms.php