Algorithm Details: Scatter Plot
A scatter plot is a two-dimensional data visualization that uses dots to represent the values obtained for two different variables — one plotted along the x-axis and the other plotted along the y-axis. The scatter diagram graphs pairs of numerical data, with one variable on each axis, to look for a relationship between them. If the variables are correlated, the points will fall along a line or curve. The better the correlation, the tighter the points will hug the line.
It helps to detect fraud by observing correlations between variables and finding outliers. One can identify different values from the label column by them having different shapes for the points on the chart. If the label column contains numerical data and more than 25 different values, they will be displayed as a color gradient. If the label column contains non-numerical data and more than 25 different values, then only 25 top grossing label values will be displayed, all others will have "Other" as their common label. The Scatter plot algorithm shows:
- A linear regression line reflecting dependency between variables;
- Kernel density estimate curve reflecting density of points for each axis;
- Colored ticks near coordinate axes reflecting density of points for every label;
- List of the outlier points under the plot.
[1] Friendly M. And Denis D (2005).The Early Origins And Development Of The Scatterplot. In Journal of the History of the Behavioral Sciences, Vol. 41(2), 103–130
[2] https://asq.org/quality-resources/scatter-diagram
[3] https://www.texasgateway.org/resource/interpreting-scatterplots