This small project aims to show how to spot anomalies in a dataset using the Benford's Law and few lines of Python
- Python 3.10
- Numpy 1.23.5
- Pandas 1.5.2
-
Select the dataset, the column of interest and the title you want to set in the plot
benford.analyze('datasets\\gaia-dr2-rave-35.csv', 'r_distance', 'Distance to Earth of 250k stars')
-
Calculate the correlation
benford.calculateCorrelation()
-
Implement your own manipulation method
vect = [1, 2, 3, 4, 4, 5, 5, 5, 5, 6, 6, 6, 7, 7, 7, 9, 8, 9, 9] self.data_vector = [random.choice(vect) * 1000 if random.randint(0, 10) % 2 == 0 else x for x in self.data_vector if not math.isnan(x) and x > 0]
-
Call the maipulation method and re-analyze the vector
benford.manipulateV2() benford.reanalyze() print(benford.calculateCorrelation())
Note that using Benford's Law for spot data anomalies is only suitable in particular conditions:
- Sample Size: if the sample size is too small, the distribution of the leading digits may not follow the expected pattern.
- Selection Bias: the dataset needs to be representative of the population it is drawn from, otherwise it may not follow Benford’s Law.
- Leading Digit Preference: Benford’s Law assumes that people are equally likely to report any leading digit, i.e. each number 1 through 9 should have an equal chance of being the leading digit.
The datasets used for the tests come from Kaggle
- 6,000 Largest Companies Ranked by Market Cap by KANAWATTANACHAI
- Covid-19 Data Deaths and Vaccinations by DIGVIJAYSINH GOHIL
- Worldwide deaths by country/risk factors by ARPIT VERMA
- GoodReads 100k books by MANAV DHAMANI
- Stars from Gaia DR2 and RAVE DR5 by JOSÉ H. SOLÓRZANO
- World Population 1960–2018 by DEVAKUMAR K. P.
- Frank Benford (March 1938). “The law of anomalous numbers“. Proc. Am. Philos. Soc. 78 (4): 551–572.
- Using Excel and Benford’s Law to detect fraud By J. Carlton Collins, CPA April 1, 2017
- Breaking the (Benford) Law: Statistical Fraud Detection in Campaign Finance, by Wendy K. Tam Cho and Brian J. Gaines
- Emergence of Benford’s Law in Classical Music, Azar Khosravani and Constantin Rasinariu