A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements

Document Type : Original Manuscript


1 Department of Accounting, Qaemshahr Branch, Islamic Azad University, Qaemshahr, Iran,

2 Department of Accounting, Qaemshahr Branch, Islamic Azad University, Qaemshahr, Iran

3 Department of computer engineering, Sari branch, Islamic Azad University, Sari, Iran


Financial statement fraud has increasingly become a serious problem for business, government, and investors. In fact, this threatens the reliability of capital markets, corporate heads, and even the audit profession. Auditors in particular face their apparent inability to detect large-scale fraud, and there are various ways to identify this problem. In order to identify this problem, the majority of the proposed methods are based on existing algorithms and have only attempted to identify human or simple data mining methods that have high overhead and are also costly. The data mining methods presented so far have had high computational overhead or low accuracy. The purpose of this study is to present a model in which an improved ID3 decision tree with a support vector machine is used as a hybrid approach and also to improve the performance and accuracy, genetic algorithm and multilayer perceptron neural networks are applied. More efficient feature selection has been used to reduce computational overhead. The tree proposed in the proposed method has the lowest depth possible and therefore has high velocity and low computational overhead. For this purpose, the financial statements of 151 listed companies in Tehran Stock Exchange during 2014-2015 were surveyed and 125 financial ratios were extracted using ANOVA test, 23 fraud related ratios were selected as model input data. The proposed model has a high accuracy of about 80% of prediction accuracy compared to similar models.

Graphical Abstract

A hybrid model based on machine learning and genetic algorithm for detecting fraud in financial statements


A.V Senthil Kumar et. al. (2013), “Diagnosis of heart disease using Advanced Fuzzy resolution Mechanism” International Journal of Science and Applied Information Technology (IJSAIT),Vol.2 , No.2, Pages : 22-30 (2013).
Andon, Paul, Clinton Free, and Benjamin Scard, (2015) "Pathways to accountant fraud: Australian evidence and analysis", Accounting Research Journal 28, vol. 1, pp. 10-44, 2015.
Bahrami, B., Hosseini Shirvani, M. (2015). Prediction and Diagnosis of Heart Disease by Data Mining Techniques. Journal of Multidisciplinary Engineering Science and Technology (JMEST). Vol. 2, Issue 2. 
Daghmeh Qi Firouzajai, Ali (2014), Accounting in Financial Reporting: Disclosure of Fraudulent Companies, MSc, Accounting, University of Mazandaran.
Dionysios S. D. (2018). Fighting money laundering with technology: A case study of Bank X in the UK, Decision Support Systems 105 (2018) 96–107.
Dre┼╝ewski, R., Sepielak, J., Filipkowski, W. (2015). The application of social network analysis algorithms in a system supporting money laundering detection, Information Sciences, Volume 295, 20 February 2015, Pages 18-32.
Etemadi, Hossein and Zalaghi, Hassan (2013), Application of Logistic Regression in Identifying Fraudulent Financial Reporting, Journal of Auditing Knowledge, Volume 13, Number 51, 144-163.
Faghandoust Haghigh, Kambiz & Borouari, Fareed (2009), Investigating the Use of Analysis Methods in Risk Assessment of Financial Statements (Management Fraud), Journal of Accounting Knowledge and Research, No. 16, 18-70.
Farzai, S., Ghasemi, D., & Marzuni, S. S. M. (2015). Offenders Clustering Using FCM & K-Means. Journal of mathematics and computer Science 15(2015)294-301. http://dx.doi.org/10.22436/jmcs.015.04.06.
Farzai S, Hosseini Shirvani, M., Rabbani M. (2020). Multi-Objective Communication-Aware Optimization for Virtual Machine Placement in Cloud Datacenters, Sustainable Computing: Informatics and Systems (2020), doi: https://doi.org/10.1016/j.suscom.2020.100374.
Ghorbani, A., & Farzai, S. (2018). Fraud detection in automobile insurance using a data mining based approach. International Journal of Mechatronics, Elektrical and Computer Technology (IJMEC), 8(27), 3764-3771. http://aeuso.org/includes/files/articles/Vol8_Iss27_3764-3771_Fraud_Detection_in_Automobile_Insur.pdf.
Halbouni, Sawsan Saadi., (2015), The Role of Auditors in Preventing, Detecting, and Reporting Fraud: The Case of the United Arab Emirates (UAE), International Journal of Auditing, 19, 117–130.
Hosseini Shirvani, M., (2018a). A new shuffled genetic-based task scheduling algorithm in heterogeneous distributed systems. J. Adv. Comput. Res. 9 (4), 19–36, http://jacr.iausari.ac.ir/article_660143.html.
Hosseini Shirvani, M. (2018b, July). Web Service Composition in multi-cloud environment: A bi-objective genetic optimization algorithm. In 2018 Innovations in Intelligent Systems and Applications (INISTA) (pp. 1-6). IEEE. https://doi.org/10.1109/INISTA.2018.8466267.
Hosseini Shirvani, M. and Babazadeh Gorji, A. (2020). Optimisation of automatic web services composition using genetic algorithm, Int. J. Cloud Computing, 9(4), 397–411.
Hosseini Shirvani, M., Rahmani, A. M., & Sahafi, A. (2018). An iterative mathematical decision model for cloud migration: A cost and security risk approach. Software: Practice and Experience, 48(3), 449-485. https://doi.org/10.1002/spe.2528.
Hosseinzadeh, S., Hosseini Shirvani, M. (2015). Optimizing energy consumption in clouds by using genetic algorithm. Journal of multidisciplinary engineering science and technology, 2(6),pp: 1431-1434.
Iranian Audit Organization (2015 and 2005), Auditor's Responsibility for Fraud and Error in Auditing Financial Statements, Auditing Standard 240, Tehran.
Kantesh Kumar Oad, Xu DeZhi & Pinial Khan Butt et. al, (2014) “A Fuzzy Rule based Approach to Predict Risk Level of Heart Disease”. Global Journal of Computer Science and Technology: C Software & Data Engineering, Volume 14 Issue 3 Version 1.0 Year 2014, Online ISSN: 0975-4172 & Print ISSN: 0975-4350.
Kim, Yeonkook J., Baik, Bok. Cho, Sungzoon. (2016). Detecting financial misstatements with fraud intention using multi-class cost-sensitive learning, Expert Systems with Applications, No. 62, pp. 32-43.
Lari. Dashtbayaz, Mahmoud. (2015). Data search and discovery process for financial statement fraud, Research Journal of Finance and Accounting, Vol.6, No.3.
Lookman, Sanni, and Selmin Nurcan, (2015) "A Framework for Occupational Fraud Detection by Social Network Analysis", In CAISE 2015 FORUM, 2015.
Maham Kayhan & Torabi, Abolfazl (2012), Presentation of Risk Rating Model in Financial Reporting Fraud, Economic Jihad Conference (Emphasizing on National Production, Support for Iranian Labor and Capital), University of Mazandaran.
Mohamed Yusof. K., Ahmad Khair A.H. & Jon Simon., (2015). Fraudulent Financial Reporting: An Application of Fraud Models to Malaysian Public Listed Companies, The Macrotheme Review. 4(3)
Ojeme Blessing Onuwa et. al, (2014) “Fuzzy Expert System for Malaria Diagnosis”., An International Open Free Access, Peer Reviewed Research Journal, Published By: Oriental Scientific Publishing Co., India. June2014,Vol.7,No. (2):Pgs. 273-284 [ISSN: 0974-6471].
Razavi, F., Zabihi, F., Hosseini Shirvani, M., (2016). Multi-layer Perceptron Neural Network Training Based on Improved of Stud GA. J. Adv. Comput. Res. 7 (3), 1–14, http://jacr.iausari.ac.ir/article_650504.html.
Rezaee, Z., and R. Riley, (2010). Financial statement fraudprevention and detection. 2nd edition, John Wiley & Sons, Inc
Sergio Ledesma, Gustavo Cerda, Gabriel Avina, Donato Hernandez, and Miguel Torres, (2008) “Feature Selection Using Artificial Neural Networks”, A. Gelbukh and E.F. Morales (Eds.): MICAI 2008, LNAI 5317, pp. 351–359, 2008.
Sudan Chen. (2016), Detection of fraudulent financial statements using the hybrid data mining approach, Springer Plus, No. 5:89.
Vakili fard, hamidreza; Ahmadi, Akbar, (2010), Investigating the Characteristics of Fraud in the Financial Statements, Journal of Accounting, No. 210, pp. 36-41.
Ziming Yin, Yinhong Zhao, Xudong Lu, and Huilong Duan, (2014), Screening of Alzheimer’s Disease Based on Multiple Neuropsychological Rating Scales, Hindawi Publishing Corporation Computational and Mathematical Methods in Medicine, Volume 2015, Article ID 258761, 13 pages.
Online References