نوع مقاله : مقاله پژوهشی - کاربردی

نویسندگان

1 گروه مدیریت صنعتی، دانشکده مدیریت و حسابداری، واحد رشت، دانشگاه آزاد اسلامی، رشت، ایران.

2 گروه مدیرت صنعتی، دانشکده مدیریت و حسابداری، واحد رشت، دانشگاه آزاد اسلامی، رشت، ایران.

چکیده

هدف: الگوریتم‌­های خوشه‌بندی، ابزارهای مفیدی برای درک ساختار داده‌­ها و طبقه‌بندی آن‌ها در مجموعه داده‌­های مختلف می‌­باشند. با‌توجه ‌به اهمیت به‌کارگیری این الگوریتم‌­ها در تحلیل داده‌­های بازارهای مالی که از حجم و گستردگی بالایی برخوردارند، این پژوهش به‌منظور انتخاب بهترین الگوریتم خوشه‌بندی برای خوشه­‌بندی شرکت­‌های حاضر در بورس اوراق بهادار تهران در حوزه مالی از الگوریتم‌های خوشه‌­بندی مختلف استفاده نموده و به ارزیابی اعتبار این الگوریتم‌­ها و انتخاب بهترین الگوریتم پرداخته است.
روش‌شناسی پژوهش: این پژوهش از نظر هدف، کاربردی و از نظر روش اجرا توصیفی و از نوع کمی (مدل‌سازی ریاضی) است. جامعه آماری تحقیق شامل 403 شرکت حاضر در بورس اوراق بهادار تهران در سال 98 است که عملکرد آن‌ها بر‌اساس چهار معیار مالی ارزیابی شده است.
یافته‌ها: پس از خوشه‌­بندی شرکت‌­های مورد‌بررسی توسط پنج الگوریتم خوشه‌­بندی K-Means، EM، COBWEB، الگوریتم مبتنی‌بر چگالی و روش وارد، از هفت شاخص RS، DB، دان، SD، خلوص، آنتروپی و زمان برای ارزیابی الگوریتم­‌های خوشه‌­بندی استفاده گردید. در نهایت، عملکرد نهایی الگوریتم‌­های به‌کار‌ رفته بر‌اساس روش­‌های تاپسیس، ویکور و تحلیل پوششی داده­‌ها مورد تجزیه‌وتحلیل قرار گرفت. بر‌اساس نتایج، روش K-Means از عملکرد بهتری در خوشه­‌بندی شرکت‌­ها بر‌اساس مجموعه ‌داده‌­های مالی برخوردار است.
اصالت/ارزش افزوده علمی: از آن‌جایی‌که هیچ الگوریتم خوشه‌بندی‌ نمی‌تواند بهترین عملکرد را در تمام اندازه‌گیری‌ها برای هر مجموعه داده داشته باشد، این پژوهش ضمن به‌کارگیری ترکیبی از معیارهای چند‌گانه به‌منظور تجزیه‌وتحلیل الگوریتم‌های خوشه‌بندی داده‌های مربوط به حوزه ارزیابی عملکرد مالی شرکت‌ها، به ارایه پیشنهاداتی پرداخته و نتایج این پژوهش برای سرمایه‌گذاران حوزه مالی کاربرد موثر داشته که منجر به انتخاب بهینه سبد سرمایه‌گذاری می‌شود.

کلیدواژه‌ها

موضوعات

عنوان مقاله [English]

Developing a hybrid model for comparative analysis of financial data clustering algorithms

نویسندگان [English]

  • Mojtaba Movahedi 1
  • Mahdi Homayounfar 2
  • Mehdi Fadaei 2
  • Mansour Soufi 2

1 Department of Industrial Management, Faculty of Management and Accounting, Rasht Branch, Islamic Azad University, Rasht, Iran.

2 Department of Industrial Management, Faculty of Management and Accounting, Rasht Branch, Islamic Azad University, Rasht, Iran.

چکیده [English]

Purpose: Clustering algorithms are useful tools for understanding data structure and classifying them into different data sets. Due to the importance of using these algorithms in analyzing financial market data that have a high volume and scope, this study in order to select the best clustering algorithm for clustering companies listed on the Tehran Stock Exchange in the field of finance from It has used different clustering algorithms and evaluated the validity of these algorithms and selected the best algorithm.
Methodology: This research is applied in terms of purpose and descriptive in terms of implementation method and is of quantitative type (mathematical modeling). The statistical population of the research includes 403 companies listed on the Tehran Stock Exchange in 2019, whose performance has been evaluated based on four financial criteria.
Findings: After clustering the surveyed companies by five clustering algorithms, namely K-means, EM, COBWEB, density-based algorithm and ward method, seven indicators RS, DB, Dun, SD, Purity, Entropy and Time were used to evaluate the algorithms. Finally, the total performance of the algorithms was analyzed based on TOPSIS, VICOR and DEA methods. Based on the results, K-means has a better performance in clustering based on the financial data sets.
Originality/Value: Since no clustering algorithm can have the best performance in all measurements for each data set, this study uses a combination of multiple criteria to analyze data clustering algorithms related to the field of financial performance appraisal. Companies have provided suggestions and the results of this study have been used effectively for investors in the field of finance, which leads to the optimal choice of investment portfolio.

کلیدواژه‌ها [English]

  • Clustering
  • Multi-criteria decision making
  • Financial performance evaluation
[1]     Li, C., Chen, Y., & Shang, Y. (2022). A review of industrial big data for decision making in intelligent manufacturing. Engineering science and technology, an international journal, 29, 101021. https://www.sciencedirect.com/science/article/pii/S2215098621001336
[2]     Iqbal, R., Doctor, F., More, B., Mahmud, S., & Yousuf, U. (2020). Big data analytics and computational Intelligence for cyber--physical systems: recent trends and state of the art applications. Future generation computer systems, 105, 766–778.
[3]     Huang, Y., Gao, Y., Gan, Y., & Ye, M. (2021). A new financial data forecasting model using genetic algorithm and long short-term memory network. Neurocomputing, 425, 207–218.
[4]     Iqbalnia, M., Pouyanfar, A., & Maleki, M. (2015). Equilibrium modeling of stocks in Tehran stock exchange using a three-stage clustering approach. Financial management perspective, 5(11), 133–158. (In Persian). https://jfmp.sbu.ac.ir/article_94651.html?lang=fa
[5]     Salehi Vaziri, S. M., & Barzaghi Khaneghah, J. (2020). Investigating the effect of different data clustering methods on the accuracy of models related to accounting estimates by comparing traditional and classical clustering methods. Management accounting, 13(44), 165–178. (In Persian). https://jma.srbiau.ac.ir/article_15515_8892be7c2957d2bad0b53a712c54f5ca.pdf
[6]     Rahman, S. H. (2003). Modelling of international market selection process: a qualitative study of successful Australian international businesses. Qualitative market research: an international journal, 6(2), 119–132.
[7]     Nachev, A., Hill, S., Barry, C., & Stoyanov, B. (2010). Fuzzy, distributed, instance counting, and default artmap neural networks for financial diagnosis. International journal of information technology & decision making, 9(06), 959–978.
[8]     Zhang, Z., Liu, Z., Martin, A., Liu, Z., & Zhou, K. (2021). Dynamic evidential clustering algorithm. Knowledge-based systems, 213, 106643. https://www.sciencedirect.com/science/article/pii/S0950705120307723
[9]     Yu, H., Chen, L., & Yao, J. (2021). A three-way density peak clustering method based on evidence theory. Knowledge-based systems, 211, 106532. https://www.sciencedirect.com/science/article/pii/S0950705120306614
[10]   Sadeghi, H., & Forooghi Dehnavi, S. (2017). Codification of dendrograms portfolio based on Euclidean distance measure (a comparison between different methods of hierarchical clustering). Financial knowledge of security analysis (financial studies), 10(34), 89–105. (In Persian). https://jfksa.srbiau.ac.ir/article_10606.html?lang=en
[11]   Serafrazi, A. (2018). Half a century after clustering; investigation and evaluation of clustering approaches and methods with multi-criteria decision analysis. Research in science, engineering and technology, 4(2), 65–84. (In Persian). https://www.noormags.ir/view/fa/articlepage/
[12]   Adel, A., Mahdavi Rad, A., & Mousakhai, M. K. (2015). Designing a combined model of data mining and multi-criteria decision making (case study: Iran statistics center subsidies database). Journal of operational research and its applications, 12(1), 95-111. (In Persian). http://jamlu.liau.ac.ir/article-1-1045-fa.html
[13]   Mirakbari, Z., Mojavarian, S. M., Rafiei, H., & Amirnejad, H. (2020). Clustering of Iran pistachio export target countries based on combined hyper-innovative algorithms. Research in economics and agricultural development of iran, 51(3), 413–427. (In Persian). DOI:10.22059/ijaedr.2018.263272.668633
[14]   Kou, G., Peng, Y., & Wang, G. (2014). Evaluation of clustering algorithms for financial risk analysis using MCDM methods. Information sciences, 275, 1–12.
[15]   Fisher, D. H. (1987). Knowledge acquisition via incremental conceptual clustering. Machine learning, 2, 139–172. https://link.springer.com/article/10.1007/BF00114265
[16]   Gennari, J. H., Langley, P., & Fisher, D. (1989). Models of incremental concept formation. Artificial intelligence, 40(1–3), 11–61.
[17]   Zadedehbalaei, A., Bagheri, A., & Afshar, H. (2017). A study on DBSCAN clustering algorithm issues and a survey on its improvements. Soft computing journal, 6(1), 2–37. (In Persian). https://scj.kashanu.ac.ir/article_111412_en.html?lang=fa
[18]   Kazemi, R., & Porhemmat,  J. (2018). Investigating the effect of hierarchical clustering methods on accurately modeling of runoff coefficient in Karkheh Basin. Watershed engineering and management, 10(1), 81–94. (In Persian). https://jwem.areeo.ac.ir/article_115724.html?lang=en
[19]   Luna-Romera, J. M., Martínez-Ballesteros, M., García-Gutiérrez, J., & Riquelme, J. C. (2019). External clustering validity index based on chi-squared statistical test. Information sciences, 487, 1–17. https://doi.org/10.1016/j.ins.2019.02.046
[20]   Wang, W., & Zhang, Y. (2007). On fuzzy cluster validity indices. Fuzzy sets and systems, 158(19), 2095–2117.
[21]   Fazel Zarandi, M. H., Ghazanfar Ahri, S., & Ghafari Nasab, N. (2012). A new exponential cluster validity index using Jaccard distance. Industrial management studies, 10(27), 22–43. (In Persian). https://jims.atu.ac.ir/article_1901.html
[22]   Momeni, M. (2018). Data clustering (cluster analysis). Mansoor Momeni Publication. (In Persian). https://www.gisoom.com/book/1761036/
[23]   Sumathi, S., & Grace, H. G. (2020). Withdrawn: a novel distance measure for microarray dataset using entropy. Materials today: proceedings. https://doi.org/10.1016/j.matpr.2020.10.520
[24]   Rokach, L. (2010). Ensemble-based classifiers. Artificial intelligence review, 33, 1–39.
[25]   Homayounfar, M., & Amirteimoori, A. R. (2019). Balanced evaluation of suppliers performance by applying a hybrid DEMATEL-DEA approach in presence of undesirable factors. Journal of new researches in mathematics, 5(18), 31–48. (In Persian). https://jnrm.srbiau.ac.ir/article_14279.html?lang=fa
[26]   Shariati, R., & Afkhami Ardakani, M. (2016). Identifying and prioritizing the performance evaluation indicators of research and development centers based on the balanced scorecard model. Scientific-promotional monthly of oil and gas exploration and production, 137, 25–32. (In Persian). https://ekteshaf.nioc.ir/article-1-1920-fa.html
[27]   Ergul, N., & Seyfullahogullari, C. A. (2012). The ranking of retail companies trading in ISE. European journal of scientific research, 70(1), 29–37.
[28]   Nikbakht, M. reza, Firooznia, A., & Kalhornia, H. (2019). The relationship between earnings per share to price ratio (E / P) and future earnings growth. Empirical studies in financial accounting, 16(61), 55–78. (In Persian). DOI:10.22054/qjma.2019.22686.1621
[29]   Shakeri, M. T., Sabaghian, E., & Esmaeili, H. (2012). CCK (clustering-classification-kappa) a new validation index to assessing clustering results of gene expression data. North khorasan university of medical sciences, 3(5), 67–78. DOI:10.29252/jnkums.3.5.S5.67
[30]   Shakri, M., & Abdulahi, M. (2015). Investigating the impact of different data clustering methods on the accuracy of models related to accounting estimates by comparing traditional and classical clustering methods [presentation]. International conference on applied research in information technology, computer and communication.
[31]   Dehghan Nayeri, M. (2017). A new cluster validity index based on fuzzy cardinality. Modern research in decision making, 2(3), 99–122.
[32]   Goldberg, D. E. (1989). Genetic algorithms in search, optimization and machine learning. Addison-Wesley Professional.
[33]   Hirano, S., & Tsumoto, S. (2010). Multiscale comparison and clustering of three-dimensional trajectories based on curvature maxima. International journal of information technology & decision making, 9(6), 889–904.
[34]   McNicholas, P. D. (2016). Model-based clustering. Journal of classification, 33, 331–373.
[35]   Kimes, P. K., Liu, Y., Neil Hayes, D., & Marron, J. S. (2017). Statistical significance for hierarchical clustering. Biometrics, 73(3), 811–821. DOI:10.1111/biom.12647
[36]   López-Rubio, E., Palomo, E. J., & Ortega-Zamorano, F. (2018). Unsupervised learning by cluster quality optimization. Information siences, 436, 31–55.
[37]   Renjith, S., Sreekumar, A., & Jathavedan, M. (2020). Performance evaluation of clustering algorithms for varying cardinality and dimensionality of data sets. Materials today: proceedings, 27, 627–633.
[38]   Hassan, B. A., Rashid, T. A., & Mirjalili, S. (2021). Performance evaluation results of evolutionary clustering algorithm star for clustering heterogeneous datasets. Data in brief, 36, 107044. https://www.sciencedirect.com/science/article/pii/S2352340921003280
[39]   Lossio-Ventura, J. A., Gonzales, S., Morzan, J., Alatrista-Salas, H., Hernandez-Boussard, T., & Bian, J. (2021). Evaluation of clustering and topic modeling methods over health-related tweets and emails. Artificial intelligence in medicine, 117, 102096. DOI:10.1016/j.artmed.2021.102096
[40]   Keršuliene, V., & Turskis, Z. (2011). Integrated fuzzy multiple criteria decision making model for architect selection. Technological and economic development of economy, 17(4), 645–666.
[41]   Charnes, A., Cooper, W. W., & Rhodes, E. (1978). Measuring the efficiency of decision making units. European journal of operational research, 2(6), 429–444.
[42]   Bagheri Mazraeh, N., Daneshvar, A., & Madanchi Za, M. (2022). Development a new ensemble learning approach for stock portfolio selection using multiclass SVM and genetic algorithm. Journal of fnancial engineering and securities management, 13(50), 282–305. (In Persian). https://fej.ctb.iau.ir/article_692412.html?lang=en
[43]   Banker, R., Chen, J. Y. S., & Klumpes, P. (2016). A trade-level DEA model to evaluate relative performance of investment fund managers. European journal of operational research, 255(3), 903–910.
[44]   Hamidizadeh, M. R., & Shahab Al-Dini, M. (2015). Explanation of efficiency and analysis of returns in relation to the scale of the country’s electricity industry. Business management quarterly, 26. (In Persian). https://journals.iau.ir/article_525559_1853b9c5aa21ebea0279b2d3401be5be.pdf
[45]   Rousseeuw, P. J. (1987). Silhouettes: a graphical aid to the interpretation and validation of cluster analysis. Journal of computational and applied mathematics, 20, 53–65.
[46]   Zavadskas, E. K., Mardani, A., Turskis, Z., Jusoh, A., & Nor, K. M. (2016). Development of TOPSIS method to solve complicated decision-making problems - an overview on developments from 2000 to 2015. International journal of information technology and decision making, 15(3), 645–682. DOI:10.1142/S0219622016300019