Document Type : Original Article

Authors

Department of Industrial and Systems Engineering, Isfahan University of Technology, Isfahan, Iran.

Abstract

Purpose: The purpose of this paper is to present a new methodology for statistical modeling, which, unlike all commonly developed models and algorithms, maximizes the reliability of the results instead of the resulting accuracy. Accordingly, a new class of statistical modeling approaches has been developed by replacing conventional processes with the proposed process.
Methodology: The multiple linear regression method has been selected to implement the proposed methodology in this paper. To comprehensively evaluate the performance of the proposed regression model, 10 standard datasets from the literature on statistical modeling have been considered.
Findings: Overall, the results show that in 65% of the studied data sets, the proposed model can generalize more than the usual multiple linear regression. The proposed regression model, on average, has been able to improve the accuracy of the modeling by 5.571% and 6.466% in mean absolute error and mean square error, respectively, compared to its classic version. These results clearly show the significant effect of reliability of the results on the degree of generalizability, which is basically not considered in the usual statistical modeling processes.
Originality/Value: Statistical modeling is one of the most important tools for simulating real-world systems and data sets that are often used to make decisions in a wide range of applications. Several different approaches have been developed in the literature with different features to cover real-world issues with the desired accuracy. However, such methods follow a similar concept and idea in the modeling process. The performance basis in all conventional statistical modeling approaches is based on the assumption that maximum accuracy in experimental and inaccessible data will be obtained from models with minimization of error in training data. Although this is a logical and standard procedure in traditional statistical modeling spaces, it is not the unique way to achieve maximum generalizability. In other words, the generalizability of the model simultaneously depends on the model's accuracy and the level of results' reliability. In this paper, a new methodology for statistical modeling is presented, which, unlike all commonly developed models and algorithms, maximizes the reliability of the results instead of the resulting accuracy.

Keywords

Main Subjects

[1]    Khashei, M., & Bijari, M. (2010). An artificial neural network (p, d, q) model for timeseries forecasting. Expert systems with applications, 37(1), 479–489.
[2]    Khashei, M., & Bijari, M. (2011). A novel hybridization of artificial neural networks and ARIMA models for time series forecasting. Applied soft computing, 11(2), 2664–2675.
[3]    Catalina, T., Iordache, V., & Caracaleanu, B. (2013). Multiple regression model for fast prediction of the heating energy demand. Energy and buildings, 57, 302–312.
[4]   Yildiz, B., Bilbao, J. I., & Sproul, A. B. (2017). A review and analysis of regression and machine learning models on commercial building electricity load forecasting. Renewable and sustainable energy reviews, 73, 1104–1122.
[5]    Fitzmaurice, G. M. (2016). Regression. Diagnostic histopathology, 22(7), 271–278.
[6]     Rath, S., Tripathy, A., & Tripathy, A. R. (2020). Prediction of new active cases of coronavirus disease (COVID-19) pandemic using multiple linear regression model. Diabetes & metabolic syndrome: clinical research & reviews, 14(5), 1467–1474.
[7]    Tang, Q., Huang, L., & Pan, Z. (2019). Multiple linear regression model for vascular aging assessment based on radial artery pulse wave. European journal of integrative medicine, 28, 92–97. https://doi.org/10.1016/j.eujim.2019.05.006
[8]    Huang, Z., Lin, S., Long, L., Cao, J., Luo, F., Qin, W., … Gregersen, H. (2020). Predicting the morbidity of chronic obstructive pulmonary disease based on multiple locally weighted linear regression model with K-means clustering. International journal of medical informatics, 139, 104141. https://doi.org/10.1016/j.ijmedinf.2020.104141
[9]   Ciulla, G., & D’Amico, A. (2019). Building energy performance forecasting: a multiple linear regression approach. Applied energy, 253, 113500. https://doi.org/10.1016/j.apenergy.2019.113500
[10]   Park, S. K., Moon, H. J., Min, K. C., Hwang, C., & Kim, S. (2018). Application of a multiple linear regression and an artificial neural network model for the heating performance analysis and hourly prediction of a large-scale ground source heat pump system. Energy and buildings, 165, 206–215. https://doi.org/10.1016/j.enbuild.2018.01.029
[11]  Çerçi, K. N., & Hürdougan, E. (2020). Comparative study of multiple linear regression (MLR) and artificial neural network (ANN) techniques to model a solid desiccant wheel. International communications in heat and mass transfer, 116, 104713. https://doi.org/10.1016/j.icheatmasstransfer.2020.104713
[12]  Khemet, B., & Richman, R. (2018). A univariate and multiple linear regression analysis on a national fan (de) Pressurization testing database to predict airtightness in houses. Building and environment, 146, 88–97. https://doi.org/10.1016/j.buildenv.2018.09.030
[13]   Shine, P., Scully, T., Upton, J., & Murphy, M. D. (2018). Multiple linear regression modelling of on-farm direct water and electricity consumption on pasture based dairy farms. Computers and electronics in agriculture, 148, 337–346. https://doi.org/10.1016/j.compag.2018.02.020
[14]  Trigo-González, M., Batlles, F. J., Alonso-Montesinos, J., Ferrada, P., Del Sagrado, J., Martinez-Durbán, M., Cortés, M., Partillo, C., & Marzo, A. (2019). Hourly PV production estimation by means of an exportable multiple linear regression model. Renewable energy, 135, 303–312. https://doi.org/10.1016/j.bse.2020.104052
[15]  Siavash, N. K., Ghobadian, B., Najafi, G., Rohani, A., Tavakoli, T., Mahmoodi, E., & Mamat, R. (2021). Prediction of power generation and rotor angular speed of a small wind turbine equipped to a controllable duct using artificial neural network and multiple linear regression. Environmental research196, 110434. https://doi.org/10.1016/j.envres.2020.110434
[16] Xu, N., Meng, F., Zhou, G., Li, Y., Wang, B., & Lu, H. (2020). Assessing the suitable cultivation areas for Scutellaria baicalensis in China using the Maxent model and multiple linear regression. Biochemical systematics and ecology, 90, 104052. https://doi.org/10.1016/j.bse.2020.104052
[17] Abrougui, K., Gabsi, K., Mercatoris, B., Khemis, C., Amami, R., & Chehaibi, S. (2019). Prediction of organic potato yield using tillage systems and soil properties by artificial neural network (ANN) and multiple linear regressions (MLR). Soil and tillage research, 190, 202–208. https://doi.org/10.1016/j.still.2019.01.011
[18]  Lee, Y., Jung, C., & Kim, S. (2019). Spatial distribution of soil moisture estimates using a multiple linear regression model and Korean geostationary satellite (COMS) data. Agricultural water management, 213, 580–593. https://doi.org/10.1016/j.agwat.2018.09.004
[19]  Xie, X., Wu, T., Zhu, M., Jiang, G., Xu, Y., Wang, X., & Pu, L. (2021). Comparison of random forest and multiple linear regression models for estimation of soil extracellular enzyme activities in agricultural reclaimed coastal saline land. Ecological indicators, 120, 106925. https://doi.org/10.1016/j.ecolind.2020.106925
[20]  Pahlavan-Rad, M. R., Dahmardeh, K., Hadizadeh, M., Keykha, G., Mohammadnia, N., Gangali, M., Keikha, M., Davatgar, N., & Brungard, C. (2020). Prediction of soil water infiltration using multiple linear regression and random forest in a dry flood plain, eastern Iran. Catena, 194, 104715. https://doi.org/10.1016/j.catena.2020.104715
[21]  Palmer, D., Pou, J. O., Gonzalez-Sabaté, L., & Diaz-Ferrero, J. (2018). Multiple linear regression based congener profile correlation to estimate the toxicity (TEQ) and dioxin concentration in atmospheric emissions. Science of the total environment, 622, 510–516. https://doi.org/10.1016/j.scitotenv.2017.11.344
[22]   Stoichev, T., Coelho, J. P., De Diego, A., Valenzuela, M. G. L., Pereira, M. E., de Chanvalon, A. T., & Amouroux, D. (2020). Multiple regression analysis to assess the contamination with metals and metalloids in surface sediments (Aveiro Lagoon, Portugal). Marine pollution bulletin, 159, 111470. https://doi.org/10.1016/j.marpolbul.2020.111470
[23]  Yuchi, W., Gombojav, E., Boldbaatar, B., Galsuren, J., Enkhmaa, S., Beejin, B., ... & Allen, R. W. (2019). Evaluation of random forest regression and multiple linear regression for predicting indoor fine particulate matter concentrations in a highly polluted city. Environmental pollution245, 746-753. https://doi.org/10.1016/j.envpol.2018.11.034
[24]  Tang, W., Li, Y., Yu, Y., Wang, Z., Xu, T., Chen, J., … Li, X. (2020). Development of models predicting biodegradation rate rating with multiple linear regression and support vector machine algorithms. Chemosphere, 253, 126666. https://doi.org/10.1016/j.chemosphere.2020.126666
[25]  Hosseinzadeh, A., Baziar, M., Alidadi, H., Zhou, J. L., Altaee, A., Najafpoor, A. A., & Jafarpour, S. (2020). Application of artificial neural network and multiple linear regression in modeling nutrient recovery in vermicompost under different conditions. Bioresource technology, 303, 122926. https://doi.org/10.1016/j.biortech.2020.122926
[26] Etemadi, S., & Khashei, M. (2021). Etemadi multiple linear regression. Measurement, 186, 1–19. https://doi.org/10.1016/j.measurement.2021.110080
[27]  Fanaee-T, H., & Gama, J. (2014). Event labeling combining ensemble detectors and background knowledge. Progress in artificial intelligence, 2, 113–127. https://doi.org/10.1007/s13748-013-0040-3