Shrinkage methods for linear regression were developed over the last ten years to reduce the weakness of ordinary least squares (OLS) regression with respect to prediction accuracy. And, high dimensional data are quickly growing in many areas due to the development of technological advances which helps collect data with a large number of variables. In this paper, shrinkage methods were used to evaluate regression coefficients effectively for the high-dimensional multiple regression model, where there were fewer samples than predictors. Also, regularization approaches have become the methods of choice for analyzing such high dimensional data. We used three regulation methods based on penalized regression to select the appropriate model. Lasso, Ridge and Elastic Net have desirable features; they can simultaneously perform the regulation and selection of appropriate predictor variables and estimate their effects. Here, we compared the performance of three regular linear regression methods using cross-validation method to reach the optimal point. Prediction accuracy using the least squares error (MSE) was evaluated. Through conducting a simulation study and studying real data, we found that all three methods are capable to produce appropriate models. The Elastic Net has better prediction accuracy than the rest. However, in the simulation study, the Elastic Net outperformed other two methods and showed a less value in terms of MSE.
Published in | American Journal of Theoretical and Applied Statistics (Volume 8, Issue 5) |
DOI | 10.11648/j.ajtas.20190805.14 |
Page(s) | 185-192 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2019. Published by Science Publishing Group |
Shrinkage Estimator, High Dimension, Cross-Validation, Ridge Regression, Elastic Net
[1] | Doreswamy, Chanabasayya. M. Vastrad. (2013). "Performance Analysis Of Regularized Linear Regression Models For Oxazolines And Oxazoles Derivitive Descriptor Dataset," International Journal of Computational Science and Information Technology (IJCSITY) Vol. 1, No. 4. 10.5121/ijcsity.2013.1408. |
[2] | Fan. J, Li. R (2001). "Variable selection via nonconcave penalized likelihood and its oracleproperties," Journal of the American Statistical Association 96: 1348-1360. |
[3] | Hoerl. A. E, Kennard. R. W (1970). "Ridge regression: Biased estimation for nonorthogonal problems," Technometrics. 12 (1) 55–67. |
[4] | Hastie. T, Tibshirani. R, and Friedman, J (2001). The Elements of Statistical Learning; Data Mining, Inference and Prediction. New York, Springer. |
[5] | James. G, Witten. D, Hastie. T, R. Tibshirani. (2013). An Introduction to Statistical Learning with Applications in R. Springer New York Heidelberg Dordrecht London. |
[6] | Jerome. Friedman, Trevor Hastie (2009). "Regularization Paths for Generalized Linear Models via Coordinate Descent", www.jstatsoft.org/v33/i01/paper. |
[7] | Qiu. D, (2017). An Applied Analysis of High-Dimensional Logistic Regression. simon fraser niversity. |
[8] | Tibshirani. R, (1996). "Regression shrinkage and selection via the LASSO," Journal of the Royal Statistical Society. Series B (Methodological). 267-288. |
[9] | Tibshirani. R, Hastie. T, Wainwright. M., (2015). Statistical Learning with Sparsity The Lasso and Generalizations. Chapman and hall book |
[10] | Yuzbasi. B, Arashi. M, Ahmed. S. E (2017). "Big Data Analysis Using Shrinkage Strategies," arXiv: 1704.05074v1 [stat.ME] 17 Apr 2017. |
[11] | Zhang. F, (2011). Cross-Valitation and regression analiysis in high dimentional sparse linear models. Stanford University. |
[12] | Zhao. P, Yu. B, (2006). "On model selection consistency of lasso," Journal of Machine Learning Research 7 (11) 2541–2563. |
[13] | Zou. H, and Hastie. T (2005). "Regularization and variable selection via the elastic net," J. Roy.Stat.Soc.B 67, 301–320. |
[14] | Zou. H (2006). "The adaptive lasso and its oracle properties.", Journal of the American Statistical Association 101: 1418-1429. |
APA Style
Zari Farhadi Zari Farhadi, Reza Arabi Belaghi, Ozlem Gurunlu Alma. (2019). Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data. American Journal of Theoretical and Applied Statistics, 8(5), 185-192. https://doi.org/10.11648/j.ajtas.20190805.14
ACS Style
Zari Farhadi Zari Farhadi; Reza Arabi Belaghi; Ozlem Gurunlu Alma. Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data. Am. J. Theor. Appl. Stat. 2019, 8(5), 185-192. doi: 10.11648/j.ajtas.20190805.14
AMA Style
Zari Farhadi Zari Farhadi, Reza Arabi Belaghi, Ozlem Gurunlu Alma. Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data. Am J Theor Appl Stat. 2019;8(5):185-192. doi: 10.11648/j.ajtas.20190805.14
@article{10.11648/j.ajtas.20190805.14, author = {Zari Farhadi Zari Farhadi and Reza Arabi Belaghi and Ozlem Gurunlu Alma}, title = {Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data}, journal = {American Journal of Theoretical and Applied Statistics}, volume = {8}, number = {5}, pages = {185-192}, doi = {10.11648/j.ajtas.20190805.14}, url = {https://doi.org/10.11648/j.ajtas.20190805.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.ajtas.20190805.14}, abstract = {Shrinkage methods for linear regression were developed over the last ten years to reduce the weakness of ordinary least squares (OLS) regression with respect to prediction accuracy. And, high dimensional data are quickly growing in many areas due to the development of technological advances which helps collect data with a large number of variables. In this paper, shrinkage methods were used to evaluate regression coefficients effectively for the high-dimensional multiple regression model, where there were fewer samples than predictors. Also, regularization approaches have become the methods of choice for analyzing such high dimensional data. We used three regulation methods based on penalized regression to select the appropriate model. Lasso, Ridge and Elastic Net have desirable features; they can simultaneously perform the regulation and selection of appropriate predictor variables and estimate their effects. Here, we compared the performance of three regular linear regression methods using cross-validation method to reach the optimal point. Prediction accuracy using the least squares error (MSE) was evaluated. Through conducting a simulation study and studying real data, we found that all three methods are capable to produce appropriate models. The Elastic Net has better prediction accuracy than the rest. However, in the simulation study, the Elastic Net outperformed other two methods and showed a less value in terms of MSE.}, year = {2019} }
TY - JOUR T1 - Analysis of Penalized Regression Methods in a Simple Linear Model on the High-Dimensional Data AU - Zari Farhadi Zari Farhadi AU - Reza Arabi Belaghi AU - Ozlem Gurunlu Alma Y1 - 2019/10/16 PY - 2019 N1 - https://doi.org/10.11648/j.ajtas.20190805.14 DO - 10.11648/j.ajtas.20190805.14 T2 - American Journal of Theoretical and Applied Statistics JF - American Journal of Theoretical and Applied Statistics JO - American Journal of Theoretical and Applied Statistics SP - 185 EP - 192 PB - Science Publishing Group SN - 2326-9006 UR - https://doi.org/10.11648/j.ajtas.20190805.14 AB - Shrinkage methods for linear regression were developed over the last ten years to reduce the weakness of ordinary least squares (OLS) regression with respect to prediction accuracy. And, high dimensional data are quickly growing in many areas due to the development of technological advances which helps collect data with a large number of variables. In this paper, shrinkage methods were used to evaluate regression coefficients effectively for the high-dimensional multiple regression model, where there were fewer samples than predictors. Also, regularization approaches have become the methods of choice for analyzing such high dimensional data. We used three regulation methods based on penalized regression to select the appropriate model. Lasso, Ridge and Elastic Net have desirable features; they can simultaneously perform the regulation and selection of appropriate predictor variables and estimate their effects. Here, we compared the performance of three regular linear regression methods using cross-validation method to reach the optimal point. Prediction accuracy using the least squares error (MSE) was evaluated. Through conducting a simulation study and studying real data, we found that all three methods are capable to produce appropriate models. The Elastic Net has better prediction accuracy than the rest. However, in the simulation study, the Elastic Net outperformed other two methods and showed a less value in terms of MSE. VL - 8 IS - 5 ER -