With the increasingly serious air pollution problem, PM2.5 concentration, as an effective indicator to evaluate air quality, has attracted extensive attention from all sectors of society. Accurate prediction of PM2.5 concentrations is of great significance in providing the public with early air pollution warning information to protect public health. With a decade of development, artificial intelligence technology has given birth to various prediction models with high-performance, in particular, brought new impetus to the prediction of PM2.5 concentrations. In this study, a stacking-based ensemble model with self-adaptive hyper-parameter optimization is proposed to solve the PM2.5 concentrations prediction problem. First, the raw data are preprocessed with the normalization method to reduce the influence of the different orders of magnitude of input variables on model performance. Second, the Bayesian optimization method is used to optimize the hyper-parameters of the base predictors to improve their performance. Finally, a stacking ensemble method is applied to integrate the optimized base predictors into an ensemble model for final prediction. In the experiments, two datasets from the air quality stations in different areas are tested with four metrics to evaluate the performance of the proposed model in PM2.5 concentration prediction. The experimental results show that the proposed model outperforms other baseline models in solving the PM2.5 concentrations prediction problem.
Published in | Applied and Computational Mathematics (Volume 10, Issue 6) |
DOI | 10.11648/j.acm.20211006.14 |
Page(s) | 156-162 |
Creative Commons |
This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited. |
Copyright |
Copyright © The Author(s), 2021. Published by Science Publishing Group |
Ensemble Model, Stacking, Bayesian Optimization, PM2.5 Concentrations, Prediction, Air Quality
[1] | Sun, W., and Sun, J. Y. (2017). Daily PM2.5 concentration prediction based on principal component analysis and LSSVM optimized by cuckoo search algorithm. Journal of Environmental Management, 188, 144-152. |
[2] | Song, Y. Z., Yang, H. L., Peng, J. H., Song, R. Y., Sun, Q., and Li, Y. (2015). Estimating PM2.5 concentrations in Xi'an city using a generalized additive model with multi-source monitoring data. PLoS One, 10 (11), e0142149. |
[3] | Lee, H. J., Chatfield, R. B., and Strawa, A. W. (2016). Enhancing the applicability of satellite remote sensing for PM2.5 estimation using MODIS deep blue AOD and land use regression in California, United States. Environmental Science & Technology, 50 (12), 6546-6555. |
[4] | Hystad, P., Setton, E., Cervantes, A., Poplawski, K., Deschenes, S., Brauer, M., et al. (2011). Creating national air pollution models for population exposure assessment in Canada. Environmental Health Perspectives, 119 (8), 1123-1129. |
[5] | Jian, L., Zhao, Y., Zhu, Y. P., Zhang, M. B., and Bertolatti, D. (2012). An application of ARIMA model to predict submicron particle concentrations from meteorological factors at a busy roadside in Hangzhou, China. Science of the Total Environment, 426, 336-345. |
[6] | Wang, X. P., and Sun, W. B. (2019). Meteorological parameters and gaseous pollutant concentrations as predictors of daily continuous PM2.5 concentrations using deep neural network in Beijing-Tianjin-Hebei, China. Atmospheric Environment, 211, 128-137. |
[7] | Efron, B. (1988). Logistic regression, survival analysis, and the Kaplan-Meier curve. Journal of the American Statistical Association, 83 (402), 414-425. |
[8] | Smola, A., and Vapnik, V. (1997). Support vector regression machines. Advances in Neural Information Processing Systems, 9, 155-161. |
[9] | Hart, P. (1968). The condensed nearest neighbor rule. IEEE Transactions on Information Theory, 14, 515-516. |
[10] | Chen, J., Zeng, G. Q., Zhou, W. N., Du, W., and Lu, K. D. (2018). Wind speed forecasting using nonlinear-learning ensemble of deep learning time series prediction and extremal optimization. Energy Conversion and Management, 165, 681-695. |
[11] | Ferreira, A., and Giraldi, G. (2017). Convolutional neural network approaches to granite tiles classification. Expert Systems with Applications, 84, 1-11. |
[12] | Liu, D., Wang, J. L., and Wang, H. (2015). Short-term wind speed forecasting based on spectral clustering and optimised echo state networks. Renewable Energy, 78, 599-608. |
[13] | Djalalova, I., Monache, D. L., and Wilczak, J. M. (2015). PM2.5 analog forecast and Kalman filter post-processing for the community multiscale air quality (CMAQ) model. Atmospheric Environment, 119, 431-442. |
[14] | Gennaro, G. D., Trizio, L., Gilio, A. D., Pey, J., Perez, N., Cusack, M., et al. (2013). Neural network model for the prediction of PM10 daily concentrations in two sites in the Western Mediterranean. Science of the Total Environment, 463-464, 875-883. |
[15] | Sinnott, R. O., and Guan, Z. (2018). Prediction of air pollution through machine learning approaches on the cloud. In Proceedings of 2018 IEEE/ACM 5th International Conference on Big Data Computing Applications and Technologies (BDCAT), Zurich, Switzerland, December 17, pp. 51-60. |
[16] | Joharestani, M. Z., Cao, C., Ni, X., Bashir, B., and Talebiesfandarani, S. (2019). PM2.5 prediction based on random forest, XGBoost, and deep learning using multisource remote sensing data. Atmosphere, 10 (7), 373. |
[17] | Ho, H. C., and Lin, C. J. (2012). Large-scale linear support vector regression. Journal of Machine Learning Research, 13, 3323-3348. |
[18] | Schapire, R. E. (1990). The strength of weak learnability. Machine Learning, 5 (2), 197-227. |
[19] | Wolpert, D. (1992). Stacked generalization. Neural Networks, 5 (2), 241-259. |
[20] | Fedorova, E., Gilenko, E., and Dovzhenko, S. (2013). Bankruptcy prediction for Russian companies: Application of combined classifiers. Expert Systems with Applications, 40 (18), 7285-7293. |
[21] | Han, B., and Cook, P. (2013). A stacking-based approach to twitter user geolocation prediction. In Proceedings of the 51st Annual Meeting of the Association for Computational Linguistics: System Demonstrations, Sofia, Bulgaria, August 4-9, pp. 7-12. |
[22] | Cai, S. Z., Yin, Y. Q., Wang, D. J., Li, Z. W., and Wang, Y. Z. (2021). A stacking-based ensemble learning method for earthquake casualty prediction. Applied Soft Computing, 101, 107038. |
[23] | Wang, S. J., and Song, G. J. (2018). A deep spatial-temporal ensemble model for air quality prediction. Neurocomputing, 314, 198-206. |
[24] | Di, Q., Amini, H., Shi, L. H., Kloog, I., Silvern, R., Kelly, J., et al. (2019). An ensemble-based model of PM2.5 concentration across the contiguous United States with high spatiotemporal resolution. Environment International, 130, 104909. |
[25] | Maciag, P. S., Kasabov, N., Kryszkiewicz, M., and Bembenik, R. (2019). Air pollution prediction with clustering-based ensemble of evolving spiking neural networks and a case study for London area. Environmental Modelling & Software, 118, 262-280. |
[26] | Zhang, S. Y., Guo, B., Dong, A. I., He, J., Xu, Z. P., and Chen, X. S. (2017). Cautionary tales on air-quality improvement in Beijing. Proceedings of the Royal Society A: Mathematical, Physical, and Engineering Sciences, 473 (2205), 20170457. |
[27] | Liang, X., Li, S., Zhang, S. Y., Huang, H., and Chen, X. S. (2016). PM2.5 Data reliability, consistency and air quality assessment in five Chinese cities. Journal of Geophysical Research Atmospheres, 121 (17), 10220-10236. |
[28] | Nguyen, V., Rana, S., Gupta, S., Li, C., and Venkatesh, S. (2017). Budgeted batch bayesian optimization with unknown batch sizes. arXiv preprint arXiv: 1703.04842. |
[29] | Fröhlich, L. P., Klenske, E. D., Daniel, C. G., and Zeilinger, M. N. (2019). Bayesian optimization for policy search in high-dimensional systems via automatic domain selection. In Proceedings of 2019 IEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), Macau, China, November 3-8, pp. 757-764. |
[30] | Xia, F. Y., Liu, C. Z., Li, Y. Y., and Liu, N. N. (2017). A boosted decision tree approach using Bayesian hyper-parameter optimization for credit scoring. Expert Systems with Applications, 78, 225-241. |
[31] | Cai, T. A., H, H. L., and Zhang, W. Y. (2018). Breast cancer diagnosis using imbalanced learning and ensemble method. Applied and Computational Mathematics, 7 (3), 146-154. |
[32] | Zhang, C., Zhou, J. Z., Li, C. S., Fu, W. L., and Peng, T. (2017). A compound structure of ELM based on feature selection and parameter optimization using hybrid backtracking search algorithm for wind speed forecasting. Energy Conversion and Management, 143, 360-376. |
[33] | Rodrigues, F., Markou, I., and Pereira, F. C. (2019). Combining time-series and textual data for taxi demand prediction in event areas: a deep learning approach. Information Fusion, 49, 120-129. |
APA Style
Haoyuan Zhang, Yilun Jin, Jiaxuan Shi, Shuai Zhang. (2021). Predicting PM2.5 Concentrations Using Stacking-based Ensemble Model. Applied and Computational Mathematics, 10(6), 156-162. https://doi.org/10.11648/j.acm.20211006.14
ACS Style
Haoyuan Zhang; Yilun Jin; Jiaxuan Shi; Shuai Zhang. Predicting PM2.5 Concentrations Using Stacking-based Ensemble Model. Appl. Comput. Math. 2021, 10(6), 156-162. doi: 10.11648/j.acm.20211006.14
AMA Style
Haoyuan Zhang, Yilun Jin, Jiaxuan Shi, Shuai Zhang. Predicting PM2.5 Concentrations Using Stacking-based Ensemble Model. Appl Comput Math. 2021;10(6):156-162. doi: 10.11648/j.acm.20211006.14
@article{10.11648/j.acm.20211006.14, author = {Haoyuan Zhang and Yilun Jin and Jiaxuan Shi and Shuai Zhang}, title = {Predicting PM2.5 Concentrations Using Stacking-based Ensemble Model}, journal = {Applied and Computational Mathematics}, volume = {10}, number = {6}, pages = {156-162}, doi = {10.11648/j.acm.20211006.14}, url = {https://doi.org/10.11648/j.acm.20211006.14}, eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20211006.14}, abstract = {With the increasingly serious air pollution problem, PM2.5 concentration, as an effective indicator to evaluate air quality, has attracted extensive attention from all sectors of society. Accurate prediction of PM2.5 concentrations is of great significance in providing the public with early air pollution warning information to protect public health. With a decade of development, artificial intelligence technology has given birth to various prediction models with high-performance, in particular, brought new impetus to the prediction of PM2.5 concentrations. In this study, a stacking-based ensemble model with self-adaptive hyper-parameter optimization is proposed to solve the PM2.5 concentrations prediction problem. First, the raw data are preprocessed with the normalization method to reduce the influence of the different orders of magnitude of input variables on model performance. Second, the Bayesian optimization method is used to optimize the hyper-parameters of the base predictors to improve their performance. Finally, a stacking ensemble method is applied to integrate the optimized base predictors into an ensemble model for final prediction. In the experiments, two datasets from the air quality stations in different areas are tested with four metrics to evaluate the performance of the proposed model in PM2.5 concentration prediction. The experimental results show that the proposed model outperforms other baseline models in solving the PM2.5 concentrations prediction problem.}, year = {2021} }
TY - JOUR T1 - Predicting PM2.5 Concentrations Using Stacking-based Ensemble Model AU - Haoyuan Zhang AU - Yilun Jin AU - Jiaxuan Shi AU - Shuai Zhang Y1 - 2021/12/02 PY - 2021 N1 - https://doi.org/10.11648/j.acm.20211006.14 DO - 10.11648/j.acm.20211006.14 T2 - Applied and Computational Mathematics JF - Applied and Computational Mathematics JO - Applied and Computational Mathematics SP - 156 EP - 162 PB - Science Publishing Group SN - 2328-5613 UR - https://doi.org/10.11648/j.acm.20211006.14 AB - With the increasingly serious air pollution problem, PM2.5 concentration, as an effective indicator to evaluate air quality, has attracted extensive attention from all sectors of society. Accurate prediction of PM2.5 concentrations is of great significance in providing the public with early air pollution warning information to protect public health. With a decade of development, artificial intelligence technology has given birth to various prediction models with high-performance, in particular, brought new impetus to the prediction of PM2.5 concentrations. In this study, a stacking-based ensemble model with self-adaptive hyper-parameter optimization is proposed to solve the PM2.5 concentrations prediction problem. First, the raw data are preprocessed with the normalization method to reduce the influence of the different orders of magnitude of input variables on model performance. Second, the Bayesian optimization method is used to optimize the hyper-parameters of the base predictors to improve their performance. Finally, a stacking ensemble method is applied to integrate the optimized base predictors into an ensemble model for final prediction. In the experiments, two datasets from the air quality stations in different areas are tested with four metrics to evaluate the performance of the proposed model in PM2.5 concentration prediction. The experimental results show that the proposed model outperforms other baseline models in solving the PM2.5 concentrations prediction problem. VL - 10 IS - 6 ER -