| Peer-Reviewed

Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis

Received: 12 July 2020     Published: 22 August 2020
Views:       Downloads:
Abstract

In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.

Published in Applied and Computational Mathematics (Volume 9, Issue 4)
DOI 10.11648/j.acm.20200904.15
Page(s) 130-145
Creative Commons

This is an Open Access article, distributed under the terms of the Creative Commons Attribution 4.0 International License (http://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution and reproduction in any medium or format, provided the original work is properly cited.

Copyright

Copyright © The Author(s), 2020. Published by Science Publishing Group

Keywords

Data Diagnosis, One-Way MANOVA, Principal Component Analysis, Factor Analysis, Discriminant Analysis

References
[1] Hui-Ling Chen, Bo Yang, Jie Liu, Da-You Liu. A support vector machine classifier with rough set-based feature selection for breast cancer diagnosis [J]. Expert Systems With Applications, 2011, 38 (7).
[2] Zheng Ying, Wu Chunxiao, Zhang Minlu. The prevalence and disease characteristics of breast cancer in China [J]. Chinese Journal of Cancer, 2013, 23 (008): 561-569. (in Chinese).
[3] Yang Ling, Li Liandi, Chen Yude, et al. Estimation and prediction of the incidence and death trend of breast cancer in China [J]. Chinese Journal of Oncology, 2006, 28 (006): 438-440. (in Chinese).
[4] M. Eskelinen, E. Hämäläinen, V.-M. Kosmat, I. Penttilä, E. Alhava, K. Syrjänent. 7 Comparison of tumour markers CEA, AFP, CA15-3, TPS and NEU in breast cancer diagnosis [J]. The Breast, 1995, 4 (1).
[5] Na Liu, Er-Shi Qi, Man Xu, Bo Gao, Gui-Qiu Liu. A novel intelligent classification model for breast cancer diagnosis [J]. Information Processing and Management, 2019, 56 (3).
[6] M. Patrício, J. Pereira, J. Crisóstomo, P. Matafome, M. Gomes, R. Seic A, and F. Caramelo. Using resistin, glucose, age and bmi to predict the presence of breast cancer. Bmc Cancer, 18 (1): 29, 2018.
[7] Jiang Yina, Chen Naihong. Research on the mechanism of CCL2/MCP-1 in related diseases [J]. Chinese Pharmacological Bulletin, 2016, 32 (12): 1634-1638. (in Chinese).
[8] Yue Chen. Adiponectin-a new type of lipid-derived hormone [J]. Medical Journal of Chinese People's Liberation Army, 2003 (02): 183-185. (in Chinese).
[9] Wallace, Tara M., Levy, Jonathan C., Matthews, & David R. Use and Abuse of HOMA Modeling. [J]. Diabetes Care, 2004.
[10] Srivastava M. S, Hui T. K. On assessing multivariate normality based on shapiro-wilk W statistic. 1987, 5 (1): 15-18.
[11] Liu-Cang Wu, Deng-Ke Xu. Maximum Likelihood Estimation of Normal Distribution Parameters under Data Transformation [J]. Journal of Data Analysis, 2010, 5 (5): 15-24. (in Chinese).
[12] Dai Jinhui, Yuan Jing. Comparison of single-factor analysis of variance and multiple linear regression analysis methods [J]. Statistics and Decision, 2016 (09): 23-26. (in Chinese).
[13] Guo Zhibo, Liu Huajun, Zheng Yujie, et al. Enhanced linear discriminant analysis criteria based on the unification principle of PCA and LDA [J]. Journal of Image and Graphics, 2008, 13 (4): 702-708. (in Chinese).
[14] Lin Haiming, Du Zifang. Problems that should be paid attention to in the comprehensive evaluation of principal component analysis [J]. Statistical Research, 2013, 30 (08): 25-31. (in Chinese).
[15] P. A. Lachenbruch and M. R. Mickey. Estimation of error rates in discriminant analysis. Technomet- rics, 10 (1): 1–11, 1968.
Cite This Article
  • APA Style

    Ruixuan Dong. (2020). Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis. Applied and Computational Mathematics, 9(4), 130-145. https://doi.org/10.11648/j.acm.20200904.15

    Copy | Download

    ACS Style

    Ruixuan Dong. Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis. Appl. Comput. Math. 2020, 9(4), 130-145. doi: 10.11648/j.acm.20200904.15

    Copy | Download

    AMA Style

    Ruixuan Dong. Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis. Appl Comput Math. 2020;9(4):130-145. doi: 10.11648/j.acm.20200904.15

    Copy | Download

  • @article{10.11648/j.acm.20200904.15,
      author = {Ruixuan Dong},
      title = {Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis},
      journal = {Applied and Computational Mathematics},
      volume = {9},
      number = {4},
      pages = {130-145},
      doi = {10.11648/j.acm.20200904.15},
      url = {https://doi.org/10.11648/j.acm.20200904.15},
      eprint = {https://article.sciencepublishinggroup.com/pdf/10.11648.j.acm.20200904.15},
      abstract = {In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.},
     year = {2020}
    }
    

    Copy | Download

  • TY  - JOUR
    T1  - Explore the Characteristics of Age, BMI and Blood Composition of Breast Cancer Patients Based on Multivariate Statistical Analysis
    AU  - Ruixuan Dong
    Y1  - 2020/08/22
    PY  - 2020
    N1  - https://doi.org/10.11648/j.acm.20200904.15
    DO  - 10.11648/j.acm.20200904.15
    T2  - Applied and Computational Mathematics
    JF  - Applied and Computational Mathematics
    JO  - Applied and Computational Mathematics
    SP  - 130
    EP  - 145
    PB  - Science Publishing Group
    SN  - 2328-5613
    UR  - https://doi.org/10.11648/j.acm.20200904.15
    AB  - In this paper, through a series of analysis and testing of breast cancer detection data, the statistical rules of multiple objects and multiple indicators are analyzed in the case of their correlation. First of all, univariate diagnosis and multivariate diagnosis were performed on the data. Among them, when studying the correlation between variables, it was found that HOMA had a clear linear positive correlation with insulin content in blood. It is worth noting that some patients with breast cancer show a high degree of insulin resistance and blood insulin content, which is a feature not found in samples without breast cancer. Then, through single factor analysis of variance, we believe that there were significant differences in blood test conditions, ages, and BMI indicators of samples of different health conditions. Next, the principal component analysis was used to reduce the dimension of the data. In this study, the differences in age, BMI, and blood component content between the two groups with different health conditions can be summarized by these two independent factors. Among them, the absolute value of the MCP-1 (monocyte chemoattractant protein 1) coefficient in the main component 1 is large, reflecting the characteristics of the blood component of the sample; the load values of glucose and leptin in the main component 2 are large, reflecting similar results. Then, assuming the use of m = 3 factor model and the use of maximum likelihood method and principal component method, the original data and factor rotation data are re-analyzed, so that the variables are reduced to 3 factors for analysis. Among them, the maximum likelihood method is used to estimate the factor rotation data. The first factor reflects the insulin resistance factor attributed to insulin and HOMA indicators, and the second factor reflects the body fat and thin factor attributed to BMI and leptin. The third factor reflects the glucose content in the blood. Finally, by setting different misjudgment costs for discriminant analysis, the obtained APER is 0.1638 and EAER is 0.1872. Among them, the probability of discriminating patients with breast cancer from not having breast cancer is 0.09375, which is a low rate of misjudgment and also means the model established in this paper is efficient.
    VL  - 9
    IS  - 4
    ER  - 

    Copy | Download

Author Information
  • Department of Statistic, East China Normal University, Shanghai, China

  • Sections