Valores anômalos e dados faltantes em estudos clínicos e experimentais

Hélio Amante Miot

doi:10.1590/1677-5449.190004

Review Article

Valores anômalos e dados faltantes em estudos clínicos e experimentais

Anomalous values and missing data in clinical and experimental studies

Hélio Amante Miot

http://dx.doi.org/10.1590/1677-5449.190004 J Vasc Bras, vol.18, e20190004, 2019

PDF

SciELO

Downloads: 0

Resumo

Resumo: Durante a análise dos dados de uma pesquisa científica, é habitual deparar-se com valores anômalos ou dados faltantes. Valores anômalos podem ser resultado de erros de registro, de digitação, de aferição instrumental, ou configurarem verdadeiros outliers. Nesta revisão, são discutidos conceitos, exemplos e formas de identificar e de lidar com tais contingências. No caso de dados faltantes, discutem-se técnicas de imputação dos valores para evitar a exclusão do sujeito da pesquisa, caso não seja possível recuperar a informação das fichas de registro ou reabordar o participante.

Palavras-chave

análise de dados, base de dados, discrepância, imputação múltipla

Abstract

Abstract: During analysis of scientific research data, it is customary to encounter anomalous values or missing data. Anomalous values can be the result of errors of recording, typing, measurement by instruments, or may be true outliers. This review discusses concepts, examples and methods for identifying and dealing with such contingencies. In the case of missing data, techniques for imputation of the values are discussed in, order to avoid exclusion of the research subject, if it is not possible to retrieve information from registration forms or to re-address the participant.

Keywords

data analysis, database, outlier, multiple imputation

References

Kwak SK, Kim JH. Statistical data preparation: management of missing values and outliers. Korean J Anesthesiol. 2017;70(4):407-11. http://dx.doi.org/10.4097/kjae.2017.70.4.407. PMid:28794835.

Norman GR, Streiner DL. Biostatistics. The bare essentials. 4th ed. Shelton: People's Medical Publishing House; 2014.

Miot HA. Agreement analysis in clinical and experimental trials. J Vasc Bras. 2016;15:89-92. http://dx.doi.org/10.1590/1677-5449.004216. PMid:29930571.

Miot HA. Assessing normality of data in clinical and experimental trials. J Vasc Bras. 2017;16:88-91. http://dx.doi.org/10.1590/1677-5449.041117. PMid:29930631.

de Cheveigné A, Arzounian D. Robust detrending, rereferencing, outlier detection, and inpainting for multichannel data. Neuroimage. 2018;172:903-12. http://dx.doi.org/10.1016/j.neuroimage.2018.01.035. PMid:29448077.

Penny KI, Jolliffe IT. Multivariate outlier detection applied to multiply imputed laboratory data. Stat Med. 1999;18:1879-95.

Ramsay T, Elkum N. A comparison of four different methods for outlier detection in bioequivalence studies. J Biopharm Stat. 2005;15(1):43-52. http://dx.doi.org/10.1081/BIP-200040815. PMid:15702604.

Abellana Sangra R, Farran Codina A. The identification, impact and management of missing values and outlier data in nutritional epidemiology. Nutr Hosp. 2015;31(Suppl 3):189-95. PMid:25719786.

Shete S, Beasley TM, Etzel CJ, et al. Effect of winsorization on power and type 1 error of variance components and related methods of QTL detection. Behav Genet. 2004;34(2):153-9. http://dx.doi.org/10.1023/B:BEGE.0000013729.26354.da. PMid:14755180.

Ramalle-Gomara E, Andres De Llano JM. Use of robust methods in inferential statistics. Aten Primaria. 2003;32(3):177-82. PMid:12975106.

Evans K, Love T, Thurston SW. Outlier identification in model-based cluster analysis. J Classif. 2015;32(1):63-84. http://dx.doi.org/10.1007/s00357-015-9171-5. PMid:26806993.

Wilcox RR. Robust ANCOVA using a smoother with bootstrap bagging. Br J Math Stat Psychol. 2009;62(Pt 2):427-37. http://dx.doi.org/10.1348/000711008X325300. PMid:18652737.

O’Hagan A, Stevens JW. Assessing and comparing costs: how robust are the bootstrap and methods based on asymptotic normality? Health Econ. 2003;12(1):33-49. http://dx.doi.org/10.1002/hec.699. PMid:12483759.

Jiang X, Guo X, Zhang N, Wang B, Zhang B. Robust multivariate nonparametric tests for detection of two-sample location shift in clinical trials. PLoS One. 2018;13(4):e0195894. http://dx.doi.org/10.1371/journal.pone.0195894. PMid:29672555.

Cleophas TJ. Clinical trials: robust tests are wonderful for imperfect data. Am J Ther. 2015;22(1):e1-5. http://dx.doi.org/10.1097/MJT.0b013e31824c3ee1. PMid:23896742.

Wagstaff DA, Elek E, Kulis S, Marsiglia F. Using a nonparametric bootstrap to obtain a confidence interval for Pearson’s r with cluster randomized data: a case study. J Prim Prev. 2009;30(5):497-512. http://dx.doi.org/10.1007/s10935-009-0191-y. PMid:19685290.

Rascati KL, Smith MJ, Neilands T. Dealing with skewed data: an example using asthma-related costs of medicaid clients. Clin Ther. 2001;23(3):481-98. http://dx.doi.org/10.1016/S0149-2918(01)80052-7. PMid:11318082.

Vickers AJ, Altman DG. Statistics notes: missing outcomes in randomised trials. BMJ. 2013;346:1-4. http://dx.doi.org/10.1136/bmj.f3438. PMid:23744649.

Altman DG, Bland JM. Missing data. BMJ. 2007;334(7590):424. http://dx.doi.org/10.1136/bmj.38977.682025.2C. PMid:17322261.

Miot HA, Medeiros LM, Siqueira CRS, et al. Association between coronary artery disease and the diagonal earlobe and preauricular creases in men. An Bras Dermatol. 2006;81:29-33. http://dx.doi.org/10.1590/S0365-05962006000100003.

Sterne JA, White IR, Carlin JB, et al. Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls. BMJ. 2009;338:b2393. http://dx.doi.org/10.1136/bmj.b2393. PMid:19564179.

Little RJ. Regression with missing X’s: A review. J Am Stat Assoc. 1992;87:1227-37.

Pedersen AB, Mikkelsen EM, Cronin-Fenton D, et al. Missing data and multiple imputation in clinical epidemiological research. Clin Epidemiol. 2017;9:157-66. http://dx.doi.org/10.2147/CLEP.S129785. PMid:28352203.

Enders CK. Multiple imputation as a flexible tool for missing data handling in clinical research. Behav Res Ther. 2017;98:4-18. http://dx.doi.org/10.1016/j.brat.2016.11.008. PMid:27890222.

Stanimirova I, Walczak B. Classification of data with missing elements and outliers. Talanta. 2008;76(3):602-9. http://dx.doi.org/10.1016/j.talanta.2008.03.049. PMid:18585327.

Mackinnon A. The use and reporting of multiple imputation in medical research - a review. J Intern Med. 2010;268(6):586-93. http://dx.doi.org/10.1111/j.1365-2796.2010.02274.x. PMid:20831627.

Harel O, Mitchell EM, Perkins NJ, et al. Multiple Imputation for Incomplete Data in Epidemiologic Studies. Am J Epidemiol. 2018;187(3):576-84. http://dx.doi.org/10.1093/aje/kwx349. PMid:29165547.

Enders CK. Multiple imputation as a flexible tool for missing data handling in clinical research. Behav Res Ther. 2017;98:4-18. http://dx.doi.org/10.1016/j.brat.2016.11.008. PMid:27890222.

Nunes LN, Klück MM, Fachel JMG. Multiple imputations for missing data: a simulation with epidemiological data. Cad Saude Publica. 2009;25(2):268-78. http://dx.doi.org/10.1590/S0102-311X2009000200005. PMid:19219234.

Miot HA. Correlation analysis in clinical and experimental studies. J Vasc Bras. 2018;17(4):275-9. http://dx.doi.org/10.1590/1677-5449.174118. PMid:30787944.

Sullivan TR, White IR, Salter AB, Ryan P, Lee KJ. Should multiple imputation be the method of choice for handling missing data in randomized trials? Stat Methods Med Res. 2018;27(9):2610-26. http://dx.doi.org/10.1177/0962280216683570. PMid:28034175.

Gades NM, Jacobson DJ, McGree ME, et al. Dropout in a longitudinal, cohort study of urologic disease in community men. BMC Med Res Methodol. 2006;6(1):58. http://dx.doi.org/10.1186/1471-2288-6-58. PMid:17169156.

Curran D, Molenberghs G, Aaronson NK, Fossa SD, Sylvester RJ. Analysing longitudinal continuous quality of life data with dropout. Stat Methods Med Res. 2002;11(1):5-23. http://dx.doi.org/10.1191/0962280202sm270ra. PMid:11923994.

Cheng J, Edwards LJ, Maldonado-Molina MM, Komro KA, Muller KE. Real longitudinal data analysis for real people: building a good enough mixed model. Stat Med. 2010;29(4):504-20. PMid:20013937.

Dziura JD, Post LA, Zhao Q, Fu Z, Peduzzi P. Strategies for dealing with missing data in clinical trials: from design to analysis. Yale J Biol Med. 2013;86(3):343-58. PMid:24058309.

Moreno-Betancur M, Chavance M. Sensitivity analysis of incomplete longitudinal data departing from the missing at random assumption: Methodology and application in a clinical trial with drop-outs. Stat Methods Med Res. 2016;25(4):1471-89. http://dx.doi.org/10.1177/0962280213490014. PMid:23698867.

Rombach I, Jenkinson C, Gray AM, Murray DW, Rivero-Arias O. Comparison of statistical approaches for analyzing incomplete longitudinal patient-reported outcome data in randomized controlled trials. Patient Relat Outcome Meas. 2018;9:197-209. http://dx.doi.org/10.2147/PROM.S147790. PMid:29950913.

Garcia TP, Marder K. Statistical Approaches to Longitudinal Data Analysis in Neurodegenerative Diseases: Huntington’s Disease as a Model. Curr Neurol Neurosci Rep. 2017;17(2):14. http://dx.doi.org/10.1007/s11910-017-0723-4. PMid:28229396.

Edwards LJ. Modern statistical techniques for the analysis of longitudinal data in biomedical research. Pediatr Pulmonol. 2000;30(4):330-44. http://dx.doi.org/10.1002/1099-0496(200010)30:4<330::AID-PPUL10>3.0.CO;2-D. PMid:11015135.

Miot HA. Survival analysis in clinical and experimental studies. J Vasc Bras. 2017;16:267-9. http://dx.doi.org/10.1590/1677-5449.001604. PMid:29930659.

Little R, Kang S. Intention-to-treat analysis with treatment discontinuation and missing data in clinical trials. Stat Med. 2015;34(16):2381-90. http://dx.doi.org/10.1002/sim.6352. PMid:25363683.

White IR, Horton NJ, Carpenter J, Pocock SJ. Strategy for intention to treat analysis in randomised trials with missing outcome data. BMJ. 2011;342:1-9. http://dx.doi.org/10.1136/bmj.d40. PMid:21300711.

Streiner D, Geddes J. Intention to treat analysis in clinical trials when there are missing data. Evid Based Ment Health. 2001;4(3):70-1. http://dx.doi.org/10.1136/ebmh.4.3.70. PMid:12004740.

Bagatin E, Miot HA. How to design and write a clinical research protocol in Cosmetic Dermatology. An Bras Dermatol. 2013;88(1):69-75. http://dx.doi.org/10.1590/S0365-05962013000100008. PMid:23539006.

Resseguier N, Giorgi R, Paoletti X. Sensitivity analysis when data are missing not-at-random. Epidemiology. 2011;22(2):282. http://dx.doi.org/10.1097/EDE.0b013e318209dec7. PMid:21293212.

Yamaguchi Y, Misumi T, Maruo K. A comparison of multiple imputation methods for incomplete longitudinal binary data. J Biopharm Stat. 2018;28(4):645-67. http://dx.doi.org/10.1080/10543406.2017.1372772. PMid:28886277.

Wen L, Terrera GM, Seaman SR. Methods for handling longitudinal outcome processes truncated by dropout and death. Biostatistics. 2018;19(4):407-25. http://dx.doi.org/10.1093/biostatistics/kxx045. PMid:29028922.

Spratt M, Carpenter J, Sterne JA, et al. Strategies for multiple imputation in longitudinal studies. Am J Epidemiol. 2010;172(4):478-87. http://dx.doi.org/10.1093/aje/kwq137. PMid:20616200.

Ferretti F, Saltelli A, Tarantola S. Trends in sensitivity analysis practice in the last decade. Sci Total Environ. 2016;568:666-70. http://dx.doi.org/10.1016/j.scitotenv.2016.02.133. PMid:26934843.

Tseng CH, Elashoff R, Li N, Li G. Longitudinal data analysis with non-ignorable missing data. Stat Methods Med Res. 2016;25(1):205-20. http://dx.doi.org/10.1177/0962280212448721. PMid:22637472.

Sociedade Brasileira de Angiologia e Cirurgia Vascular (SBACV)"> Sociedade Brasileira de Angiologia e Cirurgia Vascular (SBACV)">

Valores anômalos e dados faltantes em estudos clínicos e experimentais

Anomalous values and missing data in clinical and experimental studies

Hélio Amante Miot

Resumo

Palavras-chave

Abstract

Keywords

References

Links

Share

J Vasc Bras