A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis

A carregar...
Miniatura
Data
2024-11-12
Autores
Cabrera-Sánchez, Juan Francisco
Pereira, Ricardo Cardoso
Abreu, Pedro Henriques
Silva-Ramírez, Esther Lydia
Título da revista
ISSN da revista
Título do Volume
Editora
IEEE - Institute of Electrical and Electronics Engineers
Resumo
Progressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stage is the presence of missing values. Understanding the reason why missingness occurs helps to select data imputation methods that are more adequate to complete these missing values. Missing at Random synthetic generation presents challenges such as achieving extreme missingness rates and preserving the consistency of the mechanism. To address these shortcomings, three new methods that generate synthetic missingness under the Missing at Random mechanism are proposed in this work and compared to a baseline model. This comparison considers a benchmark covering 33 data sets and five missingness rates (10%,20%,40%,60%,80%). Seven data imputation methods are compared to evaluate the proposals, ranging from traditional methods to deep learning methods. The results demonstrate that the proposals are aligned with the baseline method in terms of the performance and ranking of data imputation methods. Thus, three new feasible and consistent alternatives for synthetic missingness generation under Missing at Random are presented.
Descrição
Palavras-chave
Citação
J. F. Cabrera-Sánchez, R. Cardoso Pereira, P. Henriques Abreu and E. L. Silva-Ramírez, "A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis," in IEEE Access, vol. 12, pp. 162399-162411, 2024, doi: 10.1109/ACCESS.2024.3490396