A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis

Cabrera-Sánchez, Juan Francisco; Pereira, Ricardo Cardoso; Abreu, Pedro Henriques; Silva-Ramírez, Esther Lydia

A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis

Data

2024-11-12

Autores

Cabrera-Sánchez, Juan Francisco

Pereira, Ricardo Cardoso

Abreu, Pedro Henriques

Silva-Ramírez, Esther Lydia

Editora

IEEE - Institute of Electrical and Electronics Engineers

Resumo

Progressively more advanced and complex models are proposed to address problems related to computer vision, forecasting, Internet of Things, Big Data and so on. However, these disciplines require preprocessing steps to obtain meaningful results. One of the most common problems addressed in this stage is the presence of missing values. Understanding the reason why missingness occurs helps to select data imputation methods that are more adequate to complete these missing values. Missing at Random synthetic generation presents challenges such as achieving extreme missingness rates and preserving the consistency of the mechanism. To address these shortcomings, three new methods that generate synthetic missingness under the Missing at Random mechanism are proposed in this work and compared to a baseline model. This comparison considers a benchmark covering 33 data sets and five missingness rates (10%,20%,40%,60%,80%). Seven data imputation methods are compared to evaluate the proposals, ranging from traditional methods to deep learning methods. The results demonstrate that the proposals are aligned with the baseline method in terms of the performance and ranking of data imputation methods. Thus, three new feasible and consistent alternatives for synthetic missingness generation under Missing at Random are presented.

Citação

J. F. Cabrera-Sánchez, R. Cardoso Pereira, P. Henriques Abreu and E. L. Silva-Ramírez, "A Perspective on the Missing at Random Problem: synthetic generation and benchmark analysis," in IEEE Access, vol. 12, pp. 162399-162411, 2024, doi: 10.1109/ACCESS.2024.3490396