Volume 2021 (8),
Article ID 3789521,
Innovations in Artificial Intelligence, Machine Learning and Intelligent Systems: 37AI 2021
Abstract
Recently, open data managing has been increasingly used. One of the various techniques of open data processing is regression analysis. However, it is crucial to prepare the data for analysis properly. The article aims to diagnose and try to deal with the most common problems researchers encounter while preparing open data for analysis using the regression multiply model. In the article, using the inquiry-based learning method, exemplary open data taken from Eurostat were processed. Issues such as missing values, outliers, and the comparability of records were discussed during data collection and analysis. A crucial issue referring to open data was its accessibility. All calculations were performed in the R environment. Multiple linear regression analysis (using the backward selection method) was performed using the least-squares method. The estimated model was prooved with the verification tests.
Keywords: Open Data, Outliers, Missing Values, Regression Analysis