Anna MULARCZYK

Silesian University of Technology, Poland

Abstract

Recently, open data managing has been increasingly used. One of the various techniques of open data processing is regression analysis. However, it is crucial to prepare the data for analysis properly. The article aims to diagnose and try to deal with the most common problems researchers encounter while preparing open data for analysis using the regression multiply model. In the article, using the inquiry-based learning method, exemplary open data taken from Eurostat were processed. Issues such as missing values, outliers, and the comparability of records were discussed during data collection and analysis. A crucial issue referring to open data was its accessibility. All calculations were performed in the R environment. Multiple linear regression analysis (using the backward selection method) was performed using the least-squares method. The estimated model was prooved with the verification tests.

Keywords: Open Data, Outliers, Missing Values, Regression Analysis
Shares