Introduction
Recently, all public services work on top of electronic platforms. Most of these platforms have a front-end (where available services are available to the general target audience), back-office (where services are configured and managed), and one or more databases. Information stored inside the databases is intensively used to provide all kinds of services to general citizens (Viscusi et al., 2010). Therefore, the acquisition and implementation of a database management system (DBMS) should be carefully planned, since it will have a significant real impact on the information availability. As the internet evolves, and consequently the cloud computing, the databases must cope with the large volumes of data efficiently, requiring good performance for both read and write operations (Han et al., 2011). Abadi, Boncz (Abadi et al., 2013) mention, in their study, that the performance of the database systems is directly related with the efficiency of the system to store data, and also its ability to move them to the CPU registers for processing. In all sectors, the databases have a critical role, especially in business, electronic commerce, engineering, health-care, genetics, education, library science and others (Elmasri, 2017). Sharing the before-mentioned point of view, Abramova, Bernardino (Abramova et al., 2014) consider the databases the organization’s vital point. The selection and decision of a database must follow strict approaches, methodologies and rules, which abide organizational growth without harming performance. Even though when data is structured, its size is small, and databases operate relatively well. The increase of data volume requires performance tuning and maintenance by specialized technicians, for instance, normalization and indexation (Sz´arnyas et al., 2018).
This paper presents a study of the most used databases in the Portuguese public town halls, taking into account the main reasons to select the adopted databases. Conclusions of this study consisted of several quizzes sent to the IT managers responsible for the IT town halls, aiming to find more about how and why they used their databases. Presented results allow to study the quality factors and pondered reasons to acquire a database management system, in the scope of the Portuguese government. Additionally, it allows concluding what the most adopted database management systems are.
This document contains five sections. Section 2 presents the state of the art review. Section 3 describes how the quiz was performed, organized and distributed to obtain answers regarding the motivations to select and use specific software databases. Section 4 shows the obtained results. Finally, in Section 5, conclusions are drawn, followed by the introduction of future work guidelines.
State of the Art
Information systems comprehend several organizational activities which concern data; these activities include: acquiring data; storage; recovering; manipulation; and transmitting information to other users (Lopes et al., 2005). Almost all information systems include data storage, which is done internally using databases, relational or non-relational. These databases present themselves as essential for managing modern society’s information (Abramova et al., 2014).
According to Silberschatz, Korth (Silberschatz et al., 1997), database management systems differ significantly from storage systems and also from programming environments. A DBMS consists of a collection of programs and services that allow the user to insert, update, delete and query the stored data, therefore, making it easier to maintain and access the data (Sharma et al., 2017). It is essential to underline that relational databases were created as a technology to store structured data as tables, allowing to perform queries using SQL (Structured Query Language) (Fayech & Ounalli, 2012). For the authors, Viscusi, Batini (Viscusi et al., 2010), each DBMS represents a logical data schema, divided into classes that are related. At the same time, the authors mention that relational databases within governmental institutions are constantly in use not only in the central government but also at local municipalities, electronic stores, sensors, city halls and mobile devices. For all cases where databases fit, vast amounts of data are being produced with different data-rates (Fotache & Strimbei, 2015). As a proof of the importance of databases, according to a study in 1989, the number of estimated databases in the world was already five million (Frawley et al., 1992). The most known and utilized database model is the relational, developed in 1969 by E. F. Codd, who was working as a researcher at IBM. The model assumes a set of theories and predicates logic (Litwin, 2003). Despite the familiarization with relational databases, in the last decades, a large number of NoSQL databases were adopted in many large organizations (Okman et al., 2011). According to (Pore & Pawar, 2015), NoSQL databases were developed around 2000 to deal with the limitations of traditional SQL databases, especially concerning scalability, replication and the storage of non-structured data. For the author Vaish (Vaish, 2013), NoSQL is the terminology for the database class that does not follow the same principles of relational databases, concerning ACID (Atomicity, Consistency, Isolation and Durability) properties. NoSQL is equipped to manage with scalability vs. speed for enterprises such as Google, Facebook, Yahoo and Twitter. When comparing NoSQL with traditional relational databases, NoSQL databases are projected to store data with simple structures, or similar to the object-oriented programming languages (Strauch et al., 2011). In studies carried out by Bassil (Bassil, 2012), the author tested and compared between the database engines of SQL Server, Oracle, IBM DB2, MySQL and the MS Access. In this comparative study, the author concludes that IBM DB2 shows a better performance, despite consuming more memory. At the same time, MS Access uses less CPU. These conclusions are supported by other authors, such as Sing, Sharma (Singh et al., 2015) who also tested query performance. The 2019 ranking “DB-engines” identifies 330 database engines. According to this ranking, in 2018 and 2019, the six most popular databases are Oracle, MySQL, SQL Server, PostgreSQL, MongoDB, and IBM DB2. Note that, among these top 6 databases, there are six relational databases, and only one non-relational (MongoDB). The selection of a database engine has direct implications on the daily performance and organization of both enterprises and research (Wang & Santos, 2011). The authors Abbasi, S´a (Abbasi et al., 2018), in the context of smart-cities, consider the performance of database engines, test and suggest the use of Cassandra, which is a NoSQL oriented engine. Other authors, such as Liu and Luo (Liu & Luo 2010), use PostgreSQL in their project, and eGovMon in the field of eGovernment, justifying the choice with the resources that PostgreSQL provides warehousing data. Then authors Sourouni, Lampathaki (Sourouni et al., 2008) use SQL Server to implement ontology oriented to represent concept relations in the domain of eGovernment and the development of associated services. Authors claim that the decision relies on performance and scalability.
In turn, for Signore, Chesi (Signore et al., 2005), based on the eGovernment project citepl, Oracle DBMS was the choice. Baraka and Madoukh (Baraka & Madoukh, 2012) do not mention which DBMS engine is used to improve the Palestinian eGovernment, but suggest that the SOA oriented model surpasses the actual in-place model in performance, interoperability, flexibility, and management. Tsai, Choi (Tsai et al. 2009), for the implementation of a GIS web based project, decide to use Oracle or SQL Server for licensing reasons, aiming to reuse the organization DBMS’s licenses and optimize costs.
The decision on which database to use is not consensual; it is not possible to say which is good or bad without analyzing the context first. The DBMS should be chosen very carefully, due to its cost and impact on the entire surrounding systems (Sibley, 1985). Based on all factors, this paper presents a study that allows assessing the most used DBMSs in the Portuguese government, taking into account the specificity of the city town halls.
Survey/Quiz
Based on a survey, the central objective of this study is to identify the most used DBMSs by the Portuguese city town halls. For Marconi and Lakatos (Marconi & Lakatos, 2004), a questionnaire is an instrument for collecting data with a set of ordered questions. These authors point out the negative aspect of having the questions and answers in a written format, without the presence of the interviewer. According to Gil (Gil, 2008), the survey is a technique based on a set of cross-questions to an individual with the aim to gather information regarding knowledge, beliefs, feelings, values, interests, expectations, aspirations, fears, behaviours, and others.
This study comprehends 308 Portuguese governmental city halls which were selected, aiming to maximize the number of answers and obtain a clear picture of the reality. Table 1 shows some statistical values collected from the obtained results regarding how many workers exist inside the inquired organizations.
Table 1: How many workers exist in the organization
The questionnaire was sent by email to all Portuguese city halls On May 15, 2019, directed to the encharged person of the department of Information Systems, and the reply to the questionnaire had to be sent before 10th of June, 2019.
The survey questions are designed using the application “survey do Zoho”. Based on the state-of-the-art literature, the questions are designed to the users.
- The first page contains key sociodemographic questions. After validating that the inquiry belongs to a city hall, it is essential to understand the number of inhabitants that influence the: amount of provided services; technological resources; and the number of IT workers.
- The next question is to understand the number of IT workers in the organizational IT department. The dimension of the IT department directly relates to the capacity to define, choose and identify work policies and data selection, and where to store it.
- The other question aims to gather the generic characteristics of DBMSs that participants value. By presenting a set of possible features to the user, participants can answer by selecting the most relevant ones. Table 2 resumes the list of state-of-the-art DBMS most relevant features. Additionally, an open field, “others”, is also available so that participants can include new features.
After the generic database characteristics are selected, participants must identify which DBMS they have in the governmental institution where they are associated. The list with possible databases to select is set, presenting the thirty most used databases present in the db-engines.com (2018). Anyway, to include all
Table 2: Database characteristics
database engines, the user can add additional database engines. According to the DBMSs identified by the user as being the most used in their organization, two further questions are performed individually.
- First (Q1): “What are the characteristics did you take into account to select/install/use the DBMS in the organization?”
- Second (Q2): “How do you evaluate the DBMS that is currently in use in the organization?”
Q1 reuses the listed databases shown in Table 2, with two extra options: “Used by a web-based software”; and “Used by an Enterprise Resource Planning (ERP) software”. These options exist to include specific database software requirements, such as ERP software’s.
Q2 evaluates individually the installation and utilization of each DBMS in use. This evaluation uses the Likert scale with five positions. The considered dimensions for the Likert range include technical assistance; global evaluation; price; documentation; usage easiness; maintenance; performance; and security. Additionally, it is also provided with an open option for users to include new dimensions and observations.
After acquiring all the necessary data for the proposed study, with SPSS software, data was imported for analysis.
Results
The obtained data for this result analysis, results from a quiz to 308 Portuguese town halls, from which, 104 (33, 7%) replied, allowing a trust level of 95% with an error of 8%. This sample is considered a valid reality representation of the DBMSs utilization in the Portuguese central government institutions. From the sample, 11% of the participants are graduates, 18% have certifications, 3% have had higher education, 19% have a BSc, 15% have an MSc, and 3% have a PhD. Note that IT departments are gradually hiring better professionals with more academic formation. Moreover, most of the regional city councils have an average of 3 technicians. Finally, results show that larger cities have more significant IT departments. However, most of the participants have less than 50 thousand inhabitants in their city hall (Table 3).
Regarding the answers to the question “what are the most valued characteristics in a DBMS”, 67% of the participants selected “recovery in case of disaster/crash”. This identifies the important role that information plays in an organization.
Table 3: Number of answers by city hall size
The second most relevant characteristic is the “technical support” with 58.3% of the answers. This choice demonstrates that public services consider the availability of their services as important. At the same time, because public services do not deal directly with the problems, having a good support service available in case of trouble is a critical factor.
Understandably, the performance is the third most relevant characteristic with 57.5% (the response time is directly related to users’ satisfaction). Compatibility and scalability got 50% of the votes. The price is the sixth identified characteristic, with 49% of the votes. Pleasantly, it is good to know that public institutions consider the cost of the tools. On the other side of the choices, characteristics such as NoSQL oriented, Global ranking, and memory oriented are the less selected options, with 0.9%, 1.8% and 1.8%, respectively.
Table 4 shows all the results, with a variance of 87.72, and a standard deviation of 9.26.
Table 5 shows the list of the most used databases. The gathered results present a variance of 39.67 and a standard deviation of 6.3. In general, the preferred database engine in this study is the SQL Server Express, which is present in 68.2% of the Portuguese town halls. Other results metrics show that participants evaluate the SQL Server Express with 3.51 points in a total of 5. It is important to highlight that the “price” is the characteristic with more relevance, showing an average of 4.17 points in 5.The characteristics of performance and security have an average evaluation of 3.4 points in 5. Performance and Technical Support have an evaluation of 3.5 points in 5.. Characteristics, easy to use and documentation are evaluated with 3.7 points in 5. In general, the features considered more relevant in this study when choosing the SQL Server Express, are the price, easy to install, and easy to use and integrate with an ERP software.
This study’s second most used DBMS is IBM Informix, which is present in 65.4% of the inquired town halls. Technical assistance and security have 4.01 points in 5. Informix global evaluation is 3.94 points in 5, followed by the performance with 3.85 points in 5. Licensing price, easiness to use, maintenance, and performance stand between 3 and 4 points (in a total of 5). It is worth noting that most Informix users select this database based on ERP software that is widely used among the town halls that require this DBMS. As already mentioned before, technical assistance and licensing are essential factors when pondering a DBMS section.
The third DBMS in the list is MySQL, with 61% of the votes. Regarding the price for licensing, it has an evaluation of 4.14 points in 5. Easiness to use, maintenance and performance have an average evaluation of 3.9 points in 5. Despite having a proper punctuation, the worst detected characteristic with 3.22 points in 5, is the technical assistance. The fourth DBMS in the list is the SQL Server (commercial version) with 56.7%.
Table 4: Most valued characteristics in a DBMS
Table 5: Most used databases in Portuguese town halls in 2018
of the votes. Although characteristics are very similar to the express version, the price has worse punctuation. On the other hand, technical assistance improves. The identified requirement to use this DBMS is the dependability of the web-based software.
Surprisingly, MS access is the fifth-ranked database, with 44.2% of participants’ preference. This position is due to the easiness to install and use. At last, in the fifth position is PostgreSQL with 37.5% of the votes.
The less used DBMS are ActionVector, ElasticSearch, Redis, SQLanyWhere, PostGIS and Disan. These last engines are used only by one town hall (Table 5 resumes the findings).
Tables 6, 7 and 8 show the top three, most used database engines (SQL Server Express, Informix, and MySQL), and how the inquired organizations evaluated each relevant characteristic.
Conclusions
This research work reflects an initial study that intends to identify the most used DBMS by the Portuguese government in the city town halls. Simultaneously, it also recognized the reasons which lead to such selection/implementation, as well as the satisfaction level of each participant in the study.
Table 6: Inside the organisation, how is SQL Server Express evaluated (1 unhappy – 5 very happy)
Inside the organisation, how is SQL Server evaluated (1 unhappy – 5 very happy)
Table 7: Inside the organisation, how is Informix evaluated (1 unhappy – 5 very happy)
Inside the organisation, how is Informix evaluated (1 unhappy – 5 very happy)
In this study, based on a set of questions sent to 308 Portuguese town halls by email, 104 valid answers were collected, from which, the ranking in this study results. Each database system is evaluated regarding satisfaction, technical assistance, globally, price, documentation, easiness to use, maintenance, performance, security, and performance. In the same time, the characteristics that justify the inquired participants were also collected.
The raking of the most used DBMSs present in Table 5 show that the DBMS with more votes are, SQL Server Express, Informix, MySQL, SQL Server (commercial version), Access, and PostgreSQL. The less-used databases, last in rank, are DiSAM, PostGIS, Redis, and ElastichSearch. In general, it is possible to conclude that the database administrators across the studied institutions consider Technical Assistance, Security and Performance, as critical aspects.
Additionally, it is also visible that because of software requirements such as ERP’s, specific DBMSs are required to be installed, obligating the municipalities to use/integrate such databases; Informix and SQL Server are examples of such cases.
Future Work
For future work, this study is going to be extended to understand the reasons that make users choose and acquire DBMSs. In line of this study’s fulfillment, the quiz will be repeated annually, to understand the evolution of the Portuguese local government. Moreover, based on the gathered information, this study aims to help IT departments in the acquisition and selection of a DBMS.
Table 8: Inside the organisation, how is MySQL evaluated (1 unhappy – 5 very happy)
Inside the organisation, how is MySQL evaluated (1 unhappy – 5 very happy)
Acknowledgements
“This work is funded by National Funds through the FCT – Foundation for Science and Technology, I.P., within the scope of the project Ref UIDB/05583/2020.
Furthermore, the authors of this paper would like to thank the Research Centre in Digital Services
(CISeD), and the Polytechnic of Viseu for their support”.
(adsbygoogle = window.adsbygoogle || []).push({});
References
- Abadi, D., Boncz, P., Harizopoulos, S., Idreos, S., Madden, S. et al. (2013), ‘The design and implementation of modern column-oriented database systems’, Foundations and Trends® in Databases 5(3), 197–280.
- Abbasi, M., S´a, F., Albuquerque, D., Wanzeller, C., Caldeira, F., Tom´e, P., Furtado, P. & Martins, P. (2018), Nosql scalability performance evaluation over cassandra, in ‘International Conference Europe Middle East & North Africa Information Systems and Technologies to Support Learning’, Springer, pp. 512–520.
- Abramova, V., Bernardino, J. & Furtado, P. (2014), ‘Which nosql database? a performance overview’, Open Journal of Databases (OJDB) 1(2), 17–24.
- Baraka, R. S. & Madoukh, S. M. (2012), A conceptual soa-based framework for e-government central database, in ‘2012 International Conference on Computer, Information and Telecommunication Systems (CITS)’, IEEE, pp. 1–5.
- Bassil, Y. (2012), ‘A comparative study on the performance of the top dbms systems’, arXiv preprint arXiv:1205.2889 .
- Elmasri, R. (2017), Fundamentals of database systems.
- Fayech, I. & Ounalli, H. (2012), ‘Towards a flexible database interrogation’, International Journal of Database Management Systems 4(3), 21.
- Fotache, M. & Strimbei, C. (2015), ‘Sql and data analysis. some implications for data analysits and higher education’, Procedia Economics and Finance 20, 243–251.
- Frawley, W. J., Piatetsky-Shapiro, G. & Matheus, C. J. (1992), ‘Knowledge discovery in databases: An overview’, AI magazine 13(3), 57–57.
- Gil, A. C. (2008), M´etodos e t´ecnicas de pesquisa social, 6. ed. Ediitora Atlas SA.
- Han, J., Haihong, E., Le, G. & Du, J. (2011), Survey on nosql database, in ‘2011 6th international conference on pervasive computing and applications’, IEEE, pp. 363–366.
- Litwin, P. (2003), Fundamentals of relational database design, in ‘NYU Symposium on Data Base Design Techniques’.
- Liu, X. & Luo, X. (2010), ‘A data warehouse solution for e-government’, International Journal of Research and Reviews in Applied Sciences 4(1), 101–105.
- Lopes, F. C., Morais, M. P. & Carvalho, A. J. (2005), Desenvolvimento de Sistemas de Informa¸c˜ao: m´etodos e t´ecnicas.
- Marconi, M. d. A. & Lakatos, E. M. (2004), Metodologia cient´ıfica, Vol. 4, Atlas S˜ao Paulo.
- Okman, L., Gal-Oz, N., Gonen, Y., Gudes, E. & Abramov, J. (2011), Security issues in nosql databases, in ‘2011IEEE 10th International Conference on Trust, Security and Privacy in Computing and Communications’, IEEE, pp. 541–547.
- Pore, S. S. & Pawar, S. B. (2015), ‘Comparative study of sql & nosql databases’, J. Adv. Res. Comput. Eng. Technol 4(5).
- Sharma, N. et al. (2017), ‘Overview of the database management system.’, International Journal of Advanced Research in Computer Science 8(4).
- Sibley, E. H. (1985), ‘How to select and evaluate a dbms’, INFORMATION SYSTEM MANAGEMENT 2(2), 40–49.
- Signore, O., Chesi, F. & Pallotti, M. (2005), E-government: challenges and opportunities, in ‘CMG Italy-XIX annual conference’, pp. 7–9.
- Silberschatz, A., Korth, H. F., Sudarshan, S. et al. (1997), Database system concepts, Vol. 4, McGraw-Hill New York.
- Singh, P., Sharma, S. & Kaur, S. (2015), ‘Performance analysis of different dbms systems’, International Journal of Advance Foundation And Research In Science & Engineering 1, 1–9.
- Sourouni, A.-M., Lampathaki, F., Mouzakitis, S., Charalabidis, Y. & Askounis, D. (2008), Paving the way to egovernment transformation: Interoperability registry infrastructure development, in ‘International Conference on Electronic Government’, Springer, pp. 340–351.
- Strauch, C., Sites, U.-L. S. & Kriha, W. (2011), ‘Nosql databases’, Lecture Notes, Stuttgart Media University 20.
- Sz´arnyas, G., Izs´o, B., R´ath, I. & Varr´o, D. (2018), ‘The train benchmark: crosstechnology performance evaluation of continuous model queries’, Software & Systems Modeling 17(4), 1365–1393.
- Tsai, N., Choi, B. & Perry, M. (2009), ‘Improving the process of e-government initiative: An in-depth case study of web-based gis implementation’, Government Information Quarterly 26(2), 368–376.
- Vaish, G. (2013), Getting started with NoSQL, Packt Publishing Ltd.
- Viscusi, G., Batini, C. & Mecella, M. (2010), Information systems for eGovernment: A quality-of-service perspective, Springer Science & Business Media.
- Wang, Z. & Santos, T. (2011), A study on acupressure points online database, in ‘Proceedings of the International Conference on Bioinformatics & Computational Biology (BIOCOMP)’, The Steering Committee of The World Congress in Computer Science, Computer …, p. 1.