Database Encryption for Balance Between Performance and Security

In an increasingly digital world, information security is a very recurring theme and a growing concern for companies. This involves the protection of data and confidential or non-confidential information of a company, which transit between all its sectors and between the organization and its stakeholders. We can say that the pillars of information security are integrity, confidentiality, and availability. But this constant concern brings an increase in complexity, reflected in the performance of the systems. To study the impact of security on the performance of systems, operations were tested on databases with and without encryption and the average times and occupied space were analyzed. There was a substantial impact on the reading and writing performance of the database when it uses encryption, this impact being directly proportional to the level of security used.

Also, because of its flexibility to choose the encryption method, analyze issues such as security level, study the performance of insertions and queries. Note that also the total storage space was taken into account.
We started by using PGP encryption, which is known to be secure on email systems (Kar et al. 2019) and other messaging platforms (Ilmonen 2020), but in our preliminary tests proved to be inefficient for large volumes of data, so it was discarded. The best solution found was the use of raw encryption with AES (Bhange & Mathur 2019), which, being simpler, proved to be more efficient. This type of encryption is used in several current scenarios, as is the example of smart cities (Farahat et al. 2019). This decision naturally leads to a decrease in security. Even so, there was a decrease in performance compared to tests without encryption.
The data read results, queering, show that, with encryption, the times increase as there is a need for decryption. As expected, with the same queries, the use of primary and foreign keys allowed the reduction of time, but with encryption, the same did not happen for all tested queries. Another analysis that is important to consider is the difference in the space occupied by the data, which was found to increase in scenarios with encryption and keys.
This document is organized into four sections. In Section 2, we perform a study of different forms of database encryption. In Section 3, is explained the experimental scenario. Section 4 is where the case study results are presented. Finally, Section 5 concludes the work, and we present possible future work guidelines.

Related work
In this section, some works carried out in this area are analyzed. These help to identify alternatives and improvements to the current state-of-the-art.

Asymmetric Keys
Since data security is an issue of increasing importance, in this case, study, the use of cryptography algorithms is tested. These algorithms have become a constant concern for improvement, which has led to an increase in complexity involving higher execution times that are reflected in a decrease in application performance. The results in (Boicea et al. 2017) show that the encryption time increases with the number of encrypted characters, but with special attention to RSA where the encryption time varies very little with the increase in the number of characters. It was also analyzed that the algorithms have encryption times of the same order of magnitude, considering the length of the encryption key. The decryption time is dozens of times less than the encryption time for all algorithms and increases with the size of the key, again the RSA with shorter times is highlighted.

AES to IoT
In the IoT scope, security is also a criterion to take into account, after being received, the encrypted data from the IoT sensors are saved in text format. However, suppose the integrity, validity and security of the data are to be guaranteed. In that case, these encryption and decryption operations cause overhead in the data query time, which, in turn, result in additional operational delays and increased consumption of data power in wireless IoT devices.  The works differ in the type of data, in which relational and non-relational BJSON data fields are used.
As for the results in (Kokkonis et al. 2019) for data with BJSON, they showed that the average insertion time is the same for encrypted and unencrypted. How-ever, for queries, an 8% increase in processing time has been noted for encrypted documents compared to plain text documents.

FNR encryption
FNR encryption storage preserves the length and format of the fields. For this, an encrypted database was used in CryptDB using the FNR encryption scheme. This scheme consists of the classification of the data type, which is encrypted in a text with the same size that is again converted into data with the format of the original. To this end, scenarios are tested to explore the feasibility of Format Preserving Encryption (FPE) (Pérez-Resa et al. 2020). Using SQL Aware Encryption, an encryption scheme was defined for each predefined set of SQL operations, so all data items are encrypted so that any operation is possible without the need for decryption.
In comparison to the present scenario, and in addition to all the application and technologies being different, the encryption itself is also different. This allows the preservation of the size and format of the fields in a way that does not require decryption in a set of SQL operations, something that is considered inevitable in most forms of encryption, including those used in the context of this document.
In the experimental results (Chandrashekar et al. 2015), improvements in storage were measured, with a decrease in performance. Because, in terms of storage, they showed a decrease of approximately 50% concerning AES. Although this value always depends on the data of the chosen application, it was found that the size of the data with FNR was always equal to the size in normal text, in contrast to the AES which was always much larger. The performance, of insertion operations, of the FNR when compared to the AES-128 was about seven times higher since the FNR scheme uses several encryption steps internally. Thus, it is concluded that encryption using FNR reduces the size of the database because the size of the data remains. Still, it leads to a decrease in performance in data insertions because it requires several steps to reach the final encrypted data.

Experimental Setup
For the experimental scenario, 10GB of data were generated with the TPC-H. The interaction with the database was carried out through Python using the SQL Alchemy module, so for insertions the Python script that reads each row of data from each table and converts it into an insertion command with 4kB of bulk.
For encryption scenarios, this command changes: for each field, the encryption function is added. In this way, the loading was tested, with and without encryption, to measure the difference in performance and space occupied by the data.
As for the consultations, five queries of different complexities of the TPC-H were chosen (Q1, Q2, Q3, Q4, Q5), where for each scenario they are executed five times, and the average is calculated discarding the maximum and the minimum. These are also executed by a Python script where, in the case of Journal of Information Assurance & Cyber security In the experimental scenario, the module pgcrypto (Odongo & Bukenya 2019) was used, which provides cryptographic functions for PostgreSQL (Juba & Volkov 2019). This module provides several forms of encryption. Initially, the PGP encryption model was tested using symmetric keys. In this model, the password provided is a hash calculated from a String2Key (S2K) algorithm (Shakya & Karna 2019). The data, to be encrypted must undergo a manipulation that includes compression, conversion to UTF-8 and/or conversion of line terminations. Then the data is prefixed with a block of random bytes, a SHA1 hash of the random prefix and the data is attached and placed in a data pack. This model was discarded as soon as data was entered, and times increased exponentially, which became impractical times for large amounts of data. Therefore, raw encryption functions were used, using the AES algorithm, which only performs a cypher on the data and does not have any advanced PGP encryption feature. This model directly uses the user's key as the cypher key and does not provide any data integrity checks. Although it does not deal with text, having to be converted to bytes, this model proved to be more efficient for insertion.

Experimental Results
In this section, the results are presented in the form of charts and their analysis is carried out, so it will be divided into two sub-sections, insertions and queries. In the first, the encryption scenario using PGP, the AES encryption scenario and the same with the use of keys is be presented. For queries, AES encryption is used, with and without keys.

Insertion Performance
The results of the insertions in scenarios with and without encryption are presented to evaluate the performance differences. In 4.1.1 the PGP encryption method is tested. In 4.1.2 and 4.1.3, the raw encryption method is tested with and without AES, and with the use of keys, respectively.

PGP Encryption
Through the chart 1 the insertion times with the PGP model are shown using symmetric keys in comparison to the 5 Journal of Information Assurance & Cyber security As can be seen in figure 1, a high increase began to be noted with only 13.5 MB of data which worsened dramatically by 1.12 GB. These times can be explained by the numerous steps, described above, for encrypting a single row of data. Thus, it is concluded that, despite being good encryption that allows the verification of integrity, it is impractical to use this encryption for large volumes of data.

AES Encryption
Since PGP encryption proved to be impractical for this volume of data, crude encryption functions were used with the use of AES. The chart 2 shows the performance of this encryption compared to the same scenario without encryption.
This encryption model (Figure 2) proved to be more efficient than the previous one, since comparing both for 1.12 GB of data, this one lasts around 11 minutes, while in the previous one lasted 8 hours. Compared to the scenario without encryption, the times remained close to 1.12 GB of data, noticing a dispersion that grew with the increase of the data volume. Concerning total times, without encryption, it took 1:18:34 h, while for this encryption it took 05:23:27 h, about five times longer.
In conclusion, this encryption mode is less secure, as it has no integrity check. On the other hand, it proved to be more efficient than the previous one, already compared to the scenario without encryption, with the increase of the data volume, the performance is degraded.

AES encryption with keys
The previous scenario was repeated, including the use of primary and foreign keys, and the times are illustrated in the chart 3.
About the scenario without encryption with keys, the conclusions are the same as in the previous one, but the increase in time concerning the increase in volume proved to be even more significant, reaching 25:13:21 h for 7.24GB. Thus, the scenario without

Performance in Queries
This section is analyzed the performance of data queries with and without encryption.
In chart 4, the average execution time without keys is shown for each query.
With encryption, all queries have increased execution times, being more visible in some more than others due to the amount of data required, explained by the amount of data that is accessed to perform them, as they need decryption to perform operations, or to process the join operations, as well as the amount of data to perform the final data projection.
The same query scenario was performed for encryption with primary and foreign keys, and the average execution times can be seen in the chart 5. Compared to the same scenario without encryption, the times were higher, as expected. Still, while in scenarios without encryption, the execution times decreased with the use of keys, in scenarios with encryption, the same did not always happen. There were three queries in which the times increased with the use of the keys.

Occupied storage space by the database
In Table 1, the occupied storage space is studied. When comparing the space occupied by the databases with and without keys or indexes, there is a natural increase in the use of encryption, as this, regardless of the type of data, converts into a bit chain with sizes that are mostly larger than the original. The increase was about 3.8 times. For databases with the use of keys, the same difference was noted, with an increase of approximately four times.

7
Journal of Information Assurance & Cyber security

Conclusions and Future Work
With the security level being the most important criterion, an algorithm with fewer security breaches may involve a longer encryption and decryption time, depending on the key size. This leads to significant reductions in performance, additional consumption of resources on the server and increased storage space for encrypted data. Another important aspect is key management because if decryption can only be done with the corresponding encryption key, there is a risk that the administration will lose data due to failure, and the entire database is compromised.
In a search for encryption that would strike a better balance between security and performance, the solution went through the use of raw encryption with the use of AES. Despite being less secure encryption, as it does not have data integrity validation, it proved to be more efficient, but still with performance losses quite accentuated as the data increases.
It is concluded that the loss of performance in data encryption is inevitable. Still, depending on the requirements of the scenario to be implemented, the best balance between the desired security and performance should be sought. In the case presented, this happened when there was a decision to change between PGP and AES encryption.
For future work, it is important to analyze this encryption mode for larger volumes of data, to continue the evaluation of the identified performance degradation with the increase in data. Additionally, it would be interesting to compare the impact of encryption in different databases.