This study compares relational and non-relational (NoSQL) standardized medical information systems. The computational complexity of the response times of querying such database management systems (DBMS) is computed using doubling-sized databases. These results help the discussion of the appropriateness of each database approach to different scenarios and problems.
This research shows a protocol to assess the computational complexity of querying relational and non-relational (NoSQL (not only Structured Query Language)) standardized electronic health record (EHR) medical information database systems (DBMS). It uses a set of three doubling-sized databases, i.e. databases storing 5000, 10,000 and 20,000 realistic standardized EHR extracts, in three different database management systems (DBMS): relational MySQL object-relational mapping (ORM), document-based NoSQL MongoDB, and native extensible markup language (XML) NoSQL eXist.
The average response times to six complexity-increasing queries were computed, and the results showed a linear behavior in the NoSQL cases. In the NoSQL field, MongoDB presents a much flatter linear slope than eXist.
NoSQL systems may also be more appropriate to maintain standardized medical information systems due to the special nature of the updating policies of medical information, which should not affect the consistency and efficiency of the data stored in NoSQL databases.
One limitation of this protocol is the lack of direct results of improved relational systems such as archetype relational mapping (ARM) with the same data. However, the interpolation of doubling-size database results to those presented in the literature and other published results suggests that NoSQL systems might be more appropriate in many specific scenarios and problems to be solved. For example, NoSQL may be appropriate for document-based tasks such as EHR extracts used in clinical practice, or edition and visualization, or situations where the aim is not only to query medical information, but also to restore the EHR in exactly its original form.
NoSQL (Not only SQL) DBMS have recently emerged as an alternative to traditional Relational DBMS (RDMBS). RDBMS have dominated the way data were stored in database systems for decades. Well-studied and understood relational algebra and calculus have guaranteed the efficiency and consistency of RDBMS1. NoSQL systems will not become substitutes for relational systems, but they could behave advantageously in certain scenarios and under several conditions.
Some of these particular scenarios and conditions would occur when designing the database persistence of Electronic Health Record (EHR) systems used to manage and store medical information. In order to be interoperable and sustainable in practice, several international standards such as ISO/EN 13606, openEHR, and HL72,3,4,5 have been used to standardize EHR extracts.
Several standards such as ISO/EN 13606 and openEHR have separated information and knowledge into two different levels of abstraction, represented by the Reference Model (RM) and special data structures called archetypes. This separation is often called the dual model and it permits EHR systems to be semantically interoperable and medical knowledge to evolve without reprogramming the whole EHR system and, consequently, to be maintainable and sustainable in practice6. However, the dual model implemented in standardized EHR systems requires that the organization of information follows a specific structure, and this has profound consequences in the way the database persistence level of the system is designed7.
Object Relational Mapping (ORM)8 is one methodology to implement an EHR system using the relational database paradigm. ORM exhaustively maps the structure of standardized EHR extracts XML (eXtensible Markup Language) files used by the system for a relational database. ORM constructs many relational tables exhaustively following the structure of the standardized EHR extracts XML files. These relational tables are related through many foreign keys and the resulting system may not be very efficient.
Several relational improvements to ORM have been proposed. openEHR's Node+Path9 reduces the number of relational tables by serializing subtrees of the whole extract XML file into BLOBs (binary large objects). However, this simplification causes complex retrieval logic, damaging complex queries. Archetype Relational Mapping (ARM)10 generates a database model driven by archetypes, building a new relational schema based on mappings between archetypes and relational tables. Consequently, some of the non-medical information of the EHR extract is lost.
Many document-based NoSQL databases store whole documents as entire BLOBs respecting an original XML or JSON (JavaScript Object Notation) format. This means that no relational tables are constructed. These NoSQL databases have no schema and they do not support either joins or (ACID) properties11, i.e.,atomicity, consistency, isolation, or durability. As a result, they may be very inefficient if an element of a document references elements of the same or other such documents utilizing an indirection link. This happens because, in order to maintain consistency, the entirety of the referenced documents have to be processed sequentially. However, a non-relational database may be still appropriate if the main task performed by the DBMS is a document-based task. This is because data may remain in a form more closely approximating its true representation using a document-based NoSQL database, though this is also due to the special persistence policies accomplished by EHR medical documents (see discussion section).
The purpose of these methods is to showcase several experiments to compare the implementation of the persistence layer of a standardized EHR system using three different DBMSs: one relational (MySQL) and two NoSQL (document-based MongoDB and native XML eXist). Their computational complexity has been computed and compared using three different increasing size databases and six different complexity-increasing queries. The three database servers have been installed and configured locally in the same computer where the queries have been executed. See the Table of Materials for technical details.
Concurrency experiments have also been conducted in order to compare the performance of relational MySQL and NoSQL MongoDB DBMSs. The described ORM improvements (Node+Path and ARM) have also been compared using relevant appropriate results from the literature10.
Database management systems are evolving continuously at an accelerating rate. No one would think about this exponential development when the only existing paradigm was the relational model. To take an example, see for instance12, where a model was proposed to implement response-time improved relational databases retaining the ACID properties.
This protocol shows that pure relational ORM systems do not seem practical for single-patient queries (Q1, Q3, and Q4) since response times are slower, probably due to a high number of relational tables performing many expensive join operations, and due to the storage system used by the specific kind of database. NoSQL databases store data in a document-based fashion, while relational systems use a table-based fashion that spreads each document throughout the whole database. NoSQL systems show a linear slope, with MongoDB performing considerably faster than eXist DBMS. In concurrency, MongoDB also behaves much better than relational MySQL ORM7.
This protocol presents a troubleshooting protocol for the results presented in7 regarding the ORM MySQL DBMS. The MySQL system has been updated to the latest version and the results have been slightly modified. In addition, a critical point in document-based NoSQL systems such as MongoDB is that they may preserve consistency when storing medical information7 because when an EHR extract is updated, it is not overwritten, but a whole new extract with the new data is generated and stored in the system, and the original extract is maintained. This is a strict requirement of medical information, because some medical practitioners may have made important medical decisions based on the original data.
The improved relational ARM system drastically diminishes the number of relational tables and improves relational performance. However, since it modifies the relational schema, medical information held by the extracts may be queried, but extracts cannot be recovered in their exact original forms.
For very big databases in secondary use (research), it is not clear which database system is more appropriate, since the all-patient queries (Q2 and Q5) behave better in ORM than in NoSQL systems, but these systems perform better than the simplified relational systems in 12. We consider Q6 a special query in between clinical practice and secondary use whose behavior cannot be determined by the results yielded by these experiments.
However, one limitation of the method is the inavailability of direct experiments comparing the improved relational ARM system with NoSQL MongoDB regarding single-patient, medical practice queries with exactly the same data used in the protocol. We maintained the results interpolating Table 3 and Table 5 regarding single-patient queries until the experiment including optimized ARM in the protocol was performed. We leave these experiments for future applications. One critical step within the protocol is the selection of free database, similar software versions from recent years, so that we may compare the exact state-of-the-art of the three technologies.
This is one of the first attempts to directly compare relational and NoSQL systems using actual, realistic, standardized medical information. However, the specific system to be used depends much on the actual scenario and problem to be solved8.
The authors have nothing to disclose.
The authors would like to thank Dr Dipak Kalra, leader of the EHRCom task force that defined the ISO/EN 13606 standard and his team from University College London for their kind permission to use the ISO/EN 13606 W3C XML Schema.
This work was supported by Instituto de Salud Carlos III [grant numbers PI15/00321, PI15/00003, PI1500831, PI15CIII/00010 and RD16CIII].
MySQL 5.7.20 | MySQL experiments | ||
Red Hat Enterprise Linux Server release 7.4 (Maipo), 2.60GHz, RAM 16GB | |||
MongoDB 2.6 | MongoDB experiments | ||
Windows 7, 2.66GHz, RAM 12GB | |||
eXist 3.0RC1 | eXist experiments | ||
Windows 7, 2.66GHz, RAM 12GB | |||
Studio 3T 5.5.1 | 3T Software Labs Gmbh | MongoDB GUI |