Bird's Eye View of Database Architecture

Blog Link

There is no hard stated architecture for database system and each system has customizations to suit the use case that they have been optimized for. However, almost all the database system follow a basic architecture in some form.

Let us dig into a very high level view of what goes into making a database system.

It all starts with client making a request (query) to the database server. The database server receiving the request can choose to either process the request locally or forward the request to other instances (or both) depending on the locality of the data requested. In either way the Communication Manager needs to keep an active connection between the itself and the client and the other instances.

Once the database system accepts a client request it needs to decode it to understand the request details. This step is achieved by the Query Processor layer. It is responsible for

  • Parsing the query to interpret it.

  • Validating it to ensure the semantics are correct and are in accordance with the system

  • Query optimization - A query (request) can be addressed in many different ways. For example, you can choose to scan through an entire column to filter out requested rows or make us of the index table, if present, and avoid full table scan. Query Optimizer uses metadata and statistics about the data to prepare the query plan.

The query plan is then handed over to the Execution Engine. The execution engine can then decide to execute the query locally by invoking the storage engine or pass it on other nodes in the cluster or both, if the query needs to span across nodes.

The Storage Engine is responsible for handling of the physical data. It comprises of algorithms and data structures to efficiently read and write data while ensuring the sanity and durability of the data. It is comprised of below components:

Access Methods - responsible for managing the database files, their physical layout on disk and data structures for optimizing their read and write.

Buffer Manager - database systems often manage their own page cache instead of relying on the virtual memory of the operating system in order to have more control and scope for optimization.

Transaction Manager and Lock Manager - one of the key responsibilities of the database system is to ensure that the integrity of the data is maintained as per the defined invariant. A transaction manager ensures that all the steps of the transactions are complete and committed before the transaction can be marked as complete. Any incomplete transaction should be rolled back. It relies on lock manager for concurrent operations to ensure that no dirty record is persisted.

Recovery Manager - In order to ensure that all the completed transactions are eventually persisted to physical storage the storage engines retains operation logs which help the system to recover in case of a crash

Resources

The above article captures the learning notes from

  • Database Internals - by Alex Petrov

  • DATABASE SYSTEMS The Complete Book - Hector Garcia-Molina Jeffrey D. Ullman Jennifer Widom

  • Architecture of a Database System - Joseph M. Hellerstein1, Michael Stonebraker2 and James Hamilton3