Blog

18 Dec, 2024

To Relational Database, or not to Relational Database?

Nicholas Potesta, Lead Software Engineer

6 min read

Teams can often build themselves a footgun with the over prescription of relational database management systems (RDBMS) in their solution design. Often resulting in members of the team thinking: “how the heck do I fix this relational database shaped hole in my foot?!”

A common trend within industry sees teams discussing “what [existing] database should we use for this solution?” or jumping straight to “what should the table schema look like?” during solution design.

This thinking can often be an anti-pattern, as it tends to lead teams toward poor assumptions being made of the data storage tooling.

As with all aspects of software design, a certain amount of pragmatism should be practised. When it comes to designing how our systems store and access data, teams may want to instead discuss: “how does our solution need to query the data?”

Let’s dive a little deeper into what teams may consider when it comes to this discussion.

The structure of the data
Are there relationships and/or hierarchies between the data entities? If so, how many? Can these relationships be easily represented and maintained by a simpler/flat structure?

Is the data ephemeral?
What happens if we lose this data? If the data is ephemeral, can and should the data be easily rebuilt within an accepted time frame?

The read patterns
What response times are required when reading the data? How often does the solution need to read data? How will the solution access the data?

For example: If we’re designing an online hotel booking system, and we need to store a customer’s booking data; will we only ever need to look it up per Booking ID? Or will there be a need to sometimes access it via the Customer ID? Or potentially the date the booking was placed?

The write patterns
This is effectively the inverse of the previous point; how will this solution need to write data? Do the writes need to be atomic?

Does this data need to be fanned out to other downstream systems?
If the team is hoping to use some of the architecture and design to enable event-based processing, could we leverage some of the out-of-the-box streaming functionality of different data store technologies?

Some of these questions should help to drive the team to think more about what the underlying data store should look like. If for example the read and write pattern for the data only ever requires a service to write/read millions of records which are relatively flat JSON objects, against a single identifier, with sub-millisecond latency, a RDBMS might not necessarily be the best tool for the job.

Alternative Data Storage Strategies

When teams begin to think about how their solutions need to read and write data, they may also want to start to think about alternative read/write patterns and what they might look like without the use of a relational database.

In order to explore this further, let’s look at a simple usage example.

Let’s focus on a team that is implementing a solution to store and master customer order data for a fairly traditional ecommerce application that services customers across Australia and New Zealand.

The team has already built a service which receives requests from the checkout and cart services each time an order is placed with all the appropriate data for the order - i.e: customer name and cart details.

What are some of the approaches we might be able to take to store this data for use later? Again, focusing on the design facet: “how does our solution need to query the data?”

S3 Storage

When posed with the question: “how does our solution need to query the data?” the team has a bit of a think, and find that the data will only ever need to be read per jurisdiction (I.e. Australian or New Zealand Customers), per Order ID. This also drives the write strategy.

The team are not overly concerned about performance, but are keen to keep lookups and writes at under ~500ms.

Given this, the team could think about writing these orders to S3, as they’re processed, as flat structured blobs (JSON, YAML, you name it!)

This could look something like:

Figure 1. High-level design for utilising S3 storage to store our hypothetical team’s order data

Figure 1. High-level design for utilising S3 storage to store our hypothetical team’s order data

As above, a certain amount of pragmatism should be practised. So let’s dive into what makes utilising S3 storage in this case advantageous and less advantageous for the team, compared to the use of a relational database.

Advantages

  • Can easily query for orders when the jurisdiction (I.e: Australia) and order ID are known

  • Scalable reads and writes. S3 provides up to 5,500 TPS per key prefix

  • Only paying for data ingress and egress as per the S3 usage pricing. There are no costs associated with “running” an underlying instance. Operationally and financially this can often result in far less cost compared to running the cheapest/smallest RDS instances

  • Buckets and their associated data can be backed up with the use of Bucket Replication

  • No need to maintain a data schema

  • Ability to provide event streams on updates (or creates) from the bucket to downstream services

Disadvantages

  • Unable to easily query for orders via known data outside of provided keys. For example, trying to query for all orders matching a target customer_id, or if there’s a scenario where we don’t have the jurisdiction for the order, we would need to change how the keys are built and stored

  • Reliance on writes being complete snapshots of data. Difficult to provide “delta” writes in some scenarios

  • Pulling data requires the entire JSON blob to be pulled out of S3. Unable to query for specific fields

  • Difficult to scan for multiple orders. Each get query is effectively a single fetch operation from the bucket which could result in lots of network I/O when multiple orders are needed in a single process.

DynamoDB (or Similar NoSQL) Storage

If we were to use the same problem statement above, and the team only ever needed to read an order per jurisdiction (I.e: Australian or New Zealand customers), per order ID. But, this time around, the team are very concerned about response times needing to be sub-millisecond for lookups and writes where appropriate.

We could look at potentially storing these orders in something like DynamoDB and explore alternative strategies for storing this data with a breakdown of attributes across a table, or provide a top-level map of the data.

DynamoDB (and other similar NoSQL solutions) support multiple data types. Specifically scalar types (I.e. a single value; string, number, binary) or document and set types (I.e. complex structs with multiple nested scalar values).

Being a key/value store, we can approach this in multiple ways depending on how the team wants to work against the read and write requirements.

Assuming an order with ID 2323 within the Australian jurisdiction:

Example One: Partition Key and a single Map typed field to store order data

Example DynamoDB table structure for optimised lookups of order data via the Order ID

Figure 2: Example DynamoDB table structure for optimised lookups of order data via the Order ID

Example Two: Partition Key and breakout queryable data aspects into individual fields

Let’s say the team explores the query requirements a bit further and discovers they may also need to query by the order ID, jurisdiction and the customer id. We can augment our storage design to suit:

Example DynamoDB table structure for optimised lookups of order data via the Order ID, jurisdiction and/or customer ID

Figure 3: Example DynamoDB table structure for optimised lookups of order data via the Order ID

Let’s take a look at what makes utilising either of these storage patterns in AWS DynamoDB (or similar NoSQL solutions) advantageous and disadvantageous for the team compared to the use of a relational database.

Advantages

  • Able to perform snapshot and delta writes as DynamoDB supports updating of specific mapped attributes

  • Sub-millisecond lookups when querying for a single ID or up to 100 IDs via BatchGetItem

  • With on-demand or provisioned iops pricing, the team will only ever pay for data ingress/egress. There is no concept of paying for a “running database instance”

  • No need to maintain a data schema

  • Able to provide event streams to downstreams services via DynamoDB streams

Disadvantages

  • Unable to query for specific attributes of the order outside of the specified IDs within each example

  • Able to scan for data (I.e: Give me the first 10 orders) but it is not recommended or performant with larger datasets

Final Thoughts

RDBMS and some of its cloud offerings such as AWS RDS are not inherently bad products. Oftentimes, the pain that teams can experience that come with a relational database shaped hole in their foot are symptoms of the RDBMS being used for a workload that doesn’t necessarily suit. Generally caused by poor assumptions made about the design of the data storage and how the data should be read and written.

Software teams practising healthy technical and delivery principles should encourage pragmatism across the board. Including how systems store and access data. In practice this should translate into teams not necessarily moving away from relational databases all the time, but focusing on “how the solution needs to query the data.”

The examples explored above aren’t an exhaustive list and shouldn’t be treated as such. But can hopefully help to demonstrate how teams should begin thinking about the design of how their systems read and write data.

Share

Connect with us

Your strategic
technology partner.

contact us
Melbourne skyline