2015 Dec 10

Leveraging the power of UUIDs to ensure interoperability with external applications or micro-services and how it relates to database design.

A unique challenge exists in today's applications. They are no longer data silos or single domain applications, and they need to interact with external services (microservices). Application developers and database administrators need to ensure that actions taken in their applications interact with these external services and update the correct records and - likewise - actions in these applications need the ability to update or record actions in the source application.

In a previous post I discussed history of SQL as a language and highlighted some pretty amazing features of the standard. In this post I'm going to cover a topic that is related called database design.

Understanding database design is key to understanding a key tool in addressing this challenge -- the universally unique identifier (UUID).

Types of Databases

There are many types of database systems in the world today, but the two most dominant are Relational Databases Management Systems (RDMS) or a Document-oriented Database (often referred to as NoSQL). The two are very different in their implementation. I'll discuss each briefly, but the core of this post focuses on relational datasets.

Document-oriented Databases (NoSQL)

The two predominant characteristics of NoSQL databases are that they are schema-less and weakly typed. What this means is that database designers (often referred to as database administrators, or DBAs) do not have to structure a database beforehand.

Document-oriented databases store data in documents. Documents are key-value stores where the key denotes the name of the element you're storing and the value is just that, the value. For example:

author:

  name: 

    given_name: Bob

    family_name: Smith

  address:

    street_address: 555 Example St.

    city: Ottawa

    state: Ontario

In this document example, all of the strings before the colon (user, contact, address, family_name, street_address etc.) are keys and the string after is the value. The value does not have to be a string, it can be any type of data.

To build a document in a NoSQL database you do not have to predefine any of these keys or value types beforehand, nor do you have to adhere to them. You can add or rename keys and change the types of data stored in the value field at will.

This flexibility is great from an application development standpoint, however, there is no concept of relational data in a NoSQL database. This means that design, as it exists for NoSQL, centers around how you want to retrieve data from the database (the query) rather than how you want to store data. It seems subtle, but the primary drawback of document-oriented databases is that providing different ways to retrieve or analyze data can be challenging if your "design" is optimized for a set of queries.

For example, if you wanted to find comments by any user for a given article, then you would store comments on an article document and the user data within the comment. Something like this:

article:

  title: Awesome article title

  text: Amazing article text

  author:

    given_name: Jane

    family_name: Doe

  comments:

    text: This article is neither awesome or amazing.

    user:

      given_name: Bob

      family_name: Smith

Now, if you wanted to find comments for any article by a given user it would be considerably more challenging as you would have to reach into each article, iterate over the comments and find users that match the specific user you're looking for.

Relational Databases

Relational databases are designed beforehand and strongly typed. They are a collection of database objects including tables, columns, indices and more.

For the purposes of this article, we're going to focus on tables and columns as it related to database design.

In a relational database you would create tables to represent objects you want to store and columns within these tables to store their values. Here's an example:

Users (users)

id given_name family_name
1 Bob Smith
2 Jane Doe

 

Articles (articles)

id user_id title text
1 2 Awesome article title Amazing article text

 

Comments (comments)

id article_id user_id text
1 1 1 This article is neither awesome or amazing.

 

This example details the comparable complexity of the relational database environment. These tables need to be setup first, the column types need to be specified (integer, varchar, text) for each type of data that column will store and the relationships between tables need to be predefined.

There is a pattern here. This is called database normalization. In relational database design, this is incredibly important as it will:

  • Free the database of modification anomalies
  • Minimize redesign when extending the database structure
  • Avoid bias towards any particular pattern of querying

The last point being the most important from our example above. With this database structure, finding comments by a user on an article as well as finding all comments by a user for any given article is extremely easy to do as both queries would be isolated to the comments table with some conditional WHERE statements.

Universally unique identifier (UUID)

In the relational example above, you'll notice some key columns that we didn't cover -- the id columns. These columns form the cornerstone of relational database design theory. They are the primary and foreign keys that determine the relationships between the respective tables.

Single domain application ERD

This diagram details the relationships. The user_id column in comments and articles refers to the users table. This means, for each entry in the comments or articles table, the user referenced by user_id is the author. The family_name and given_name is not written to the articles or comments tables, instead, it is only written to the users table. In this way, there is only a single table to update if a user changes their name, and all references to these columns will update immediately.

A key issue that the UUID addresses is that these id columns are usually auto-incrementing integers. Each new record added to the database will automatically increment these numbers to the latest available and then write the rest of the columnar data.

This design is fine for single domain applications. Single domain applications typically will not interact with other applications and their accompanying databases. In a sense, they're isolated.

In modern applications, this is no longer the case. Entities or objects represented by relationships across multiple tables in a single-domain application will likely need to relate to data stored in external applications and/or databases.

Enter the UUID. They look like this:

123e4567-e89b-12d3-a456-426655440000

The probability of collisions in generated UUIDs is pretty remote. As the article on wikipedia puts it, "after generating 1 billion UUIDs every second for the next 100 years, the probability of creating just one duplicate would be about 50%".

The goal of the UUID is to "enable distributed systems to uniquely identify information without significant central coordination". In simple terms, this means that auto-incremented table ids will not be be unique across different applications and databases. Instead of using auto-incrementing ids in tables, switching to UUIDs or storing them as a separate columnar value allows distributed applications to relate to one another.

This is called referential integrity. It allows that updates to a specify record or row in a database with a UUID and corresponding reference in an external application will update across applications as this value can be used to uniquely link them together.

Your database diagram now looks like this:

Multi-domain application ERD

In this way, the native applications can still rely on their own internal mechanisms for foreign key/primary key assignment (using ids) but have the ability to relate via associatied UUIDs.

Conclusion

Alongside the power of modern day databases and their accompanying standards is the power and prowess of database design. There are many competing database technologies, but dominant ones such as document-oriented databases seek to remove the design element from the software development lifecycle to provide some additional flexibility and speed at the beginning of an applications life.

However, document-oriented databases suffer from a critical problem as an application grows. The original intent of the documents can be pushed past their limit as applications stretch to look at their data in different ways, a problem long since solved by relational database theory.

In relational database design, individual tables and their columns represent the smallest amount of data possible and relate to eachother through auto-generated identifiers that are used to join tables together to pull out data in a myriad of ways to analyze and report on.

As applications grow to beyond their single deployment or domain, the need to interact with external applications pushes the relational design even further in ensuring that these foreign key relationships are as unique as possible.

The universally unique identifier is a value either stored as a primary key on a record or as an additional column that provides a near-impossible possibility of collisions between distinct applications.

With wicket.io, all objects represented in the database has an accompanying UUID. The key purpose is interoperability with external services and ensuring that data integrity both internal to the application and it's external services is paramount.