In the Inria’s Discovery initiative, we get in touch with CockroachLabs guys with an idea: make Keystone supports CockorachDB. So we give it a try and you can find a very first result on our GitHub. The GitHub project consists of a Vagrant file that spawns a VM with a CockroachDB database and then installs Keystone with Devstack using CockroachDB as backend.

CockroachDB claims to support the PostgreSQL protocol. It also provides support for SQLAlchemy that mostly inherits from the PostgreSQL one. So, making Keystone working with CockroachDB should be easy peasy right? Almost! The rest of this post starts by explaining why we choose Keystone for our POC. It then describes what we have done to make Keystone works over CockorachDB.

Focus on Keystone first, conquer all services then

OpenStack is a huge project, made of dozens of services. All services are developed independently, but built with common principals. In particular, a service stores its entire state in a database relying on a library called oslo.db that abstracts databases binding. Under the hood, oslo.db wraps SQLAclhemy. Thus, OpenStack supports MariaDB and PostgreSQL.

From all OpenStack services, we decided to focus first on Keystone. Keystone is one of the core services of OpenStack. It manages all the authentication part of the OpenStack process. We focus on Keystone with the following idea: if we can make Keystone work with CockroachDB, then we can hopefully make all services work with CockroachDB thanks to oslo.db abstraction.

We choose Keystone because its DB requests are quite simple: Keystone splits the authentication management in 10 concerns (credentials, token, domain, …), whose code and DB requests are relatively easy to follow. Keystone presents another advantage when you want to test OpenStack at scale. To scale, current deployments put instances of OpenStack in different regions around the globe. Across these regions, operators use Galera to synchronise Keystones’ database. Thus, it makes authentication the same, wherever a client authenticates herself. However, Galera presents major limitations in case of high latency such as in Edge computing. So focusing on Keystone will let us test how the combo Keystone/CockroachDB outshines these limitations.

sqlalchemy-migrate doesn’t work with CockroachDB

Keystone uses a database migration tool named sqlalchemy-migrate. This tool is called during the deployment of Keystone to migrate the database from its initial versions to its actual version. Unfortunately, migration failed with the following (partial) stacktrace:

DEBUG migrate.versioning.repository [-] Config: OrderedDict([('db_settings', OrderedDict([('__name__', 'db_settings'), ('repository_id', 'keystone'), ('version_table', 'migrate_version'), ('required_dbs', '[]'), ('use_timestamp_numbering', 'False')]))]) __init__ /usr/local/lib/python2.7/dist-packages/migrate/versioning/repository.py:83
INFO migrate.versioning.api [-] 66 -> 67...
CRITICAL keystone [-] KeyError: 'cockroachdb'
...
TRACE keystone     visitorcallable = get_engine_visitor(engine, visitor_name)
TRACE keystone   File "/usr/local/lib/python2.7/dist-packages/migrate/changeset/databases/visitor.py", line 47, in get_engine_visitor
TRACE keystone     return get_dialect_visitor(engine.dialect, name)
TRACE keystone   File "/usr/local/lib/python2.7/dist-packages/migrate/changeset/databases/visitor.py", line 62, in get_dialect_visitor
TRACE keystone     migrate_dialect_cls = DIALECTS[sa_dialect_name]
TRACE keystone KeyError: 'cockroachdb'

Actually, sqlalchemy-migrate contains dedicated SQL backend codes and there is no such code for CockroachDB. So, we developed one. The code circumvents some CockroachDB limitations. For instance, CockroachDB does not support the add of Primary Key constraint. So, we implement this migration by, first making a new temporary table with the new primary keys. Then, copying elements of the original table into the temporary one. And finally, dropping the original table and renaming the temporary one. You can find that specific snippet at lines 238-257 of our fork of sqlalchemy-migrate.

Honestly, the support is only partial and not well-tested. A tox -epy27 shows that the code successfully passes 90% of the unit tests, but still fails on cleaning some index. However, our patch lets us deploy Keystone over CockroachDB with Devstack, which is good enough for our POC.

oslo.db contains backend specific code for error handling

The oslo.db library contains one file with backend-specific codes (oslo_db/sqlalchemy/exc_filters.py). This file aims at abstracting database exceptions that differ but target the same class of error (because each backend produces a specific exception with a specific error message). Intercepted exceptions are wrapped into a common OS exception to abstract backend errors in the rest of oslo.db. CockroachDB doesn’t produce same errors as PostgreSQL, so we have to update this class. Note that our POC is not exhaustive since we only added error messages we saw during Tempest/Rally tests.

You can look at the differences between OpenStack/oslo.db/stable/pike and our fork on GitHub. We only add two lines!

CockroachDB manages isolation without lock

CockroachDB lets you write transactions that respect ACID properties. Regarding the “I” (Isolation), CockroachDB offers snapshot and serializable isolation but doesn’t rely on locks to do that. So every time there is concurrent editing transactions that end in a conflict, then you have to retry the transactions. Fortunately, oslo.db already offers a decorator to do that automatically. But, based on tests we run with Tempest/Rally, we figured out that some Keystone’s SQL requests needed the decorator.

You can look at the differences between OpenStack/keystone/stable/pike and our fork on GitHub.

What’s next?

You can drop yourself on the VM as a stack user and run Tempest/Rally tests (see README). Note that modifications we made are minimal. This is promising for the adoption of CockroachDB by other core services. Nonetheless:

  • We have to look through the performance prism: Keystone over CockroachDB vs. Keystone over Galera. This part may involve more modifications of oslo.db. One thing we have in mind is the management of retrying transactions since CockroachDB’s documentation suggests an approach that doesn’t match with actual oslo.db implementation.

  • We recently start a similar work for Nova, the compute service of OpenStack. Nova produces SQL query that are far more complex than Keystone. In particular, Nova produces correlated subqueries, i.e., the subqueries are ran against each row returned by the main query. Unfortunately, correlated subqueries are not supported by CockroachDB. So keep in touch, we will see in a future post how we cope with these queries.