Skip to content
apavlo edited this page May 30, 2011 · 1 revision

Authors

Alex Kalinin

Implementation

First of all, we need to reflect transactional model into document-model. We do this in this way:

  • Every table becomes a CouchDB database
  • Every row becomes a JSON document
  • Every column/value becomes a key/value pair in the JSON document

Every document must have a unique id. In our case this is:

  • Concatenation of all primary key values as a string, if the table has a primary key.
  • Randomly client-generated uuid, if the table does not have a primary key.

It is obvious that for efficient processing some indexes are needed. In our case they are emulated via CouchDB views. Some views are just used to query documents by some combination of keys (e.g., "retrieve all orders from the ORDERS table for that (DISTRICT, WAREHOUSE) pair"). Others are used to compute aggregates, like the total sum of the order summing all corresponding order lines from the ORDER_LINE table.

The details can be found in the TPCC_SCM structure in the driver, which contains view definitions (in form of simple JavaScript functions) and all other details about this relational-JSON convertation.

Partitioning

If we have several nodes available, we treat them as shards by partitioning databases across all nodes. This is done in a round-robin manner using an integer key. Key is specified as a parameter for every database in the TPCC_SCM structure as a distr_key. In its current incarnation it uses warehouse ids to put a document to the corresponding shard. For tables that do not specify the distr_key all documents are replicated across all shards (e.g., ITEMS table).

During query execution documents are requested based on the warehouse specified. The request just goes to the corresponding shard. This is hard-coded in the query execution logic right now. So, if 'distr_key' specifies something different from the warehouse id, the execution should fail (this was not tested).

Short-cuts

This is actually a very fair implementation of the TPC-C, no short-cuts. Except for one thing: isolation. It is impossible to isolate concurrent execution in CouchDB, since the level of granularity of requests is just one document. However, it still should be okay, because of the nature of these transactions. Usually, they go for disjoint orders (e.g., DELIVERY), unique order ids (e.g., NEW_ORDER), etc. All contention between them is resolved in a pessimistic way: if an update conflict happens, we just catch the exception, re-fetch the document and try to update it again.

However, some of the transactions can clash in an unusual way. If we are putting a new order and insert the record in the NEW_ORDER table first, delivery transactions might fetch it before we have put the whole order in the system! I am pretty sure that such things should not happen in this implementation, since I tried to reorder operations to avoid this. All logic stays the same, of course.

There is one logical isolation conflict that seems impossible to get rid of. When STOCK_LEVEL transaction fetches information about last 20 orders, it might not see some of the order lines. This is because it uses D_NEXT_O_ID column, which is updated first in the NEW_ORDER transaction. We cannot do it the other way, since obtaining an order id in the NEW_ORDER is crucial for further execution, since all order lines must have a valid order id. That implies you cannot insert order lines before updating D_NEXT_O_ID field. So it seems it is impossible to resolve this conflict in a simple way. Except using some tricky application magic, of course.

On another note, durability is not supported at all. If the execution is interrupted in the middle, the databases might end up in an inconsistent state. The only possible solution is to maintain some kind of logical log in the application (driver), which is not a part of this driver.

Optimizations

The first optimization involves fetching and updating documents in bulk whenever possible. For example, in NEW_ORDER transaction you can insert all order lines with one request. For queries I tried really hard to fetch as more docs as possible in one request, rearranging operations if possible. For example, for the DELIVERY transaction, you can fetch all total costs for all orders from the ORDER_LINE table with one request by querying the corresponding view with all order ids as keys.

The second optimization uses views extensively as an alternative to an index. For every database, if we are going to query it not only by document ids, but also by some keys (e.g., querying ORDER_LINE by (order number, district, warehouse) key), we create a view. This is a common approach to handle such types of queries in CouchDB. There is one view, which is provided by the system itself: _all_docs. It is used extensively to retrieve several documents in bulk. In some cases it is assumed that the result of the view is sorted by the key. This is a fair assumption, guaranteed by the system and considered a common practice.

The only unusual thing about the views is that we query them on the loading phase for the first time before the actual TPC-C workload starts. The reason for this is the way CouchDB maintains views: lazy updates. So, if we did not query the views during the loading phase, the first transaction would wait for a very long time on first retrieval from a view. That is why the actual building of the view is "moved" to the loading phase. I do not consider lazy updates unfair during the actual execution, so nothing is done about that.

Driver Dependencies

The driver requires only couchdb-python to run. All other dependencies should be present in the standard Python distribution.

Note, that couchdb-python cannot handle datetime objects, raising an exception. So, the loader was modified to change all datetime.now() to str(datetime.now()). This is fair and is probably done by other "sane" drivers internally as well. For the executor, the corresponding changes are made in the driver and do not involve the actual engine.

Known Issues

Results seem unusual at best. The actual killers are transactions that update and retrieve a lot of documents. CouchDB and/or Erlang do not have a good profiler (to the best of my knowledge). However, I tried to measure some transactions and parts of them. Here are some results for a local execution with two nodes and two clients (20 warehouses):

  • DELIVERY: ~1s
  • NEW_ORDER: ~1s
  • ORDER_STATUS: ~0.8s
  • PAYMENT: ~0.3s
  • STOCK_LEVEL: ~6s

It can be seen here that STOCK_LEVEL transaction would block the client for approximately 6 seconds. Actually, this transaction probably cannot be optimized further. It already uses two views that are specifically created only to optimize it (more precise, to compute the join).

To sum up, these are possible reasons for the poor performance:

  • CouchDB does not do any caching/buffering. Every request goes to disk. It is possible that Erlang does something in its file access library, of course, but that would be too general. It can be seen that the amount of disk I/O is pretty significant during processing and CPUs are actually under-saturated (both on clients and on working nodes). Increasing the number of clients helps to saturate working nodes, without any actual performance benefits, which is probably because of disk contention.
  • There will be a lot of HTTP connections. Every request would result in opening a connection, fetching the data and then closing it again. This might introduce some amount of overhead. I tried to fight this issue by doing requests in bulk whenever possible.
  • Network traffic. If the only thing you need is to update a key, you still need to retrieve the whole document. We maintain a view for the STOCK_LEVEL transaction just to avoid this kind of overhead.
  • Pessimistic concurrency control. If an update fails because of a conflict, you have to fetch the document again and try to update it again. To be honest, I have not seen this often, probably because the number of clients was not so large. Increasing the number of clients obviously increases the contention.

Future Work

None