Skip to content
apavlo edited this page May 30, 2011 · 2 revisions

Authors

Zikai Wang

James Chin

Implementation

We adopt the same table partition as in TPC-C specification. Therefore, we have 9 tables: WAREHOUSE, DISTRICT, CUSTOMER, HISTORY, NEW_ORDER, ORDERS, ORDER_LINE, ITEM, STOCK. Also, each table has the same columns as in TPC-C. However, these columns are grouped, named differently for optimization.

We do not perform denormalization. Because limited operators provided by HBase are all based on row key, denormalization does not bring much benefit. Though it is possible to hack row keys to contain information to query on, maintaining secondary indexes with such row keys are complex and error-prone. In contrast, our performance optimization focus on utilizing HBase's flexible physical storage layout and exploiting its rich tuning parameters.

We make a lot of efforts on tuning. For the experiments that generate the TPC-C numbers, we use the following optimizations that fall into three categories.

Firstly, we optimize data storage layout. Because HBase values are freighted with row keys and column names(column family plus column qualifier), we minimize row key size and column name size. By doing this, we decreased overall data size by about 23% and improved read/write performance. We group columns into column families according to access pattern of TPC-C transactions. We pick a suitable maximum region size and set number of column families to no more than three. Most importantly, because all cluster nodes have 8GB of main memory and size of TPC-C data for HBase is like 5GB/node, we would like to cache as much data as possible in main memory with the in-memory column family option.

Secondly, we optimize client performance. At client side, we pre-fetch a bunch of tuples for SCAN. The default behavior for a scanner is to perform one RPC to fetch one tuple when the client calls next method of the scanner instance. We tune it to pre-fetch 100 tuples. We turned off pre-write to WAL log and auto flushing for each PUT to speed up both loading process and writes in execution process.

Thirdly, we tune both Hadoop and HBase at server side. For Hadoop, we use a 128MB block size instead of the default 64MB. We set the size of buffer for use in sequence files to 128KB instead of 4KB. We also increase the upper bound on the number of files that one HDFS datanode will serve at any one time. For HBase, we increase HBase heap size to 5GB to cache TPC-C data. Because the execution process does not write much data, we decrease upper and lower limit of memstore to save HBase heap for caching TPC-C data. We also set the number of threads that are kept open to answer incoming requests to user tables to a larger number.

There were two optimizations that we seriously considered but did not perform for the optimizations: LZO compression and bloom filters. Because of licence issues(Apache VS GPL), LZO is not shipped with HBase and to configure it is too complex. Because we aggressively use memory to cache data, we do not use bloom filters.

Driver Dependencies

Our implementation is in Jython and thereby needs main Hadoop and HBase jars. Classpath should be set properly to these packages.

When classpath is set properly, tpcc.py is replaced with our version that uses java.util.concurrent instead of multiprocessing module and the import for multiprocessing module in executor.py is commented, just replace ‘python’ in the original full command with ‘jython’ and that’s it.

Known Issues

When we ran the experiments, free heap space on the nodes was less than we expected. Moreover, there were some load balancing issues that some nodes cached significantly more data than other nodes and ran out of memory. As a result, we had to decrease data cached in main memory a lot. When we ran the experiments, the client node was not saturated (only 78% usage for each core). We think in our experiment setting, the system is disk-bound.

Future Work

The code style of our driver has a lot of space to improve. Though the code is commented very well but it could be more neat. That is because we are totally new to Python and we would like to do it in a 'quick and dirty' way that focus on performance. We do have a version that looks much better from a software engineering perspective but its performance is not as good as the previous one due to costs of much more function calls and argument passing by value.

Clone this wiki locally