Skip to content

Data sets and Vagrant script to provision a virtual machine for Apache Calcite development

License

Notifications You must be signed in to change notification settings

vlsi/calcite-test-dataset

Repository files navigation

Test data sets

This repository includes data sets and Vagrant script to provision a virtual machine with pre-installed databases.

The idea is to have an easily-available development machine for testing Apache Calcite.

Requirements

  • Java
  • Maven 3.0.4
  • Vagrant
  • Virtual Box
  • 1GiB of internet for initial VM provision
  • ~10GiB disk space (VirtualBox image with data consumes 3.2GiB)

Installation

Note: the databases are listening on the default ports, so you might need to pick other ports if you have MongoDB/MySQL/PostgreSQL running on your host machine. To update port forwarding, edit vm/Vagrantfile.

Alternatively, run shut.sh, which will attempt to shut down your native databases.

The step by step is as follows:

mvn install # this will download base image and install all the databases

Note: it might take 10-30 minutes depending on your machine and internet connection.

List of created databases

  • Apache Geode (port 10334)
  • Apache Cassandra (port 9042)
  • Druid (port 8082)
  • H2 (h2/target folder)
  • HSQLDB (hsqldb/target folder)
  • MongoDB (port 27017)
  • MySQL (port 3306)
  • PostgreSQL (port 5432)

List of data sets

Using the VM

How to create a VM

A single mvn install setups and starts up the VM.

mvn install

Note: vm/target stores apt-get cache (~340MiB), so you might want avoid cleaning it.

How to drop the VM

Note: this destroys VM's data (virtual hard drive), so make sure you've backed up all your changes done in the VM.

cd vm && vagrant destroy

How to connect to VM via SSH

cd vm && vagrant ssh

How to startup and shutdown the VM

cd vm
vagrant up
vagrant halt

Accessing Apache Geode in the VM

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ gfsh
Monitor and Manage Apache Geode
gfsh>connect
Connecting to Locator at [host=localhost, port=10334] ..
Connecting to Manager at [host=192.168.68.8, port=1099] ..
Successfully connected to: [host=192.168.68.8, port=1099]

gfsh>list regions
List of regions
---------------
BookMaster
...
Zips

gfsh>describe region --name=/Zips
..........................................................
Name            : Zips
Data Policy     : partition
Hosting Members : server1

Non-Default Attributes Shared By Hosting Members  

 Type  |    Name     | Value
------ | ----------- | ---------
Region | size        | 29353
       | data-policy | PARTITION

gfsh>quit
Exiting... 

Accessing Apache Cassandra in the VM

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ cqlsh -k twissandra "`hostname -I` | sed -e 's/192.168.68.8//'"
Connected to CalciteCassandraCluster at 10.0.2.15:9042.
[cqlsh 5.0.1 | Cassandra 2.2.5 | CQL spec 3.3.1 | Native protocol v4]
Use HELP for help.
cqlsh:twissandra> describe columnfamilies

users  timeline  followers  tweets  userline  friends

cqlsh:twissandra> exit

Accessing Druid in the VM

Wikiticker data:

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ cat >query.json <<EOD
{
    "queryType" : "timeBoundary",
    "dataSource": "wikiticker"
}
EOD
vagrant@ubuntucalcite:~$ curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'content-type: application/json'  -d @query.json
[ {
  "timestamp" : "2015-09-12T00:46:58.771Z",
    "result" : {
      "maxTime" : "2015-09-12T23:59:59.200Z",
      "minTime" : "2015-09-12T00:46:58.771Z"
  }
} ]

Foodmart data:

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ cat >query.json <<EOD
{
    "queryType" : "timeBoundary",
    "dataSource": "foodmart"
}
EOD
vagrant@ubuntucalcite:~$ curl -X POST 'http://localhost:8082/druid/v2/?pretty' -H 'content-type: application/json'  -d @query.json
[ {
  "timestamp" : "1997-01-01T00:00:00.000Z",
  "result" : {
    "maxTime" : "1997-12-30T00:00:00.000Z",
    "minTime" : "1997-01-01T00:00:00.000Z"
  }
} ]

Accessing MongoDB in the VM

Zips data:

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ mongo test
MongoDB shell version: 2.6.6
connecting to: test
> show collections
system.indexes
zips
> exit
bye

Foodmart data:

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ mongo foodmart
MongoDB shell version: 2.6.6
connecting to: foodmart
> show collections
account
agg_c_10_sales_fact_1997
agg_c_14_sales_fact_1997
agg_c_special_sales_fact_1997
agg_g_ms_pcat_sales_fact_1997
...
> exit
bye

Accessing MySQL in the VM

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ mysql --user=foodmart --password=foodmart --database=foodmart
...
Server version: 5.5.40-0ubuntu0.14.04.1 (Ubuntu)
...
mysql> show tables;
+-------------------------------+
| Tables_in_foodmart            |
+-------------------------------+
| account                       |
| agg_c_10_sales_fact_1997      |
| agg_c_14_sales_fact_1997      |
| agg_c_special_sales_fact_1997 |
| agg_g_ms_pcat_sales_fact_1997 |
...
mysql> quit;
Bye

Accessing PostgreSQL in the VM

$ cd vm && vagrant ssh
vagrant@ubuntucalcite:~$ PGPASSWORD=foodmart PGHOST=localhost psql -U foodmart -d foodmart
psql (9.3.5)
foodmart=> \d
 public | account                       | table | foodmart
 public | agg_c_10_sales_fact_1997      | table | foodmart
 public | agg_c_14_sales_fact_1997      | table | foodmart
 public | agg_c_special_sales_fact_1997 | table | foodmart
 public | agg_g_ms_pcat_sales_fact_1997 | table | foodmart
...
foodmart=> \q

About

Data sets and Vagrant script to provision a virtual machine for Apache Calcite development

Resources

License

Stars

Watchers

Forks

Packages

No packages published