-
Notifications
You must be signed in to change notification settings - Fork 227
Phoenix in 15 minutes or less
What is this new Phoenix thing I've been hearing about?
Phoenix is an open source SQL skin for HBase. You use the standard JDBC APIs to create tables, insert data, and query instead of using the HBase client APIs.
Doesn't putting an extra layer between my application and HBase just slow things down?
Actually, no. Phoenix achieves as good or likely better performance than if you hand-coded it yourself by:
- compiling your SQL queries to native HBase scans
- determining the optimal start and stop for your scan key
- orchestrating the parallel execution of your scans
- bring the computation to the data by
- pushing the predicates in your where clause to a server-side filter
- executing aggregate queries through server-side hooks (called co-processors)
Blah, blah, blah - I just want to get started!
Ok, great! Just follow our install instructions by:
- downloading and expanding our installation tar
- copying the phoenix jar into the HBase lib directory of every region server
- restarting the region servers
- adding the phoenix client jar to the classpath of your HBase client
- download and setup SQuirrel as your SQL client so you can issue adhoc SQL against your HBase cluster
I don't want to download and setup anything else!
Ok, fair enough - you can create your own simple SQL scripts and execute them using our command line tool instead. In the bin directory of your install location:
- Create us_population.sql file
create table if not exists us_population (
state char(2),
city varchar not null,
population bigint
constraint pk primary key (state, city));
- Create us_population.csv file
NY,New York,8143197
CA,Los Angeles,3844829
IL,Chicago,2842518
TX,Houston,2016582
PA,Philadelphia,1463281
AZ,Phoenix,1461575
TX,San Antonio,1256509
CA,San Diego,1255540
TX,Dallas,1213825
CA,San Jose,912332
- Create us_population_queries.sql file
SELECT state as "State",count(city) as "City Count",sum(population) as "Sum Population"
FROM us_population
GROUP by state
- Execute the following command from a command terminal
psql.sh your_zookeeper_quorum us_population.sql us_population.csv us_population_queries.sql
Congratulations! You've just created your first Phoenix table, inserted data into it, and executed an aggregate query over it with a few lines of code in 15 minutes or less!
Why is it called Phoenix anyway? Did some other project crash and burn and this is the next generation?
I'm sorry, but we're out of time and space, so we'll have to answer that next time!