Skip to content

Latest commit

 

History

History
140 lines (120 loc) · 4.52 KB

README.md

File metadata and controls

140 lines (120 loc) · 4.52 KB

SparkSQL+

This is the repository for the demo paper SparkSQL+: Next-generation Query Planning over Spark.

Prerequisites

  • Java 1.8
  • Scala 2.12.10
  • Maven 3.8.6
  • Spark 3.0.1
  • HDFS(Optional) 2.7.7

Project Structure:

SparkSQL+ uses Maven multi-module for project organization and dependency management. It consists of the following submodules:

  • sqlplus-core: contains the parser, planner, and code generator of SparkSQL+.
  • sqlplus-lib: contains the library functions used by the Scala programs generated by SparkSQL+.
  • sqlplus-web: contains the web-based interface.
  • sqlplus-cli: contains the command line interface.
  • sqlplus-example: contains the implementation of built-in example queries.

Setup

Clone

Run git clone [email protected]:hkustDB/SparkSQLPlus.git.

Configurations

SparkSQL+ modes

  • Local mode
    • Use Local mode if the Spark Standalone Cluster is deploy on the same machine.
    • The HDFS is not needed in Local mode.
  • Remote mode
    • Use Remote mode if the Spark Standalone Cluster is deploy on different machines.
    • The jars will be uploaded to HDFS before execution.
    • The input data should be in HDFS before execution.

Local forwarding

  • Local port forwarding is necessary when the remote Spark and HDFS clusters are not directly accessible. In this case, enable the experiment.forwarding (see below).
  • Manually enable local port forwarding for the following ports(e.g., through SSH -L):
    • 6066
    • 7077
    • 8080
    • 8081
    • 9000
    • 50070
    • 50075

SparkSQL+ configurations

  • Run cp src/main/resources/application.yml.template src/main/resources/application.yml.
  • Edit src/main/resources/application.yml.
server:
  port: 8848

logging:
  config: classpath:log4j2-spring.xml

sqlplus:
  home: /Users/sqlplus/Projects/SparkSQLPlus    # the absolute path of SparkSQLPlus

experiment:
  mode: local                                   # local or remote
  forwarding: false                             # whether using local port forwarding
  spark:
    master:
      host: localhost
      port: 7077
      submission:
        port: 6066
      web-ui:
        port: 8080
    driver:
      memory: 4g
      cores: 1
    executor:
      memory: 4g
      cores: 1
    default:
      parallelism: 1
  timeout: 300                                  # timeout setting, in seconds 
  hdfs:
    host: localhost
    post: 50070
    path: /Users/sqlplus                        # base path for the uploaded jars
    user: sqlplus
  data:
    path: /Users/sqlplus/data                   # base path for the input data
  result:
    type: web-ui

Spark configurations

  • Edit $SPARK_HOME/conf/spark-defaults.conf and add spark.master.rest.enabled true to the end.
  • Edit $SPARK_HOME/conf/log4j.properties and add the following configurations.
log4j.logger.SparkSQLPlusExperiment=INFO, SparkSQLPlus
log4j.appender.SparkSQLPlus=org.apache.log4j.ConsoleAppender
log4j.appender.SparkSQLPlus.target=System.out
log4j.appender.SparkSQLPlus.layout=org.apache.log4j.PatternLayout
log4j.appender.SparkSQLPlus.layout.ConversionPattern=%d{yy/MM/dd HH:mm:ss} %p %c{1}: %m%n

Build

Run mvn clean package.

Download data

Run bash examples/data/download.sh to download a graph from SNAP. It is also possible to use other input data as long as the columns are separated by commas.

Usage

The example queries are in examples/query.

Command line interface

sparksql-plus compiles the input SQL file into SparkSQL+ code.

syntax: sparksql-plus [OPTIONS] <query>
  options:
     -d,--ddl <path>           Set the path to the ddl file.
     -h,--help                 Show the help message.
     -n,--name <object name>   Set the object name for the output object.
     -o,--output <path>        Set the path to the output file.
     -p,--pkg <package name>   Set the package name for the output object.

The following command generates SparkSQL+ code for examples/query/q1.

./bin/sparksql-plus -d examples/query/q1/graph.ddl -o examples/query/q1/q1.scala examples/query/q1/query.sql

Web-based interface

Start the application

Run java -jar sqlplus-web/target/sparksql-plus-web-jar-with-dependencies.jar.

Access the web

Visit http://localhost:8848/ in the browser.

Compile a query

  • Submit a query.
  • Select a candidate.
  • Persist the generated SparkSQL+ code.

Run experiments

Click the Experiment tab on the top and submit the experiments. The result will be fetched and displayed automatically.