Skip to content

Latest commit

 

History

History
227 lines (170 loc) · 10.1 KB

STYLE_GUIDE.md

File metadata and controls

227 lines (170 loc) · 10.1 KB

SQL Style Guide

This document describes the style that Prettier plugin SQL-CST uses for formatting SQL.

  • ✅ - decided and implemented
  • ✔️ - decided, but not yet implemented
  • ❓ - undecided
  • 🔧 - can be disabled by setting sqlCanonicalSyntax: false

Generic whitespace

This is based on the defaults used by Prettier:

  • ✅ 2 spaces for indentation
  • ✅ UNIX \n for newlines
  • ✅ Final newline at the end of a file
  • ✅ Preserve single empty line between statements.

Upper & lower case

Keywords include literal values like NULL, TRUE, FALSE, but don't include function names that SQL dialects might themselves label as keywords (like sum(), count()), notably the parameter-less builtin functions like current_date aren't considered to be keywords. At the same time CAST(x AS type) is considered to be language construct and is therefore formatted in uppercase.

Aliases

Semicolons

  • ✅ Finish all statements with a semicolon. Rule_L052

Undecided:

  • ❓ Should single statement without a trailing semicolon be supported?

Strings

  • ✔️ Use single quotes for strings if possible.
  • ✔️ Use different quote style when string contains single quotes or is multi-line.

This might need to be configurable, though the support of single-quotes for strings should IMHO be available in all dialects.

Identifiers

  • ✔️ Don't unnecessarily quote an identifier. Rule_L059
  • ✔️ Always quote identifiers that are keywords. Rule_L029

For quoting style there are various options here:

  • Default to double quotes (SQL standard) Not possible in some dialects like BigQuery.
  • ❓ Default to the preferred/default quoting style of a dialect, like backticks in MySQL and double quotes in SQLite.
  • ❓ Preserve existing quoting style.

This definitely needs to be configurable.

Operators

  • ✅ Surround binary operators with whitespace: 1 + 2
  • ✅ Surround unary keyword-operators with whitespace: NOT foo
  • ✅ Don't add whitespace to unary punctuation-operators: -price
  • ✅ 🔧 Use != instead of <> for "not equal to" comparisons. Rule_L061
  • ✔️ Add linebreak before binary operator: Rule_L007
WHERE
  first_name = 'John'
  AND age_bracket BETWEEN 12 AND 18
  AND income > 1000

similarly:

SELECT
  sqrt(normalized_power)
  + interval_duration
  + rest_duration
  + average_speed

Comma spacing

  • ✅ No whitespace before comma
  • ✅ Single space after comma for inline list. Rule_L008
  • ✅ Newline after comma for multiline list
  • ✅ Forbid trailing commas in select clause. Rule_L038

Parenthesis spacing

  • ✅ No space between function name and arguments. my_func(1, 2, 3) Rule_L017
  • ✅ No space between single-word type name and arguments. VARCHAR(100)
  • ✅ Space between multi-word type name and arguments. UNSIGNED NUMERIC (10, 5)
  • ✅ Space after USING in JOIN: JOIN foo USING (col1)
  • ✅ Space after OPTIONS in BigQuery: OPTIONS (foo = bar)
  • ✅ Space between table/view/CTE name and columns list:
CREATE TABLE foo (
  id INT
);

CREATE VIEW my_view (col1, col2) AS
  SELECT 1, 2;

WITH my_cte (c1, c2) AS (SELECT 1, 2)
SELECT * FROM my_cte;

Parenthesis removal

  • ✅ Discard unnecessary nested parenthesis. ((1 + 2)) * 3 --> (1 + 2) * 3
  • ✅ Discard parenthesis around function arguments. count((id)) --> count(id)
  • ✔️ Top-level statements should not be wrapped in brackets. Rule_L053
  • ✔️ No parenthesis after DISTINCT. Rule_L015

Comments

  • ✅ Ensure space at the start of a line-comment: --comment --> -- comment

Syntax variations

Often SQL allows for syntax variations. For example ALTER COLUMN col_name or just ALTER col_name. It would be nice to enforce one consistent way of expressing this. We should take into account:

  • Which variant is preferred by the SQL standard
  • Which variant is most common across various SQL dialects

Some candidates:

  • ✅ 🔧 INSERT -> INSERT INTO (standard syntax)
  • ✅ 🔧 REPLACE -> REPLACE INTO (non-standard. More widely supported)
  • ✅ 🔧 DELETE -> DELETE FROM (standard syntax)
  • ✅ 🔧 TRUNCATE -> TRUNCATE TABLE (standard syntax.)
  • ✅ 🔧 MERGE -> MERGE INTO (standard syntax.)
  • ✅ 🔧 CREATE TEMP TABLE -> CREATE TEMPORARY TABLE (standard syntax)
  • ADD -> ADD COLUMN (more commonly supported)
  • DROP -> DROP COLUMN (more commonly supported)
  • ALTER -> ALTER COLUMN (more commonly supported)
  • ✅ 🔧 RENAME new_table_name -> RENAME TO new_table_name (more commonly supported)

A related case is the use of non-standard function names where standard alternatives are available, like:

  • NOW() -> CURRENT_TIMESTAMP()
  • CURDATE() -> CURRENT_DATE()

Another case is with some syntax where some keywords are effectively redundant, as it specifies the default behavior. For example:

  • UNION DISTINCT is the same as UNION. Some dialects only support UNION and UNION ALL, leaving out the DISTINCT. SQL standard includes both ALL and DISTINCT.
  • SELECT ALL is the same as SELECT.
  • ORDER BY foo ASC is the same as ORDER BY foo.
  • PostgreSQL table_name * is the same as just table_name. The * indicates that inherited tables are to be included, which is the default.

In these cases the additional redundant syntax can be useful to clarify the intention of the code. So the pretty-printer probably shouldn't eliminate such redundant syntax.

Dialect-specific

PostgreSQL

  • ✅ 🔧 Convert deprecated := named argument syntax to standard =>.
  • LIMIT ALL -> (it's the same as specifying no limit)

MySQL

  • ✅ 🔧 Convert SELECT DISTINCTROW to SELECT DISTINCT.
  • ✅ 🔧 Convert RENAME AS tbl to RENAME TO tbl.
  • ✔️ Convert CREATE TABLE foo (id INT KEY) to CREATE TABLE foo (id INT PRIMARY KEY)
  • ✅ 🔧 Convert &&, || to AND, OR.

Rules to possibly adopt from SQLFluff

  • Rule_L022: Blank line after CTE.
  • Rule_L033: UNION [DISTINCT|ALL] is preferred over just UNION.
  • Rule_L034: Select wildcards then simple targets before calculations and aggregates.
  • Rule_L036: Select targets should be on a new line unless there is only one select target.
  • Rule_L037: Ambiguous ordering directions for columns in order by clause.
  • Rule_L047: Use consistent syntax to express "count number of rows", e.g. count(*) instead of count(1)
  • Rule_L049: Comparisons with NULL should use IS or IS NOT.
  • Rule_L051: Join clauses should be fully qualified, e.g. INNER JOIN instead of plain JOIN.
  • Rule_L058: Nested CASE statement in ELSE clause could be flattened.
  • Rule_L067: Enforce consistent type casting style, e.g. use CAST(10 AS TEXT) instead of 10::TEXT or CONVERT(10, TEXT)

Test with SQLFluff online.