Provided by: sqlfluff_1.4.5-2_all
NAME
sqlfluff - SQLFluff 1.4.5 Bored of not having a good SQL linter that works with whichever dialect you're working with? Fluff is an extensible and modular linter designed to help you write good SQL and catch errors and bad SQL before it hits your database. Notable changes: • 0.1.x involved a major re-write of the parser, completely changing the behaviour of the tool with respect to complex parsing. • 0.2.x added templating support and a big restructure of rules and changed how users might interact with SQLFluff on templated code. • 0.3.x drops support for python 2.7 and 3.4, and also reworks the handling of indentation linting in a potentially not backward compatible way. • 0.4.x dropped python 3.5, added the dbt templater, source mapping and also introduced the python API. • 0.5.x introduced some breaking changes to the API. • 0.6.x introduced parallel processing, which necessitated a big re-write of several innards. • 0.7.x extracted the dbt templater to a separate plugin and removed the exasol_fs dialect (now merged in with the main exasol). • 0.8.x an improvement to the performance of the parser, a rebuild of the Jinja Templater, and a progress bar for the CLI. • 0.9.x refinement of the Simple API, dbt 1.0.0 compatibility, and the official SQLFluff Docker image. • 0.10.x removed support for older dbt versions < 0.20 and stopped fix attempting to fix unparsable SQL. • 0.11.x rule L030 changed to use extended_capitalisation_policy. • 0.12.x dialect is now mandatory, the spark3 dialect was renamed to sparksql and datatype capitalisation was extracted from L010 to it's own rule L063. • 0.13.x new rule for quoted literals, option to remove hanging indents in rule L003, and introduction of ignore_words_regex. • 1.0.0 first stable release, no major changes to take advantage of a point of relative stability. Want to see where and how people are using SQLFluff in their projects? Head over to SQLFluff in the Wild for inspiration.
GETTING STARTED
To get started just install the package, make a sql file and then run SQLFluff and point it at the file. For more details or if you don't have python or pip already installed see Getting Started. $ pip install sqlfluff $ echo " SELECT a + b FROM tbl; " > test.sql $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | L050 | Files must not begin with newlines or whitespace. L: 1 | P: 3 | L003 | First line has unexpected indent L: 1 | P: 11 | L039 | Unnecessary whitespace found. L: 1 | P: 14 | L039 | Unnecessary whitespace found. L: 1 | P: 27 | L001 | Unnecessary trailing whitespace.
CONTENTS
Getting Started To get started with SQLFluff you'll need python and pip installed on your machine, if you're already set up, you can skip straight to Installing sqlfluff. Installing Python How to install python and pip depends on what operating system you're using. In any case, the python wiki provides up to date instructions for all platforms here. There's a chance that you'll be offered the choice between python versions. Support for python 2 was dropped in early 2020, so you should always opt for a version number starting with a 3. As for more specific options beyond that, SQLFluff aims to be compatible with all current python versions, and so it's best to pick the most recent. You can confirm that python is working as expected by heading to your terminal or console of choice and typing python --version which should give you a sensible read out and not an error. $ python --version Python 3.9.1 For most people, their installation of python will come with pip (the python package manager) preinstalled. To confirm this you can type pip --version similar to python above. $ pip --version pip 21.3.1 from ... If however, you do have python installed but not pip, then the best instructions for what to do next are on the python website. Installing SQLFluff Assuming that python and pip are already installed, then installing SQLFluff is straight forward. $ pip install sqlfluff You can confirm its installation by getting SQLFluff to show its version number. $ sqlfluff version 1.4.5 Basic Usage To get a feel for how to use SQLFluff it helps to have a small .sql file which has a simple structure and some known issues for testing. Create a file called test.sql in the same folder that you're currently in with the following content: SELECT a+b AS foo, c AS bar from my_table You can then run sqlfluff lint test.sql --dialect ansi to lint this file. $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations | and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is | only one select target. L: 1 | P: 9 | L006 | Missing whitespace before + L: 1 | P: 9 | L006 | Missing whitespace after + L: 1 | P: 11 | L039 | Unnecessary whitespace found. L: 2 | P: 1 | L003 | Expected 1 indentations, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case. You'll see that SQLFluff has failed the linting check for this file. On each of the following lines you can see each of the problems it has found, with some information about the location and what kind of problem there is. One of the errors has been found on line 1, position * (as shown by :code:`L: 1 | P: 9`) and it's a problem with rule *L006 (for a full list of rules, see Rules Reference). From this (and the following error) we can see that the problem is that there is no space either side of the + symbol in a+b. Head into the file, and correct this issue so that the file now looks like this: SELECT a + b AS foo, c AS bar from my_table Rerun the same command as before, and you'll see that the original error (violation of L006) no longer shows up. $ sqlfluff lint test.sql --dialect ansi == [test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations | and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is | only one select target. L: 1 | P: 13 | L039 | Unnecessary whitespace found. L: 2 | P: 1 | L003 | Expected 1 indentations, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case. To fix the remaining issues, we're going to use one of the more advanced features of SQLFluff, which is the fix command. This allows more automated fixing of some errors, to save you time in sorting out your sql files. Not all rules can be fixed in this way and there may be some situations where a fix may not be able to be applied because of the context of the query, but in many simple cases it's a good place to start. For now, we only want to fix the following rules: L003, L009, L010 $ sqlfluff fix test.sql --rules L003,L009,L010 --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 2 | P: 1 | L003 | Expected 1 indentations, found 0 [compared to line 01] L: 2 | P: 10 | L010 | Keywords must be consistently upper case. ==== fixing violations ==== 2 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ...at this point you'll have to confirm that you want to make the changes by pressing y on your keyboard... Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm. If we now open up test.sql, we'll see the content is now different. SELECT a + b AS foo, c AS bar FROM my_table In particular: • The second line has been indented to reflect being inside the SELECT statement. • The FROM keyword has been capitalised to match the other keywords. • A final newline character has been added at the end of the file (which may not be obvious in the snippet above). We could also fix all of the fixable errors by not specifying --rules. $ sqlfluff fix test.sql --dialect ansi ==== finding violations ==== == [test.sql] FAIL L: 1 | P: 1 | L034 | Select wildcards then simple targets before calculations | and aggregates. L: 1 | P: 1 | L036 | Select targets should be on a new line unless there is | only one select target. L: 1 | P: 13 | L039 | Unnecessary whitespace found. ==== fixing violations ==== 3 fixable linting violations found Are you sure you wish to attempt to fix these? [Y/n] ... Attempting fixes... Persisting Changes... == [test.sql] PASS Done. Please check your files to confirm. If we now open up test.sql, we'll see the content has been updated again. SELECT c AS bar, a + b AS foo FROM my_table The SQL statement is now well formatted according to all the rules defined in SQLFluff. The --rules argument is optional, and could be useful when you or your organisation follows a slightly different convention than what we have defined. Custom Usage So far we've covered the stock settings of SQLFluff, but there are many different ways that people style their sql, and if you or your organisation have different conventions, then many of these behaviours can be configured. For example, given the example above, what if we actually think that indents should only be two spaces, and rather than uppercase keywords, they should all be lowercase? To achieve this we create a configuration file named .sqlfluff and place it in the same directory as the current file. In that file put the following content: [sqlfluff] dialect = ansi [sqlfluff:rules] tab_space_size = 2 [sqlfluff:rules:L010] capitalisation_policy = lower Then rerun the same command as before. $ sqlfluff fix test.sql --rules L003,L009,L010,L034,L036,L039 Then examine the file again, and you'll notice that the file has been fixed accordingly. select c as bar, a + b as foo from my_table For a full list of configuration options check out Default Configuration. To see how these options apply to specific rules check out the "Configuration" section within each rule's documentation in Rules Reference. Going further From here, there are several more things to explore. • To understand how SQLFluff is interpreting your file explore the parse command. You can learn more about that command and more by running sqlfluff --help or sqlfluff parse --help. • To start linting more than just one file at a time, experiment with passing SQLFluff directories rather than just single files. Try running sqlfluff lint . (to lint every sql file in the current folder) or sqlfluff lint path/to/my/sqlfiles. • To find out more about which rules are available, see Rules Reference. • To find out more about configuring SQLFluff and what other options are available, see Configuration. One last thing to note is that SQLFluff is a relatively new project and you may find bugs or strange things while using it. If you do find anything, the most useful thing you can do is to post the issue on GitHub where the maintainers of the project can work out what to do with it. The project is in active development and so updates and fixes may come out regularly. SQL in the Wild SQL has been around for a long time, as a language for communicating with databases, like a communication protocol. More recently with the rise of data as a business function, or a domain in its own right SQL has also become an invaluable tool for defining the structure of data and analysis - not just as a one off but as a form of infrastructure as code. As analytics transitions from a profession of people doing one-offs, and moves to building stable and reusable pieces of analytics, more and more principles from software engineering are moving in the analytics space. One of the best articulations of this is written in the viewpoint section of the docs for the open-source tool dbt. Two of the principles mentioned in that article are quality assurance and modularity. Quality assurance The primary aim of SQLFluff as a project is in service of that first aim of quality assurance. With larger and larger teams maintaining large bodies of SQL code, it becomes more and more important that the code is not just valid but also easily comprehensible by other users of the same codebase. One way to ensure readability is to enforce a consistent style, and the tools used to do this are called linters. Some famous linters which are well known in the software community are flake8 and jslint (the former is used to lint the SQLFluff project itself). SQLFluff aims to fill this space for SQL. Modularity SQL itself doesn't lend itself well to modularity, so to introduce some flexibility and reusability it is often templated. Typically this is done in the wild in one of the following ways: 1. Using the limited inbuilt templating abilities of a programming language directly. For example in python this would be using the format string syntax: "SELECT {foo} FROM {tbl}".format(foo="bar", tbl="mytable") Which would evaluate to: SELECT bar FROM mytable 2. Using a dedicated templating library such as jinja2. This allows a lot more flexibility and more powerful expressions and macros. See the Jinja Templating Configuration section for more detail on how this works. • Often there are tools like dbt or apache airflow which allow templated sql to be used directly, and they will implement a library like jinja2 under the hood themselves. All of these templating tools are great for modularity but they also mean that the SQL files themselves are no longer valid SQL code, because they now contain these configured placeholder values, intended to improve modularity. SQLFluff supports both of the templating methods outlined above, as well as dbt projects, to allow you to still lint these "dynamic" SQL files as part of your CI/CD pipeline (which is great 🙌), rather than waiting until you're in production (which is bad 🤦, and maybe too late). During the CI/CD pipeline (or any time that we need to handle templated code), SQLFluff needs additional info in order to interpret your templates as valid SQL code. You do so by providing dummy parameters in SQLFluff configuration files. When substituted into the template, these values should evaluate to valid SQL (so SQLFluff can check its style, formatting, and correctness), but the values don't need to match actual values used in production. This means that you can use much simpler dummy values than what you would really use. The recommendation is to use the simplest possible dummy value that still allows your code to evaluate to valid SQL so that the configuration values can be as streamlined as possible. Vision for SQLFluff SQLFluff has a few components: 1. A generic parser for SQL which aims to be able to unify SQL written in different dialects into a comparable format. The parser. 2. A mechanism for measuring written SQL against a set of rules, with the added ability to fix any violations found. The linter. 3. An opinionated set of guidelines for how SQL should be structured and formatted. The rules. The core vision [1] for SQLFluff is to be really good at being the linter. The reasoning for this is outlined in SQL in the Wild. Most of the codebase for SQLFluff is the parser, mostly because at the point of developing SQLFluff, there didn't appear to be a good option for a whitespace-aware parser that could be used instead. With regards to the rules, SQLFluff aims to be opinionated but it also accepts that many organisations and groups have pre-existing strong conventions around how to write SQL and so ultimately SQLFluff should be flexible enough to support whichever rule set a user wishes to. Notes [1] Credit to this article for highlighting the importance of a good vision. Rolling out SQLFluff with a new team Rolling out SQLFluff, like rolling out any other linter or style guide, is not just about the technical rollout, but also how you introduce the tool to the team and organisation around you. The effect of SQLFluff should be to change your behaviours, not just your SQL. With that in mind, it's worth reminding ourselves what we're trying to achieve with a tool like this. A set of potential success criteria might be: 1. Faster comprehension and collaboration by the team on a shared codebase. This includes more effective (and more enjoyable) code review on top of code which is easy to review and build upon. 2. Easier and faster onboarding for new team members. By adopting a style which is clean and consistent with other organisations we make it easier for new people to join the team. 3. Improved adoption of shared SQL from other sources. If the SQL found in open source projects is easy to read and looks familiar then you're more likely to use it. This means more reusable code across the industry. 4. Productive discussions around style. By defining your organisation's style guide in code, it means you can version control it, discuss changes and ultimately give a concrete output to discussions over style. You like leading commas? Make a PR to .sqlfluff and let's discuss with the team what the implications would be. Consider which of these success measures is most important and most desirable for your team. Write that down. The following steps are a guide, which you should adapt to your organisation, and in particular its level of data maturity. 1. Assess the situation This step is done by you, or a small group of people who already think that linting is a good idea. • Run sqlfluff lint on your project with the stock configuration to find out how things work out of the box. • Set up your Configuration so that things run and that you can get a readout of the errors which you would want the team to see and not the ones you don't. Great tools for this are to use Ignoring types of errors, --exclude-rules or --ignore in the CLI (see CLI Reference). • Identify which areas of your project are the worst and which are the tidiest. In particular, any areas which are particularly tidy already will be particularly useful in the next phase. 2. Make a plan There are three sensible rollout phases: 1. Pre CI/CD. 2. Soft CI/CD (warnings but no strict fails). 3. Hard CI/CD (violations mean deployments fail). In each of these phases you have three levers to play with: 1. Areas of the project in which to apply rules. 2. Depth of rules enforced (this might also include whether to ignore parsing errors or not). 3. Whether to just lint changes (Using SQLFluff on changes using diff-quality), or to lint all the existing code as well. Work out a sensible roadmap of how hard you want to go in each phase. Be clear who is responsible for changes at each phase. An example plan might look like this: 1. Pre CI/CD we get the tidiest area of a project to a stage that it fully passes the rules we eventually want to enforce. The core project team will do this. Liberal use of sqlfluff fix can be a lifesaver in this phase. 2. Soft CI/CD is applied to the whole project, team members are encouraged to write tidy SQL, but not required to. 3. Hard CI/CD is applied to the tidy areas of the project and also to any changes to the whole project. Anyone making changes is required to write SQL which passes check. 4. Hard CI/CD is applied to the whole project on not just changes, with only a few particularly problematic files explicitly ignored using Ignoring types of errors. 3. Build the need Bring your team together to introduce both linting as a concept and also SQLFluff as a tool. At this stage it's really important that the team understand *why* this is a good thing. Consider whether to discuss the whole plan from step 2, or whether to only talk about the first few steps. Aim to make this an empowering experience that everyone can get involved with rather than another piece of admin they need to do. At this stage, you might also want to consider other tools in the SQLFluff ecosystem such as the SQLFluff pre-commit hook and the SQLFluff VSCode plugin or SQLFluff online formatter. 4. Do, Review & Reassess Once the plan is in motion, make sure to start putting in place norms and rituals around how you change the rules. In particular: • How would someone suggest changing the style guide or enabling/disabling a rule? • How do we assess whether the changes are working for the team or whether some are creating unnecessary stress? It's normal for your usage of tools like SQLFluff to change and evolve over time. It's important to expect this change in advance, and welcome it when it happens. Always make sure you're driving toward the success measures you decided up front, rather than just resisting the change. 5. Spread the word 😁 Did it work? If so, spread the word. Tell a friend about SQLFluff. If you're lucky they might share your views on comma placement 🤷♀️. Let's talk about whitespace If there is one part of building a linter that is going to be controversial it's going to be whitespace (closely followed by cApiTaLiSaTiOn 😁). More specifically, whitespace divides into three key themes: 1. Spacing: The amount of whitespace between elements on the same line. 2. Line Breaks: The choice of where within the code it is inappropriate, appropriate or even compulsory to have a line break. 3. Indentation: Given a line break, how much whitespace should precede the first code element on that line. SQLFluff aims to be opinionated on this theme, but also configurable (see Configuring Layout). The tool will have a default viewpoint and will aim to have views on all of the important aspects of SQL layout, but if you (or your organisation) don't like those views then we aim to allow enough configuration that you can lint in line with your views, and still use SQLFluff. For more information on how to configure rules to your own viewpoint see Configuration. NOTE: This section of the docs handles the intent and reasoning behind how layout is handled by SQLFluff. For a deeper look at how this is achieved internally see Reflow Internals. Spacing Of the different elements of whitespace, spacing is likely the least controversial. By default, all elements are separated by a single space character. Except for very specific circumstances (see section on Aligned elements), any additional space between elements is usually unwanted and a distraction for the reader. There are however several common cases where no whitespace is more appropriate, which fall into two cases (for more details on where to configure these see Configuring layout and spacing). 1. No whitespace but a newline is allowed. This option is configured using the touch option in the spacing_* configuration settings. The most common example of this is the spacing around commas. For example SELECT a , b would be unusual and more normally be written SELECT a, b. Inserting a newline between the a and comma would not cause issues and may even be desired, for example: SELECT col_a , col_b -- Newline present before column , col_c -- When inline, comma should still touch element before. , GREATEST(col_d, col_e) as col_f FROM tbl_a 2. No whitespace and a newline is not allowed. This option is configured using the inline option in the spacing_* configuration settings. The most common example of this is spacing within the parts of qualified identifier e.g. my_schema.my_table. If a newline were present between the . and either my_schema or my_table, then the expression would not parse and so no newlines should be allowed. Aligned elements A special case of spacing is where elements are set to be aligned within some limits. This is not enabled by default, but can be be configured to achieve layouts like: SELECT a AS first_column, b AS second_column, (a + b) / 2 AS third_column FROM foo AS bar In this example, the alias expressions are all aligned with each other. To configure this, SQLFluff needs to know what elements to align and how far to search to find elements which should be aligned with each other. The configuration to achieve this layout is: [sqlfluff:layout:type:alias_expression] # We want non-default spacing _before_ the alias expressions. spacing_before = align # We want to align them within the next outer select clause. # This means for example that alias expressions within the FROM # or JOIN clause would _not_ be aligned with them. align_within = select_clause # The point at which to stop searching outward for siblings, which # in this example would likely be the boundary of a CTE. Stopping # when we hit brackets is usually a good rule of thumb for this # configuration. align_scope = bracketed Of these configuration values, the align_scope is potentially the least obvious. The following example illustrates the impact it has. -- With -- align_scope = bracketed -- align_within = select_clause WITH foo as ( SELECT a, b, c AS first_column d + e AS second_column ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar; -- With -- align_scope = bracketed -- align_within = statement WITH foo as ( SELECT a, b, c AS first_column d + e AS second_column ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar -- Now the FROM alias is also aligned. -- With -- align_scope = file -- align_within = select_clause WITH foo as ( SELECT a, b, c AS first_column -- Now the aliases here are aligned d + e AS second_column -- with the outer query. ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar -- With -- align_scope = file -- align_within = statement WITH foo as ( SELECT a, b, c AS first_column d + e AS second_column ) SELECT a AS first_column, (a + b) / 2 AS third_column FROM foo AS bar Line Breaks When controlling line breaks, we are trying to achieve a few different things: 1. Do we have enough line breaks that line length doesn't become excessive. Long lines are hard to read, especially given that readers may be on varying screen sizes or have multiple windows open. This is (of course) configurable, but the default is 80 characters (in line with the dbt Labs SQL style guide.) 2. Is the positioning of blank lines (i.e. lines with nothing other than whitespace on them) appropriate. There are some circumstances where a blank line is desired (e.g. between CTEs). There are others where they are not, in particular multiple blank lines, for example at the beginning of a file. 3. Where we do have line breaks, are they positioned appropriately and consistently with regards to other elements around them. This is most common when it comes to commas, and whether they should be leading (e.g. , my_column) or trailing (e.g. my_column,). In less common cases, it may also be desirable for some elements to have both a line break before and after (e.g. a set operator such as UNION). Indentation Lastly, given we have multiple lines of SQL, to what extent should we indent some lines to provide visual cues to the structure of that SQL. It's important to note that SQL is not whitespace sensitive in its interpretation and that means that any principles we apply here are entirely for the benefit of humans. Your database doesn't care. The indentation therefore should be treated as a hint to the reader of the structure of the code. This explains the common practice within most languages that nested elements (for example the contents of a set of brackets in a function call) should be indented one step from the outer elements. It's also convention that elements with the same level in a nested structure should have the same indentation, at least with regards to their local surroundings. As an example: SELECT nested_within_select AS first_column, some_function( nested_within_function, also_nested_within_function ) AS indented_the_same_as_opening_bracket FROM indented_the_same_as_select Comment Indents NOTE: The notes here about block comments are not implemented prior to 2.0.x. They should be coming in that release or soon after. Comments are dealt with differently, depending on whether they're block comments (/* like this */), which might optionally include newlines, or inline comments (-- like this) which are necessarily only on one line. • Block comments cannot share a line with any code elements (so in effect they must start on their own new line), they cannot be followed by any code elements on the same line (and so in effect must be followed by a newline, if we are to avoid trailing whitespace). None of the lines within the block comment may have an indent less than the first line of the block comment (although additional indentation within a comment is allowed), and that first line should be aligned with the first code element following the block comment. SELECT /* This is a block comment starting on a new line which contains a newline (continuing with at least the same indent. - potentially containing greater indents - having no other code following it in the same line - and aligned with the line of code following it */ this_column as what_we_align_the_column_to FROM my_table • Inline comments can be on the same line as other code, but are subject to the same line-length restrictions. If they don't fit on the same line (or if it just looks nicer) they can also be the only element on a line. In this latter case, they should be aligned with the first code element following the comment. SELECT -- This is fine this_column as what_we_align_to, another_column as something_short, -- Is ok case -- This is aligned correctly with below when indented then take_care else try_harder end as the_general_guidance -- Even here we align with the line below FROM my_table NOTE: When fixing issues with comment indentation, SQLFluff will attempt to keep comments in their original position but if line length concerns make this difficult, it will either abandon the fix, or move same line comments up and before the line they are currently on. This is in line with the assumption that comments on their own line refer to the elements of code which they come before, not after. Hanging Indents One approach to indenting nested elements is a layout called a hanging indent. In this layout, there is no line break before the first nested element, but subsequent elements are indented to match the line position of that first element. Two examples might be: -- A select statement with two hanging indents: SELECT no_line_break_before_me, indented_to_match_the_first, 1 + (a + b) AS another_more_complex_example FROM my_table; -- This TSQL example is also in essence a hanging indent: DECLARE @prv_qtr_1st_dt DATETIME, @last_qtr INT, @last_qtr_first_mn INT, @last_qtr_yr INT; In some circumstances this layout can be quite neat (the DECLARE statement is a good example of this), however once indents are nested or indentation styles are mixed it can rapidly become confusing (as partially shown in the first example). Additionally, unless the leading element of the first line is very short, hanging indents use much larger indents than a traditional simple indent where a line break is used before the first element. Hanging indents have been supported in SQLFluff up to the 1.x versions, however they will no longer by supported from 2.0.0 onwards. This is due to the ambiguity which they bring to fixing poorly formatted SQL. Take the following code: SELECT this_is, badly_formatted, code_and, not_obvious, what_was, intended FROM my_table Given the lack of line break between SELECT and this_is, it would appear that the user is intending a hanging indent, however it is also plausible that they did not and they just forgot to add a line break between them. This ambiguity is unhelpful, both for SQLFluff as a tool, but also for people who write SQL that there two ways of indenting their SQL. Given SQLFluff aims to provide consistency in SQL layout and remove some of the burden of needing to make choices like this - and that it would be very unusual to keep only hanging indents and disable traditional ones - the only route left to consistency is to not allow hanging indents. Starting in 2.0.0, any hanging indents detected will be converted to traditional indents. Templated Indents SQLFluff supports templated elements in code, such as those offered by jinja2 (or dbt which relies on it). For simple cases, templated elements are handled as you would expect by introducing additional indents into the layout. SELECT a, {% for n in ['b', 'c', 'd'] %} -- This section is indented relative to 'a' because -- it is inside a jinja for loop. {{ n }}, {% endfor %} e FROM my_table This functionality can be turned off if you wish using the template_blocks_indent option in your Configuration. It's important to note here, that SQLFluff lints the code after it has been rendered, and so only has access to code which is still present after that process. SELECT a, {% if False %} -- This section of the code cannot be linted because -- it is never rendered due to the `if False` condition. my + poorly + spaced - and/indented AS section_of_code {% endif %} e FROM my_table More complex templated cases are usually characterised by templated tags cutting across the parse tree. This more formally is where the opening and closing tags of a templated section exist at different levels in the parsed structure. Starting in version 2.x, these will be treated differently (Prior to version 2.x, situations like this were sometimes handled inconsistently or incorrectly). Indentation should act as a visual cue to the structure of the written SQL, and as such, the most important thing is that template tags belonging to the same block structure use the same indentation. In the example below, this is the opening and closing elements of the second if statement. If treated as a simple case, these tags would have different indents, because they are at different levels of the parse tree and so clearly there is a conflict to be resolved. The view SQLFluff takes on how to resolve this conflict is to pull all of the tags in this section down to the indent of the least indented (in the example below that would be the closing endif tag). This is similar to the treatment of C Preprocessor Directives, which are treated somewhat as being outside the structure of the rest of the file. In these cases, the content is also not further indented as in the simple case because it makes it harder to line up elements within the affected section and outside (in the example below the SELECT and FROM are a good illustration). SELECT a, {% if True %} -- This is a simple case. The opening and closing tag are -- both at the same level within the SELECT clause. simple_case AS example, {% endif %} b, {% if True %} -- This is a complex case. The opening tag is within the SELECT -- clause, but the closing tag is outside the statement -- entirely. complex_case AS example FROM table_option_one {% else %} complex_case_two AS example FROM table_option_two {% endif %} Configuring Layout Configuration for layout is spread across three places: 1. Indent behavior for particular dialect elements is controlled by the parser. This is because in the background SQLFluff inserts Indent and Dedent tokens into the parse tree where those things are expected. For more detail see Configuring indent locations. 2. Configuration for the spacing and line position of particular types of element (such as commas or operators) is set in the layout section of the config file. For more detail see Configuring layout and spacing. 3. Some elements of layout are still controlled by rules directly. These are usually very specific cases, see Rules Reference for more details. Configuring indent locations One of the key areas for this is the indentation of the JOIN expression, which we'll use as an example. Semantically, a JOIN expression is part of the FROM expression and therefore would be expected to be indented. However according to many of the most common SQL style guides (including the dbt Labs SQL style guide and the Mozilla SQL style guide) the JOIN keyword is expected to at the same indent as the FROM keyword. By default, SQLFluff sides with the current consensus, which is to not indent the JOIN keyword, however this is one element which is configurable. By setting values in the sqlfluff:indentation section of your config file you can control how this is parsed. For example, the default indentation would be as follows: SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2 By setting your config file to: [sqlfluff:indentation] indented_joins = True Then the expected indentation will be: SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2 There is a similar indented_using_on config (defaulted to True) which can be set to False to prevent the USING or ON clause from being indented, in which case the original SQL would become: SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2 There is also a similar indented_on_contents config (defaulted to True) which can be set to False to align any AND subsections of an ON block with each other. If set to False the original SQL would become: SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2 These can also be combined, so if indented_using_on config is set to False, and indented_on_contents is also set to False then the SQL would become: SELECT a, b FROM my_table JOIN another_table ON condition1 AND condition2 There is also a similar indented_ctes config (defaulted to False) which can be set to True to enforce CTEs to be indented within the WITH clause: WITH some_cte AS ( SELECT 1 FROM table1 ), some_other_cte AS ( SELECT 1 FROM table1 ) SELECT 1 FROM some_cte By default, SQLFluff aims to follow the most common approach to indentation. However, if you have other versions of indentation which are supported by published style guides, then please submit an issue on GitHub to have that variation supported by SQLFluff. Configuring layout and spacing The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. The syntax of this section is very expressive; however in normal use, only very small alterations should be necessary from the Default Configuration. The syntax of the section headings here select by type, which corresponds to the type defined in the dialect. For example the following section applies to elements of the type comma, i.e. ,. [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing Within these configurable sections there are a few key elements which are available: • Spacing Elements: spacing_before, spacing_after and spacing_within. For each of these options, there are a few possible settings: • The default spacing for all elements is single unless otherwise specified. In this state, elements will be spaced with a single space character unless there is a line break between them. • The value of touch allows line breaks, but if no line break is present, then no space should be present. A great example of this is the spacing before commas (as shown in the config above), where line breaks may be allowed, but if not they should touch the element before. • The value of inline is effectively the same as touch but in addition, no line breaks are allowed. This is best illustrated by the spacing found in a qualified identifier like my_schema.my_table. • Line Position: set using the line_position option. By default this is unset, which implies no particular line position requirements. The available options are: • trailing and leading, which are most common in the placement of commas. Both of these settings also allow the option of a comma on its own on a line, or in the middle of a line, but if there is a line break on either side then they make sure it's on the correct side. By default we assume trailing commas, but if you (or your organisation) have settled on leading commas then you should add the following section to your config: [sqlfluff:layout:type:comma] line_position = leading • alone, which means if there is a line break on either side, then there must be a line break on both sides (i.e. that it should be the only thing on that line. • All of the above options can be qualified with the :strict modifier - which prevents the inline case. For example: -- Setting line_position to just `alone` -- within [sqlfluff:layout:type:set_operator] -- would not allow: SELECT a UNION SELECT b; -- ...or... SELECT a UNION SELECT b; -- but *would* allow both of the following: SELECT a UNION SELECT b; SELECT a UNION SELECT b; -- However the default is set to `alone:strict` -- then the *only* acceptable configuration is: SELECT a UNION SELECT b; Rules Reference Rules in SQLFluff are implemented as crawlers. These are entities which work their way through the parsed structure of a query to evaluate a particular rule or set of rules. The intent is that the definition of each specific rule should be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away. Core Rules Certain rules belong to the core rule group. In order for a rule to be designated as core, it must meet the following criteria: • Stable • Applies to most dialects • Could detect a syntax issue • Isn’t too opinionated toward one style (e.g. the dbt style guide) Core rules can also make it easier to roll out SQLFluff to a team by only needing to follow a 'common sense' subset of rules initially, rather than spending time understanding and configuring all the rules, some of which your team may not necessarily agree with. We believe teams will eventually want to enforce more than just the core rules, and we encourage everyone to explore all the rules and customize a rule set that best suites their organization. See the Configuration section for more information on how to enable only core rules by default. Specific Rules Inline Ignoring Errors SQLFluff features inline error ignoring. For example, the following will ignore the lack of whitespace surrounding the * operator. a.a*a.b AS bad_1 -- noqa: L006 Multiple rules can be ignored by placing them in a comma-delimited list. a.a * a.b AS bad_2, -- noqa: L007, L006 It is also possible to ignore non-rule based errors, and instead opt to ignore templating (TMP) & parsing (PRS) errors. WHERE col1 = 2 AND dt >= DATE_ADD(CURRENT_DATE(), INTERVAL -2 DAY) -- noqa: PRS NOTE: It should be noted that ignoring TMP and PRS errors can lead to incorrect sqlfluff lint and sqfluff fix results as SQLFluff can misinterpret the SQL being analysed. Should the need arise, not specifying specific rules to ignore will ignore all rules on the given line. a.a*a.b AS bad_3 -- noqa Ignoring line ranges Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive. -- Ignore rule L012 from this line forward SELECT col_a a FROM foo -- noqa: disable=L012 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all Dialects Reference SQLFluff is designed to be flexible in supporting a variety of dialects. Not all potential dialects are supported so far, but several have been implemented by the community. Below are a list of the currently available dialects. Each inherits from another, up to the root ansi dialect. For a canonical list of supported dialects, run the sqlfluff dialects command, which will output a list of the current dialects available on your installation of SQLFluff. NOTE: For technical users looking to add new dialects or add new features to existing ones, the dependent nature of how dialects have been implemented is to try and reduce the amount of repetition in how different elements are defined. As an example, when we say that the Redshift dialect inherits from the PostgreSQL dialect this is not because there is an agreement between those projects which means that features in one must end up in the other, but that the design of the Redshift dialect was heavily inspired by the postgres dialect and therefore when defining the dialect within sqlfuff it makes sense to use PostgreSQL as a starting point rather than starting from scratch. Consider when adding new features to a dialect: • Should I be adding it just to this dialect, or adding it to a parent dialect? • If I'm creating a new dialect, which dialect would be best to inherit from? • Will the feature I'm adding break any downstream dependencies within dialects which inherit from this one? ANSI This is the base dialect which holds most of the definitions of common SQL commands and structures. If the dialect which you're actually using isn't specifically implemented by SQLFluff, using this dialect is a good place to start. This dialect doesn't intend to be brutal in adhering to (and only to) the ANSI SQL spec (mostly because ANSI charges for access to that spec). It aims to be a representation of vanilla SQL before any other project adds their spin to it, and so may contain a slightly wider set of functions than actually available in true ANSI SQL. Athena The dialect for Amazon Athena. BigQuery The dialect for Google BigQuery. ClickHouse The dialect for ClickHouse. Databricks The dialect Databricks is an alias for the SparkSQL. Since Databricks builds on top of Apache Spark, the Spark SQL dialect holds most of the definitions of common commands and structures. Specifics to Databricks, such as Delta Live Table syntax, are added to the Spark SQL dialect to simplify implementation and prevent code duplication for minor syntax updates. This follows SQLFluff's philosophy of not being strict in adhering to dialect specifications to permit slightly wider set of functions than actually available in a given dialect. Db2 The dialect for Db2. Exasol The dialect for Exasol. Hive The dialect for Hive. Materialize The dialect for Materialize. MySQL The dialect for MySQL. Oracle The dialect for Oracle SQL. Note: this does not include PL/SQL. PostgreSQL This is based around the PostgreSQL spec. Many other SQL instances are often based on PostreSQL syntax. If you're running an unsupported dialect, then this is often the dialect to use (until someone makes a specific dialect). Redshift The dialect for Amazon Redshift. Snowflake The dialect for Snowflake, which has much of its syntax inherited from PostgreSQL. SOQL The dialect for SOQL (Salesforce Object Query Language). SparkSQL The dialect for Apache Spark SQL. It inherits from ANSI and includes relevant syntax from Hive for commands that permit Hive Format. Spark SQL extensions provided by the Delta Lake project are also implemented in this dialect. This implementation focuses on the Ansi Compliant Mode introduced in Spark3, instead of being Hive Compliant. The introduction of ANSI Compliance provides better data quality and easier migration from traditional DBMS. Versions of Spark prior to 3.x will only support the Hive dialect. SQLite The dialect for SQLite. T-SQL The dialect for T-SQL (aka Transact-SQL). Teradata The dialect for Teradata. Production Usage SQLFluff is designed to be used both as a utility for developers but also to be part of CI/CD pipelines. Using SQLFluff on a whole sql codebase The exit code provided by SQLFluff when run as a command line utility is designed to assist usefulness in deployment pipelines. If no violations are found then the exit code will be 0. If violations are found then a non-zero code will be returned which can be interrogated to find out more. • At the moment all error states related to linting return 65. • An error as a result of a SQLFluff internal error will return 1. Using SQLFluff on changes using diff-quality For projects with large amounts of (potentially imperfect) SQL code, the full SQLFluff output could be very large, which can be distracting -- perhaps the CI build for a one-line SQL change shouldn't encourage the developer to fix lots of unrelated quality issues. To support this use case, SQLFluff integrates with a quality checking tool called diff-quality. By running SQLFluff using diff-quality (rather than running it directly), you can limit the the output to the new or modified SQL in the branch (aka pull request or PR) containing the proposed changes. Currently, diff-quality requires that you are using git for version control. NOTE: Installing SQLFluff automatically installs the diff_cover package that provides the diff-quality tool. Adding diff-quality to your builds In your CI build script: 1. Set the current working directory to the git repository containing the SQL code to be checked. 2. Run diff-quality, specifying SQLFluff as the underlying tool: $ diff-quality --violations sqlfluff The output will look something like: ------------- Diff Quality Quality Report: sqlfluff Diff: origin/master...HEAD, staged and unstaged changes ------------- sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql (0.0%): sql/audience_size_queries/constraints/_postcondition_check_gdpr_compliance.sql:5: Unquoted Identifiers must be consistently upper case. ------------- Total: 1 line Violations: 1 line % Quality: 0% ------------- These messages are basically the same as those provided directly by SQLFluff, although the format is a little different. Note that diff-quality only lists the line _numbers_, not the character position. If you need the character position, you will need to run SQLFluff directly. For more information on diff-quality, see the documentation. It covers topics such as: • Generating HTML reports • Controlling which branch to compare against (i.e. to determine new/changed lines). The default is origin/master. • Configuring diff-quality to return an error code if the quality is too low • Troubleshooting Using pre-commit pre-commit is a framework to manage git "hooks" triggered right before a commit is made. A git hook is a git feature to "fire off custom scripts" when specific actions occur. Using pre-commit with SQLFluff is a good way to provide automated linting to SQL developers. With pre-commit, you also get the benefit of only linting/fixing the files that changed. SQLFluff comes with two pre-commit hooks: • sqlfluff-lint: returns linting errors. • sqlfluff-fix: attempts to fix rule violations. WARNING: For safety reasons, sqlfluff-fix by default will not make any fixes in files that had templating or parse errors, even if those errors were ignored using noqa or --ignore`. Although it is not advised, you can tell SQLFluff to try and fix these files by overriding the fix_even_unparsable setting in .sqlfluff config file or using the sqlfluff fix --FIX-EVEN-UNPARSABLE command line option. Overriding this behavior may break your SQL. If you use this override, always be sure to review any fixes applied to files with templating or parse errors to verify they are okay. You should create a file named .pre-commit-config.yaml at the root of your git project, which should look like this: repos: - repo: https://github.com/sqlfluff/sqlfluff rev: 1.4.5 hooks: - id: sqlfluff-lint # For dbt projects, this installs the dbt "extras". # You will need to select the relevant dbt adapter for your dialect # (https://docs.getdbt.com/docs/available-adapters): # additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt'] - id: sqlfluff-fix # Arbitrary arguments to show an example # args: [--rules, "L003,L014"] # additional_dependencies: ['<dbt-adapter>', 'sqlfluff-templater-dbt'] When trying to use the dbt templater, uncomment the additional_dependencies to install the extras. This is equivalent to running pip install <dbt-adapter> sqlfluff-templater-dbt. You can specify the version of dbt-adapter used in pre-commit, for example: additional_dependencies : ['dbt-bigquery==1.0.0', 'sqlfluff-templater-dbt'] See the list of available dbt-adapters. Note that you can pass the same arguments available through the CLI using args:. Using GitHub Actions to Annotate PRs There are two way to utilize SQLFluff to annotate Github PRs. 1. When sqlfluff lint is run with the --format github-annotation-native option, it produces output formatted as Github workflow commands which are converted into pull request annotations by Github. 2. When sqlfluff lint is run with the --format github-annotation option, it produces output compatible with this action. Which uses Github API to annotate the SQL in GitHub pull requests. For more information and examples on using SQLFluff in GitHub Actions, see the sqlfluff-github-actions repository. Configuration SQLFluff accepts configuration either through the command line or through configuration files. There is rough parity between the two approaches with the exception that templating configuration must be done via a file, because it otherwise gets slightly complicated. For details of what's available on the command line check out the CLI Reference. For file based configuration SQLFluff will look for the following files in order. Later files will (if found) will be used to overwrite any values read from earlier files. • setup.cfg • tox.ini • pep8.ini • .sqlfluff • pyproject.toml Within these files, the first four will be read like a cfg file, and SQLFluff will look for sections which start with sqlfluff, and where subsections are delimited by a semicolon. For example the jinjacontext section will be indicated in the section started with [sqlfluff:jinjacontext]. For example, a snippet from a .sqlfluff file (as well as any of the supported cfg file types): [sqlfluff] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [sqlfluff:indentation] indented_joins = false indented_using_on = true template_blocks_indent = false [sqlfluff:templater] unwrap_wrapped_queries = true [sqlfluff:templater:jinja] apply_dbt_builtins = true For the pyproject.toml file, all valid sections start with tool.sqlfluff and subsections are delimited by a dot. For example the jinjacontext section will be indicated in the section started with [tool.sqlfluff.jinjacontext]. For example, a snippet from a pyproject.toml file: [tool.sqlfluff.core] templater = jinja sql_file_exts = .sql,.sql.j2,.dml,.ddl [tool.sqlfluff.indentation] indented_joins = false indented_using_on = true template_blocks_indent = false [tool.sqlfluff.templater] unwrap_wrapped_queries = true [tool.sqlfluff.templater.jinja] apply_dbt_builtins = true Nesting SQLFluff uses nesting in its configuration files, with files closer overriding (or patching, if you will) values from other files. That means you'll end up with a final config which will be a patchwork of all the values from the config files loaded up to that path. The exception to this is the value for templater, which cannot be set in config files in subdirectories of the working directory. You don't need any config files to be present to make SQLFluff work. If you do want to override any values though SQLFluff will use files in the following locations in order, with values from later steps overriding those from earlier: 0. [...and this one doesn't really count] There's a default config as part of the SQLFluff package. You can find this below, in the Default Configuration section. 1. It will look in the user's os-specific app config directory. On OSX this is ~/Library/Preferences/sqlfluff, Unix is ~/.config/sqlfluff, Windows is <home>\AppData\Local\sqlfluff\sqlfluff, for any of the filenames above in the main Configuration section. If multiple are present, they will patch/override each other in the order above. 2. It will look for the same files in the user's home directory (~). 3. It will look for the same files in the current working directory. 4. [if parsing a file in a subdirectory of the current working directory] It will look for the same files in every subdirectory between the current working dir and the file directory. 5. It will look for the same files in the directory containing the file being linted. This whole structure leads to efficient configuration, in particular in projects which utilise a lot of complicated templating. In-File Configuration Directives In addition to configuration files mentioned above, SQLFluff also supports comment based configuration switching in files. This allows specific SQL file to modify a default configuration if they have specific needs. When used, these apply to the whole file, and are parsed from the file in an initial step before the rest of the file is properly parsed. This means they can be used for both rule configuration and also for parsing configuration. To use these, the syntax must start as an inline sql comment beginning with sqlfluff (i.e. -- sqlfluff). The line is then interpreted as a colon-seperated address of the configuation value you wish to set. A few common examples are shown below: -- Set Indented Joins -- sqlfluff:indentation:indented_joins:true -- Set a smaller indent for this file -- sqlfluff:indentation:tab_space_size:2 -- Set keywords to be capitalised -- sqlfluff:rules:L010:capitalisation_policy:upper SELECT * FROM a JOIN b USING(c) We recommend only using this configuration approach for configuration that applies to one file in isolation. For configuration changes for areas of a project or for whole projects we recommend Nesting of configuration files. Rule Configuration Rules can be configured with the .sqlfluff config files. Common rule configurations can be set in the [sqlfluff:rules] section. For example: [sqlfluff:rules] tab_space_size = 4 max_line_length = 80 indent_unit = space allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all Rule specific configurations are set in rule specific subsections. For example, enforce that keywords are upper case by configuring the rule L010: [sqlfluff:rules:L010] # Keywords capitalisation_policy = upper All possible options for rule sections are documented in Rules Reference. For an overview of the most common rule configurations that you may want to tweak, see Default Configuration (and use Rules Reference to find the available alternatives). Enabling and Disabling Rules To disable individual rules, set exclude_rules in the top level section of sqlfluff configuration. The value is a comma separated list of rule ids. For example, to disable the rules L022 and L027: [sqlfluff] exclude_rules = L022, L027 To enable individual rules, configure rules, respectively. For example, to enable L027: [sqlfluff] rules = L027 If both exclude_rules and rules have non-empty value, then the excluded rules are removed from the rules list. This allows for example enabling common rules on top level but excluding some on subdirectory level. Rules can also be enabled/disabled by their grouping. Right now, the only rule grouping is core. This will enable (or disable) a select group of rules that have been deemed 'core rules'. [sqlfluff] rules = core More information about 'core rules' can be found in the Rules Reference. Additionally, some rules have a special force_enable configuration option, which allows to enable the given rule even for dialects where it is disabled by default. The rules that support this can be found in the Rules Reference. The default values can be seen in Default Configuration. See also: Ignoring Errors & Files. Downgrading rules to warnings To keep displaying violations for specific rules, but not have those issues lead to a failed run, rules can be downgraded to warnings. Rules set as warnings won't cause a file to fail, but will still be shown in the CLI to warn users of their presence. The configuration of this behaves very like exclude_rules above: [sqlfluff] warnings = L019, L007 With this configuration, files with no other issues (other than those set to warn) will pass. If there are still other issues, then the file will still fail, but will show both warnings and failures. == [test.sql] PASS L: 2 | P: 9 | L006 | WARNING: Missing whitespace before + == [test2.sql] FAIL L: 2 | P: 8 | L014 | Unquoted identifiers must be consistently upper case. L: 2 | P: 11 | L006 | WARNING: Missing whitespace before + This is particularly useful as a transitional tool when considering the introduction on new rules on a project where you might want to make users aware of issues without blocking their workflow (yet). Layout & Spacing Configuration The [sqlfluff:layout] section of the config controls the treatment of spacing and line breaks across all rules. To understand more about this section, see the section of the docs dedicated to layout: Configuring Layout. Jinja Templating Configuration When thinking about Jinja templating there are two different kinds of things that a user might want to fill into a templated file, variables and functions/macros. Currently functions aren't implemented in any of the templaters. Variable Templating Variables are available in the jinja, python and placeholder templaters. By default the templating engine will expect variables for templating to be available in the config, and the templater will be look in the section corresponding to the context for that templater. By convention, the config for the jinja templater is found in the sqlfluff:templater:jinja:context section, the config for the *python templater is found in the sqlfluff:templater:python:context section, the one for the placeholder templater is found in the sqlfluff:templater:placeholder:context section For example, if passed the following .sql file: SELECT {{ num_things }} FROM {{ tbl_name }} WHERE id > 10 LIMIT 5 ...and the following configuration in .sqlfluff in the same directory: [sqlfluff:templater:jinja:context] num_things=456 tbl_name=my_table ...then before parsing, the sql will be transformed to: SELECT 456 FROM my_table WHERE id > 10 LIMIT 5 NOTE: If there are variables in the template which cannot be found in the current configuration context, then this will raise a SQLTemplatingError and this will appear as a violation without a line number, quoting the name of the variable that couldn't be found. Placeholder templating Libraries such as SQLAlchemy or Psycopg use different parameter placeholder styles to mark where a parameter has to be inserted in the query. For example a query in SQLAlchemy can look like this: SELECT * FROM table WHERE id = :myid At runtime :myid will be replace by a value provided by the application and escaped as needed, but this is not standard SQL and cannot be parsed as is. In order to parse these queries is then necessary to replace these placeholders with sample values, and this is done with the placeholder templater. Placeholder templating can be enabled in the config using: [sqlfluff] templater = placeholder A few common styles are supported: -- colon WHERE bla = :my_name -- colon_nospaces -- (use with caution as more prone to false positives) WHERE bla = table:my_name -- numeric_colon WHERE bla = :2 -- pyformat WHERE bla = %(my_name)s -- dollar WHERE bla = $my_name or WHERE bla = ${my_name} -- question_mark WHERE bla = ? -- numeric_dollar WHERE bla = $3 or WHERE bla = ${3} -- percent WHERE bla = %s -- ampersand WHERE bla = &s or WHERE bla = &{s} or USE DATABASE MARK_{ENV} These can be configured by setting param_style to the names above: [sqlfluff:templater:placeholder] param_style = colon my_name = 'john' then it is necessary to set sample values for each parameter, like my_name above. Notice that the value needs to be escaped as it will be replaced as a string during parsing. When parameters are positional, like question_mark, then their name is simply the order in which they appear, starting with 1. [sqlfluff:templater:placeholder] param_style = question_mark 1 = 'john' In case you need a parameter style different from the ones above, you can pass a custom regex. [sqlfluff:templater:placeholder] param_regex = __(?P<param_name>[\w_]+)__ my_name = 'john' N.B. quotes around param_regex in the config are interpreted literally by the templater. e.g. param_regex='__(?P<param_name>[w_]+)__' matches '__some_param__' not __some_param__ the named parameter param_name will be used as the key to replace, if missing, the parameter is assumed to be positional and numbers are used instead. Also consider making a pull request to the project to have your style added, it may be useful to other people and simplify your configuration. Complex Variable Templating Two more advanced features of variable templating are case sensitivity and native python types. Both are illustrated in the following example: [sqlfluff:templater:jinja:context] my_list = ['a', 'b', 'c'] MY_LIST = ("d", "e", "f") my_where_dict = {"field_1": 1, "field_2": 2} SELECT {% for elem in MY_LIST %} '{{elem}}' {% if not loop.last %}||{% endif %} {% endfor %} as concatenated_list FROM tbl WHERE {% for field, value in my_where_dict.items() %} {{field}} = {{value}} {% if not loop.last %}and{% endif %} {% endfor %} ...will render as... SELECT 'd' || 'e' || 'f' as concatenated_list FROM tbl WHERE field_1 = 1 and field_2 = 2 Note that the variable was replaced in a case sensitive way and that the settings in the config file were interpreted as native python types. Macro Templating Macros (which also look and feel like functions are available only in the jinja templater. Similar to Variable Templating, these are specified in config files, what's different in this case is how they are named. Similar to the context section above, macros are configured separately in the macros section of the config. Consider the following example. If passed the following .sql file: SELECT {{ my_macro(6) }} FROM some_table ...and the following configuration in .sqlfluff in the same directory (note the tight control of whitespace): [sqlfluff:templater:jinja:macros] a_macro_def = {% macro my_macro(n) %}{{ n }} + {{ n * 2 }}{% endmacro %} ...then before parsing, the sql will be transformed to: SELECT 6 + 12 FROM some_table Note that in the code block above, the variable name in the config is a_macro_def, and this isn't apparently otherwise used anywhere else. Broadly this is accurate, however within the configuration loader this will still be used to overwrite previous values in other config files. As such this introduces the idea of config blocks which could be selectively overwritten by other configuration files downstream as required. In addition to macros specified in the config file, macros can also be loaded from files or folders. This is specified in the config file: [sqlfluff:templater:jinja] load_macros_from_path = my_macros load_macros_from_path is a comma-separated list of .sql files or folders. Locations are relative to the config file. For example, if the config file above was found at /home/my_project/.sqlfluff, then SQLFluff will look for macros in the folder /home/my_project/my_macros/ (but not subfolders). Any macros defined in the config will always take precedence over a macro defined in the path. • .sql files: Macros in these files are available in every .sql file without requiring a Jinja include or import. • Folders: To use macros from the .sql files in folders, use Jinja include or import as explained below. Note: The load_macros_from_path setting also defines the search path for Jinja include or import. Unlike with macros (as noted above), subdirectories are supported. For example, if load_macros_from_path is set to my_macros, and there is a file my_macros/subdir/my_file.sql, you can do: {% include 'subdir/include_comment.sql' %} NOTE: Throughout the templating process whitespace will still be treated rigorously, and this includes newlines. In particular you may choose to provide dummy macros in your configuration different from the actual macros used in production. REMEMBER: The reason SQLFluff supports macros is to enable it to parse templated sql without it being a blocker. It shouldn't be a requirement that the templating is accurate - it only needs to work well enough that parsing and linting are helpful. Builtin Macro Blocks One of the main use cases which inspired SQLFluff as a project was dbt. It uses jinja templating extensively and leads to some users maintaining large repositories of sql files which could potentially benefit from some linting. NOTE: SQLFluff has now a tighter integration with dbt through the "dbt" templater. It is the recommended templater for dbt projects. If used, it eliminates the need for the overrides described in this section. To use the dbt templater, go to dbt Project Configuration. SQLFluff anticipates this use case and provides some built in macro blocks in the Default Configuration which assist in getting started with dbt projects. In particular it provides mock objects for: • ref: The mock version of this provided simply returns the model reference as the name of the table. In most cases this is sufficient. • config: A regularly used macro in dbt to set configuration values. For linting purposes, this makes no difference and so the provided macro simply returns nothing. NOTE: If there are other builtin macros which would make your life easier, consider submitting the idea (or even better a pull request) on github. Interaction with --ignore=templating Ignoring Jinja templating errors provides a way for users to use SQLFluff while reducing or avoiding the need to spend a lot of time adding variables to [sqlfluff:templater:jinja:context]. When --ignore=templating is enabled, the Jinja templater behaves a bit differently. This additional behavior is usually but not always helpful for making the file at least partially parsable and fixable. It definitely doesn’t guarantee that every file can be fixed, but it’s proven useful for some users. Here's how it works: • Within the expanded SQL, undefined variables are automatically replaced with the corresponding string value. • If you do: {% include query %}, and the variable query is not defined, it returns a “file” containing the string query. • If you do: {% include "query_file.sql" %}, and that file does not exist or you haven’t configured a setting for load_macros_from_path, it returns a “file” containing the text query_file. For example: select {{ my_variable }} from {% include "my_table.sql" %} is interpreted as: select my_variable from my_table The values provided by the Jinja templater act a bit (not exactly) like a mixture of several types: • str • int • list • Jinja's Undefined class Because the values behave like Undefined, it's possible to replace them using Jinja's default() filter. For example: select {{ my_variable | default("col_a") }} from my_table is interpreted as: select col_a from my_table Library Templating If using SQLFluff for dbt with jinja as your templater, you may have library function calls within your sql files that can not be templated via the normal macro templating mechanisms: SELECT foo, bar FROM baz {{ dbt_utils.group_by(2) }} To template these libraries, you can use the sqlfluff:jinja:library_path config option: [sqlfluff:templater:jinja] library_path = sqlfluff_libs This will pull in any python modules from that directory and allow sqlfluff to use them in templates. In the above example, you might define a file at sqlfluff_libs/dbt_utils.py as: def group_by(n): return "GROUP BY 1,2" If an __init__.py is detected, it will be loaded alongside any modules and submodules found within the library path. SELECT {{ custom_sum('foo', 'bar') }}, {{ foo.bar.another_sum('foo', 'bar') }} FROM baz sqlfluff_libs/__init__.py: def custom_sum(a: str, b: str) -> str: return a + b sqlfluff_libs/foo/__init__.py: # empty file sqlfluff_libs/foo/bar.py: def another_sum(a: str, b: str) -> str: return a + b dbt Project Configuration NOTE: From sqlfluff version 0.7.0 onwards, the dbt templater has been moved to a separate plugin and python package. Projects that were already using the dbt templater may initially fail after an upgrade to 0.7.0+. See the installation instructions below to install the dbt templater. dbt templating is still a relatively new feature added in 0.4.0 and is still in very active development! If you encounter an issue, please let us know in a GitHub issue or on the SQLFluff slack workspace. dbt is not the default templater for SQLFluff (it is jinja). dbt is a complex tool, so using the default jinja templater will be simpler. You should be aware when using the dbt templater that you will be exposed to some of the complexity of dbt. Users may wish to try both templaters and choose according to how they intend to use SQLFluff. A simple rule of thumb might be: • If you are using SQLFluff in a CI/CD context, where speed is not critical but accuracy in rendering sql is, then the dbt templater may be more appropriate. • If you are using SQLFluff in an IDE or on a git hook, where speed of response may be more important, then the jinja templater may be more appropriate. Pros: • Most (potentially all) macros will work Cons: • More complex, e.g. using it successfully may require deeper understanding of your models and/or macros (including third-party macros) • More configuration decisions to make • Best practices are not yet established or documented • If your dbt model files access a database at compile time, using SQLFluff with the dbt templater will also require access to a database. • Note that you can often point SQLFluff and the dbt templater at a test database (i.e. it doesn't have to be the production database). • Runs slower Installation & Configuration In order to get started using SQLFluff with a dbt project you will first need to install the relevant dbt adapter for your dialect and the sqlfluff-templater-dbt package using your package manager of choice (e.g. pip install dbt-postgres sqlfluff-templater-dbt) and then will need the following configuration: In .sqlfluff: [sqlfluff] templater = dbt In .sqlfluffignore: target/ # dbt <1.0.0 dbt_modules/ # dbt >=1.0.0 dbt_packages/ macros/ You can set the dbt project directory, profiles directory and profile with: [sqlfluff:templater:dbt] project_dir = <relative or absolute path to dbt_project directory> profiles_dir = <relative or absolute path to the directory that contains the profiles.yml file> profile = <dbt profile> target = <dbt target> NOTE: If the profiles_dir setting is omitted, SQLFluff will look for the profile in the default location, which varies by operating system. On Unix-like operating systems (e.g. Linux or macOS), the default profile directory is ~/.dbt/. On Windows, you can determine your default profile directory by running dbt debug --config-dir. To use builtin dbt Jinja functions SQLFluff provides a configuration option that enables usage within templates. [sqlfluff:templater:jinja] apply_dbt_builtins = True This will provide dbt macros like ref, var, is_incremental(). If the need arises builtin dbt macros can be customised via Jinja macros in .sqlfluff configuration file. [sqlfluff:templater:jinja:macros] # Macros provided as builtins for dbt projects dbt_ref = {% macro ref(model_ref) %}{{model_ref}}{% endmacro %} dbt_source = {% macro source(source_name, table) %}{{source_name}}_{{table}}{% endmacro %} dbt_config = {% macro config() %}{% for k in kwargs %}{% endfor %}{% endmacro %} dbt_var = {% macro var(variable, default='') %}item{% endmacro %} dbt_is_incremental = {% macro is_incremental() %}True{% endmacro %} If your project requires that you pass variables to dbt through command line, you can specify them in template:dbt:context section of .sqlfluff. See below configuration and its equivalent dbt command: [sqlfluff:templater:dbt:context] my_variable = 1 dbt run --vars '{"my_variable": 1}' Known Caveats • To use the dbt templater, you must set templater = dbt in the .sqlfluff config file in the directory where sqlfluff is run. The templater cannot be changed in .sqlfluff files in subdirectories. • In SQLFluff 0.4.0 using the dbt templater requires that all files within the root and child directories of the dbt project must be part of the project. If there are deployment scripts which refer to SQL files not part of the project for instance, this will result in an error. You can overcome this by adding any non-dbt project SQL files to .sqlfluffignore. CLI Arguments You already know you can pass arguments (--verbose, --exclude-rules, etc.) through the CLI commands (lint, fix, etc.): $ sqlfluff lint my_code.sql -v --exclude-rules L022,L027 You might have arguments that you pass through every time, e.g rules you always want to ignore. These can also be configured: [sqlfluff] verbose = 1 exclude_rules = L022,L027 Note that while the exclude_rules config looks similar to the above example, the verbose config has an integer value. This is because verbose is stackable meaning there are multiple levels of verbosity that are available for configuration. See CLI Reference for more details about the available CLI arguments. For more details about rule exclusion, see Enabling and Disabling Rules. Ignoring Errors & Files Ignoring individual lines Similar to flake8's ignore, individual lines can be ignored by adding -- noqa to the end of the line. Additionally, specific rules can be ignored by quoting their code or the category. -- Ignore all errors SeLeCt 1 from tBl ; -- noqa -- Ignore rule L014 & rule L030 SeLeCt 1 from tBl ; -- noqa: L014,L030 -- Ignore all parsing errors SeLeCt from tBl ; -- noqa: PRS Ignoring line ranges Similar to pylint's "pylint" directive", ranges of lines can be ignored by adding -- noqa:disable=<rule>[,...] | all to the line. Following this directive, specified rules (or all rules, if "all" was specified) will be ignored until a corresponding -- noqa:enable=<rule>[,...] | all directive. -- Ignore rule L012 from this line forward SELECT col_a a FROM foo -- noqa: disable=L012 -- Ignore all rules from this line forward SELECT col_a a FROM foo -- noqa: disable=all -- Enforce all rules from this line forward SELECT col_a a FROM foo -- noqa: enable=all Ignoring types of errors General categories of errors can be ignored using the --ignore command line option or the ignore setting in .sqlfluffignore. Types of errors that can be ignored include: • lexing • linting • parsing • templating .sqlfluffignore Similar to Git's .gitignore and Docker's .dockerignore, SQLFluff supports a .sqlfluffignore file to control which files are and aren't linted. Under the hood we use the python pathspec library which also has a brief tutorial in their documentation. An example of a potential .sqlfluffignore placed in the root of your project would be: # Comments start with a hash. # Ignore anything in the "temp" path /temp/ # Ignore anything called "testing.sql" testing.sql # Ignore any ".tsql" files *.tsql Ignore files can also be placed in subdirectories of a path which is being linted and the sub files will also be applied within that subdirectory. Default Configuration The default configuration is as follows, note the Builtin Macro Blocks in section [sqlfluff:templater:jinja:macros] as referred to above. [sqlfluff] # verbose is an integer (0-2) indicating the level of log output verbose = 0 # Turn off color formatting of output nocolor = False # Supported dialects https://docs.sqlfluff.com/en/stable/dialects.html # Or run 'sqlfluff dialects' dialect = None # One of [raw|jinja|python|placeholder] templater = jinja # Comma separated list of rules to check, default to all rules = all # Comma separated list of rules to exclude, or None exclude_rules = None # The depth to recursively parse to (0 for unlimited) recurse = 0 # Below controls SQLFluff output, see max_line_length for SQL output output_line_length = 80 # Number of passes to run before admitting defeat runaway_limit = 10 # Ignore errors by category (one or more of the following, separated by commas: lexing,linting,parsing,templating) ignore = None # Warn only for rule codes (one of more rule codes, seperated by commas: e.g. L001,L002) # Also works for templating and parsing errors by using TMP or PRS warnings = None # Ignore linting errors found within sections of code coming directly from # templated code (e.g. from within Jinja curly braces. Note that it does not # ignore errors from literal code found within template loops. ignore_templated_areas = True # can either be autodetect or a valid encoding e.g. utf-8, utf-8-sig encoding = autodetect # Ignore inline overrides (e.g. to test if still required) disable_noqa = False # Comma separated list of file extensions to lint # NB: This config will only apply in the root folder sql_file_exts = .sql,.sql.j2,.dml,.ddl # Allow fix to run on files, even if they contain parsing errors # Note altering this is NOT RECOMMENDED as can corrupt SQL fix_even_unparsable = False # Very large files can make the parser effectively hang. # The more efficient check is the _byte_ limit check which # is enabled by default. The previous _character_ limit check # is still present for backward compatibility. This will be # removed in a future version. # Set either to 0 to disable. large_file_skip_char_limit = 0 large_file_skip_byte_limit = 20000 # CPU processes to use while linting. # If positive, just implies number of processes. # If negative or zero, implies number_of_cpus - specified_number. # e.g. -1 means use all processors but one. 0 means all cpus. processes = 1 [sqlfluff:indentation] # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-indent-locations indented_joins = False indented_ctes = False indented_using_on = True indented_on_contents = True template_blocks_indent = True # Layout configuration # See https://docs.sqlfluff.com/en/stable/layout.html#configuring-layout-and-spacing [sqlfluff:layout:type:comma] spacing_before = touch line_position = trailing [sqlfluff:layout:type:binary_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:statement_terminator] spacing_before = touch line_position = trailing [sqlfluff:layout:type:end_of_file] spacing_before = touch [sqlfluff:layout:type:set_operator] line_position = alone:strict [sqlfluff:layout:type:start_bracket] spacing_after = touch [sqlfluff:layout:type:end_bracket] spacing_before = touch [sqlfluff:layout:type:start_square_bracket] spacing_after = touch [sqlfluff:layout:type:end_square_bracket] spacing_before = touch [sqlfluff:layout:type:casting_operator] spacing_before = touch spacing_after = touch [sqlfluff:layout:type:comparison_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:assignment_operator] spacing_within = touch line_position = leading [sqlfluff:layout:type:object_reference] spacing_within = inline [sqlfluff:layout:type:comment] spacing_before = any spacing_after = any # Template loop tokens shouldn't dictate spacing around them. [sqlfluff:layout:type:template_loop] spacing_before = any spacing_after = any [sqlfluff:templater] unwrap_wrapped_queries = True [sqlfluff:templater:jinja] apply_dbt_builtins = True # Some rules can be configured directly from the config common to other rules [sqlfluff:rules] tab_space_size = 4 # Max line length is set by default to be in line with the dbt style guide. # https://github.com/dbt-labs/corp/blob/main/dbt_style_guide.md max_line_length = 80 indent_unit = space allow_scalar = True single_table_references = consistent unquoted_identifiers_policy = all # Some rules have their own specific config [sqlfluff:rules:L003] hanging_indents = True [sqlfluff:rules:L010] # Keywords capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L011] # Aliasing preference for tables aliasing = explicit [sqlfluff:rules:L012] # Aliasing preference for columns aliasing = explicit [sqlfluff:rules:L014] # Unquoted identifiers extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L016] # Line length ignore_comment_lines = False ignore_comment_clauses = False [sqlfluff:rules:L027] # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L026] # References must be in FROM clause # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:L028] # References must be consistently used # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:L029] # Keywords should not be used as identifiers. unquoted_identifiers_policy = aliases quoted_identifiers_policy = none # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L030] # Function names extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L031] # Avoid table aliases in from clauses and join conditions. # Disabled for some dialects (e.g. bigquery) force_enable = False [sqlfluff:rules:L036] wildcard_policy = single [sqlfluff:rules:L038] # Trailing commas select_clause_trailing_comma = forbid [sqlfluff:rules:L040] # Null & Boolean Literals capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L042] # By default, allow subqueries in from clauses, but not join clauses forbid_subquery_in = join [sqlfluff:rules:L047] # Consistent syntax to count all rows prefer_count_1 = False prefer_count_0 = False [sqlfluff:rules:L051] # Fully qualify JOIN clause fully_qualify_join_types = inner [sqlfluff:rules:L052] # Semi-colon formatting approach multiline_newline = False require_final_semicolon = False [sqlfluff:rules:L054] # GROUP BY/ORDER BY column references group_by_and_order_by_style = consistent [sqlfluff:rules:L057] # Special characters in identifiers unquoted_identifiers_policy = all quoted_identifiers_policy = all allow_space_in_identifier = False additional_allowed_characters = None ignore_words = None ignore_words_regex = None [sqlfluff:rules:L059] # Policy on quoted and unquoted identifiers prefer_quoted_identifiers = False ignore_words = None ignore_words_regex = None force_enable = False [sqlfluff:rules:L062] # Comma separated list of blocked words that should not be used blocked_words = None blocked_regex = None [sqlfluff:rules:L063] # Data Types extended_capitalisation_policy = consistent # Comma separated list of words to ignore for this rule ignore_words = None ignore_words_regex = None [sqlfluff:rules:L064] # Consistent usage of preferred quotes for quoted literals preferred_quoted_literal_style = consistent # Disabled for dialects that do not support single and double quotes for quoted literals (e.g. Postgres) force_enable = False [sqlfluff:rules:L066] min_alias_length = None max_alias_length = None [sqlfluff:rules:L067] # SQL type casting preferred_type_casting_style = consistent CLI Reference API Reference SQLFluff exposes a public api for other python applications to use. A basic example of this usage is given here, with the documentation for each of the methods below. """This is an example of how to use the simple sqlfluff api.""" from typing import Any, Dict, Iterator, List, Union import sqlfluff # -------- LINTING ---------- my_bad_query = "SeLEct *, 1, blah as fOO from mySchema.myTable" # Lint the given string and return an array of violations in JSON representation. lint_result = sqlfluff.lint(my_bad_query, dialect="bigquery") # lint_result = # [ # { # "code": "L010", # "line_no": 1, # "line_pos": 1, # "description": "Keywords must be consistently upper case.", # } # ... # ] # -------- FIXING ---------- # Fix the given string and get a string back which has been fixed. fix_result_1 = sqlfluff.fix(my_bad_query, dialect="bigquery") # fix_result_1 = 'SELECT *, 1, blah AS foo FROM myschema.mytable\n' # We can also fix just specific rules. fix_result_2 = sqlfluff.fix(my_bad_query, rules=["L010"]) # fix_result_2 = 'SELECT *, 1, blah AS fOO FROM mySchema.myTable' # Or a subset of rules... fix_result_3 = sqlfluff.fix(my_bad_query, rules=["L010", "L014"]) # fix_result_3 = 'SELECT *, 1, blah AS fOO FROM myschema.mytable' # -------- PARSING ---------- # Parse the given string and return a JSON representation of the parsed tree. parse_result = sqlfluff.parse(my_bad_query) # parse_result = {'file': {'statement': {...}, 'newline': '\n'}} # This JSON structure can then be parsed as required. # An example usage is shown below: def get_json_segment( parse_result: Dict[str, Any], segment_type: str ) -> Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]: """Recursively search JSON parse result for specified segment type. Args: parse_result (Dict[str, Any]): JSON parse result from `sqlfluff.fix`. segment_type (str): The segment type to search for. Yields: Iterator[Union[str, Dict[str, Any], List[Dict[str, Any]]]]: Retrieves children of specified segment type as either a string for a raw segment or as JSON or an array of JSON for non-raw segments. """ for k, v in parse_result.items(): if k == segment_type: yield v elif isinstance(v, dict): yield from get_json_segment(v, segment_type) elif isinstance(v, list): for s in v: yield from get_json_segment(s, segment_type) # e.g. Retrieve array of JSON for table references. table_references = list(get_json_segment(parse_result, "table_reference")) print(table_references) # [[{'identifier': 'mySchema'}, {'dot': '.'}, {'identifier': 'myTable'}]] # Retrieve raw table name from last identifier in the table reference. for table_reference in table_references: table_name = list(get_json_segment(parse_result, "naked_identifier"))[-1] print(f"table_name: {table_name}") # table_name: myTable Simple API commands Advanced API usage The simple API presents only a fraction of the functionality present within the core SQLFluff library. For more advanced use cases, users can import the Linter() and FluffConfig() classes from sqlfluff.core. As of version 0.4.0 this is considered as experimental only as the internals may change without warning in any future release. If you come to rely on the internals of SQLFluff, please post an issue on GitHub to share what you're up to. This will help shape a more reliable, tidy and well documented public API for use. Internals It is recommended that the following is read in conjunction with exploring the codebase. dialect_ansi.py in particular is helpful to understand the recursive structure of segments and grammars. Some more detail is also given on our Wiki including a Contributing Dialect Changes guide. Architecture At a high level, the behaviour of SQLFluff is divided into a few key stages. Whether calling sqlfluff lint, sqlfluff fix or sqlfluff parse, the internal flow is largely the same. Stage 1, the templater This stage only applies to templated SQL, most commonly Jinja and dbt. Vanilla SQL is sent straight to stage 2, the lexer. In order to lint templated SQL, SQLFluff must first convert the 'raw' or pre-templated code into valid SQL, which can then be parsed. The templater returns both the raw and post-templated SQL so that any rule violations which occur in templated sections can be ignored and the rest mapped to their original line location for user feedback. SQLFluff supports two templating engines: Jinja and dbt. Under the hood dbt also uses Jinja, but in SQLFluff uses a separate mechanism which interfaces directly with the dbt python package. For more details on how to configure the templater see Jinja Templating Configuration. Stage 2, the lexer The lexer takes SQL and separates it into segments of whitespace and code. No meaning is imparted; that is the job of the parser. Stage 3, the parser The parser is arguably the most complicated element of SQLFluff, and is relied on by all the other elements of the tool to do most of the heavy lifting. 1. The lexed segments are parsed using the specified dialect's grammars. In SQLFluff, grammars describe the shape of SQL statements (or their components). The parser attempts to apply each potential grammar to the lexed segments until all the segments have been matched. 2. In SQLFluff, segments form a tree-like structure. The top-level segment is a FileSegment, which contains zero or more StatementSegments, and so on. Before the segments have been parsed and named according to their type, they are 'raw', meaning they have no classification other than their literal value. 3. The three key components to the parser are segments, match_grammars and parse_grammars. A segment can be a leaf in the parse tree, such as a NumericLiteralSegment, which is simply a number, or can contain many other segments, such as a SelectStatementSegment. Each segment can specify a parse_grammar, and a match_grammar. If both a match_grammar and parse_grammar are defined in a segment, match_grammar is used to quickly prune the tree for branches which do not match segments being parsed, and the parse_grammar is then used to refine the branch identified as correct. If only a match_grammar is defined, then it serves the purpose of both pruning and refining. 4. A segment's .parse() method uses the parse_grammar, on which .match() is called. The match method of this grammar will return a potentially refined structure of the segments within the segment in greater detail. In the example of a FileSegment, it first divides up the query into statements and then finishes. • Segments must implement a match_grammar. When .match() is called on a segment, this is the grammar which is used to decide whether there is a match. • Grammars combine segments or other grammars together in a pre-defined way. For example the OneOf grammar will match if any one of its child elements match. 1. Regardless of whether the parse_grammar was used, the next step is to recursively call the .parse() method of each of the child segments of the grammar. This operation is wrapped in a method called .expand(). In the FileSegment, the first step will have transformed a series of raw tokens into StatementSegment segments, and the expand step will let each of those segments refine the content within them. 2. During the recursion, the parser eventually reaches segments which have no children (raw segments containing a single token), and so the recursion naturally finishes. #. If no match is found for a segment, the contents will be wrapped in an UnparsableSegment which is picked up as a parsing error later. When working on the parser there are a couple of design principles to keep in mind. • Grammars are contained in dialects, the root dialect being the ansi dialect. The ansi dialect is used to host logic common to all dialects, and so does not necessarily adhere to the formal ansi specification. Other SQL dialects inherit from the ansi dialect, replacing or patching any segments they need to. One reason for the Ref grammar is that it allows name resolution of grammar elements at runtime and so a patched grammar with some elements overridden can still rely on lower-level elements which haven't been redeclared within the dialect • All grammars and segments attempt to match as much as they can and will return partial matches where possible. It is up to the calling grammar or segment to decide whether a partial or complete match is required based on the context it is matching in. Stage 4, the linter Given the complete parse tree, rule classes check for linting errors by traversing the tree, looking for segments and patterns of concern. If the rule discovers a violation, it returns a LintResult pointing to the segment which caused the violation. Some rules are able to fix the problems they find. If this is the case, the rule will return a list of fixes, which describe changes to be made to the tree. This can include edits, inserts, or deletions. Once the fixes have been applied, the updated tree is written to the original file. Reflow Internals Many rules supported by SQLFluff involve the spacing and layout of different elements, either to enforce a particular layout or just to add or remove code elements in a way sensitive to the existing layout configuration. The way this is achieved is through some centralised utilities in the sqlfluff.utils.reflow module. This module aims to achieve several things: * Less code duplication by implementing reflow logic in only one place. • Provide a streamlined interface for rules to easily utilise reflow logic. • Given this requirement, it's important that reflow utilities work within the existing framework for applying fixes to potentially templated code. We achieve this by returning LintFix objects which can then be returned by each rule wanting to use this logic. • Provide a consistent way of configuring layout requirements. For more details on configuration see Configuring Layout. To support this, the module provides a ReflowSequence class which allows access to all of the relevant operations which can be used to reformat sections of code, or even a whole file. Unless there is a very good reason, all rules should use this same approach to ensure consistent treatment of layout. Developing Rules Rules in SQLFluff are implemented as classes inheriting from BaseRule. SQLFluff crawls through the parse tree of a SQL file, calling the rule's _eval() function for each segment in the tree. For many rules, this allows the rule code to be really streamlined and only contain the logic for the rule itself, with all the other mechanics abstracted away. Traversal Options recurse_into Some rules are a poor fit for the simple traversal pattern described above. Typical reasons include: • The rule only looks at a small portion of the file (e.g. the beginning or end). • The rule needs to traverse the parse tree in a non-standard way. These rules can override BaseRule's recurse_into field, setting it to False. For these rules False, _eval() is only called once, with the root segment of the tree. This can be much more efficient, especially on large files. For example, see rules L050 and L009 , which only look at the beginning or end of the file, respectively. _works_on_unparsable By default, SQLFluff calls _eval() for all segments, even "unparsable" segments, i.e. segments that didn't match the parsing rules in the dialect. This causes issues for some rules. If so, setting _works_on_unparsable to False tells SQLFluff not to call _eval() for unparsable segments and their descendants. Performance-related Options These are other fields on BaseRule. Rules can override them. needs_raw_stack needs_raw_stack defaults to False. Some rules use RuleContext.raw_stack property to access earlier segments in the traversal. This can be useful, but it adds significant overhead to the linting process. For this reason, it is disabled by default. lint_phase There are two phases of rule running. 1. The main phase is appropriate for most rules. These rules are assumed to interact and potentially cause a cascade of fixes requiring multiple passes. These rules run the runaway_limit number of times (default 10). 2. The post phase is for post-processing rules, not expected to trigger any downstream rules, e.g. capitalization fixes. They are run in a post-processing loop at the end. This loop is identical to the main loop, but is only run 2 times at the end (once to fix, and once again to confirm no remaining issues). The two phases add complexity, but they also improve performance by allowing SQLFluff to run fewer rules during the main phase, which often runs several times. NOTE: post rules also run on the first pass of the main phase so that any issues they find will be presented in the list of issues output by sqlfluff fix and sqlfluff lint. Base Rules base_rules Module Functional API These newer modules provide a higher-level API for rules working with segments and slices. Rules that need to navigate or search the parse tree may benefit from using these. Eventually, the plan is for all rules to use these modules. As of December 30, 2021, 17+ rules use these modules. The modules listed below are submodules of sqlfluff.utils.functional. segments Module segment_predicates Module raw_file_slices Module raw_file_slice_predicates Module Developing Plugins SQLFluff is extensible through "plugins". We use the pluggy library to make linting Rules pluggable, which enable users to implement rules that are just too "organization specific" to be shared, or too platform specific to be included in the core library. Creating a plugin We have an example plugin in sqlfluff/plugins/sqlfluff-plugin-example which you can use as a template for rules, or the sqlfluff/plugins/sqlfluff-templater-dbt which you can use as a template for templater plugins. Few things to note about plugins: Currently, only Rules and Templaters can be added through plugins. Over time we expect more elements of SQLFluff will be extensible with plugins. Each plugin can implement multiple Rules or Templaters. We recommend that the name of a plugin should start with "sqlfluff-" to be clear on the purpose of your plugin. A plugin may need to include a default configuration if its rules are configurable: use plugin default configurations only for that reason! We advise against overwriting core configurations by using a default plugin configuration, as there is no mechanism in place to enforce precedence between the core library configs and plugin configs, and multiple plugins could clash. A plugin Rule class name should have the structure: "Rule_PluginName_L000". The 'L' can be any letter and is meant to categorize rules; you could use the letter 'S' to denote rules that enforce security checks for example. An important thing to note when running custom implemented rules: Run pip install -e ., inside the plugin folder so custom rules in linting are included. A plugin Rule code includes the PluginName, so a rule "Rule_L000" in core will have code "L000", while "Rule_PluginName_L000" will have code "PluginName_L000". Codes are used to display errors, they are also used as configuration keys. We make it easy for plugin developers to test their rules by exposing a testing library in sqlfluff.utils.testing. Giving feedback Would you like to have other parts of SQLFluff be "pluggable"? Tell us about it in a GitHub issue 😄. SQLFluff in the Wild Want to find other people who are using SQLFluff in production use cases? Want to brag about how you're using it? Just want to show solidarity with the project and provide a testimonial for it? Just add a section below by raising a PR on GitHub by editing this file ✏️. • SQLFluff in production dbt projects at tails.com. We use the SQLFluff cli as part of our CI pipeline in codeship to enforce certain styles in our SQL codebase (with over 650 models) and keep code quality high. Contact @alanmcruickshank. • Netlify's data team uses SQLFluff with dbt to keep code quality in more than 350 models (and growing). Previously, we had our SQL Guidelines defined in a site hosted with Netlify, and now we're enforcing these rules in our CI workflow thanks to SQLFluff. • Drizly's analytics team uses SQLFluff with dbt for over 700 models as part of our CI checks in GitHub. Before SQLFluff, we had SQL best practices outlined in a google doc and had to manually enforce through PR comments. We're now able to enforce much of our style guide automatically through SQLFluff. • Petal's data-eng team runs SQLFluff on our 100+ model dbt project. As a pre-commit hook and as a CI check, SQLFluff helps keep our SQL readable and consistent. • Surfline's Analytics Engineering team implemented SQLFluff as part of our continuous integration (CI) suite across our entire dbt project (700+ models). We implement the CI suite using GitHub Actions and Workflows. The benefits of using SQLFluff at Surfline are: • The SQL in our dbt models is consistent and easily readable. • Our style guide is maintained as code, not a README that is rarely updated. • Reduced burden on Analytics Engineers to remember every single style rule. • New Analytics Engineers can quickly see and learn what "good SQL" looks like at Surfline and start writing it from day 1. • The HTTP Archive uses SQLFluff to automatically check for quality and consistency of code submitted by the many contributors to this project. In particular our annual Web Almanac attracts hundreds of volunteers to help analyse our BigQuery dataset and being able automatically lint Pull Requests through GitHub Actions is a fantastic way to help us maintain our growing repository of over a thousand queries. • Brooklyn Data Co has a dbt_artifacts dbt package from which runs SQLFluff in CI to lint pull requests automatically. It uses the GitHub Actions workflow contributed by Greg Clunies, with annotations on pull requests to make it easy for contributors to see where their SQL has failed any rules. See an example pull request with SQLFluff annotations. • Markerr has tightly integrated SQLFluff into our CI/CD process for data model changes and process improvements. Since adopting SQLFluff across the organization, the clarity of our SQL code has risen dramatically, freeing up review time to focus on deeper data and process-specific questions. • Symend has a microservices platform supporting our SaaS product. We use SQLFLuff in the CI/CD process of several of our data-oriented microservices. Among other things, it validates our database migration scripts, deployed using schemachange and we have near-term plans to implement it for our dbt projects. • At CarePay we use SQLFLuff to lint and fix all our dbt models as well as several other SQL heavy projects. Locally we use SQLFluff with pre-commit and have also integrated it into our CI/CD pipelines. • Core Analytics Team from Typeform and videoask uses SQLFluff in the production dbt project for building our datawarehouse layer for both products: • We use it locally in our day to day work, helping us to write cleaner code. • We added SQLFluff to our CI processes, so during a PR we can check that any new or modified sql file has a consistent and easy-to-read format. SQLFluff Slack We have a fast-growing community on Slack, come and join us! SQLFluff on Twitter Follow us On Twitter @SQLFluff for announcements and other related posts. • Index • Module Index • Search Page
AUTHOR
Alan Cruickshank
COPYRIGHT
2023, Alan Cruickshank Feb 03, 2023 SQLFLUFF(1)