Added JavaScript syntax checking via Esprima and a Git pre-commit hook

I came across a brilliant project the other day – Esprima from Ariya Hidayat, the author of PhantomJS.

What is Esprima? Esprima is a JavaScript Parser written in JavaScript Syntax Validator. It forms the basis of several different tools – a minifier, a code coverage tool, a syntax validator – just to name a few.

I was immediately interested in the syntax validation tool. It’s not a linter – it just checks that the JavaScript written is syntactically correct. Why would you want this if you already have JsHint and JsLint?

  1. It is extremely fast. Validating three.js (800 KB source) takes less than a second on a modern machine.
  2. It looks only for syntax errors, it does not care about coding style at all.
  3. It handles generated files as the result of minification or compilation (CoffeeScript, Dart, TypeScript, etc).
  4. It tries to be tolerant and not give up immediately on the first error, especially for strict mode violations.

Esprima is available as an npm package, so installing it only takes a second:

sudo npm install -g esprima

Using Esprima from the command line is simple:

esvalidate file.js

The only thing to note about running from the command line: if the validation succeeds, you won’t get much in the way of confirmation. Which can be painful if you are processing a whole directory. You only get useful feedback in the default mode on error.

However, if you don’t mind reading a little XML:

esvalidate lib/*.js --format=junit

Prints junit XML which at least you can visually parse to see which files were validated.

Where would I use this where I might not use JsHint? As a pre-commit hook to screen my checkins. Instead of going through a check-in, building everything, then running JSHint just to hear that something is not up to spec, I can add a little script that will do a quick sanity check of my JS before I go to commit anything to git.

If you’ve never created a pre-commit hook before, it’s pretty easy. Two lines in bash will give you a pre-commit file:

touch .git/hooks/pre-commit
chmod +x .git/hooks/pre-commit

This is windows version of the code for the pre-commit hook:

#!/bin/sh
files=$(git diff-index --name-only HEAD | grep -l '\.js$')
for file in $files; do
esvalidate $file
if [ $? -eq 1 ]; then
echo "Syntax error: $file"
exit 1
fi
done

To make this work on Linux, just remove the remove the #!/bin/sh line.

For more information about Esprima, check out this article by the author.

.Net SQL Parsing – Using the TSqlParser library

As a bit of a preface to this post: it is hard to find a free SQL Parser for .NET.

There is a company that has a terrible library that they charge $150 bucks for. There are a couple of incomplete implementations done for school projects or for narrowly focused tasks.

So if you want a no-strings attached free parser for SQL, you’re out of luck.

However, since most people who want a .NET parser are writing code on a Windows machine, and use Visual Studio, there is (lightly documented) hope: the TSqlParser library that ships with Visual Studio.

This is a fully featured parsing library for SQL Server SQL syntax. I’m not sure about the support of other DB’s SQL syntax, but I would imagine it’s poor.

On an x64 Windows machine, using Visual Studio 2010, the dll’s which contain the TSqlParser library are located at:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VSTSDB

The class TSql100Parser in Microsoft.Data.Schema.ScriptDom.Sql gets you the parser for Sql Server 2008.

To instantiate an instance of the TSql100Parser class, you have to supply the constructor with one parameter:

public TSql100Parser(bool initialQuotedIdentifiers )

The docs for this are better than trying to figure out what initialQuotedIdentifiers means:

Specifies whether quoted identifier handling is on.

I’m guessing this has to do with declaring aliases for columns like this:

select 
   bar as 'This is the alias for foo.bar'
from 
   tblFoo
--instead of like this:
select
    bar as [This is the alias for foo.bar]
from
    tblFoo

Using the parser is relatively simple. Once you reference the correct dll’s in your project:

var parser = new TSql100Parser(true); 
var script = parser.Parse(reader, out errors) as TSqlScript;

foreach (TSqlBatch batch in script.Batches)
{
    foreach (TSqlStatement statement in batch.Statements)
    {
        //At this point, you have a collection of SQL Statements... 
        //that can contain collections of SQL Statements...  
    }
}

My comment in the code above is to help you understand something about parsing SQL – almost every relationship is expressed as a tree, where something contains more of the same thing, and that thing may contain more of the same thing, or maybe not.

Which means the easiest way to navigate the data is recursively. In other words, the Rules for using TSqlParser:

  1. LEARN TO LOVE THE RECURSION.
  2. Refer to the Rules for using TSqlParser

I’ll give you an example scenario to show you what you’re up against.

One common scenario is searching your code for SELECT statements. Select statements can be contained in:

  • Stored Procedures
  • If Statements
  • While Statements
  • BEGIN statements
  • Try/Catch Blocks

I’m sure I’m missing some cases.

So it’s not as simple as saying “give me all the statements that are select statements”. Instead, you have to write something like:

function ProcessStatements(statements)

    foreach(statement in statements) 
        if statement is a Stored Procedure
           ProcessStatements(statement.MyStatements)
        if statement is an If Statement
           ProcessStatements(statement.MyStatements)
        if statement is a While Statement
           ProcessStatements(statement.MyStatements)
        if statement is a Select Statement
           ProcessSelect( statement)

So once you get your select statement (or your collection of select statements), how do you process them?

Well unfortunately it’s not straightforward. The SelectStatement class contains a field called QueryExpression – this field contains what kind of Select we’re dealing with.
As far as I can determine, there are three types of QueryExpressions:

  • QuerySpecification
  • This is an actual SELECT statement

  • BinaryQueryExpression
  • This is a UNION or similar expression between two SELECT statements

  • QueryParenthesis
  • This is a SELECT surrounded by parenthesis. In other words, a sub-select

So again, if you only want SELECT statements, you have to weed through the three types of QueryExpressions until you get to the underlying SELECT statements.

So eventually you’ll get to a list of QuerySpecifications (which represent the SELECT statements from your original query).

Now here comes the good stuff: you can now weed through the SELECT fields programmatically and get out whatever information you want. Here are some of the fields on QuerySpecification:

FromClauses     Gets a list of FROM clauses.
GroupByClause   Gets or sets a GROUP BY clause.
HavingClause    Gets or sets a HAVING clause.
Into            Gets or sets the into table name.
SelectElements  Gets a list of the selected columns or set variables.
TopRowFilter    Gets or sets the usage of the top row filter.
UniqueRowFilter Gets or sets the unique row filter value.
WhereClause     Gets or sets a WHERE clause.

Just tons of SELECT goodness. However, be warned: each of these fields contains lists with multiple subclasses. So more recursive diving if you want to get something very specific out of this select data.

To get farther you might have to dive into the docs.
Link to MSDN Namespace Docs