.Net SQL Parsing – Using the TSqlParser library

As a bit of a preface to this post: it is hard to find a free SQL Parser for .NET.

There is a company that has a terrible library that they charge $150 bucks for. There are a couple of incomplete implementations done for school projects or for narrowly focused tasks.

So if you want a no-strings attached free parser for SQL, you’re out of luck.

However, since most people who want a .NET parser are writing code on a Windows machine, and use Visual Studio, there is (lightly documented) hope: the TSqlParser library that ships with Visual Studio.

This is a fully featured parsing library for SQL Server SQL syntax. I’m not sure about the support of other DB’s SQL syntax, but I would imagine it’s poor.

On an x64 Windows machine, using Visual Studio 2010, the dll’s which contain the TSqlParser library are located at:

C:\Program Files (x86)\Microsoft Visual Studio 10.0\VSTSDB

The class TSql100Parser in Microsoft.Data.Schema.ScriptDom.Sql gets you the parser for Sql Server 2008.

To instantiate an instance of the TSql100Parser class, you have to supply the constructor with one parameter:

public TSql100Parser(bool initialQuotedIdentifiers )

The docs for this are better than trying to figure out what initialQuotedIdentifiers means:

Specifies whether quoted identifier handling is on.

I’m guessing this has to do with declaring aliases for columns like this:

select 
   bar as 'This is the alias for foo.bar'
from 
   tblFoo
--instead of like this:
select
    bar as [This is the alias for foo.bar]
from
    tblFoo

Using the parser is relatively simple. Once you reference the correct dll’s in your project:

var parser = new TSql100Parser(true); 
var script = parser.Parse(reader, out errors) as TSqlScript;

foreach (TSqlBatch batch in script.Batches)
{
    foreach (TSqlStatement statement in batch.Statements)
    {
        //At this point, you have a collection of SQL Statements... 
        //that can contain collections of SQL Statements...  
    }
}

My comment in the code above is to help you understand something about parsing SQL – almost every relationship is expressed as a tree, where something contains more of the same thing, and that thing may contain more of the same thing, or maybe not.

Which means the easiest way to navigate the data is recursively. In other words, the Rules for using TSqlParser:

  1. LEARN TO LOVE THE RECURSION.
  2. Refer to the Rules for using TSqlParser

I’ll give you an example scenario to show you what you’re up against.

One common scenario is searching your code for SELECT statements. Select statements can be contained in:

  • Stored Procedures
  • If Statements
  • While Statements
  • BEGIN statements
  • Try/Catch Blocks

I’m sure I’m missing some cases.

So it’s not as simple as saying “give me all the statements that are select statements”. Instead, you have to write something like:

function ProcessStatements(statements)

    foreach(statement in statements) 
        if statement is a Stored Procedure
           ProcessStatements(statement.MyStatements)
        if statement is an If Statement
           ProcessStatements(statement.MyStatements)
        if statement is a While Statement
           ProcessStatements(statement.MyStatements)
        if statement is a Select Statement
           ProcessSelect( statement)

So once you get your select statement (or your collection of select statements), how do you process them?

Well unfortunately it’s not straightforward. The SelectStatement class contains a field called QueryExpression – this field contains what kind of Select we’re dealing with.
As far as I can determine, there are three types of QueryExpressions:

  • QuerySpecification
  • This is an actual SELECT statement

  • BinaryQueryExpression
  • This is a UNION or similar expression between two SELECT statements

  • QueryParenthesis
  • This is a SELECT surrounded by parenthesis. In other words, a sub-select

So again, if you only want SELECT statements, you have to weed through the three types of QueryExpressions until you get to the underlying SELECT statements.

So eventually you’ll get to a list of QuerySpecifications (which represent the SELECT statements from your original query).

Now here comes the good stuff: you can now weed through the SELECT fields programmatically and get out whatever information you want. Here are some of the fields on QuerySpecification:

FromClauses     Gets a list of FROM clauses.
GroupByClause   Gets or sets a GROUP BY clause.
HavingClause    Gets or sets a HAVING clause.
Into            Gets or sets the into table name.
SelectElements  Gets a list of the selected columns or set variables.
TopRowFilter    Gets or sets the usage of the top row filter.
UniqueRowFilter Gets or sets the unique row filter value.
WhereClause     Gets or sets a WHERE clause.

Just tons of SELECT goodness. However, be warned: each of these fields contains lists with multiple subclasses. So more recursive diving if you want to get something very specific out of this select data.

To get farther you might have to dive into the docs.
Link to MSDN Namespace Docs

Advertisements

4 thoughts on “.Net SQL Parsing – Using the TSqlParser library

  1. I like how the code allows me to break up the SQL statements I enter from a string. It works exactly as I would expect it to be.

    Is there any chance when it loops through and gives me the statements, can I get the string of the statements?

    Example:

    Select * from tblTest Select * from tblTest2 where ID in (select ID from tblTest)

    This string correctly identifies two select statements, because one is nested. Now I would like to print out the two statements separated. Is this possible?

    • It is possible to get all the select statements, even the nested ones.

      The SelectStatement object will have a WhereClause object, which will have a collection of objects, one of which will be the 2nd select statement.

      So you just have to recursively loop throught all the constructs, and you’ll eventually find that 2nd Select statement.

      I’m not actively maintaining this blog – check out my current blog for more up to date content.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s