As a bit of a preface to this post: it is hard to find a free SQL Parser for .NET.
There is a company that has a terrible library that they charge $150 bucks for. There are a couple of incomplete implementations done for school projects or for narrowly focused tasks.
So if you want a no-strings attached free parser for SQL, you’re out of luck.
However, since most people who want a .NET parser are writing code on a Windows machine, and use Visual Studio, there is (lightly documented) hope: the TSqlParser library that ships with Visual Studio.
This is a fully featured parsing library for SQL Server SQL syntax. I’m not sure about the support of other DB’s SQL syntax, but I would imagine it’s poor.
On an x64 Windows machine, using Visual Studio 2010, the dll’s which contain the TSqlParser library are located at:
C:\Program Files (x86)\Microsoft Visual Studio 10.0\VSTSDB
Microsoft.Data.Schema.ScriptDom.Sql gets you the parser for Sql Server 2008.
To instantiate an instance of the
TSql100Parser class, you have to supply the constructor with one parameter:
public TSql100Parser(bool initialQuotedIdentifiers )
The docs for this are better than trying to figure out what
Specifies whether quoted identifier handling is on.
I’m guessing this has to do with declaring aliases for columns like this:
bar as 'This is the alias for foo.bar'
--instead of like this:
bar as [This is the alias for foo.bar]
Using the parser is relatively simple. Once you reference the correct dll’s in your project:
var parser = new TSql100Parser(true);
var script = parser.Parse(reader, out errors) as TSqlScript;
foreach (TSqlBatch batch in script.Batches)
foreach (TSqlStatement statement in batch.Statements)
//At this point, you have a collection of SQL Statements...
//that can contain collections of SQL Statements...
My comment in the code above is to help you understand something about parsing SQL – almost every relationship is expressed as a tree, where something contains more of the same thing, and that thing may contain more of the same thing, or maybe not.
Which means the easiest way to navigate the data is recursively. In other words, the Rules for using TSqlParser:
- LEARN TO LOVE THE RECURSION.
- Refer to the Rules for using TSqlParser
I’ll give you an example scenario to show you what you’re up against.
One common scenario is searching your code for SELECT statements. Select statements can be contained in:
- Stored Procedures
- If Statements
- While Statements
- BEGIN statements
- Try/Catch Blocks
I’m sure I’m missing some cases.
So it’s not as simple as saying “give me all the statements that are select statements”. Instead, you have to write something like:
foreach(statement in statements)
if statement is a Stored Procedure
if statement is an If Statement
if statement is a While Statement
if statement is a Select Statement
So once you get your select statement (or your collection of select statements), how do you process them?
Well unfortunately it’s not straightforward. The
SelectStatement class contains a field called
QueryExpression – this field contains what kind of Select we’re dealing with.
As far as I can determine, there are three types of QueryExpressions:
This is an actual SELECT statement
This is a UNION or similar expression between two SELECT statements
This is a SELECT surrounded by parenthesis. In other words, a sub-select
So again, if you only want SELECT statements, you have to weed through the three types of QueryExpressions until you get to the underlying SELECT statements.
So eventually you’ll get to a list of QuerySpecifications (which represent the SELECT statements from your original query).
Now here comes the good stuff: you can now weed through the SELECT fields programmatically and get out whatever information you want. Here are some of the fields on
FromClauses Gets a list of FROM clauses.
GroupByClause Gets or sets a GROUP BY clause.
HavingClause Gets or sets a HAVING clause.
Into Gets or sets the into table name.
SelectElements Gets a list of the selected columns or set variables.
TopRowFilter Gets or sets the usage of the top row filter.
UniqueRowFilter Gets or sets the unique row filter value.
WhereClause Gets or sets a WHERE clause.
Just tons of SELECT goodness. However, be warned: each of these fields contains lists with multiple subclasses. So more recursive diving if you want to get something very specific out of this select data.
To get farther you might have to dive into the docs.
Link to MSDN Namespace Docs