If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Data Access, Manipulation & Batch Languages > ANSI SQL > Reverse Engineer Hand Written SQL ( Oracle )

Reply
 
LinkBack Thread Tools Search this Thread Display Modes
  #1 (permalink)  
Old 03-26-04, 02:04
arnzie arnzie is offline
Registered User
 
Join Date: Mar 2004
Posts: 30
Reverse Engineer Hand Written SQL ( Oracle )

Hi,

My environment encompasses over 2000 tables with over 300000 lines of hand-written Oracle SQL, and what I want to do is reverse engineer the SQL to tell me what tables are being accessed and how...

I have the full list of tables stored in a reference table ( extracted from USER_TABLES ), and I also have every line of SQL loaded into another reference table, of the example format :

FILENAME, LINE_NO, SQL
products.sql,1,select *
products.sql,2,from products
products.sql,3,where id = 2
customer.sql,1,select * from
customer.sql,2,customers a, products b
customer.sql,3,where a.id = b.id

so as you can see I will be evaluating this on a line by line basis... Now this hand-written SQL will obviously be valid Oracle SQL, but it can be all over the place, in that if i am looking at it on a line-by-line basis, the INSERT, UPDATE, FROM statement may not necessary be on the same line as the table name, etc...Basically the same query can be written in many different ways...

So I am taking the list of tables ( stored in a reference table ), and then also the table holding all of the SQL code, and then basically doing a join with a LIKE clause constraint, to see what SQL lines are using what tables, but as the SQL is all hand-written the rules can get messy...

So basically what Im asking is, does anyone know of any definitive rules for reverse engineering SQL, or any example scripts, etc ? My process basically works like this at the moment to get you started :

If i find an instance of a tablename on a SQL line, it must be prefixed with either nothing, space, tab, comma, and it must be suffixed with either nothing, space, tab, comma, semicolon, right bracket... This firstly stops finding any tables thats name are part of another table, eg :

PRODUCT compared to PRODUCT_LIST

Then I make sure that it is not within a comment, so if its on a line with
/*, */, --, etc, then I disregard it ( or if it was between two sets of start-comment and end-comment i also disregard ) - eg :

/*
SELECT * FROM products
*/

After that, I then search for the word INTO, UPDATE, TRUNCATE, DROP, CREATE INDEX, RENAME on that line, etc to try and determine what I am doing to that table, and if i cant find any of those, I assume its a SELECT... I also look at the previous line for those keywords, because the SQL could have been written like this :

TRUNCATE TABLE
products

Then I also do a similar thing to look for in-line UPDATES :

UPDATE(
SELECT end_dte FROM PRODUCTS a, CUSTOMER b
) SET a.end_dte = '1-Jan-2000'

Because the above line is really updating the PRODUCT table, whereas on a line-by-line basis it looks like a SELECT...

So these are my rules in place thus far... My end result would be to have a table that says these SQL scripts access these tables in this way, for example :

FILENAME, QUERY_TYPE, TABLE
products.sql, update, products
products.sql, select, customers
products.sql, truncate, products
customers.sql, select, customers

OK, Hopefully you understand what I mean... Basically if anyone has any pre-written code or ideas that would be great, but I am more interested in expanding on these rules to make sure I get it 100% right. As you can imagine, hard-written SQL, that is technically valid SQL can still be written in many, many different ways, so the rules neeed to be very tight... Or can anyone see any flaws in my rules also ?

By the way, Im only interested in what tables are accessed, trying to do the same thing on a column basis would be WAY too hard...

Thanks,
Adam
Reply With Quote
  #2 (permalink)  
Old 03-26-04, 09:52
Pat Phelan Pat Phelan is offline
Resident Curmudgeon
 
Join Date: Feb 2004
Location: In front of the computer
Posts: 12,605
I'd use YACC to construct a tokenizer if I were you. Otherwise you'll go nuts trying to figure out how to parse the SQL. Once you've reduced it to tokens, take all of the table tokens from the token stream and you'll be more than 90% of the way to your goal.

-PatP
Reply With Quote
Reply

Thread Tools Search this Thread
Search this Thread:

Advanced Search
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On