Hi - does anyone have any tips for profiling and cleansing data with T-SQL? I've got a lot of columns of data to clean up and it seems the only way of cleaning it all is to handcraft T-SQL code to:
1) first profile my data, then,
2) Clean it.
For example, I have a column of datetimes with a lot of noise in it - nulls, future dates (there shouldn't be!) and dates that are far too old (i.e. they were incorrectly entered). I need to end up with some script that strips out (or possibly even cleans) the invalid rows. Trouble is, I need to do this for a lot of columns...
Does anyone know of a quicker way of doing this - is there something like this in SSIS or elsewhere that will make this job easier/quicker? A profiling tool of some sort?
Yes - I can get access to the information in information_schema.columns. And I did think about writing something which would interrogate this information and generate some SQL script for me - is this what you would suggest?
I guess I was just wondering if there was a quicker way of doing this (e.g. some part of SSMS that I've missed!) or even a tool I could download that could help.. If not, I'm happy to start coding something myself - but I didn't want to do this if somebody had already done it for me!