Results 1 to 13 of 13
  1. #1
    Join Date
    Jan 2003
    Location
    Zutphen,Netherlands
    Posts
    256

    Unanswered: 'corrupted' values in database

    I just started working on reports on an SQL 2000 database.
    Many field contain numerical values stored as varchar. To make matters worse these strings contain both '.' and ',' as decimal sign.
    So far this has really been a great pain :-(

    The database and its application are bought as is, I cannot modify fieldtypes or anything else.

    Can anyone give a good strategy to convert the strings back to numerical so I can calculate revenue's and such????

    P.S.

    In extreme cases string contain both ',' AND '.' , like:

    5.608,42
    Ties Blom
    Senior Application Developer BI
    Getronics Healthcare
    DB2,ORACLE,Powercenter,BusObj,Access,
    SQL, SQL server

  2. #2
    Join Date
    Jul 2003
    Location
    The Dark Planet
    Posts
    1,401
    Use

    convert (float, replace(column_name,',',''))

    or convert (int,replace(replace(column_name,',',''),'.','') if you are sure the value is an integer

  3. #3
    Join Date
    Jan 2003
    Location
    Zutphen,Netherlands
    Posts
    256
    Note quite what I need,

    The string can contain either a . or a ,
    With the conversion you give all figures with a ',' come out 100 fold higher than they should be, while those with a '.' come out right.

    Some sort of validation seems to be needed to get all them right.

    Right now I am focussing on prices which means the highest value is just 900.00 (or 900.00) so I have no problem in this column with both a ',' and a '.'
    Ties Blom
    Senior Application Developer BI
    Getronics Healthcare
    DB2,ORACLE,Powercenter,BusObj,Access,
    SQL, SQL server

  4. #4
    Join Date
    Jan 2003
    Location
    Zutphen,Netherlands
    Posts
    256

    this seems to work

    The following seems to work:

    SELECT
    CASE
    WHEN
    SUBSTRING(REVERSE(RTRIM(LTRIM(PRIJS))),3,1) = ',' THEN
    convert(float,replace(PRIJS,',',''))/100
    WHEN
    SUBSTRING(REVERSE(RTRIM(LTRIM(PRIJS))),3,1) = '.' THEN
    convert(dec(9,2),PRIJS)
    ELSE 0 END

    not getting very good feelings though on having to do these king of conversions :-(
    Ties Blom
    Senior Application Developer BI
    Getronics Healthcare
    DB2,ORACLE,Powercenter,BusObj,Access,
    SQL, SQL server

  5. #5
    Join Date
    Aug 2003
    Location
    Delft, The Netherlands (EU)
    Posts
    447

    Re: this seems to work

    No, me either! You should consider to clean your data before reporting them. You are not allowed to change the field type, but you should be allowed to update your data. So, define your required format, and update everything, that does not match your format, first. Your format may be "#.###,##" or "####.##"; I would prefer the last, because that can be converted into a number directly. You may even consider to convert everything to cents, which makes it easy for you to detect new values, that are not yet converted.

    Another approach would be to identify your different formats, and make a union query, handling each of your formats in a separate branch.
    Make everything as simple as possible, but not simpler! - A. Einstein
    DB Problems? DB Explorer, BTrieve Re-engineering, DB Conversions & ETL? Conversion Tool

  6. #6
    Join Date
    Jan 2003
    Location
    Zutphen,Netherlands
    Posts
    256
    Can't do that.

    Some of the fields are filled by procedures from the AS400 system.
    THere is something fundamentaly wrong with the way the data is handled on the SQL database. (For which I just mailed an angry mail to all involved) ,

    but doctor maybe you can help me with this one:


    I have all the differnent flavors in the DB:

    1.203,56
    1,203.56
    456,67
    456.67

    You see replacing the comma is okay if it comes as first one in the string.
    If there is just a comma replace it by a '.'
    If a '.' precedes a ',' replace ','by '.' and get rid of the '.'

    Nice challenge ain't it???

    Next private message next week, gotta cycle to the north this afternoon
    :-)
    Ties Blom
    Senior Application Developer BI
    Getronics Healthcare
    DB2,ORACLE,Powercenter,BusObj,Access,
    SQL, SQL server

  7. #7
    Join Date
    Jul 2003
    Location
    The Dark Planet
    Posts
    1,401
    No, me either! You should consider to clean your data before reporting them. You are not allowed to change the field type, but you should be allowed to update your data. So, define your required format, and update everything, that does not match your format, first. Your format may be "#.###,##" or "####.##"; I would prefer the last, because that can be converted into a number directly. You may even consider to convert everything to cents, which makes it easy for you to detect new values, that are not yet converted.
    I dont uderstand the format "#.###,##" . What does it exactly mean ???

  8. #8
    Join Date
    Aug 2003
    Location
    Delft, The Netherlands (EU)
    Posts
    447
    Originally posted by blom0344
    Can't do that.

    Some of the fields are filled by procedures from the AS400 system.
    THere is something fundamentaly wrong with the way the data is handled on the SQL database. (For which I just mailed an angry mail to all involved) ,
    I understand, that you don't have control of the import process, but who is preventing you from cleaning your data afterwards?


    Originally posted by blom0344
    I have all the differnent flavors in the DB:

    1.203,56
    1,203.56
    456,67
    456.67

    You see replacing the comma is okay if it comes as first one in the string.
    If there is just a comma replace it by a '.'
    If a '.' precedes a ',' replace ','by '.' and get rid of the '.'

    Nice challenge ain't it???
    As I said, determine first your possible flavors. What is imported when your price is 456.- or 456.50? Is is 456[,|.]00 or just 456? So, actually I'm asking whether your decimal point is always on the 3rd last position? Your algorithm depends on that. Another point is whether you can expect a maximum number of digits, or not. So, can you also have, for example 1,234,567.89?

    Assuming, your possible flavors are those 4 you gave me, a query like I propose would look like (your price field is called p)

    SELECT p FROM T WHERE len(p)=6 AND SubString(p, 4,1)="."
    UNION
    SELECT replace(p,',','.') FROM T WHERE len(p)=6 AND SubString(p, 4,1)=","
    UNION
    SELECT replace(p,',','') FROM T WHERE len(p)=8 AND SubString(p, 6,1)="."
    UNION
    SELECT replace(replace(p,'.',''), ',','.') FROM T WHERE len(p)=8 AND SubString(p, 6,1)=","

    Calling this union query U you can do your accumulation like

    SELECT sum(cast(p as decimal(10,2))) from (<U>)
    Make everything as simple as possible, but not simpler! - A. Einstein
    DB Problems? DB Explorer, BTrieve Re-engineering, DB Conversions & ETL? Conversion Tool

  9. #9
    Join Date
    Jan 2003
    Location
    Zutphen,Netherlands
    Posts
    256
    ---- replace(replace(p,'.',''), ',','.') -----

    Yep,

    I am working with the replace on replace in my solution as well.
    Does not seem to work like it should.
    I want to use the CASE instead of UNION solution , cause I am going to assign it to a BO object

    We'll continue next week...
    Ties Blom
    Senior Application Developer BI
    Getronics Healthcare
    DB2,ORACLE,Powercenter,BusObj,Access,
    SQL, SQL server

  10. #10
    Join Date
    Jan 2003
    Location
    Massachusetts
    Posts
    5,799
    Provided Answers: 11
    If you know you have only two decimal places, how about something along the lines of eliminating all commas/periods, then dividing by 100?

    select convert(numeric (10, 2), replace(replace(value, ',', ''), '.', '')))/100.00

    A bit pressed for time here, so I have not tested that code snippet. Hope it helps.

  11. #11
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Elegant solution, MCrowley.

    ...but make sure the values don't just only have two decimal places, but that the ALWAYS have two decimal places.

    blindman

  12. #12
    Join Date
    Aug 2003
    Location
    SW Ohio
    Posts
    198

    Re: 'corrupted' values in database

    Originally posted by blom0344
    I just started working on reports on an SQL 2000 database.
    Many field contain numerical values stored as varchar. To make matters worse these strings contain both '.' and ',' as decimal sign.
    So far this has really been a great pain :-(

    The database and its application are bought as is, I cannot modify fieldtypes or anything else.

    Can anyone give a good strategy to convert the strings back to numerical so I can calculate revenue's and such????

    P.S.

    In extreme cases string contain both ',' AND '.' , like:

    5.608,42
    Dumb question ... but, any chance you are dealing with software that handles multi-currency? Do you have the CCSID translation turned on from the AS400 to the SQL server?
    Jim P.

    Supoorting Oracle, MSSQL7, Sybase 8, & Pervasive. Confusion Reigns

  13. #13
    Join Date
    Jan 2003
    Location
    Zutphen,Netherlands
    Posts
    256
    Most certainly not a dumb question, but in this case we are talking about E-commerce orders from two euro countries, so every order amount is always just in one currency.

    McCrowleys solution does indeed work for me, cause every order is calculated down to the euro-cent
    Ties Blom
    Senior Application Developer BI
    Getronics Healthcare
    DB2,ORACLE,Powercenter,BusObj,Access,
    SQL, SQL server

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •