Page 1 of 3 123 LastLast
Results 1 to 15 of 38
  1. #1
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364

    Unanswered: Data File Import prefference XML or CSV

    Hi Guys

    We're designing a system to import Retail Sales Till data from 800 Shops with fairly modern Tills which will hold Transactions in Client Access Databases.

    Basic Data is
    (Shop_ID,Till_ID,Receipt_No,Receipt_No,Receipt_Lin e_No,Sales_Date,Item_Category,...........)
    Record Width of above in one De-normalized SQL Table is 46 Bytes
    Average is 1.5 Transactions per Receipt
    Average File size per shop is 150k - SFTP server will receive 800 of these per day.
    As we will be owning the software on each of the tills we expect the data is going to be very clean in terms of validation.

    All this has to be loaded into a SQL2005 Box overnight

    The Question is what File format should we choose to receive at HQ

    Current proposal is .XML - My proposal is CSV

    Another Question is how best to import the data

    Current Proposal is to write a .NET application to perform import - My Proposal is to just use SQL to do it.

    What do you guys think & why

    Sorry I've not been able to find a previous topic on Subject but will happily read if you have any links

    Further info on request

    GW
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

  2. #2
    Join Date
    Feb 2004
    Location
    One Flump in One Place
    Posts
    14,912
    If the .NET app is intended to insert from the file a row at a time, SQL. If it is intended to be an interface that then feeds a BCP\ Bulk Insert command to SQL Server then fine. If the import spec is XML do you need .NET to shred it? (I don't import XML so I'm really woolly on this I'm afraid). I would vote for csv but only for the reason that I am comfy with it. I would expect an overhead getting the XML into a relational (albeit denormalised) form.

  3. #3
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364
    Thanks Pootle

    If the .NET app is intended to insert from the file a row at a time, SQL. If it is intended to be an interface that then feeds a BCP\ Bulk Insert command to SQL Server then fine.
    Imports will happen unattended overnight with no User Interface involved, as far as I'm aware the idea was to read the XML into a Dataset probably ADO and then INSERT * into the database as a set operation. As yet there is no Pre-Validation (Row by Row or Set Based) of this data required (as we own the feed it should be clean)

    There is no import Spec as we are still designing/writing this.

    I've only dealt with importing XML once on to a 2000 Box and it was horrendous/prone to failures validating against templates n trying to prove it had (the intimated) referential integrity with the target system.

    I personnaly Find XML very verbose & troublesome if you totally own and only expect to be transporting data files between 2 known EndPoints.

    I mean why transfer this tagging info etc when you own both ends of the pipe.

    Granted you can transport stuff Relationally which can reduce file size. But this is often traded off with Tag info size.

    But in this case the Header is a Receipt and the detail is the receipt item with only 1.5 items per receipt, meaning not much would be saved transporting it relationally.

    Can someone please tell me why Businesses default to spec XML for EVERYTHING?

    Whats good about XML for this (end to end) type of work
    (please don't just post the waffle I hear from management who've been to seminars - just technical points of view)

    They'll be wanting us to transport in XML between 2 SQL Boxes on A LAN next

    GW
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

  4. #4
    Join Date
    Sep 2005
    Posts
    161
    One reason XML is popular is because it's a platform independent way of exchanging data. But it has overhead in both parsing and file size. Your 46 bytes per row will probably balloon to more than 200 bytes with XML. Since you have a choice, I would go with CSV.

    The next issue is performance. When inserting data, the number of rows becomes very important. You will have 3260 lines per file, or 2,608,695 total lines per night. If you use a .NET application to perform the import, you are forced to insert line by line, ultimately with 2,608,695 insert statements into the database. Even if you could get 1000 inserts per second (unlikely on average server hardware), you are approaching an hour for the total operation. Skip the overhead using ADO .NET and use bcp to import a list of csv files. You can then manipulate this data using SQL if needed, but stick to set operations instead of cursors.

  5. #5
    Join Date
    Jun 2003
    Location
    Ohio
    Posts
    12,592
    Provided Answers: 1
    Recommendation: CSV.
    Reason: XML is a virus.

    Quote Originally Posted by GWilliy
    Can someone please tell me why Businesses default to spec XML for EVERYTHING?
    Because XML is a virus.

    Quote Originally Posted by cascred
    One reason XML is popular is because it's a platform independent way of exchanging data.
    ...and CSV isn't?
    If it's not practically useful, then it's practically useless.

    blindman
    www.chess.com: "sqlblindman"
    www.LobsterShot.blogspot.com

  6. #6
    Join Date
    Sep 2005
    Posts
    161
    Quote Originally Posted by blindman
    Recommendation: CSV.
    ...and CSV isn't?
    I thought someone might try to call me on this


    CSV is platform independent. But some [platform omitted] programmers like to be provided with a .wsdl that they can then feed into their [IDE omitted] which will automatically produce some nicely bloated classes that consume web services. You know the ones... Also, don't dare provide a .wsdl that [IDE omitted] chokes on, because it will be your problem, not their's.

  7. #7
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364
    I'm 100% with Blindman on this one, never thought of it as a Virus before though, lol.

    So is it true that a Dataset read from wherever into a .NET app cannot do a SET operation INSERT into SQL?

    Or is this Just specific to ADO
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

  8. #8
    Join Date
    Jul 2003
    Location
    San Antonio, TX
    Posts
    3,662
    BULK INSERT, and if needed, - format file spec. XML (though technically not a virus ), can also be easily parsed using MDX extensions of TSQL, but the effort is a lot more intense. It will strengthen your job security, of course, because nobody else will bother to learn all your .data(...), .query(...) and .value(...) manipulations, but along with that you will have to ramp up pretty heavily on XML schema definitions to handle it properly.
    "The data in a record depends on the Key to the record, the Whole Key, and
    nothing but the Key, so help me Codd."

  9. #9
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    xml is easy peasy and this should get you off and running pretty quickly...

    http://www.w3schools.com/Schema/default.asp
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  10. #10
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364
    Thanks for the link Thrassy, but why do people assume I have a fear of XML?

    Apart from the BlindOne, most people seem to be asking me to learn more about XML.

    Because of this, the Answer to my Question remains unclear to me

    Given design control of two database endpoints, with no chance that this data will need to be available to external sources (It's remote EPOS Till Data for goodness sake) the data will be clean & flat.
    The volume of data will be large.

    Would you reccomend the solution be :-

    XML with .NET applications for Import/Export

    OR

    CSV & BCP

    Cmon Chaps, This should be Bread & Butter stuff.

    Or am I missing something?

    GW
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

  11. #11
    Join Date
    Nov 2004
    Location
    on the wrong server
    Posts
    8,835
    Provided Answers: 6
    if speed is an issue, I might go with CSV and BULK INSERT.
    “If one brings so much courage to this world the world has to kill them or break them, so of course it kills them. The world breaks every one and afterward many are strong at the broken places. But those that will not break it kills. It kills the very good and the very gentle and the very brave impartially. If you are none of these you can be sure it will kill you too but there will be no special hurry.” Earnest Hemingway, A Farewell To Arms.

  12. #12
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364
    OK Thrassy

    Speed is not an issue - we have a four hour window to load these files.

    Why should I go with XML?

    thanks

    GW
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

  13. #13
    Join Date
    Jan 2007
    Location
    UK
    Posts
    11,434
    Provided Answers: 10
    As a general rule I would chose CSV over XML because [IMHO] CSV is just far, far, far, far, far simpler to manipulate and produce in a RDBMS environment.

    XML, whilst apparently being simple, is notoriously nasty when it comes to putting it back in to a normalised schema.

    Just my £0.01
    George
    Home | Blog

  14. #14
    Join Date
    Sep 2005
    Posts
    161
    If you are in control of both the creation of the file and the consumption of the file, I don't see any benefit to using xml. If you just want the xml experience, go for it. You could also make name tags for all of your employees to wear to in house meetings, but why would you want to?

  15. #15
    Join Date
    Jan 2003
    Location
    Nottinghamshire, UK
    Posts
    364
    Thanks Guys that makes sense to me.

    This is not the case here, but what if there was a supplier of the file & he said !!

    "How would you like your files XML or CSV, it makes no difference to us"

    What would you say ?
    "Everything should be made as simple as possible, but not simpler." - Albert Einstein
    "Everything should be made as complex as possible, so I look Cleverer." - Application Developer

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •