Results 1 to 12 of 12
  1. #1
    Join Date
    Sep 2009
    Posts
    6

    multiple vendor connections vs. centralized/ redundant db

    I have a project to create a dashboard that will connect to existing systems as well as create new features based on combining data from the existing systems. For example, the dashboard will be able to generate "orders" containing data merged from "members" (MS Access DB), "employees" (MySQL DB) and "products" (flat file), and there will also be new attributes particular to "orders."

    At first I thought it would be most efficient to have my application connect to each of the systems separately and perform cross-vendor joins between the different databases. But then I thought that creating a centralized/redundant db (built with scripts pushing and pulling data between the systems) might also be useful because it would empower some semi-technical staff to use products like OOBase, which can only make a single connection.

    Are there any other advantages to creating a centralized/redundant DB like the one I'm talking about? Or are multiple direct connections the best approach?

    Thanks in advance for any tips.

  2. #2
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    A key advantage of creating a Data Warehouse (the centralised, redundant database) is that it can be a conformed single version of the "truth". For instance you can perform any necessary conversion of units, data types and encoding schemes to make sure that all the data can be reconciled and summarised in useful ways.

    Also the warehouse approach will makes it easier to manage availability, performance and security issues.

    If you haven't built a data warehouse before then get some help. The failure rate for such projects is notoriously high and you'll need someone experienced to help you avoid the pitfalls.

  3. #3
    Join Date
    Sep 2009
    Posts
    6
    Thanks for your reply, dportas. I'm beginning to think a data warehouse is the way to go.

    However, I have never needed a consultant ("someone experienced") to help me build an information system. Can you recommend any books or tutorials? Is "The Data Warehouse Toolkit" still the standard?

    I would be writing the ETL in Python.

  4. #4
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Start with Bill Inmon's book "Building the Data Warehouse". Stay away from Kimball.

  5. #5
    Join Date
    Sep 2009
    Posts
    6
    Wow - that book is 17 years old! And "oldie but a goodie", huh?

  6. #6
    Join Date
    Mar 2009
    Location
    Sydney, Australia
    Posts
    258
    The miracle of flight is 100 years old. The size and complexity of aircraft has come a long way in that time, but there have been no advances in the theory of flight.

    If you do not understand the theory, you will build an albatross, or a grey goose. Marvellous ideas, from very clever people, but they did not fly. Success or aptitude in one field does not guarantee success in another field.
    Last edited by Derek Asirvadem; 09-23-09 at 11:49.
    Regards
    Derek Asirvadem (Formerly DerekA)
    Information Architect / Senior Sybase DBA
    Copyright 2009 Software Gems Pty Ltd

    I answer questions from the Original Poster only. If you have a genuine question, as the moderators have requested, start a new thread.

    http://www.softwaregems.com.au

  7. #7
    Join Date
    Sep 2009
    Posts
    6
    I've been reading about the differences between Inmon and Kimball, and all these complexities about data warehouses vs data marts, dimensions, star schemas, EAV tables and so forth, and I can't help feeling that it's more complex than it needs to be. Why can't I just look at my data dictionaries, come up with a nice, unified RD schema (integrating the new tables with the redundant/ updated ones), and schedule my ETL update scripts on a daily or event-driven basis?

    To give some perspective, we're a mid-sized retail business (160 employees, 600 members, $500K gross weekly sales).

  8. #8
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    Quote Originally Posted by yekibud
    Why can't I just look at my data dictionaries, come up with a nice, unified RD schema (integrating the new tables with the redundant/ updated ones), and schedule my ETL update scripts on a daily or event-driven basis?
    In a nutshell that is what you do. The devil is in the detail however. Dealing with problems such a late arriving data, non-conforming data, surrogate key assignment and representing changes over time are potentially complex problems in data warehouses.

    As I said before, I recommend you avoid the Kimball "dimensional" stuff. At best it is just jargon and at worst it can cripple your data model with inflexible dogmas.

  9. #9
    Join Date
    Sep 2009
    Posts
    6
    Thanks for the tips, dportas. At the very least, it sounds like Inman's ideas are something I should read up on, anyway - sort of general IT/DB knowledge. And hopefully it will help me out when I run into some practical issues.

  10. #10
    Join Date
    Jun 2007
    Location
    London
    Posts
    2,527
    Quote Originally Posted by yekibud
    I have never needed a consultant ("someone experienced") to help me build an information system. Can you recommend any books or tutorials? Is "The Data Warehouse Toolkit" still the standard?

    I would be writing the ETL in Python.

    ...

    To give some perspective, we're a mid-sized retail business (160 employees, 600 members, $500K gross weekly sales).
    Why not get someone in to advise you how to go about things - it's great to learn on a fun new project but getting things wrong will cost you long term. Once you have a few recommendations then you can consider how to go about building the system. It doesn't sound complex but why not do things right from the off. Would it help to have a more central system rather than lots of separate systems? What else can you do with your data when you get it all in one place?

  11. #11
    Join Date
    Sep 2009
    Posts
    6
    @dportas: I'm beginning to question whether what I really need is properly called a "data warehouse" - my original term "centralized / redundant db" is starting to sound better. My main issue is with Inmon's "non-volatility" requirement. Why would I want to duplicate an employee record every time an attribute changed? (e.g. change of address, status changed to 'inacitive', etc.) Is non-volatility an essential feature of a data warehouse?

    @mkie_bike_kite: Yeah, you're right. I wish I knew someone I trusted who had experience in these things.

  12. #12
    Join Date
    Dec 2007
    Location
    London, UK
    Posts
    741
    One of the desirable goals of a warehouse is to capture data to the greatest possible level of detail so you can support both current and future requirements. If you don't capture changes when they happen then that data is probably lost forever. You don't necessarily have to capture every change to every attribute. Part of the design process is deciding what history needs to be captured and to what level of detail. If you are reliant on taking daily "snapshots" of your data then you inevitably lose some of that day's changes anyway unless the history is always preserved at source as well.

    Quote Originally Posted by yekibud
    I wish I knew someone I trusted who had experience in these things.
    In the IT business, working alone on anything sucks. It means you have limited opportunities to learn and some of what you learn by experience will probably be defective knowledge. Speaking for myself, if I found I was working alone, particularly on something unfamiliar, for any period of time then that would be a good enough reason for me to look for employment elsewhere. That's just my 0.02 though.

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •