Unanswered: Best practice to handle poor data quality in relationships?
I'm attempting to normalize an enormous table with order data, but I'm running into some problems. The table currently contains many duplicates, of which also included the actual order information (yikes!), but I managed to normalize it almost all the way down. It appears that different accounts can be used on orders, and these order numbers are being recycled for some reason months down the line (don't ask my why they're reusing them for future orders because I have no idea either, they should be creating new order numbers). Of course, the Order number is the primary key in my table as it should be. I guess the same thing can occur with the sales rep. Anyway, I'm struggling to find the "best practice way" to deal with this situation. I'm almost tempted to create an intermediary "transaction table" or something like that between the main general order information (which at this point will basically be the Order Number and Customer ID only), then include a table with the account information and sales rep info, then have that link to the Order Detail with the products, quantity, order number and various dates for those order numbers. Order maybe it should be a seperate, related table, but not between the general order information and the order details? Can anyone tell me if I'm on the right track for this situation? It was a total curveball that the rep and account information could be different on these orders.
Order (Order #, Customer) -> Transaction Information (Order #, Account Type, Sales Person) -> Order Details (dates, products, quantities, etc)
Order (Order #, Customer)---> Transaction Information (Order #, Account Type, Sales Person)
|-> Order Details (dates, products, quantities, etc)