Unanswered: Large OLTP table, Index common field, Slow INSERT performance
I'm trying to design an efficient OLTP database which can handle a high INSERT rate (at least 100 per second), but still be searched relatively efficiently.
My primary OLTP table looks like:
CREATE TABLE Transactions ( Time BIGINT NOT NULL, CustId VARCHAR(20) TermId VARCHAR(12), Amt FLOAT, ... )
There are roughly 800,000 unique customer ids, 100,000 unique terminal ids.
So for every customer transaction an entry goes into this table (BIGINT is a high resolution timestamp). The clustered index is defined on Time, which is normally increasing. As you'd expect, the INSERT performance on the table as it stands is very good.
However, there are two common search cases:
- Most recent transactions by customer - SELECT TOP 100 * from Transactions where CustId = '010101037' order by Time desc
- Most recent transactions at terminal - SELECT TOP 100 * from Transactions where TermId = '475794' order by Time desc
In order to satisfy these requests, I basically need an index on both TermId and CustId. As soon as I do this and the table grows past 20 million rows, INSERT performance consistently decreases.
I assume this is because of the increasing fragmentation on the TermId and CustId indexes?
Still, I'm convinced this must an incredibly common type of application (e.g Customer service at a bank looking up recent customer transactions). Am I missing some approach that will improve performance?
I hate to say this, because it borders on cheating.
Think about what the teacher has covered in class recently. I'm sure that they've recently explained indexing and index performance, and if they are teaching your class from a book I'd be willing to bet that there is a sidebar or even a section discussing index operations and performance.
There's more than one way to solve this problem, and the solution that will get you the most credit will depend on what the teacher is trying to get you to learn. You're going to have to guide us a bit before we can help you get inside the teacher's head to find the solution that will get you the most credit.
Unfortunately this isn't a university assignment, it's a real world problem I'm trying to solve.
As I understand it, my performance problems are arising from the fact that my index columns are ordered such that the left most column repeats. This means that SQL server is constantly page splitting to put the new indexes (with high resolution timestamps) in.
So, if the index is 'create index i_custid on Transactions ( CustId, Time )', I end up with a b-tree that has leaf nodes that constantly need to be split.
This is all well and good, but it's not at all obvious to me how to avoid it. If I change the index to be ( Time, CustId ), performance is drastically better, but now the index cannot be used for the most common query (i.e 'SELECT TOP 100 * from Transactions where CustId = '9444' order by Time') without doing an index scan.
I'm sorry if I'm missing something totally obvious, but most discussions on this seem to just indicate you just have to expect the performance problems or apply the index only after loading all of the data. Adding the index is not really appropriate for an online system.