Unanswered: Understanding Dirty Pages and Buffer Cache
I’m having problems understanding the relationship between checkpoints, buffer cache, dirty pages and transaction logs. Please tell me if the definitions below are correct. I’m most confused over the definition of dirty pages and buffer cache. I’ve read a few books that appear to have slightly different defnitions.
(Definition #1) Any modified pages, residing in the buffer cache, not yet flushed to disk
(Definition #2) Any uncommitted data residing in buffer cache
Which is correct? Isn’t a dirty page data that has been modified and committed? I thought uncommitted data was data that has been modified, but the transaction is not complete and the data has not been written to disk.
(Definition #1) Consists of Procedure Cache and Data Cache. The Data Cache portion contains data modification, both clean and dirty, that have not been written to disk
(Definition #2) Consists of Log and Data cache
Flushes uncommitted data “dirty pages” from the buffer cache and only occurs when the database is in SIMPLE mode.
Writes data changes to the transaction log. (I assume the data also is written to the Buffer Cache)
When data is inserted, deleted or updated within a BEGIN TRAN – COMMIT TRAN or within a batch string terminated by GO, the data is written to buffer cache. If using FULL recovery mode at what point is the data written to the Transaction Log? Is it first written to the buffer cache and then to the T-Log?
-- Deletes 100,000 records
From Table1 A, Table2 B
Where A.PolicyNumber = B.PolicyNumber
Since a GO was issued prior to the Checkpoint, isn’t the data considered Clean (“committed”) and therefor a Checkpoint will not reduce the size of the log by the 100,000 records that were deleted?
A dirty page is a page that has not yet been written to the disk. You can (and most often will) have many pages that are different in memory as opposed to the copy on disk. They are called dirty, because the application has "smudged" them with new data. Once they are written to disk and both the disk copy and the memory copy agree, then it is no longer dirty.
As for the Buffer cache, I thought that the procedure cache was separate from the buffer cache. At least in 6.5, it was the same pool, but different "sides". You had to define how much of your memory allocation was for the proc cache. SQL Server defaulted to 30%. Anyone who came into this business after SQL 7.0 doesn't remember a bit of that trivia ;-). Generally, I think of the buffer cahce as containing any data buffer, so that would be log as well as data. The log contains "data" of it's own. Usually page numbers, transaction id's, etc..
Checkpoints have to happen in full and simple modes. Otherwise, no one's database would ever come back from a reboot. Checkpoints are done asynchronously by one of the background processes (spids 1-15 or so). This is to keep the I/O of writing of data from interfering with your reading in other pages from disk. Checkpoints can be triggered by a number of things ranging from the manual issuing of the checkpoint command to a need for free buffers caused by a huge table scan. Oddly enough, a shutdown does not give a checkpoint. If you have long re-start times, you may want to consider issuing a checkpoint before a shutdown (assuming it is a planned shutdown).
A commit transaction will mark your transaction as "completed" in the trasnaction log. Remember: All data is written first to the transaction log, then to the data files. All data is updated in memory first, as it is faster, but the hope is that data makes it to disk eventually. If something does not make it into your transaction log, it has no business on your data file. On recovery (such as after reboot), SQL Server goes through the transaction log to make sure that the current copy of the data is consistent. If it is not consistent, then it is suspect.
I think the confusion you are having is in the fact that transactions and checkpoints are actually largely unrelated. The only relation they have is that a checkpoint only flushes committed data to the data files. The transaction log should be written to synchronously (in real time), but do not quote me, as that could be just my own misconception.
In your example, the pages (buffers) in memory could be dirty for a long time (more likley 2 minutes tops), if the system is very lightly used. If you have a system that has a lot of write activity, then the checkpoints are issued more often (depending on the recovery interval setting).
That's a bit of a rambling post, but I hope it helps. Feel free to ask for more clarifications of individual areas.