I am wondering about compressing data within tables. I've read that you can create tables with the "VALUE COMPRESSION" option and columns with the "COMPRESS SYSTEM DEFAULT" option, but I can't seem to get the results I expect. This is precisely what I read:
When VALUE COMPRESSION is used, NULLs and zero-length data that has been assigned to defined variable-length data types (VARCHAR, VARGRAPHICS, LONG VARCHAR, LONG VARGRAPHIC, BLOB, CLOB, and DBCLOB) will not be stored on disk. Only overhead values associated with these data types will take up disk space.
If VALUE COMPRESSION is used then the optional COMPRESS SYSTEM DEFAULT parameter can also be specified to further reduce disk space usage. Minimal disk space is used if the inserted or updated value is equal to the system default value for the data type of the column. The default value will not be stored on disk. Data types that support COMPRESS SYSTEM DEFAULT include all numerical type columns, fixed-length character, and fixed-length graphic string data types. This means that zeros and blanks can be compressed.
Sounds great. Especially the COMPRESS SYSTEM DEFAULT option (since all the data is numeric). The problem is that when I do a little experiment, it doesn't seem to work:
I create two tables - one set up for compression, and one not setup for compression:
>> db2 "create table db2inst1.compresstest(integer int compress system default with default 4) in TS_SIGNALS value compression"
>> db2 "create table db2inst1.nocompresstest(integer int) in TS_SIGNALS"
If i then run this loop (inserts 4 into each of the tables 10000 times) ...
for x = 1:10000
u.SQLInsert('insert into nocompresstest values(4)');
u.SQLInsert('insert into compresstest values(4)');
... I would expect that the compresstest table would be a lot smaller since the column is being loaded with the default value. However, when I look in the directory where the tablespace is located I see that both tables are the exact same size:
>> ls -lia /mnt/database/human_db/db2inst1/NODE0000/tablespaces/TS_SIGNALS
1610622487 -rw------- 1 db2inst1 db2inst1 10770380 Nov 30 15:20 SQL00006.DAT
1610622488 -rw------- 1 db2inst1 db2inst1 10770380 Nov 30 15:25 SQL00007.DAT
This indicates to me that either a) I haven't got compression set up properly, b) I don't understand how compression works, or c) compression doesn't work very well at all.
I have been stuck on this for days, so any help would be greatly appreciated.
either a) I haven't got compression set up properly, b) I don't understand how compression works, or c) compression doesn't work very well at all.
I choose option b. Let's do some calculations. INT is a 4-byte value. When you set the column with the default value compression most likely it stores a 1 byte long indicator for it instead of the actual value, so you're saving here 3 bytes in each row.
10000 rows give you 30000 bytes of savings. That's a bit less than eight 4 KB pages.
Since data files are allocated by extents, and the default extent size is 32 pages, your savings fall well within a single allocation unit. They're there, you just don't see them.
If you had a 10 million row table and a 128-byte long string as your column default value that'd make a bigger difference.