The file system caching was actually one of the main reasons to use raw devices. The caching done by the operating system was just adding another caching layer, robbing memory from the DBMS and providing useless double-buffering. With direct I/O, these times are over and it is generally preferable to have a simpler administration over squeezing out another 1% of performance.
Typically, more severe performance problems exist in other places like missing proper indexes, misconfigured memory heaps, crappy SQL statements, applications that don't treat a DBMS as a DBMS but rather as a dump file system, ... If you want to do real performance testing going to the high-end (e.g. TPC-x), you should get in touch with the IBM labs to get dedicated support. There are people who are working full-time on performance tuning and performance improvements in DB2.
Btw, you could have up to 32K "inlined" LOBs in DB2 V7 already if you wrapped them into structured types. That made accessing those LOBs rather messy, but it did and still does work. :-)