Only certain queries can take advantage of intrapartition parallelism. For your 4 cores, any large tables that will have table scans should have 4 containers. You should also use large page sizes and prefetch for these tablespaces (put your smaller tables in a different tablespace). With 4 containers, you prefetch size should be 4 times your extent size.
There are other db and dbm parms that need to be set besides setting intra_parallel at the dbm level. See this for details:
https://publib.boulder.ibm.com/infoc.../t0004886.html