Page 1 of 2 12 LastLast
Results 1 to 15 of 22
  1. #1
    Join Date
    Aug 2004
    Location
    France
    Posts
    754

    Unanswered: About preventing disk contention / Optimizing cache

    Hello everyone,

    I'm trying to plan what disks I'll need for pre-production, and then production (number, for what use...).

    Here is what I plan to do (following some recommendations I already got on this forum):

    - 1 disk for UNDO
    - 1 disk for REDO LOGS
    - 1 disk for ARCHIVED REDO + SYSTEM + Oracle Install
    - 1 disk for TEMP
    - 1 disk for DATA that won't change that much during a typical day (+ 50000 rows / day, with Oracle Text indexes), but that will have to be often accessed
    - 1 disk for DATA that will change VERY MUCH during a typical day : millions of inserts / deletes, and will have to be accessed often too. The tables here will be Index-organized, and so won't influence indexes on the INDEXES disk
    - 1 disk for INDEXES

    What do you think of that ?

    Should the SYSTEM tablespace (and Oracle Install) be placed apart from the ARCHIVED REDO ?

    Is having 1 disk for DML intensive data indeed a good idea ?

    Now, another issue. Concerning buffer cache and block sizes, here is what I plan to do :

    - INDEXES would have 32K block size
    - DML Intensive data (in Index Organized Tables) would have 16K block size
    - Data with 30000 inserts a day (with CLOBs) would be 8K (or 16K, and the above 8K, don't know... these records are about 6000 bytes max (several varchar2 fields) + a CLOB, 5000 bytes in average... How could I know what of 8 or 16K would be best ?)
    - For other DATA, 4K block size.

    As you see, I plan to separate data depending on their nature and their modification speeds, which would allow each type of data to have a separate cache, for I fear that frequently updated data would take all the cache otherwise, deteriorating the performance of queries on other data... Am I not correct ?

    Thanks & Regards,

    RBARAER

  2. #2
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Dont forget that whatever configuration you go for make sure the disks are mirrored. On most prod systems I've worked on if you can afford it go for RAID 0+1 as it gives the best performance and redundency but at the cost of extra storage. The only one not to stripe would be your redo logs as they work better without striping (but should be mirrored).

    Alan

  3. #3
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Thanks Alan.

    BTW, concerning different bock sizes and Don Burleson's article that you pointed to me (http://www.dbazine.com/burleson2.shtml), it seems that Tom Kyte doesn't really love Burleson, and does not agree at all with him on his using different block sizes "a priori". See this excellent thread on asktom if you haven't already : http://asktom.oracle.com/pls/ask/f?p...:2913600659112 . What do you think of it ?

    Tom Kyte seems to be a very intelligent person, and his philosophy of Oracle seems much more scientific than Burleson's "artistic" one. Furthermore, I really like the way he explains things on AskTom. I already saw The_Duck recommend two of his books : "Expert One-on-One Oracle" and "Effective Oracle by design". I'd like to know if, apart from Tom Kyte, there are other authors (or "experts") people here would warmly recommend, or other ones people would recommend to avoid at all costs.

    Thanks & Regards,

    RBARAER

  4. #4
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Hi RBARAER

    I usually reference a number of expert sites including Tom Kyte, Burleson, Steve Adams, Jonathan Lewis etc. What I find is you have to find the advice which matches your application environment. For example the Tom Kyte article you point to suggests you should hardly ever rebuild indexes but due to the way our app works I have to rebuild all of my large indexes atleast once to reduce their size by 40%.

    The best way is to try out the hypothesis of these experts in your environment and see how they perform in your test cases. This requires you have a good testing application (or test cases) which you can repeatedly benchmark. This does take time but it helps a great deal later on.

    Alan

  5. #5
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    The best way is to try out the hypothesis of these experts in your environment and see how they perform in your test cases.
    I agree. Anyway, from what I have seen, some "experts" PROVE what they say, like Tom Kyte, and some others just SAY things because they say they KNOW them, without any real proof, like Burleson. I really prefer Kyte's way.

    AskTom is a GREAT site with very useful information, but I can't find how to post questions there. Does one need a support contract with Oracle or something like that to do that ? Is there another way to ask questions to Tom Kyte ?

    BTW, could you please give me your advice on this question from my first post :
    As you see, I plan to separate data depending on their nature and their modification speeds, which would allow each type of data to have a separate cache, for I fear that frequently updated data would take all the cache otherwise, deteriorating the performance of queries on other data... Am I not correct ?
    Thanks & Regards,

    RBARAER

  6. #6
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Ive never posted a question to Tom Kyte but I doubt that you need a support contract to do that. Far more likely is that he is swamped by queries so he takes very few and only occasionally.

    As for caches I would go along with the Burleson article as it does correspond with what other people have mentioned. I believe you dont have an OLTP type app so if you can go for a large standard blocksize i.e. 16 or 32K and assign tables/indexes according to the article.

    http://www.dbazine.com/burleson2.shtml

    Alan

  7. #7
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Thanks Alan.

    However, you still haven't answered my "Am I correct ?" question (see previous post), and my app is of OLTP type (very demanding OLTP ), with one big table that will be very frequently updated (up to thousands of inserts per seconds, and mass deletes (millions) every hour or so (hence my "very slow deletes" thread)), as well as very frequently queried for the current day's data (and a little on data inserted some days before). This big table will be an IOT (hence my thread on IOTs), because as I already told you, it really suits my needs, even concerning deletes (about 28 minutes for 4,1 million rows on my Sparc II dev machine with only 3 disks, so I expect far quicker in a better config... but I'll have to benchmark while doing many inserts/queries). Apart from this one, I'll have another table which will contain big varchar2s and a CLOB field (size between 1K and 8K approx.), with Oracle Text indexes and about 30000 inserts a day, and about 60 other tables, most of them containing quasi-static data, almost only queried to retrieve data.

    Now that you have a better idea of what my app looks like, if you have any suggestion about anything, don't hesitate.

    Concerning my "Am I correct" question, the point is :

    Is it good to have several caches : one for data very frequently updated/queried, and one for data not so frequently updated, but frequently queried (for example), or is it better to have ONE CACHE AREA ONLY ? (easier to tune maybe ?)

    Thanks & Regards,

    RBARAER

  8. #8
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    Well is it really OLTP, will you have lots of connections actually doing lots of small transactions (i.e. inserts/updates/deletes). Or will you have one process to bulk load your tables and then have lots of people just doing selects against the big table?

    If it really is OLTP then contention is the big issue which is why people use small block sizes otherwise a bigger blocksize would be better. As for the deletes I would strongly recommend going the manual partitioning route (and using truncate where possible) as otherwise this will become a bottleneck. If you dont go this route and use one IOT table then this table will get very big due to the way index deletes can leave holes in your btree which may not get reused.

    Whether is it OLTP or not you should always use multiple buffer caches as mentioned in the article. Usually use the KEEP cache for small/medium size lookup tables, the big tables should go in the recycle cache (assuming the size of table is considerably greater than your RAM) and the remaining tables/indexes in the default cache. The IOT table would probably be better off in the 32k block cache.

    Finally if this is an OLTP app you need a good testing application to mimic your users . On one OPS system which was very heavy OLTP (500+ transactions per second from 500-1000 connections and heavy contention) we spent nearly as much time devising and tuning the testing app as the main app.

  9. #9
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    Thanks.

    Well, let me explain how my app works.

    I'll have several processes that will receive data and insert them into the big table. Some other processes will insert data in the "CLOBs table". That should be some dozens insert processes.

    Then I'll have one or more "Query servers" written in OCCI (as clients won't access the DB directly). Clients will send queries to this server in a proprietary format, which will analyze them, execute the corresponding Prepared Statement, and send back the result. In fact I have "query threads" each specialized in one type of query (and each with its own connection to Oracle), each polling the clients' queries buffers to see if it contains a query for it, and if it does, it marks this as "done" for other threads of the same type, and processes the query, and so on. Clients will only be able to do some known queries through this mecanism. High parallelism is insured by one receive thread with a "query" buffer and one send thread with a "results" buffer for each client. Query threads take queries from the "query buffer", and then put results in the "results buffer" of the corresponding client, one query for a client, then next client... The number of "query threads" of each type is a parameter of the app that will be benchmarked, and probably highered if more clients get connected (in fact clients for my server are other servers to which some hundreds of real clients are connected, these servers just directing query messages to my server). I've got 2 different query servers running on this principle, one with 12 query types, the other one with 5 (originally they were each connected to a different instance, as I had 2 distinct instances, as you may already know, but as you suggested, I merged them). So, say I have 5 threads of each type, that would mean 85 different connections to Oracle as querying clients, certainly more when we have LOTS of clients (only selects). Queries will be done on almost all tables in the DB, but I expect often on this big one and the "CLOBs" one.

    Of course I'm using only bind variables. For inserts, I'm using batch inserts through C arrays binded to each variable in the queries, and I commit every second. For selects, I'm also using C arrays binded to each column in the resultset so as to fetch results 50 at a time.

    I see it more as an OLTP system than as a DW one. Don't you think ?

    If you have any suggestion about some bottleneck you can foresee in this architecture, or some advice, please let me know.

    Thanks & regards,

    RBARAER
    Last edited by RBARAER; 12-15-04 at 13:55.

  10. #10
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    I dont really see it as an OLTP as you dont have loads of connections doing lots of small transactions (and the assciated contention etc), its more like a normal database which is a cross between OLTP (lots of small selects) and DW (large batch load jobs).

    By what criteria are inserting and deleting from this big table. Do you for example have lots of separate datasets which you delete when you have finished with them. If this is the case how many datasets do you expect to have at any point in time and how many rows per dataset.

    Finally is the application code under your control or has it been written by an external party?

    Alan

  11. #11
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    OK for "normal database", just that in the Oracle doc they always refer to OLTP or DW systems, not "normal" ones, so, as mine is definitely not a DW, I thought it was OLTP. "Intermediate" suits me.

    In fact, each inserting process receives data from different data sources, each between different hours in the day, covering about 24/24. Deletes will occur to purge data that is considered too old (they will be backuped and then deleted). For now, we consider either hourly deletes (for all data sources sending data at the same time), or daily deletes (for all data sources). As for the "datasets" you refer to, you can consider either one day of data from one source as a dataset, or one day of data from all sources as a bigger dataset. At the beginning, we should have about 10 data sources, could be dozens in the future (a few hundreds max). Each data source would represent between 200 000 and 3 000 000 rows inserted per day, say about 1 000 000 rows per day in average (but hard to know exactly for now).

    The application code is under my control : I've written it and I'm in charge of it, so don't hesitate to make any suggestion.

    Best Regards,

    RBARAER

  12. #12
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    OK one last question, do your query servers select from just one source/one day of data, multiple sources/one day or do they query multiple sources/multiple days worth of data?

    Alan

  13. #13
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    On my big table, selects are only for one source on one day.

    RBARAER

  14. #14
    Join Date
    Mar 2002
    Location
    Reading, UK
    Posts
    1,137
    OK how about this

    1) Populating your table
    Your bulk load process calls an oracle function called get_table(day, source). This procedure will create a table (and indexes) called say bigtable_<day>_<source> IF it doesnt already exist and return the tablename back to your process. Your process will then use dynamic sql to insert into <tablename>... Do a periodic commit but dont commit every second try every 30 seconds say.

    2) Query servers
    Your query server will also call get_table(day, source) according to the parameters passed to it from your users. It will then use dynamic sql to generate your select statement against the tablename returned by the function.

    3) Purging old data
    You could have either an oracle job or a user initiated job which will purge old data by doing the following
    - create a cursor for select table_name from user_tables where table_name like 'bigtable_<day>%'
    - execute immediate 'truncate table '||tablename for each record returned from the above cursor
    - execute immediate 'drop table '||tablename for each record returned from the above cursor

    Advantages are:
    - loading will be fast and wont affect other queries much as it wont be inserting into a table which another process is reading from.
    - instant deletes as it uses truncate. This also means you arent going to stress the undo tablespace or your redo logs. Far fewer redo logs will be generated so less chance of running out of disk space etc.
    - faster queries as you will be querying a fraction of your total records ( as they will be split across n different tables)
    - indexes wont have to rebuilt or coalesced as they wont have holes in them as they would if you just had one table (or IOT). This also means your table/indexes wont grow much in size.
    - If you move to oracle partitoning in the future the get_table function can return just the single tablename without needing any code change in your query/loading processes.

    Alan

  15. #15
    Join Date
    Aug 2004
    Location
    France
    Posts
    754
    OK, I see your point, but...

    Do a periodic commit but dont commit every second try every 30 seconds say
    I cannot, because I have to be "almost" up-to-date, 1 second max.

    loading will be fast and wont affect other queries much as it wont be inserting into a table which another process is reading from
    Unfortunately, users will mostly access the table concerning the current day, that is the one inserted into.

    instant deletes as it uses truncate.
    OK.

    faster queries as you will be querying a fraction of your total records
    Sure about this ? I thought that properly indexed data would be accessed as quickly in a table with few records as in a table with millions of records, wouldn't it ?

    indexes wont have to be rebuilt or coalesced
    OK.

    If you move to oracle partitoning in the future...
    Right, that's a good point.

    Your solution seems better to me than the one you proposed using a view and an instead_of trigger, as neither inserts nor selects performance should suffer. However, in what way would it be better to create tables such as bigtable_date_source instead of simply bigtable_date, as I will purge data depending on the date ? I mean, as I query one different source at a time and have one insert process for each source, I could, but what would be the advantage over simply a date split ?

    Best Regards,

    RBARAER

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •