Results 1 to 2 of 2
  1. #1
    Join Date
    Aug 2014
    Posts
    1

    Unanswered: Performance issues with PostgreSQL 9.2 on RHEL 6

    We're doing performance-testing of an OLTP product built on Java (hosted on JBoss/ Tomcat) and PostgreSQL, that is supposed to process around 9000 user-requests in an hour (each request takes up several seconds due to huge data volume being submitted) on an RHEL VM with 32 GB RAM and 16 cores.

    It works fine until we are sending around 6000-7000 requests but when we send around 9K requests, Postgres starts throwing exceptions in its log. The errors we get are one or more of:

    * Could not fork new process for connection. Resource unavailable
    * Checkpoints are happening too frequently
    * Unable to write log
    * Similar sh*t

    Consequently, on the Java side we get an error stating that it can't obtain more connections from the DB. The check-points error is gone after we changed the frequency at which check-points are being done, but the forking related error remains.

    Another interesting point is that the CPU utilization only goes up to a max of 50%, and that is exactly the point when this error starts coming.

    Any idea why RHEL is not allowing Postgres to fork more processes for the new connections when 50% of the CPU is still available? This is a dedicated DB server for this product and no other applications will run on it.

    Here are the settings:

    max_connections = 1400
    shared_buffers = 7GB
    max_files_per_process = 1000
    wal_buffers = 16MB
    checkpoint_segments = 32
    checkpoint_timeout = 5min
    checkpoint_completion_target = 0.9
    effective_cache_size = 14GB
    autovacuum = on
    autovacuum_max_workers = 51
    autovacuum_naptime = 10min
    track_counts = on

    Help on this will be really appreciated. We are stuck on this for about a week now and have tried various permutations and combinations. Any voodoo artists out there who know the black magic that runs this are welcome to please share their spells.

  2. #2
    Join Date
    Jun 2004
    Location
    Arizona, USA
    Posts
    1,848
    How many WAL files are you generating an hour?

    Have you looked at the output from pg_stat_bgwriter view? It will give you an idea as to the checkpoint write reasons. If the checkpoints are being driven by timing, you'll want is to adjust settings to make the checkpoint writes more frequent (and thus smaller.) With a checkpoint completion target of .9, the checkpoint writes (if triggered by time) will only leave 18 seconds to be completed before the collisions with new WAL files could happen.

    Even with 32G main memory, the shared buffers size seems a bit large. The minimum for 1400 connections, would be 2.8GB. I've seen reports that large systems seem to work well with 6-15% of available RAM. 15% of (32-14) would be 2.7 GB, so setting it at 3-4 GB may be helpful. This would also allow you to increase Effective_Cache_Size to the 17-18 GB range, which could also help.

    Ref Checkpoints and the background writer
    Lou
    使大吃一惊
    "Lisa, in this house, we obey the laws of thermodynamics!" - Homer Simpson
    "I have my standards. They may be low, but I have them!" - Bette Middler
    "It's a book about a Spanish guy named Manual. You should read it." - Dilbert


Tags for this Thread

Posting Permissions

  • You may not post new threads
  • You may not post replies
  • You may not post attachments
  • You may not edit your posts
  •