If this is your first visit, be sure to check out the FAQ by clicking the link above. You may have to register before you can post: click the register link above to proceed. To start viewing messages, select the forum that you want to visit from the selection below.

 
Go Back  dBforums > Database Server Software > PostgreSQL > recovery PostgreSQL from hardware faults

Reply
 
LinkBack Thread Tools Display Modes
  #1 (permalink)  
Old 06-26-09, 12:17
kkam kkam is offline
Registered User
 
Join Date: Jun 2009
Posts: 1
recovery PostgreSQL from hardware faults

Hello,

Is there any way to check and/or recovery PostgreSQL from hardware faults?
Or maybe "alarm" somehow what exactly is damaged?
I mean how to find corrupted record(s)?

We were called to help folks with one of our server.
I found in log that the server had issue with memory:
"NMI received on CPU"
"Memory #08 Uncorrectable ECC, DIMM" errors)

Also I found error messages from glibc in pg_log:

ldsdata/pg_log

postgresql-Thu.log:*** glibc detected *** malloc(): memory corruption: 0x083c4068 ***

*** glibc detected *** malloc(): memory corruption: 0x083c4068 ***
<2009-06-11 01:40:35 PDT >LOG: server process (PID 8556) was terminated by signal 11
<2009-06-11 01:40:35 PDT >LOG: terminating any other active server processes
<2009-06-11 01:40:35 PDT 0 idle>WARNING: terminating connection because of crash of another server process


The server was replaced.
RAID box which is connected to the server wasn't changed so postgres DB files
remained unchanged.
Next we restarted the whole system, made sanity checks, PostgreSQL seemed to be OK.

After dozen hours we were called again.
I found in pg_log series or errors:

postgresql-Fri.log:*** glibc detected *** double free or corruption (!prev): 0x08385af8 ***
postgresql-Sat.log:*** glibc detected *** double free or corruption (!prev): 0x083cc2f8 ***
postgresql-Sat.log:*** glibc detected *** free(): invalid pointer: 0x083ce8e8 ***
postgresql-Sat.log:*** glibc detected *** free(): invalid pointer: 0x083ce900 ***
postgresql-Sat.log:*** glibc detected *** free(): invalid pointer: 0x083ce918 ***
postgresql-Sat.log:*** glibc detected *** double free or corruption (!prev): 0x08385b70 ***


=============

Also I found that the following processes were running and consumed 95% CPU:

0 S postgres 15065 1 0 76 0 - 4089 schedu Jun12 ? 00:00:09 /usr/bin/postmaster -D /a
1 S postgres 15067 15065 0 75 0 - 1497 schedu Jun12 ? 00:00:00 postgres: logger process
1 S postgres 15075 15065 0 75 0 - 4122 schedu Jun12 ? 00:00:01 postgres: writer process
1 S postgres 15076 15065 0 76 0 - 1505 schedu Jun12 ? 00:00:00 postgres: archiver proces
1 S postgres 15077 15065 0 76 0 - 1756 schedu Jun12 ? 00:00:01 postgres: stats buffer pr
1 S postgres 15078 15077 0 75 0 - 1565 schedu Jun12 ? 00:00:00 postgres: stats collector
1 R postgres 8269 15065 98 85 0 - 4500 - Jun12 ? 1-02:21:06 postgres: postgres ldsd
1 R postgres 15745 15065 96 85 0 - 4500 - Jun13 ? 02:44:52 postgres: postgres ldsdb

Fortunately we recovered PostgreSQL dD from backup/shadow server.
But I wonder if there is any way to recover damaged dB?

============
We use postgreSQL 8.1.3:

ldsdb=# SELECT version();
version
-------------------------------------------------------------------------------------
PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)
(1 row)

Thanks,
Krzysztof
Reply With Quote
Reply

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is Off
HTML code is Off
Trackbacks are On
Pingbacks are On
Refbacks are On