Hello,

Is there any way to check and/or recovery PostgreSQL from hardware faults?
Or maybe "alarm" somehow what exactly is damaged?
I mean how to find corrupted record(s)?

We were called to help folks with one of our server.
I found in log that the server had issue with memory:
"NMI received on CPU"
"Memory #08 Uncorrectable ECC, DIMM" errors)

Also I found error messages from glibc in pg_log:

ldsdata/pg_log

postgresql-Thu.log:*** glibc detected *** malloc(): memory corruption: 0x083c4068 ***

*** glibc detected *** malloc(): memory corruption: 0x083c4068 ***
<2009-06-11 01:40:35 PDT >LOG: server process (PID 8556) was terminated by signal 11
<2009-06-11 01:40:35 PDT >LOG: terminating any other active server processes
<2009-06-11 01:40:35 PDT 0 idle>WARNING: terminating connection because of crash of another server process


The server was replaced.
RAID box which is connected to the server wasn't changed so postgres DB files
remained unchanged.
Next we restarted the whole system, made sanity checks, PostgreSQL seemed to be OK.

After dozen hours we were called again.
I found in pg_log series or errors:

postgresql-Fri.log:*** glibc detected *** double free or corruption (!prev): 0x08385af8 ***
postgresql-Sat.log:*** glibc detected *** double free or corruption (!prev): 0x083cc2f8 ***
postgresql-Sat.log:*** glibc detected *** free(): invalid pointer: 0x083ce8e8 ***
postgresql-Sat.log:*** glibc detected *** free(): invalid pointer: 0x083ce900 ***
postgresql-Sat.log:*** glibc detected *** free(): invalid pointer: 0x083ce918 ***
postgresql-Sat.log:*** glibc detected *** double free or corruption (!prev): 0x08385b70 ***


=============

Also I found that the following processes were running and consumed 95% CPU:

0 S postgres 15065 1 0 76 0 - 4089 schedu Jun12 ? 00:00:09 /usr/bin/postmaster -D /a
1 S postgres 15067 15065 0 75 0 - 1497 schedu Jun12 ? 00:00:00 postgres: logger process
1 S postgres 15075 15065 0 75 0 - 4122 schedu Jun12 ? 00:00:01 postgres: writer process
1 S postgres 15076 15065 0 76 0 - 1505 schedu Jun12 ? 00:00:00 postgres: archiver proces
1 S postgres 15077 15065 0 76 0 - 1756 schedu Jun12 ? 00:00:01 postgres: stats buffer pr
1 S postgres 15078 15077 0 75 0 - 1565 schedu Jun12 ? 00:00:00 postgres: stats collector
1 R postgres 8269 15065 98 85 0 - 4500 - Jun12 ? 1-02:21:06 postgres: postgres ldsd
1 R postgres 15745 15065 96 85 0 - 4500 - Jun13 ? 02:44:52 postgres: postgres ldsdb

Fortunately we recovered PostgreSQL dD from backup/shadow server.
But I wonder if there is any way to recover damaged dB?

============
We use postgreSQL 8.1.3:

ldsdb=# SELECT version();
version
-------------------------------------------------------------------------------------
PostgreSQL 8.1.3 on i686-pc-linux-gnu, compiled by GCC gcc (GCC) 3.3.3 (SuSE Linux)
(1 row)

Thanks,
Krzysztof