Bug 479931 - occasional libdb: DB_ENV->log_flush and Database environment corrupt and PANIC: DB_RUNRECOVERY
Summary: occasional libdb: DB_ENV->log_flush and Database environment corrupt and PANI...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: 389
Classification: Retired
Component: Directory Server
Version: 1.1.3
Hardware: i386
OS: Linux
low
urgent
Target Milestone: ---
Assignee: Rich Megginson
QA Contact: Chandrasekar Kannan
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2009-01-14 06:30 UTC by nimitgarg
Modified: 2015-01-04 23:35 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
rhel5.1 32 bit and FDS 1.1.3
Clone Of:
Environment:
Last Closed: 2012-01-09 21:18:27 UTC
Embargoed:


Attachments (Terms of Use)

Description nimitgarg 2009-01-14 06:30:17 UTC
Description of problem:

On my production machine FDS is running in multimaster replication mode. But sometimes my service get stuck and in the logs i get the following error:


[14/Jan/2009:10:55:27 +051800] - libdb: DB_ENV->log_flush: LSN of 5436/8995196 past current end-of-log of 5436/625098
[14/Jan/2009:10:55:27 +051800] - libdb: Database environment corrupt; the wrong log files may have been removed or incompatible database files imported from another environment
[14/Jan/2009:10:55:27 +051800] - libdb: PANIC: DB_RUNRECOVERY: Fatal error, run database recovery
[14/Jan/2009:10:55:27 +051800] - libdb: mihRoot/guid.db4: unable to flush page: 212875
[14/Jan/2009:10:55:27 +051800] - Serious Error---Failed to trickle, err=-30977 (DB_RUNRECOVERY: Fatal error, run database recovery)
[14/Jan/2009:10:55:27 +051800] - libdb: PANIC: fatal region error detected; run recovery
[14/Jan/2009:10:55:27 +051800] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30977 (DB_RUNRECOVERY: Fatal error, run database recovery)
[14/Jan/2009:10:55:27 +051800] - libdb: PANIC: fatal region error detected; run recovery
[14/Jan/2009:10:55:27 +051800] - FATAL ERROR at idl_new.c (1); server stopping as database recovery needed.
[14/Jan/2009:10:55:27 +051800] - libdb: PANIC: fatal region error detected; run recovery
[14/Jan/2009:10:55:27 +051800] - FATAL ERROR at idl_new.c (1); server stopping as database recovery needed.
[14/Jan/2009:10:55:27 +051800] - libdb: PANIC: fatal region error detected; run recovery
[14/Jan/2009:10:55:27 +051800] - Serious Error---Failed in deadlock detect (aborted at 0x0), err=-30977 (DB_RUNRECOVERY: Fatal error, run database recovery)
        Fedora-Directory/1.1.3 B2008.269.157
        NMLUDB01.edc.mihi.com:8888 (/etc/dirsrv/slapd-NMLUDB01)

[14/Jan/2009:11:01:44 +051800] - Fedora-Directory/1.1.3 B2008.269.157 starting up






Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1.N/A
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Rich Megginson 2009-01-14 15:06:40 UTC
Any odd errors in /var/log/messages or any other disk or hardware errors?
What is in the access log at or before [14/Jan/2009:10:55:27 +051800]?  What was the server doing at that time?  What is in the error log at or before that time?

How many records do you have in your database?  What is the approximate search and update rate?

Comment 2 Rich Megginson 2009-02-20 15:16:18 UTC
When the server starts up, does it successfully recover the database?

Comment 3 nimitgarg 2009-02-21 03:48:38 UTC
Hi Rich

Thanks for your response. We did not see any odd message in /var/log/messages. and no hardware or disk error on the server. At that point of time nothing new was going on to server, and it's usual activity that was happening. 

In our DB we have around 4 millions of records. Approximate search on server is 500 search per minute and max 10 to 15 updates per minute.


A number of time we receive the same error. and nothing odd get logged in /var/log/messages.   

Please let me know in case u need further any info.

Comment 4 Rich Megginson 2009-04-09 14:41:13 UTC
Please try the newly released Fedora DS 1.2.0 to see if helps address this problem.

Comment 5 Rich Megginson 2009-08-25 18:57:24 UTC
Please try to reproduce with 1.2.0 or later.

Comment 6 Martin Kosek 2012-01-04 13:42:47 UTC
Upstream ticket:
https://fedorahosted.org/389/ticket/106


Note You need to log in before you can comment on or make changes to this bug.