Bug 122356 - slapd crashes upon update with idl_*_key: errors
slapd crashes upon update with idl_*_key: errors
Status: CLOSED WONTFIX
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: openldap (Show other bugs)
3.0
i686 Linux
medium Severity high
: ---
: ---
Assigned To: Jan Safranek
Jay Turner
:
Depends On:
Blocks: 122950
  Show dependency treegraph
 
Reported: 2004-05-03 13:30 EDT by Rich Graves
Modified: 2015-01-07 19:07 EST (History)
3 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-10-19 15:26:50 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Rich Graves 2004-05-03 13:30:07 EDT
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.4.2)
Gecko/20040308

Description of problem:
We have a master and three slave OpenLDAP servers running
openldap-2.0.27-2.7.3 + local patches (below) on RedHat 7.3, and one
slave server running stock openldap-servers-2.0.27-11 on RedHat
Enterprise AS 3.

The RedHat 7.3 master and all slaves are fine, and have uptimes of
several months. The RHEL 3 slave accumulates errors and slapd crashes
after between 2 and 10 days of uptime.

Version-Release number of selected component (if applicable):
openldap-servers-2.0.27-11

How reproducible:
Sometimes (consistently dies after a varying number of update
operations, nothing special/common about the updates is apparent)

Steps to Reproduce:
1. On the master, service ldap stop; slapcat > backup.ldif, service
ldap start
2. On the RHEL 3 slave, slapadd -l top.ldif (includes our ou structure
so that out-of-order entries aren't refused), slapadd -c -l
backup.ldif (the -c argument continues past "duplicate entry" errors
on our ou structure), service ldap start
3. Operate LDAP normally. A typical day for us is 50,000 queries to
the slave, 100 MOD operations, 5 ADD operations, 5 DEL operations, and
5 MODRDN operations.

Actual Results:  After some varying period of successful operation, we
get errors like the below and slapd dies. This particular series is
during a batch update (serial, not parallel) of about 75 LDAP entries
at 4am, but we've had it happen with single-entry updates during the day.

Apr 29 04:11:50 rosa slapd[1673]: idl_insert_key: id 7161 already in
next block
Apr 29 04:12:17 rosa slapd[1673]: idl_insert_key: id 4874 already in
next block
Apr 29 04:12:17 rosa slapd[1673]: idl_insert_key: id 7157 already in
next block
Apr 29 04:12:33 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:12:34 rosa slapd[1673]: idl_delete_key: idl_fetch of
returned NULL
Apr 29 04:12:36 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:12:39 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:12:48 rosa slapd[1673]: idl_insert_key: id 7155 already in
next block
Apr 29 04:12:49 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:12:51 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:13:09 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:13:27 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:13:39 rosa slapd[1673]: idl_delete_key: idl_fetch of
returned NULL
Apr 29 04:13:45 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:13:46 rosa slapd[1673]: idl_insert_key: nonexistent
continuation block
Apr 29 04:14:18 rosa slapd[1673]: idl_insert_key: id 7154 already in
next block
Apr 29 04:14:30 rosa slapd[1673]: idl_delete_key: idl_fetch of
returned NULL


Expected Results:  Changes should be replicated fine, and server
shouldn't crash.

Additional info:

Problem occured both with the orinally installed kernel and after
rebooting with errata kernel-smp-2.4.21-9.0.3.EL kernel (kernel errata
mentions a threading thing that looked suspicious).

I notice that our working master and slave slapd's running RedHat 7.3
are using a *newer* version of Berkeley DB. We switched to Berkeley DB
after having reliability problems with gdbm very similar to the
reliability problems we are now having with Berkeley DB on RHEL3.

RHL7.3# file /var/lib/ldap/id2entry.dbb
/var/lib/ldap/id2entry.dbb: Berkeley DB (Btree, version 9, native
byte-order)

RHEL3# file /var/lib/ldap/id2entry.dbb
/var/lib/ldap/id2entry.dbb: Berkeley DB (Btree, version 8, native
byte-order)

I noticed two entries in the changelog that look a little suspicious:

* Tue Jun 17 2003 Nalin Dahyabhai <nalin@redhat.com> 2.0.27-9

- don't use the system libtool

* Mon Feb 10 2003 Nalin Dahyabhai <nalin@redhat.com> 2.0.27-8

- back down to db 4.0.x, which 2.0.x can compile with in ldbm-over-db
setups
- tweak SuSE patch to fix a few copy-paste errors and a NULL dereference
Comment 1 Rich Graves 2004-05-03 23:55:27 EDT
The changelog entry "tweak SuSE patch to fix a few copy-paste errors
and a NULL dereference" is either inaccurate or since reverted -- I
see no difference between the patch/spec for 2.0.27-3 and 2.0.27-11.

Can you explain the reasoning for reverting from Berkeley DB 4.0.24 to
4.0.14? 4.1.25.NC seems more reliable for us.
Comment 2 Rich Graves 2004-05-04 17:07:02 EDT
OK, we failed under Berkeley DB 4.1.25.NC as well. I guess I'll open a
support case. I was just hoping another user be more responsive than
RedHat support, as is usually the case.

May  4 16:18:24 rosa slapd[29324]: idl_insert_key: id 9687 already in
next block
May  4 16:34:03 rosa slapd[29324]: idl_insert_key: id 3395 already in
next block
May  4 16:34:03 rosa slapd[29324]: idl_insert_key: id 3996 already in
next block
May  4 16:34:03 rosa slapd[29324]: idl_insert_key: id 3996 already in
next block
May  4 16:34:03 rosa slapd[29324]: idl_insert_key: id 3998 already in
next block
May  4 16:34:03 rosa slapd[29324]: idl_insert_key: id 3998 already in
next block
May  4 16:35:09 rosa slapd[29324]: idl_insert_key: id 3994 already in
next block
May  4 16:35:09 rosa slapd[29324]: idl_insert_key: id 3994 already in
next block
May  4 16:35:09 rosa slapd[29324]: idl_insert_key: id 3996 already in
next block
May  4 16:35:09 rosa slapd[29324]: idl_insert_key: id 3996 already in
next block
May  4 16:35:27 rosa slapd[29324]: idl_delete_key: idl_fetch of
returned NULL
May  4 16:51:50 rosa slapd[29324]: idl_delete_key: idl_fetch of
returned NULL
May  4 16:51:52 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:51:53 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:51:57 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:51:58 rosa slapd[29324]: idl_insert_key: id 7121 already in
next block
May  4 16:51:59 rosa slapd[29324]: idl_delete_key: idl_fetch of
returned NULL
May  4 16:52:04 rosa slapd[29324]: idl_delete_key: idl_fetch of
returned NULL
May  4 16:52:07 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:07 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:08 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:10 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:16 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:18 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:18 rosa slapd[29324]: idl_insert_key: nonexistent
continuation block
May  4 16:52:20 rosa slapd[29324]: idl_insert_key: id 7101 already in
next block
May  4 16:52:22 rosa slapd[29324]: idl_insert_key: id 7095 already in
next block
Comment 3 Rich Graves 2004-05-12 14:40:46 EDT
After discussions on the openldap-software list, I believe I will be
compiling and using my own RPMs for openldap 2.1.30. RedHat seems
unable or unwilling to provide a working package at this time. (I do
sympathize; it's a disruptive change that would make customers unhappy.)

The latest specfile for Fedora appears to be correct, but unless krb5
and cyrus-sasl have been hacked to make them thread-safe, the server
will crash at random.

Since we are not (currently) planning to use kerberos auth, I am
simply omitting both krb5 and cyrus-sasl from my build. My specfile is
http://people.brandeis.edu/~rcgraves/openldap.spec

No additional sources or patches are needed.
Comment 4 vek 2005-04-14 10:16:38 EDT
(In reply to comment #3)
> After discussions on the openldap-software list, I believe I will be
> compiling and using my own RPMs for openldap 2.1.30. RedHat seems
> unable or unwilling to provide a working package at this time. (I do
> sympathize; it's a disruptive change that would make customers unhappy.)
> 


Good move!  Having done that ourselves for several years.

A fix for this bug was posted as openldap its#2348 two years ago
but it was to late to make it to the last of the 2.0.x series.  
Applying this fix to 2.0.27 has improved the reliability of openldap
quite considerably.  Without it the database would always corrupt if
the number of entries went above a certain limit.


Villy

Comment 5 Julien A 2005-09-08 05:58:06 EDT
Today (september 8  2005) , Is there a patched version of 2.0.27 available as
RPM at RedHat ?

Or a newer version (2.2.XX ..) ??

Thanks.

Ju
Comment 6 RHEL Product and Program Management 2007-10-19 15:26:50 EDT
This bug is filed against RHEL 3, which is in maintenance phase.
During the maintenance phase, only security errata and select mission
critical bug fixes will be released for enterprise products. Since
this bug does not meet that criteria, it is now being closed.
 
For more information of the RHEL errata support policy, please visit:
http://www.redhat.com/security/updates/errata/
 
If you feel this bug is indeed mission critical, please contact your
support representative. You may be asked to provide detailed
information on how this bug is affecting you.

Note You need to log in before you can comment on or make changes to this bug.