Bug 901507

Summary: Directory server crashes during suffix removal
Product: Red Hat Enterprise Linux 7 Reporter: Ján Rusnačko <jrusnack>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED CURRENTRELEASE QA Contact: IDM QE LIST <seceng-idm-qe-list>
Severity: unspecified Docs Contact:
Priority: medium    
Version: 7.0CC: jgalipea, mkubik, mreynolds, nhosoi, nkinder
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.0.3 Doc Type: Bug Fix
Doc Text:
Cause: Deleting a suffix from the configuration. Consequence: The server can potentially crash. Fix: Corrected internal ordering of internal callback functions. Result: The server does not crash when deleting a suffix.
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-06-13 12:45:58 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Error logfile
none
Requested gdb stacktrace
none
Requested gdb stacktrace with debuginfo
none
ldif of suffixes to add
none
ldif of suffix to delete none

Description Ján Rusnačko 2013-01-18 10:51:22 UTC
Created attachment 682317 [details]
Error logfile

Description of problem:
Directory server crashes during suffix removal in managedEntry testsuite. 

Version-Release number of selected component (if applicable):
RHEL 6.4 i686
389-ds-base-1.2.11.15-10.el6.i686

How reproducible:
Consistently, only on i686

Steps to Reproduce:
1. Run managedEntry testsuite, crash occurs in mentry06 
  
Logfile shows:

[17/Jan/2013:11:31:45 -0500] - slapd started.  Listening on All
Interfaces port 22222 for LDAP requests
[17/Jan/2013:11:31:58 -0500] - ldbm: Bringing TestMEntry offline...
[17/Jan/2013:11:31:58 -0500] - ldbm: removing 'TestMEntry'.
[17/Jan/2013:11:31:58 -0500] - Destructor for instance TestMEntry called 
[17/Jan/2013:11:32:00 -0500] - 389-Directory/1.2.11.15 B2013.08.1850
starting up
[17/Jan/2013:11:32:00 -0500] - Detected Disorderly Shutdown last time
Directory Server was running, recovering database.
[17/Jan/2013:11:32:02 -0500] - Warning: Mapping tree node entry for
dc=testentry,dc=com point to an unknown backend : TestMEntry
[17/Jan/2013:11:32:02 -0500] - Warning: Mapping tree node entry for
dc=testentry,dc=com point to an unknown backend : TestMEntry
[17/Jan/2013:11:32:02 -0500] - Warning: Mapping tree node entry for
dc=testentry,dc=com point to an unknown backend : TestMEntry
[17/Jan/2013:11:32:02 -0500] - Warning: Mapping tree node entry for
dc=testentry,dc=com point to an unknown backend : TestMEntry
[17/Jan/2013:11:32:02 -0500] - Warning: Mapping tree node entry for
dc=testentry,dc=com point to an unknown backend : TestMEntry
[17/Jan/2013:11:32:02 -0500] - slapd started.  Listening on All
Interfaces port 22222 for LDAP requests 

Additional info:
Core dump available at http://file.rdu.redhat.com/~jrusnack/core-ns-slapd-11-500-500-15345-1358502139

Comment 2 mreynolds 2013-01-21 16:28:20 UTC
I can not read the core.  

Can you fire off the managedEntry test suite, and attach gdb to the process.

Then when it crashes run "where" and then run "thread apply all bt" and attach all of that output to the bug?

Thanks,
Mark

Comment 4 Ján Rusnačko 2013-01-21 19:31:32 UTC
Created attachment 684594 [details]
Requested gdb stacktrace

Attaching requested gdb output.

Comment 5 mreynolds 2013-01-21 19:40:33 UTC
Thanks Jan, that was exactly what I wanted you to do.  However, we don't have the debuginfo rpm installed, so there is not too much to see in the output.

Can you install the 389-ds-base-debuginfo, and the 389-ds-base-devel packages?  Then can you please rerun the test, and gather the same output again?

Thanks!
Mark

Comment 6 Ján Rusnačko 2013-01-21 19:57:21 UTC
Created attachment 684604 [details]
Requested gdb stacktrace with debuginfo

Comment 7 mreynolds 2013-01-21 20:06:51 UTC
Thanks Jan,

Any chance I can access this system?  I need to look into core file myself.  So if you could give me the system info and what I need to run to trigger the crash that would be great.

Thanks again,
Mark

Comment 9 mreynolds 2013-01-22 21:27:48 UTC
Made some progess...

From TET..

If I only run this custom testcase:

AddSuffixes() --> from managed entry test suite
AddSuffix "dc=text=dc=com"
sleep 5
DelSuffix "dc=test=dc=com"

Everything is fine, no crash.


I change it to restart the server before deleting the last suffix:

AddSuffixes()
AddSuffix "dc=text=dc=com"
RestartSlapd
DelSuffix "dc=test=dc=com"

We crash while deleting the suffix.  So however we build our internal list of callbacks at startup appears to be flawed, but it requires a certain amount of suffixes.  Not really sure whats going on yet.

Still investigating...

Comment 10 mreynolds 2013-01-24 04:08:08 UTC
Ok, so I discovered that all it takes to crash the server is to "not" have a "userRoot" backend, and try and delete a different backend.  This is extremely bizarre, and does not make sense.  Yet I can reproduce it over and over.  

Also, this only crashes on 32bit arch.  Rhel 6.x/Fedora 17 (64 bit) does not crash.

I can not explain why it reproduces which such a strange codition of not having a particuar database name.  While I can not explain that, I have found a flaw in the code that would explain the crash(reading free'd memory).  So I will work on that next.

Comment 12 Nathan Kinder 2013-01-24 18:45:13 UTC
Upstream ticket:
https://fedorahosted.org/389/ticket/562

Comment 13 mreynolds 2013-01-24 21:38:47 UTC
Created attachment 687027 [details]
ldif of suffixes to add

Comment 14 mreynolds 2013-01-24 21:39:26 UTC
Created attachment 687028 [details]
ldif of suffix to delete

Comment 15 mreynolds 2013-01-24 21:42:52 UTC
To reproduce the issue (with attached ldif's):

[1]  Do a silent install of 389 on a 32-bit machine

      In the info file, under [slapd], set "ds_bename = exmaple".  This must be set, and it must be something other than "userRoot"

[2]  ldapmodify ... ... -f /add.ldif

[3]  restart the server

[4]  ldapmodify ... ... -f /del.ldif

[5]  Crash

Comment 16 mreynolds 2013-01-24 21:45:04 UTC
This fix has been comitted upstream:  

6c855a8ce0de3c6b34594856762e68503da433fc

Comment 17 Rich Megginson 2013-10-01 23:25:31 UTC
moving all ON_QA bugs to MODIFIED in order to add them to the errata (can't add bugs in the ON_QA state to an errata).  When the errata is created, the bugs should be automatically moved back to ON_QA.

Comment 19 Milan Kubík 2014-02-05 14:27:11 UTC
Verified as per instructions in comment 15.

# ldapmodify -x -D "cn=directory manager" -w Secret123 -f add.ldif >/dev/null 
# echo $?
0
# systemctl restart dirsrv@dstet
# ldapmodify -x -D "cn=directory manager" -w Secret123 -f del.ldif >/dev/null 
# echo $?
0
# tail /var/log/dirsrv/slapd-dstet/errors
[05/Feb/2014:09:18:47 -0500] - slapd shutting down - waiting for 29 threads to terminate
[05/Feb/2014:09:18:47 -0500] - slapd shutting down - closing down internal subsystems and plugins
[05/Feb/2014:09:18:47 -0500] - Waiting for 4 database threads to stop
[05/Feb/2014:09:18:47 -0500] - All database threads now stopped
[05/Feb/2014:09:18:47 -0500] - slapd stopped.
[05/Feb/2014:09:18:48 -0500] - 389-Directory/1.3.1.6 B2014.035.046 starting up
[05/Feb/2014:09:18:48 -0500] - slapd started.  Listening on All Interfaces port 389 for LDAP requests
[05/Feb/2014:09:19:06 -0500] - ldbm: Bringing TestMEntry offline...
[05/Feb/2014:09:19:06 -0500] - ldbm: removing 'TestMEntry'.
[05/Feb/2014:09:19:06 -0500] - Destructor for instance TestMEntry called

389-ds-base version:
389-ds-base-1.3.1.6-18.el7.i686
389-ds-base-libs-1.3.1.6-18.el7.i686
389-ds-base-libs-1.3.1.6-18.el7.x86_64

Comment 20 Ludek Smid 2014-06-13 12:45:58 UTC
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.