Bug 813964
Summary: | IPA dirsvr seg-fault during system longevity test | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | baiesi | ||||
Component: | 389-ds-base | Assignee: | Rich Megginson <rmeggins> | ||||
Status: | CLOSED ERRATA | QA Contact: | IDM QE LIST <seceng-idm-qe-list> | ||||
Severity: | unspecified | Docs Contact: | |||||
Priority: | unspecified | ||||||
Version: | 6.3 | CC: | jgalipea, mkosek, rmeggins, shaines | ||||
Target Milestone: | rc | ||||||
Target Release: | --- | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | 389-ds-base-1.2.10.2-7.el6 | Doc Type: | Bug Fix | ||||
Doc Text: |
Cause: Performing delete and search operations against the directory server under a high load.
Consequence: Directory server crashes.
Fix: Entries may be deleted out from under a search request. DB_MULTIPLE does not like it when entries are remove out from under it. Server should handle this case by not returning deleted entries and not crashing.
Result: Server does not crash when performing searches and deletions while under a high load.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2012-06-20 07:15:26 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
baiesi
2012-04-18 21:00:39 UTC
Can you provide more detail on the UI not being available? What error are you getting? Does Apache have any logging on the failure? (In reply to comment #2) > Can you provide more detail on the UI not being available? What error are you > getting? Does Apache have any logging on the failure? If this is the problem that Bruce was having earlier, we need to get a core dump and a stack trace. Yes, this is a 389-ds-base problem - but the hard part will be to reproduce with only 389-ds-base and not IPA. Created attachment 578485 [details]
threads 27 4 and 1
Update: - STI run2- reproduced issue now with a core file; Was able to re-provision the system test environment and reproduce the segfault within 24 hours. This time I enabled debugging which worked and generated a core file under There is a core file: /var/log/dirsrv/slapd-TESTRELM-COM/core.16339. File to big to attach to this defect. /var/log/messages; Apr 18 16:21:39 sti-high-1 logger: 2012-04-18 16:21:38 /usr/bin/rhts-test-runner.sh 1210569 105720 hearbeat... Apr 18 16:22:23 sti-high-1 kernel: ns-slapd[16381]: segfault at 7f9acbbd30cb ip 00007f99f82529bd sp 00007f99cbbd3000 error 4 in libback-ldbm.so[7f99f8222000+8f000] Apr 18 16:22:23 sti-high-1 named[20108]: LDAP error: Can't contact LDAP server Apr 18 16:22:23 sti-high-1 named[20108]: connection to the LDAP server was lost Apr 18 16:22:23 sti-high-1 httpd: GSSAPI Error: Unspecified GSS failure. Minor code may provide more information (KDC returned error string: PROCESS_TGS) Apr 18 16:22:23 sti-high-1 named[20108]: Failed to init credentials (Generic error (see e-text)) Apr 18 16:22:24 sti-high-1 named[20108]: LDAP error: Can't contact LDAP server I am able to reproduce the crash. Steps 1) set up 2 master replication 2) on master 1 continuously add 1000 users and then delete them - the users should have many objectclasses: oclist = ["top", "person", "organizationalperson", "inetorgperson", "inetuser", "posixaccount", 'uidObject', 'pkiUser', 'pkiCA', 'deltaCRL', 'userSecurityInformation', 'simpleSecurityObject', 'shadowAccount', 'posixGroup', 'inetSubscriber', 'inetAdmin', 'accountPolicy', 'mailRecipient', 'nsMessagingServerUser', 'mailGroup', 'groupOfMailEnhancedUniqueNames', 'netscapeMailServer', 'eduPerson', 'mozillaAbPersonAlpha', 'authorizedServiceObject', 'hostObject', 'calEntry', 'printerServiceAuxClass', 'printerIPP'] the required attributes are sn cn uid uidNumber gidNumber homeDirectory userPassword I also added a description and a 1024 byte userCertificate for good measure This is different than the ipa schema but I believe the large number of objectclasses has something to do with the crash 3) at the same time, do searches like this: filt='(&(objectclass=top)(objectclass=person)(objectclass=organizationalperson)(objectclass=inetorgperson)(objectclass=inetuser)(objectclass=posixaccount)(objectclass=uidObject)(objectclass=pkiUser)(objectclass=pkiCA)(objectclass=deltaCRL)(objectclass=userSecurityInformation)(objectclass=simpleSecurityObject)(objectclass=shadowAccount)(objectclass=posixGroup)(objectclass=inetSubscriber)(objectclass=inetAdmin)(objectclass=accountPolicy)(objectclass=mailRecipient)(objectclass=nsMessagingServerUser)(objectclass=mailGroup)(objectclass=groupOfMailEnhancedUniqueNames)(objectclass=netscapeMailServer)(objectclass=eduPerson)(objectclass=mozillaAbPersonAlpha)(objectclass=authorizedServiceObject)(objectclass=hostObject)(objectclass=calEntry)(objectclass=printerServiceAuxClass)(objectclass=printerIPP))' while [ 1 ] ; do ii=10 ; while [ $ii -ge 0 ] ; do ldapsearch -xLLL -h localhost -p 1389 -D "cn=directory manager" -w password -b dc=example,dc=com $filt dn > /dev/null & ii=`expr $ii - 1` ; done ; wait ; done after a few minutes you will get a segfault crash in idl_new_fetch(). The problem is with DB_MULTIPLE_NEXT - the ptr variable holds the offset to the next data (ID) from the beginning of the buffer - a value of -1 means this buffer is done and a new buffer needs to be fetched - for some reason, the next to last offset is -5 - since this points before the beginning of the buffer, it points to random memory, and the attempt to dereference this causes the crash. I have no idea where the -5 comes from - still investigating. Upstream ticket: https://fedorahosted.org/389/ticket/347 Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: Cause: Performing delete and search operations against the directory server under a high load. Consequence: Directory server crashes. Fix: Entries may be deleted out from under a search request. DB_MULTIPLE does not like it when entries are remove out from under it. Server should handle this case by not returning deleted entries and not crashing. Result: Server does not crash when performing searches and deletions while under a high load. Ran the same tests against the IPA test environment. The defect did not re-occur during the test run. Closing as Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2012-0813.html |