923506 – Crash at shutdown while stopping replica agreements

RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 923506 - Crash at shutdown while stopping replica agreements

Summary: Crash at shutdown while stopping replica agreements

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 7
Classification:	Red Hat
Component:	389-ds-base
Sub Component:
Version:	7.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	unspecified
Target Milestone:	rc
Target Release:	---
Assignee:	Rich Megginson
QA Contact:	IDM QE LIST
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2013-03-20 01:28 UTC by Nathan Kinder
Modified:	2020-09-13 20:25 UTC (History)
CC List:	5 users (show)
Fixed In Version:	389-ds-base-1.3.1.2-1.el7
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2014-06-13 12:35:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	389ds 389-ds-base issues 618	0	None	None	None	2020-09-13 20:25:34 UTC

Description Nathan Kinder 2013-03-20 01:28:12 UTC

This bug is created as a clone of upstream ticket:
https://fedorahosted.org/389/ticket/618


The crash occurs while running the acceptance tests DNA.
The crash occurs while testing a private version (master+own modifications). My modifications are out of replication area and the possibility it triggers the crash are very very low.

The crash is a dynamic issue that occurs because the slapd deamon has shutdown the replication plugin and this one has shut down the replica agreement. But replica agreement that has not yet detected the shutdown was still using its structure. (so this crash '''is not systematic''')

The Stack is looking like:

{{{
...
Thread 3 (Thread 0x7f8e16a4a800 (LWP 6141)):
#0  0x00000035d6cea9f3 in select () from /usr/lib64/libc.so.6
#1  0x00007f8e16b33947 in DS_Sleep (ticks=100) at ldap/servers/slapd/util.c:1108
#2  0x00007f8e11c21a57 in _cl5Close () at ldap/servers/plugins/replication/cl5_api.c:3207
#3  0x00007f8e11c1d4d5 in cl5Close () at ldap/servers/plugins/replication/cl5_api.c:585
#4  0x00007f8e11c2bb92 in changelog5_cleanup () at ldap/servers/plugins/replication/cl5_init.c:111
#5  0x00007f8e11c436f6 in multimaster_stop (pb=0x7fff88c951b0) at ldap/servers/plugins/replication/repl5_init.c:755
#6  0x00007f8e16aff6d1 in plugin_call_func (list=0x203ebc0, operation=210, pb=0x7fff88c951b0, call_one=1)
    at ldap/servers/slapd/plugin.c:1453
#7  0x00007f8e16aff5be in plugin_call_one (list=0x203ebc0, operation=210, pb=0x7fff88c951b0) at ldap/servers/slapd/plugin.c:1421
#8  0x00007f8e16aff4d1 in plugin_dependency_closeall () at ldap/servers/slapd/plugin.c:1365
#9  0x00007f8e16aff564 in plugin_closeall (close_backends=1, close_globals=1) at ldap/servers/slapd/plugin.c:1408
#10 0x000000000041988d in slapd_daemon (ports=0x7fff88c956f0) at ldap/servers/slapd/daemon.c:1358
#11 0x0000000000420dab in main (argc=7, argv=0x7fff88c95878) at ldap/servers/slapd/main.c:1266


Thread 1 (Thread 0x7f8e07fff700 (LWP 6150)):
#0  0x000000370ea116a7 in ?? () from /usr/lib64/libnspr4.so
#1  0x000000370ea115fb in ?? () from /usr/lib64/libnspr4.so
#2  0x000000370ea11f8d in ?? () from /usr/lib64/libnspr4.so
#3  0x000000370ea12972 in PR_vsnprintf () from /usr/lib64/libnspr4.so
#4  0x000000370ea12a42 in PR_snprintf () from /usr/lib64/libnspr4.so
#5  0x00007f8e11c37f71 in agmt_set_last_update_status (ra=0x5a333333323830, ldaprc=0, replrc=0, 
    message=0x7f8e11c84a4c "Incremental update succeeded") at ldap/servers/plugins/replication/repl5_agmt.c:2181
#6  0x00007f8e11c3fc26 in repl5_inc_run (prp=0x22ebf00) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1041
#7  0x00007f8e11c468d8 in prot_thread_main (arg=0x22cf120) at ldap/servers/plugins/replication/repl5_protocol.c:295
#8  0x000000370ea28cf3 in ?? () from /usr/lib64/libnspr4.so
#9  0x00000035d7007d14 in start_thread () from /usr/lib64/libpthread.so.0
#10 0x00000035d6cf168d in clone () from /usr/lib64/libc.so.6
...
(gdb) thread
[Current thread is 1 (Thread 0x7f8e07fff700 (LWP 6150))]
(gdb) frame
#6  0x00007f8e11c3fc26 in repl5_inc_run (prp=0x22ebf00) at ldap/servers/plugins/replication/repl5_inc_protocol.c:1041
1041	                          agmt_set_last_update_status(prp->agmt, 0, 0, "Incremental update succeeded");
(gdb) print prp->agmt
$5 = (Repl_Agmt *) 0x5a333333323830   <=== invalid pointer
(gdb) print *prp
$6 = {delete = 0x22bf7d0, run = 0, stop = 0, status = 0x31, notify_update = 0x22b9740, notify_agmt_changed = 0, 
  notify_window_opened = 0, notify_window_closed = 0, update_now = 0x30, lock = 0x21, cvar = 0x6954796669646f6d, 
  stopped = 1953719661, terminate = 7368033, eventbits = 36, conn = 0x21, last_acquire_response_code = 858861618, 
  agmt = 0x5a333333323830, replica_object = 0x0, private = 0x21, replica_acquired = 36431920, repl50consumer = 0, 
  repl71consumer = 1835093619, repl90consumer = 101, timeout = 0}
}}}



The solution is looking like the replica agreement protocol shutdown function should wait for the RA thread to complete.

Note: it is note clear to me if I have to attach the core+RPM to that ticket

Comment 1 Rich Megginson 2013-10-01 23:26:58 UTC

moving all ON_QA bugs to MODIFIED in order to add them to the errata (can't add bugs in the ON_QA state to an errata).  When the errata is created, the bugs should be automatically moved back to ON_QA.

Comment 3 Amita Sharma 2013-11-07 07:14:36 UTC

DNA acceptance execution is to be executed for verification or any other specific verification steps?

Comment 4 Amita Sharma 2013-11-11 10:50:53 UTC

with 389-ds-base-1.3.1.6-7.el7.x86_64 Acceptance execution::
DNA startup 	100% (1/1) 	  	 
DNA run 	100% (64/64) 	  	 
DNA cleanup 	100% (1/1)

No crash occurred. Hence Marking bug as VERIFIED.

Comment 5 thierry bordaz 2013-11-12 08:15:40 UTC

This bug is not systematically reproducible.
It relies on a race condition and the platform is important.
I think it happened with DNA acceptance by "chance" but it could occurs with any code using replication.

To increase the possibility to reproduce, one may increase the number of replica agreement and the restart of the server.

On my box (4cores i7-3520M CPU @ 2.90GHz) it occurred quite frequently.

Comment 6 Ludek Smid 2014-06-13 12:35:09 UTC

This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.