Bug 1930188

Summary: crash in sync_repl when a MODRDN create a cenotaph
Product: Red Hat Enterprise Linux 8 Reporter: thierry bordaz <tbordaz>
Component: 389-ds-baseAssignee: thierry bordaz <tbordaz>
Status: CLOSED ERRATA QA Contact: RHDS QE <ds-qe-bugs>
Severity: high Docs Contact:
Priority: unspecified    
Version: 8.4CC: ldap-maint, lmiksik, mreynolds, pgm-rhel-tools, sgouvern, sorlov
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard: sync-to-jira
Fixed In Version: 389-ds-1.4-8040020210311214642.866effaa Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1937855 (view as bug list) Environment:
Last Closed: 2021-05-18 15:45:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1937855    

Description thierry bordaz 2021-02-18 13:42:54 UTC
Description of problem:
A crash occurs systematically when retrieving the operation extension from the pblock:
The crash backtrace is:
(gdb) where
#0  0x00007f8f4418672a in sync_update_persist_op
    (pb=pb@entry=0x7f8f1c7ffa40, e=0x7f8f0ebe2c40, eprev=eprev@entry=0x0, op_tag=op_tag@entry=104, label=label@entry=0x7f8f44188247 "sync_add_persist_post_op")
    at ldap/servers/plugins/sync/sync_persist.c:240
#1  0x00007f8f44186bfd in sync_add_persist_post_op (pb=0x7f8f1c7ffa40) at ldap/servers/plugins/sync/sync_persist.c:361
#2  0x00007f8f517e1509 in plugin_call_func () at /usr/lib64/dirsrv/libslapd.so.0
#3  0x00007f8f517e1754 in plugin_call_plugins () at /usr/lib64/dirsrv/libslapd.so.0
#4  0x00007f8f41840520 in ldbm_back_add (pb=0x7f8f1c7ffa40) at ldap/servers/slapd/back-ldbm/ldbm_add.c:1413
#5  0x00007f8f5177d368 in op_shared_add () at /usr/lib64/dirsrv/libslapd.so.0
#6  0x00007f8f5177d7fe in add_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0
#7  0x00007f8f5177e5b5 in slapi_add_internal_pb () at /usr/lib64/dirsrv/libslapd.so.0
#8  0x00007f8f40f90ab4 in urp_fixup_add_cenotaph (opcsn=0x7f8f1daee6e0, sessionid=0x7f8f1d9fc8e0 "conn=354 op=9 csn=602e5dcb000000040000", pb=0x7f8f1d9fc8c0)
    at ldap/servers/plugins/replication/urp.c:913
#9  0x00007f8f40f90ab4 in urp_post_modrdn_operation (pb=pb@entry=0x7f8f1c7fe000) at ldap/servers/plugins/replication/urp.c:782
#10 0x00007f8f40f74cc0 in multimaster_bepostop_modrdn (pb=pb@entry=0x7f8f1c7fe000) at ldap/servers/plugins/replication/repl5_plugins.c:751
#11 0x00007f8f40f74e1c in multimaster_be_betxnpostop_modrdn (pb=pb@entry=0x7f8f1c7fe000) at ldap/servers/plugins/replication/repl5_plugins.c:842
#12 0x00007f8f40f74f08 in multimaster_mmr_postop (pb=0x7f8f1c7fe000, flags=562) at ldap/servers/plugins/replication/repl5_plugins.c:621
#13 0x00007f8f517e52fd in plugin_call_mmr_plugin_postop () at /usr/lib64/dirsrv/libslapd.so.0
#14 0x00007f8f4185f18f in ldbm_back_modrdn (pb=<optimized out>) at ldap/servers/slapd/back-ldbm/ldbm_modrdn.c:1256
#15 0x00007f8f517cf21a in op_shared_rename.constprop () at /usr/lib64/dirsrv/libslapd.so.0
#16 0x00007f8f517cfad4 in do_modrdn () at /usr/lib64/dirsrv/libslapd.so.0
#17 0x000055ea40bdf7ab in connection_dispatch_operation (pb=0x7f8f1c7fe000, op=0x7f8f3e404800, conn=0x7f8f3b8a2790) at ldap/servers/slapd/connection.c:630
#18 0x000055ea40bdf7ab in connection_threadmain () at ldap/servers/slapd/connection.c:1770
#19 0x00007f8f4f1145a8 in _pt_root () at /lib64/libnspr4.so
#20 0x00007f8f4eaaf14a in start_thread () at /lib64/libpthread.so.0
#21 0x00007f8f4e240db3 in clone () at /lib64/libc.so.6




Version-Release number of selected component (if applicable):
389-ds-base-1.4.3.16-11.module+el8.4.0+9969+312e177c.x86_64

How reproducible:
Systematic - see jenkins test below



Actual results:
It crashes

Expected results:
It should not crash

Comment 1 thierry bordaz 2021-02-18 13:49:16 UTC
The problem is that there is no operation extention. This triggers a PR_ASSERT. Note that I am surprise it leads to a SIGSEV

(gdb) list
235	    PR_ASSERT(ident);
236	    /* First mark the operation as completed/failed
237	     * the param to be used once the operation will be pushed
238	     * on the listeners queue
239	     */
240	    for (curr_op = prim_op; curr_op; curr_op = curr_op->next) {
241	        if (curr_op->idx_pl == ident->idx_pl) {
242	            /* The operation extension (ident) refers this operation (currop in the pending list)
243	             * This is called during sync_repl postop. At this moment
244	             * the operation in the pending list (identified by idx_pl in the operation extension)
(gdb) print prim_op
$1 = (OPERATION_PL_CTX_T *) 0x7f8f1c8af980
(gdb) print curr_op
$2 = (OPERATION_PL_CTX_T *) 0x7f8f1c8af980
(gdb) print curr_op->next
$3 = (struct OPERATION_PL_CTX *) 0x7f8f1c8afe30
(gdb) print ident
$4 = (op_ext_ident_t *) 0x0


'ident' is retrieved from:
233	    ident = sync_persist_get_operation_extension(pb);

At a first look, this internal operation (add a cenotaph) have not added an operation extention. I do not known how it is possible :(

Comment 3 thierry bordaz 2021-02-18 17:35:15 UTC
I think the problem is that the ADD of a cenotaph is an internal_add flagged 'OP_FLAG_NOOP'.
As a consequence POST operation plugins are called (e.g. sync_update_persist_op) BUT not PRE/POST BETXN operation plugins (e.g. sync_update_persist_betxn_pre_op). Operation extension (for sync_repl) being initialized in PRE BETXN, it is no initialized at the time POST op are called.

Needs to confirm this, rerunning the test. A possible fix is to test NOOP flag if the operation ext is not set.

Comment 4 thierry bordaz 2021-02-23 08:00:41 UTC
Fix pushed upstream => POST

Comment 16 sgouvern 2021-03-15 08:51:09 UTC
Marking as verified: tested as per comment 10 and comment 11

Comment 17 sgouvern 2021-03-15 09:45:09 UTC
# PYTHONPATH=src/lib389/ py.test -s -v dirsrvtests/tests/suites/syncrepl_plugin/basic_test.py::test_sync_repl_cenotaph
re-exec with libfaketime dependencies
================================================================== test session starts ==================================================================
platform linux -- Python 3.6.8, pytest-6.2.2, py-1.10.0, pluggy-0.13.1 -- /usr/bin/python3.6
cachedir: .pytest_cache
metadata: {'Python': '3.6.8', 'Platform': 'Linux-4.18.0-293.el8.x86_64-x86_64-with-redhat-8.5-Ootpa', 'Packages': {'pytest': '6.2.2', 'py': '1.10.0', 'pluggy': '0.13.1'}, 'Plugins': {'metadata': '1.11.0', 'html': '3.1.1', 'libfaketime': '0.1.2'}}
389-ds-base: 1.4.3.16-13.module+el8.4.0+10307+74bbfb4e
nss: 3.53.1-17.el8_3
nspr: 4.25.0-2.el8_2
openldap: 2.4.46-16.el8
cyrus-sasl: 2.1.27-5.el8
FIPS: disabled
rootdir: /mnt/tests/rhds/tests/upstream/ds/dirsrvtests, configfile: pytest.ini
plugins: metadata-1.11.0, html-3.1.1, libfaketime-0.1.2
collected 1 item
PASSED
    ================================================ 1 passed in 159.82s (0:02:39) ==================================================           

marking as VERIFIED

Comment 18 mreynolds 2021-04-21 15:16:10 UTC
*** Bug 1937855 has been marked as a duplicate of this bug. ***

Comment 20 errata-xmlrpc 2021-05-18 15:45:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2021:1835