Bug 1518320

Summary: ns-slapd: crash in entrycache_add_int
Product: Red Hat Enterprise Linux 7 Reporter: German Parente <gparente>
Component: 389-ds-baseAssignee: mreynolds
Status: CLOSED ERRATA QA Contact: RHDS QE <ds-qe-bugs>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: amitkuma, bsmejkal, cpelland, fjaspe, gparente, jwooten, lkrispen, mreynolds, msauton, nkinder, pasik, rmeggins, tbordaz, tmihinto, vashirov
Target Milestone: rcKeywords: ZStream
Target Release: 7.7   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: 389-ds-base-1.3.9.1-2.el7 Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of:
: 1643875 (view as bug list) Environment:
Last Closed: 2019-08-06 12:58:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1550132, 1643875    
Attachments:
Description Flags
logs
none
thread apply all full bt
none
Amendment to debug patch
none
New debug patch to be applied on top of 389-ds-base-1.3.6.1-29bz1518320
none
New debug patch to be applied on top of 389-ds-base-1.3.6.1-29
none
New debug patch to be applied on top of 389-ds-base-1.3.6.1-29
none
the two diffs I have used to deliver the test RPM, backported to 7.4
none
Cumulative patch of all entry cache fixes for 389-ds-base-1.3.6
none
Cumulative patch of all entry cache fixes for 389-ds-base-1.3.6 (v2)
none
Cumulative patch of all entry cache fixes for 389-ds-base-1.3.6 (v3) none

Description German Parente 2017-11-28 15:51:48 UTC
Description of problem:

crash in ns-slapd when attempting to add a dn to hash.

What is strange is that the value is attempted to be added during DEL operation (shouldn't the value be deleted from the hash ?)

My findings in private comments.

#0  __strcmp_sse42 () at ../sysdeps/x86_64/multiarch/strcmp-sse42.S:164
#1  0x00007fcb5cc9b1ec in entry_same_dn (e=<optimized out>, k=0x561251f093a0) at ldap/servers/slapd/back-ldbm/cache.c:142
#2  0x00007fcb5cc9b3ee in add_hash (ht=0x561235944000, key=key@entry=0x561251f093a0, keylen=<optimized out>, entry=entry@entry=0x56124e195440, alt=alt@entry=0x7fcb236c73d0)
    at ldap/servers/slapd/back-ldbm/cache.c:191
#3  0x00007fcb5cc9c8aa in entrycache_add_int (cache=cache@entry=0x5612358ee3c8, e=0x56124e195440, state=state@entry=2, alt=alt@entry=0x0) at ldap/servers/slapd/back-ldbm/cache.c:1297
#4  0x00007fcb5cc9cb3d in cache_add_tentative (cache=cache@entry=0x5612358ee3c8, e=<optimized out>, alt=alt@entry=0x0) at ldap/servers/slapd/back-ldbm/cache.c:1497
#5  0x00007fcb5ccd201d in ldbm_back_delete (pb=0x7fcb236c7a50) at ldap/servers/slapd/back-ldbm/ldbm_delete.c:717
#6  0x00007fcb6875efd0 in op_shared_delete (pb=pb@entry=0x7fcb236c7a50) at ldap/servers/slapd/delete.c:331
#7  0x00007fcb6875f352 in do_delete (pb=pb@entry=0x7fcb236c7a50) at ldap/servers/slapd/delete.c:97
#8  0x0000561233bd8642 in connection_dispatch_operation (pb=0x7fcb236c7a50, op=0x561237af44e0, conn=0x561237241e38) at ldap/servers/slapd/connection.c:623
#9  connection_threadmain () at ldap/servers/slapd/connection.c:1772
#10 0x00007fcb66b119bb in _pt_root (arg=0x5612371e1440) at ../../../nspr/pr/src/pthreads/ptthread.c:216
#11 0x00007fcb664b1e25 in start_thread (arg=0x7fcb236c8700) at pthread_create.c:308
#12 0x00007fcb65d9334d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113



Version-Release number of selected component (if applicable): 


How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 5 wibrown@redhat.com 2017-12-01 08:03:14 UTC
Upstream ticket:
https://pagure.io/389-ds-base/issue/49477

Comment 76 joel 2018-08-15 23:41:30 UTC
Hey,

I'm handling the fts on case 01964873.

Is this patch for the cu or german? 

I know the cu already has patched their system with the other debug patches. Is there a process to remove the previous ones or will it require a new install to be setup?

thanks.

Comment 77 mreynolds 2018-08-16 00:32:18 UTC
(In reply to joel from comment #76)
> Hey,
> 
> I'm handling the fts on case 01964873.
> 
> Is this patch for the cu or german? 
> 
> I know the cu already has patched their system with the other debug patches.
> Is there a process to remove the previous ones or will it require a new
> install to be setup?
> 

It's for the customer, but it should applied on the default version of 1.3.6 that the customer was using before they applied any debug patches.  So a fresh hotfix needs to be made with this patch, and only this patch.

Comment 82 German Parente 2018-08-20 15:23:32 UTC
Created attachment 1477288 [details]
logs

Comment 86 German Parente 2018-08-20 16:08:18 UTC
Created attachment 1477295 [details]
thread apply all full bt

Comment 87 mreynolds 2018-08-20 16:30:07 UTC
Created attachment 1477309 [details]
Amendment to debug patch

Here is a small change to the current debug patch.  This should be applied on top of the last debug patch

Comment 105 Ludwig 2018-08-28 12:41:41 UTC
can you search the entry (or its tombstone) with nscpentrywsi, that should also show what was done to this entry

Comment 119 Ludwig 2018-09-13 12:50:20 UTC
but in the coredump we have:
[12/Sep/2018:13:45:16.509020958 -0300] conn=2412189 op=199100 SRCH base="cn=27263152560,ou=Nuevos,dc=abierto,dc=anses,dc=gov,dc=ar" scope=0 filter="(objectClass=*)" attrs="objectClass"
[12/Sep/2018:13:45:16.509134066 -0300] conn=2412189 op=199100 RESULT err=0 tag=101 nentries=1 etime=0

and the DEL caused the crash, so where is it gone ? was it deleted successfully after the restart ?

could you check the access logs after restart ?

Comment 138 thierry bordaz 2018-10-25 10:45:51 UTC
upstream ticket pushed (https://pagure.io/389-ds-base/issue/49967) => POST

Comment 139 German Parente 2018-10-25 15:32:16 UTC
*** Bug 1633111 has been marked as a duplicate of this bug. ***

Comment 153 mreynolds 2018-12-05 16:34:56 UTC
Created attachment 1511788 [details]
New debug patch to be applied on top of 389-ds-base-1.3.6.1-29bz1518320

Comment 161 mreynolds 2018-12-10 14:40:37 UTC
Created attachment 1513098 [details]
New debug patch to be applied on top of 389-ds-base-1.3.6.1-29

This patch contains the modrdn entry cache fix (ticket 49967), and debug logging.  It will apply on top of 389-ds-base-1.3.6.1-29

Comment 164 mreynolds 2018-12-10 14:46:03 UTC
Created attachment 1513099 [details]
New debug patch to be applied on top of 389-ds-base-1.3.6.1-29

Revised patch: contains fix for modrdn entry cache, and debug logging

Comment 168 German Parente 2018-12-26 15:59:01 UTC
*** Bug 1633111 has been marked as a duplicate of this bug. ***

Comment 190 bsmejkal 2019-04-26 09:02:39 UTC
=============================================================================================== test session starts ===============================================================================================
platform linux -- Python 3.6.3, pytest-4.4.1, py-1.8.0, pluggy-0.9.0 -- /opt/rh/rh-python36/root/usr/bin/python3
cachedir: .pytest_cache
metadata: {'Python': '3.6.3', 'Platform': 'Linux-3.10.0-1034.el7.x86_64-x86_64-with-redhat-7.7-Maipo', 'Packages': {'pytest': '4.4.1', 'py': '1.8.0', 'pluggy': '0.9.0'}, 'Plugins': {'metadata': '1.8.0', 'html': '1.20.0'}}
389-ds-base: 1.3.9.1-4.el7
nss: 3.43.0-6.el7
nspr: 4.21.0-1.el7
openldap: 2.4.44-21.el7_6
cyrus-sasl: 2.1.26-23.el7
FIPS: disabled
rootdir: /mnt/tests/rhds/tests/upstream/ds/dirsrvtests/tests/suites/memberof_plugin
plugins: metadata-1.8.0, html-1.20.0
collected 6 items / 5 deselected / 1 selected                                                                                                                                                                     

regression_test.py::test_entrycache_on_modrdn_failure PASSED                                                                                                                                                [100%]

============================================================================== 1 passed, 5 deselected in 24.58 seconds ============================================================================================

Comment 196 German Parente 2019-06-14 13:42:59 UTC
Created attachment 1580675 [details]
the two diffs I have used to deliver the test RPM, backported to 7.4

Comment 197 mreynolds 2019-06-20 20:04:37 UTC
Created attachment 1582849 [details]
Cumulative patch of all entry cache fixes for 389-ds-base-1.3.6

Comment 200 mreynolds 2019-06-21 15:55:16 UTC
Created attachment 1583290 [details]
Cumulative patch of all entry cache fixes for 389-ds-base-1.3.6 (v2)

Revised patch.  This patch can be built on top of 1.3.6.1-29 and delivered to customer

Comment 203 mreynolds 2019-07-09 14:45:25 UTC
Any news from customer on the status of the latest patch?  Have they seen this text in the errors log since deploying the new rpms:  "avoided crash"

Comment 204 Facundo Jaspe 2019-07-10 19:47:30 UTC
The customer has not experienced any new crashes/downtime since the patch was installed.

Will enquire as per the error logs you comment. Just to verify, are these supposed to be found in /var/log/dirsrv/$instance_name/errors ?

Comment 205 mreynolds 2019-07-10 19:58:20 UTC
(In reply to Facundo Jaspe from comment #204)
> The customer has not experienced any new crashes/downtime since the patch
> was installed.
> 

Great!  Just curious, how long ago was the patch actually installed?

> Will enquire as per the error logs you comment. Just to verify, are these
> supposed to be found in /var/log/dirsrv/$instance_name/errors ?

Yes.

Thanks,
Mark

Comment 206 Facundo Jaspe 2019-07-11 15:17:31 UTC
Thank you Mark, you made it happen :-)

The patch was installed 9 days ago. Considering it was crashing several times a day in recent times, it's a good sign. The customer management wants to wait up to 15 days to confirm everything is okay (originally, these crashed happened only every 2 or 3 months, but that'd be too much waiting).

Comment 207 Facundo Jaspe 2019-07-11 15:50:55 UTC
Customer confirms there hasn't been any new crashes.

However, asking about the "avoided crash" messages:

[root@ansescvss01 ~]# grep "avoided crash" /var/log/dirsrv/slapd-ansescvss01/errors
[08/Jul/2019:17:51:03.246142528 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=c952911d-9c5a11e6-99f9afaa-6e3226cf,cn=27045860294,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27045860294,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[09/Jul/2019:22:50:02.203984180 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=25beef02-c5d711e7-983b8eea-47ccf4be,cn=27181937705,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27181937705,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:07:42:46.614715017 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=85d1b188-fc6811e7-a36e8eea-47ccf4be,cn=23127210624,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=23127210624,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:10:07:45.804045134 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=e43e2119-159811e9-aacc8eea-47ccf4be,cn=27491861415,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27491861415,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:10:13:58.503394544 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=182a918a-2e3811e9-9bd78eea-47ccf4be,cn=27378554123,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27378554123,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:11:11:26.110331522 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=129e0706-d60f11e8-89f88eea-47ccf4be,cn=27219167704,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27219167704,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:11:21:56.508787808 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=d85a3185-338411e9-9bd78eea-47ccf4be,cn=27426509402,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27426509402,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:13:20:57.168263463 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=5a8e0f85-74f111e8-b6308eea-47ccf4be,cn=27944217393,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27944217393,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:13:34:38.114030528 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=a85ea1a3-2bc011e9-9bd78eea-47ccf4be,cn=27403824203,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27403824203,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:16:50:51.683559337 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=537fd20a-e3cf11e6-99f9afaa-6e3226cf,cn=20323835897,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=20323835897,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:23:00:23.540723331 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=fe95cf03-d55411e8-89f88eea-47ccf4be,cn=27367604269,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27367604269,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[10/Jul/2019:23:33:53.959808850 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=f025e92c-54de11e9-84dd8eea-47ccf4be,cn=27144082503,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=27144082503,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[11/Jul/2019:08:01:37.641922112 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=97ee96a0-a21811e7-a349afaa-6e3226cf,cn=20180411357,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=20180411357,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[11/Jul/2019:11:08:39.610091660 -0300] - ALERT - ldbm_back_delete - DEBUG [retry: 1] - tombstone NOT in cache (nsuniqueid=ab97c7b8-599c11e6-9ad19ecf-c04433d2,cn=23086267659,ou=activos,dc=abierto,dc=anses,dc=gov,dc=ar) orig dn (cn=23086267659,ou=Activos,dc=abierto,dc=anses,dc=gov,dc=ar) state (3), but it should be in the cache (avoided crash, but cache was corrupted)
[root@ansescvss01 ~]#

Comment 209 mreynolds 2019-07-11 17:21:47 UTC
Created attachment 1589592 [details]
Cumulative patch of all entry cache fixes for 389-ds-base-1.3.6 (v3)

This patch addresses an issue with the debug message that was incorrectly logged.  The actual fix is the same as the last patch.

Comment 215 errata-xmlrpc 2019-08-06 12:58:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2152