Bug 1806978

Summary: ns-slapd crashes during db2ldif
Product: Red Hat Enterprise Linux 8 Reporter: sgouvern
Component: 389-ds-baseAssignee: Simon Pichugin <spichugi>
Status: CLOSED ERRATA QA Contact: RHDS QE <ds-qe-bugs>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.2CC: mreynolds, pasik, spichugi, tbordaz, vashirov
Target Milestone: rcFlags: pm-rhel: mirror+
Target Release: 8.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: 389-ds-base-1.4.3.8-4.module+el8.3.0+7193+dfd1e8ad Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-04 03:07:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
core file
none
core file
none
errors log
none
access log none

Comment 1 thierry bordaz 2020-02-25 12:33:57 UTC
At first looks it is possible that the dbenv was cleared at the time of the crash (PR_Lock(pEnv->bdb_thread_count_lock);)
Do you still have the host so that we can debug the core ?

Comment 3 thierry bordaz 2020-02-25 13:36:33 UTC
At the time of the crash, export was completed. An error regarding output file was logged but I do not know if it is part of previous tests or occurred during crash:
[25/Feb/2020:05:57:31.180245098 -0500] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000
[25/Feb/2020:05:57:31.212522865 -0500] - ERR - bdb_db2ldif - db2ldif: userroot: can't open /root/ldif/db.ldif: 13 (Permission denied) while running as user "dirsrv"
[25/Feb/2020:08:22:17.543964527 -0500] - INFO - task_export_thread - Beginning export of 'userRoot'
[25/Feb/2020:08:22:17.592423086 -0500] - INFO - export_one_entry - export userroot: Processed 1000 entries (3%).
...
[25/Feb/2020:08:22:18.870044159 -0500] - INFO - export_one_entry - export userroot: Processed 25000 entries (97%).
[25/Feb/2020:08:22:18.887957607 -0500] - INFO - bdb_db2ldif - export userroot: Processed 25542 entries (100%).
[25/Feb/2020:08:22:18.893128699 -0500] - INFO - task_export_thread - Export finished.

The crash occurs because dbenv was clear. But not dblayer_private was initialized.
(gdb) print *li->li_dblayer_private
$3 = {dblayer_file_mode = 0, dblayer_bad_stuff_happened = 0, dblayer_idl_divisor = 0, dblayer_env = 0x0, 
  dblayer_start_fn = 0x7f6c1a94d620 <bdb_start>, dblayer_close_fn = 0x7f6c1a94ab80 <bdb_close>, 
  dblayer_instance_start_fn = 0x7f6c1a9497b0 <bdb_instance_start>, dblayer_backup_fn = 0x7f6c1a94c720 <bdb_backup>, 
  dblayer_verify_fn = 0x7f6c1a945670 <bdb_verify>, dblayer_db_size_fn = 0x7f6c1a94fe60 <bdb_db_size>, 
  dblayer_ldif2db_fn = 0x7f6c1a955320 <bdb_ldif2db>, dblayer_db2ldif_fn = 0x7f6c1a9557c0 <bdb_db2ldif>, 
  dblayer_db2index_fn = 0x7f6c1a956e40 <bdb_db2index>, dblayer_cleanup_fn = 0x7f6c1a94fee0 <bdb_cleanup>, 
  dblayer_upgradedn_fn = 0x7f6c1a95a550 <bdb_upgradednformat>, dblayer_upgradedb_fn = 0x7f6c1a959b50 <bdb_upgradedb>, 
  dblayer_restore_fn = 0x7f6c1a94e9a0 <bdb_restore>, dblayer_txn_begin_fn = 0x7f6c1a94b5b0 <bdb_txn_begin>, 
  dblayer_txn_commit_fn = 0x7f6c1a94b790 <bdb_txn_commit>, dblayer_txn_abort_fn = 0x7f6c1a94ba30 <bdb_txn_abort>, 
  dblayer_get_info_fn = 0x7f6c1a94fc50 <bdb_get_info>, dblayer_set_info_fn = 0x7f6c1a94fdc0 <bdb_set_info>, 
  dblayer_back_ctrl_fn = 0x7f6c1a94fde0 <bdb_back_ctrl>, dblayer_get_db_fn = 0x7f6c1a94ad80 <bdb_get_db>, 
  dblayer_delete_db_fn = 0x7f6c1a94bcc0 <bdb_delete_db>, dblayer_rm_db_file_fn = 0x7f6c1a94b1e0 <bdb_rm_db_file>, 
  dblayer_import_fn = 0x7f6c1a95da80 <bdb_import_main>, dblayer_load_dse_fn = 0x7f6c1a944190 <bdb_config_load_dse_info>, 
  dblayer_config_get_fn = 0x7f6c1a942f30 <bdb_public_config_get>, dblayer_config_set_fn = 0x7f6c1a943f30 <bdb_public_config_set>, 
  instance_config_set_fn = 0x7f6c1a944c50 <bdb_instance_config_set>, 
  instance_add_config_fn = 0x7f6c1a944cc0 <bdb_instance_add_instance_entry_callback>, 
  instance_postadd_config_fn = 0x7f6c1a944cb0 <bdb_instance_postadd_instance_entry_callback>, 
  instance_del_config_fn = 0x7f6c1a944ec0 <bdb_instance_delete_instance_entry_callback>, 
  instance_postdel_config_fn = 0x7f6c1a944cd0 <bdb_instance_post_delete_instance_entry_callback>, 
  instance_cleanup_fn = 0x7f6c1a944ff0 <bdb_instance_cleanup>, instance_create_fn = 0x7f6c1a945140 <bdb_instance_create>, 
  instance_search_callback_fn = 0x7f6c1a944ed0 <bdb_instance_search_callback>, 
  dblayer_auto_tune_fn = 0x7f6c1a9501d0 <bdb_start_autotune>}

The only reason I can see for a cleared dbenv:
  - was not initialized (but how export could have completed)
  - was cleared by bdb_post_close (but it should be called after bdb_pre_close)

Comment 4 Ludwig 2020-02-25 14:38:49 UTC
I tried to reproduce but failed - and have a few questions:

- Is it correct that db2ldif is run while the ldclt load is still running ? This would mean that two processes access the database, that should be ok, but there could be some race

- In the last comment Thierry provided a snippet of the error log containing these lines:
[25/Feb/2020:08:22:17.543964527 -0500] - INFO - task_export_thread - Beginning export of 'userRoot'
....
[25/Feb/2020:08:22:18.893128699 -0500] - INFO - task_export_thread - Export finished.

But these correspond to an online export, was this run additionally or is it from another test ?

Comment 5 sgouvern 2020-02-25 15:03:51 UTC
I could reproduce on an other machine in the same conditions.
Here no task_export_thread traces in the errors log.
On the 1st machine, I did an export task after the core dump occured, it's probably the reason of the traces related to export in the errors log

Comment 6 sgouvern 2020-03-04 10:24:57 UTC
Created attachment 1667454 [details]
core file

Comment 7 mreynolds 2020-05-12 15:19:01 UTC
We need more details on reproducer...

After step 1, I'm assuming the LDIF is imported.  Is the server stopped before running db2ldif in step 2 (db2ldif works with the server on or off)?  If there are any other steps please provide them.

Thanks,
Mark

Comment 8 sgouvern 2020-05-19 14:48:49 UTC
Created attachment 1689906 [details]
core file

Comment 9 sgouvern 2020-05-19 14:49:46 UTC
Created attachment 1689907 [details]
errors log

Comment 10 sgouvern 2020-05-19 14:50:31 UTC
Created attachment 1689908 [details]
access log

Comment 11 sgouvern 2020-05-19 15:07:36 UTC
I reproduced with build 389-ds-base-1.4.3.8-2.module+el8.3.0+6591+ebfc9766.x86_64 with the same steps as above :

On a fresh install, I ran :
[root@ci-vm-10-0-139-116 upstream]# ldclt -v -h localhost -p 1389 -D "cn=directory manager" -w secret12 -b "ou=deptXXX,dc=example,dc=com" -e randombase,randombaselow=0,randombasehigh=99 -f cn=UserXXXXX -e add,person,incr,noloop, -r0 -R99999

     -> entries were created in the db, lots of errors were reported by ldclt and in the access log :
ldclt[13281]: Global error 32 (No such object) occurs   107 times
ldclt[13281]: Global error 68 (Already exists) occurs  1009 times

After a while, ldclt exited with maw errors reached :
ldclt[13281]: Ending at Tue May 19 09:38:30 2020
ldclt[13281]: Exit status 3 - Max errors reached.


Then I ran (twice, with the same result):
[root@ci-vm-10-0-139-116 ds]# db2ldif -Z inst1 -s "dc=example,dc=com" -a ldif/db.ldif

Exported ldif file: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif
[19/May/2020:09:39:07.657236663 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - Backend Instance(s): 
[19/May/2020:09:39:07.661500701 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - userroot
ldiffile: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif
/usr/sbin/db2ldif: line 170: 13355 Segmentation fault      (core dumped) /usr/sbin/ns-slapd db2ldif -D /etc/dirsrv/slapd-inst1 -s "dc=example,dc=com" -a /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif

An error was reported in the errors log :
[19/May/2020:09:39:07.677045480 -0400] - ERR - bdb_db2ldif - db2ldif: userroot: can't open /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif: 2 (No such file or directory) while running as user "dirsrv"

Nota : I didn't stop the server before running db2ldif
Stopping the server before running db2ldif produces the same behavior : 

[root@ci-vm-10-0-139-116 ds]# stop-dirsrv
Stopping instance "inst1"
[root@ci-vm-10-0-139-116 ds]# db2ldif -Z inst1 -s "dc=example,dc=com" -a ldif/db.ldif
Exported ldif file: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif
[19/May/2020:11:04:29.187747930 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - Backend Instance(s): 
[19/May/2020:11:04:29.191911215 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - userroot
ldiffile: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif
/usr/sbin/db2ldif: line 170: 13699 Segmentation fault      (core dumped) /usr/sbin/ns-slapd db2ldif -D /etc/dirsrv/slapd-inst1 -s "dc=example,dc=com" -a /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif



I attached the access log, errors log and core file

Comment 12 Ludwig 2020-05-19 15:43:19 UTC
I think the non existing ldif path is the clue, I just produce a  crash: 

AddressSanitizer:DEADLYSIGNAL
=================================================================
==1979==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000018 (pc 0x7fe4942f5a71 bp 0x7ffc08a6c780 sp 0x7ffc08a6c740 T0)
==1979==The signal is caused by a READ memory access.
==1979==Hint: address points to the zero page.
    #0 0x7fe4942f5a70 in bdb_pre_close /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_layer.c:2017
    #1 0x7fe4942f6301 in bdb_close /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_layer.c:2133
    #2 0x7fe4941ff241 in dblayer_close /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/dblayer.c:530
    #3 0x7fe494320085 in bdb_db2ldif /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_ldif2db.c:1313
    #4 0x7fe4942b8b13 in ldbm_back_ldbm2ldif /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/ldif2ldbm.c:198
    #5 0x45378c in slapd_exemode_db2ldif /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/main.c:2258
    #6 0x447e6d in main /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/main.c:822
    #7 0x7fe499b2f412 in __libc_start_main ../csu/libc-start.c:308
    #8 0x411c4d in _start (/home/lkrispen/TEST/caa/install/sbin/ns-slapd+0x411c4d)

AddressSanitizer can not provide additional info.
SUMMARY: AddressSanitizer: SEGV /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_layer.c:2017 in bdb_pre_close
==1979==ABORTING
[lkrispen@lucy1 scripts]$ tail /home/lkrispen/TEST/caa/install/var/log/dirsrv/slapd-master1/errors
[19/May/2020:12:58:48.511851884 +0200] - INFO - bdb_pre_close - Waiting for 4 database threads to stop
[19/May/2020:12:58:49.446587650 +0200] - INFO - bdb_pre_close - All database threads now stopped
[19/May/2020:12:58:49.475458713 +0200] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed
[19/May/2020:12:58:49.484282545 +0200] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 0 work q stack objects - freed 0 op stack objects
[19/May/2020:12:58:49.490583528 +0200] - INFO - main - slapd stopped.
[19/May/2020:17:38:50.456519014 +0200] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000
[19/May/2020:17:38:50.489945964 +0200] - INFO - bdb_db2ldif - export userRoot: Processed 14 entries (100%).
[19/May/2020:17:38:50.499236655 +0200] - INFO - bdb_pre_close - All database threads now stopped
[19/May/2020:17:40:01.062547331 +0200] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000
[19/May/2020:17:40:01.094003965 +0200] - ERR - bdb_db2ldif - db2ldif: userRoot: can't open /home/lkrispen/scripts/ldif/ex.ldif: 2 (No such file or directory) while running as user "lkrispen"

so after detecting that the ldif file cannot be accessed handlin error code is probably insufficient

Comment 13 mreynolds 2020-06-25 14:02:34 UTC
Upstream ticket:

https://pagure.io/389-ds-base/issue/51188

Comment 16 sgouvern 2020-07-16 09:26:57 UTC
With build 389-ds-base-1.4.3.8-4.module+el8.3.0+7193+dfd1e8ad.x86_64

I performed manually the same steps as in comment 11, passing to db2ldif a non accessible ldif path as outputfile 
# db2ldif -Z inst1 -s "dc=example,dc=com" -a ldif/db.ldif
Exported ldif file: /mnt/tests/rhds/tests/upstream/ldif/db.ldif
[16/Jul/2020:05:20:15.133407167 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - Backend Instance(s): 
[16/Jul/2020:05:20:15.139503543 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - userroot
ldiffile: /mnt/tests/rhds/tests/upstream/ldif/db.ldif


The crash doesn't occur any more.

The error is not visible as output of the command, which may be misleading, even if an error is logged in the errors log :

[16/Jul/2020:05:03:35.427788797 -0400] - ERR - bdb_db2ldif - db2ldif: userroot: can't open /mnt/tests/rhds/tests/upstream/ldif/db.ldif: 2 (No such file or directory) while running as user "dirsrv"

Opening an other bug for the missing error message as output of the cli.
Marking this bug as verified

Comment 19 errata-xmlrpc 2020-11-04 03:07:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2020:4695