Bug 1806978
Summary: | ns-slapd crashes during db2ldif | ||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | sgouvern | ||||||||||
Component: | 389-ds-base | Assignee: | Simon Pichugin <spichugi> | ||||||||||
Status: | CLOSED ERRATA | QA Contact: | RHDS QE <ds-qe-bugs> | ||||||||||
Severity: | unspecified | Docs Contact: | |||||||||||
Priority: | unspecified | ||||||||||||
Version: | 8.2 | CC: | mreynolds, pasik, spichugi, tbordaz, vashirov | ||||||||||
Target Milestone: | rc | Flags: | pm-rhel:
mirror+
|
||||||||||
Target Release: | 8.0 | ||||||||||||
Hardware: | Unspecified | ||||||||||||
OS: | Unspecified | ||||||||||||
Whiteboard: | |||||||||||||
Fixed In Version: | 389-ds-base-1.4.3.8-4.module+el8.3.0+7193+dfd1e8ad | Doc Type: | If docs needed, set a value | ||||||||||
Doc Text: | Story Points: | --- | |||||||||||
Clone Of: | Environment: | ||||||||||||
Last Closed: | 2020-11-04 03:07:44 UTC | Type: | Bug | ||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||
Documentation: | --- | CRM: | |||||||||||
Verified Versions: | Category: | --- | |||||||||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||
Embargoed: | |||||||||||||
Attachments: |
|
Comment 1
thierry bordaz
2020-02-25 12:33:57 UTC
At the time of the crash, export was completed. An error regarding output file was logged but I do not know if it is part of previous tests or occurred during crash: [25/Feb/2020:05:57:31.180245098 -0500] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000 [25/Feb/2020:05:57:31.212522865 -0500] - ERR - bdb_db2ldif - db2ldif: userroot: can't open /root/ldif/db.ldif: 13 (Permission denied) while running as user "dirsrv" [25/Feb/2020:08:22:17.543964527 -0500] - INFO - task_export_thread - Beginning export of 'userRoot' [25/Feb/2020:08:22:17.592423086 -0500] - INFO - export_one_entry - export userroot: Processed 1000 entries (3%). ... [25/Feb/2020:08:22:18.870044159 -0500] - INFO - export_one_entry - export userroot: Processed 25000 entries (97%). [25/Feb/2020:08:22:18.887957607 -0500] - INFO - bdb_db2ldif - export userroot: Processed 25542 entries (100%). [25/Feb/2020:08:22:18.893128699 -0500] - INFO - task_export_thread - Export finished. The crash occurs because dbenv was clear. But not dblayer_private was initialized. (gdb) print *li->li_dblayer_private $3 = {dblayer_file_mode = 0, dblayer_bad_stuff_happened = 0, dblayer_idl_divisor = 0, dblayer_env = 0x0, dblayer_start_fn = 0x7f6c1a94d620 <bdb_start>, dblayer_close_fn = 0x7f6c1a94ab80 <bdb_close>, dblayer_instance_start_fn = 0x7f6c1a9497b0 <bdb_instance_start>, dblayer_backup_fn = 0x7f6c1a94c720 <bdb_backup>, dblayer_verify_fn = 0x7f6c1a945670 <bdb_verify>, dblayer_db_size_fn = 0x7f6c1a94fe60 <bdb_db_size>, dblayer_ldif2db_fn = 0x7f6c1a955320 <bdb_ldif2db>, dblayer_db2ldif_fn = 0x7f6c1a9557c0 <bdb_db2ldif>, dblayer_db2index_fn = 0x7f6c1a956e40 <bdb_db2index>, dblayer_cleanup_fn = 0x7f6c1a94fee0 <bdb_cleanup>, dblayer_upgradedn_fn = 0x7f6c1a95a550 <bdb_upgradednformat>, dblayer_upgradedb_fn = 0x7f6c1a959b50 <bdb_upgradedb>, dblayer_restore_fn = 0x7f6c1a94e9a0 <bdb_restore>, dblayer_txn_begin_fn = 0x7f6c1a94b5b0 <bdb_txn_begin>, dblayer_txn_commit_fn = 0x7f6c1a94b790 <bdb_txn_commit>, dblayer_txn_abort_fn = 0x7f6c1a94ba30 <bdb_txn_abort>, dblayer_get_info_fn = 0x7f6c1a94fc50 <bdb_get_info>, dblayer_set_info_fn = 0x7f6c1a94fdc0 <bdb_set_info>, dblayer_back_ctrl_fn = 0x7f6c1a94fde0 <bdb_back_ctrl>, dblayer_get_db_fn = 0x7f6c1a94ad80 <bdb_get_db>, dblayer_delete_db_fn = 0x7f6c1a94bcc0 <bdb_delete_db>, dblayer_rm_db_file_fn = 0x7f6c1a94b1e0 <bdb_rm_db_file>, dblayer_import_fn = 0x7f6c1a95da80 <bdb_import_main>, dblayer_load_dse_fn = 0x7f6c1a944190 <bdb_config_load_dse_info>, dblayer_config_get_fn = 0x7f6c1a942f30 <bdb_public_config_get>, dblayer_config_set_fn = 0x7f6c1a943f30 <bdb_public_config_set>, instance_config_set_fn = 0x7f6c1a944c50 <bdb_instance_config_set>, instance_add_config_fn = 0x7f6c1a944cc0 <bdb_instance_add_instance_entry_callback>, instance_postadd_config_fn = 0x7f6c1a944cb0 <bdb_instance_postadd_instance_entry_callback>, instance_del_config_fn = 0x7f6c1a944ec0 <bdb_instance_delete_instance_entry_callback>, instance_postdel_config_fn = 0x7f6c1a944cd0 <bdb_instance_post_delete_instance_entry_callback>, instance_cleanup_fn = 0x7f6c1a944ff0 <bdb_instance_cleanup>, instance_create_fn = 0x7f6c1a945140 <bdb_instance_create>, instance_search_callback_fn = 0x7f6c1a944ed0 <bdb_instance_search_callback>, dblayer_auto_tune_fn = 0x7f6c1a9501d0 <bdb_start_autotune>} The only reason I can see for a cleared dbenv: - was not initialized (but how export could have completed) - was cleared by bdb_post_close (but it should be called after bdb_pre_close) I tried to reproduce but failed - and have a few questions: - Is it correct that db2ldif is run while the ldclt load is still running ? This would mean that two processes access the database, that should be ok, but there could be some race - In the last comment Thierry provided a snippet of the error log containing these lines: [25/Feb/2020:08:22:17.543964527 -0500] - INFO - task_export_thread - Beginning export of 'userRoot' .... [25/Feb/2020:08:22:18.893128699 -0500] - INFO - task_export_thread - Export finished. But these correspond to an online export, was this run additionally or is it from another test ? I could reproduce on an other machine in the same conditions. Here no task_export_thread traces in the errors log. On the 1st machine, I did an export task after the core dump occured, it's probably the reason of the traces related to export in the errors log Created attachment 1667454 [details]
core file
We need more details on reproducer... After step 1, I'm assuming the LDIF is imported. Is the server stopped before running db2ldif in step 2 (db2ldif works with the server on or off)? If there are any other steps please provide them. Thanks, Mark Created attachment 1689906 [details]
core file
Created attachment 1689907 [details]
errors log
Created attachment 1689908 [details]
access log
I reproduced with build 389-ds-base-1.4.3.8-2.module+el8.3.0+6591+ebfc9766.x86_64 with the same steps as above : On a fresh install, I ran : [root@ci-vm-10-0-139-116 upstream]# ldclt -v -h localhost -p 1389 -D "cn=directory manager" -w secret12 -b "ou=deptXXX,dc=example,dc=com" -e randombase,randombaselow=0,randombasehigh=99 -f cn=UserXXXXX -e add,person,incr,noloop, -r0 -R99999 -> entries were created in the db, lots of errors were reported by ldclt and in the access log : ldclt[13281]: Global error 32 (No such object) occurs 107 times ldclt[13281]: Global error 68 (Already exists) occurs 1009 times After a while, ldclt exited with maw errors reached : ldclt[13281]: Ending at Tue May 19 09:38:30 2020 ldclt[13281]: Exit status 3 - Max errors reached. Then I ran (twice, with the same result): [root@ci-vm-10-0-139-116 ds]# db2ldif -Z inst1 -s "dc=example,dc=com" -a ldif/db.ldif Exported ldif file: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif [19/May/2020:09:39:07.657236663 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - Backend Instance(s): [19/May/2020:09:39:07.661500701 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - userroot ldiffile: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif /usr/sbin/db2ldif: line 170: 13355 Segmentation fault (core dumped) /usr/sbin/ns-slapd db2ldif -D /etc/dirsrv/slapd-inst1 -s "dc=example,dc=com" -a /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif An error was reported in the errors log : [19/May/2020:09:39:07.677045480 -0400] - ERR - bdb_db2ldif - db2ldif: userroot: can't open /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif: 2 (No such file or directory) while running as user "dirsrv" Nota : I didn't stop the server before running db2ldif Stopping the server before running db2ldif produces the same behavior : [root@ci-vm-10-0-139-116 ds]# stop-dirsrv Stopping instance "inst1" [root@ci-vm-10-0-139-116 ds]# db2ldif -Z inst1 -s "dc=example,dc=com" -a ldif/db.ldif Exported ldif file: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif [19/May/2020:11:04:29.187747930 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - Backend Instance(s): [19/May/2020:11:04:29.191911215 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - userroot ldiffile: /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif /usr/sbin/db2ldif: line 170: 13699 Segmentation fault (core dumped) /usr/sbin/ns-slapd db2ldif -D /etc/dirsrv/slapd-inst1 -s "dc=example,dc=com" -a /mnt/tests/rhds/tests/upstream/ds/ldif/db.ldif I attached the access log, errors log and core file I think the non existing ldif path is the clue, I just produce a crash: AddressSanitizer:DEADLYSIGNAL ================================================================= ==1979==ERROR: AddressSanitizer: SEGV on unknown address 0x000000000018 (pc 0x7fe4942f5a71 bp 0x7ffc08a6c780 sp 0x7ffc08a6c740 T0) ==1979==The signal is caused by a READ memory access. ==1979==Hint: address points to the zero page. #0 0x7fe4942f5a70 in bdb_pre_close /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_layer.c:2017 #1 0x7fe4942f6301 in bdb_close /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_layer.c:2133 #2 0x7fe4941ff241 in dblayer_close /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/dblayer.c:530 #3 0x7fe494320085 in bdb_db2ldif /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_ldif2db.c:1313 #4 0x7fe4942b8b13 in ldbm_back_ldbm2ldif /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/ldif2ldbm.c:198 #5 0x45378c in slapd_exemode_db2ldif /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/main.c:2258 #6 0x447e6d in main /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/main.c:822 #7 0x7fe499b2f412 in __libc_start_main ../csu/libc-start.c:308 #8 0x411c4d in _start (/home/lkrispen/TEST/caa/install/sbin/ns-slapd+0x411c4d) AddressSanitizer can not provide additional info. SUMMARY: AddressSanitizer: SEGV /home/lkrispen/TEST/caa/ws/389-ds-base/ldap/servers/slapd/back-ldbm/db-bdb/bdb_layer.c:2017 in bdb_pre_close ==1979==ABORTING [lkrispen@lucy1 scripts]$ tail /home/lkrispen/TEST/caa/install/var/log/dirsrv/slapd-master1/errors [19/May/2020:12:58:48.511851884 +0200] - INFO - bdb_pre_close - Waiting for 4 database threads to stop [19/May/2020:12:58:49.446587650 +0200] - INFO - bdb_pre_close - All database threads now stopped [19/May/2020:12:58:49.475458713 +0200] - INFO - ldbm_back_instance_set_destructor - Set of instances destroyed [19/May/2020:12:58:49.484282545 +0200] - INFO - connection_post_shutdown_cleanup - slapd shutting down - freed 0 work q stack objects - freed 0 op stack objects [19/May/2020:12:58:49.490583528 +0200] - INFO - main - slapd stopped. [19/May/2020:17:38:50.456519014 +0200] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000 [19/May/2020:17:38:50.489945964 +0200] - INFO - bdb_db2ldif - export userRoot: Processed 14 entries (100%). [19/May/2020:17:38:50.499236655 +0200] - INFO - bdb_pre_close - All database threads now stopped [19/May/2020:17:40:01.062547331 +0200] - INFO - ldbm_instance_config_cachememsize_set - force a minimal value 512000 [19/May/2020:17:40:01.094003965 +0200] - ERR - bdb_db2ldif - db2ldif: userRoot: can't open /home/lkrispen/scripts/ldif/ex.ldif: 2 (No such file or directory) while running as user "lkrispen" so after detecting that the ldif file cannot be accessed handlin error code is probably insufficient Upstream ticket: https://pagure.io/389-ds-base/issue/51188 With build 389-ds-base-1.4.3.8-4.module+el8.3.0+7193+dfd1e8ad.x86_64 I performed manually the same steps as in comment 11, passing to db2ldif a non accessible ldif path as outputfile # db2ldif -Z inst1 -s "dc=example,dc=com" -a ldif/db.ldif Exported ldif file: /mnt/tests/rhds/tests/upstream/ldif/db.ldif [16/Jul/2020:05:20:15.133407167 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - Backend Instance(s): [16/Jul/2020:05:20:15.139503543 -0400] - INFO - slapd_exemode_db2ldif - db2ldif - userroot ldiffile: /mnt/tests/rhds/tests/upstream/ldif/db.ldif The crash doesn't occur any more. The error is not visible as output of the command, which may be misleading, even if an error is logged in the errors log : [16/Jul/2020:05:03:35.427788797 -0400] - ERR - bdb_db2ldif - db2ldif: userroot: can't open /mnt/tests/rhds/tests/upstream/ldif/db.ldif: 2 (No such file or directory) while running as user "dirsrv" Opening an other bug for the missing error message as output of the cli. Marking this bug as verified Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (389-ds:1.4 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:4695 |