Bug 997742 - systemd crashes during automated removal of 389-ds-base instances
systemd crashes during automated removal of 389-ds-base instances
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: systemd (Show other bugs)
All Linux
unspecified Severity urgent
: rc
: ---
Assigned To: systemd-maint
: Regression, TestBlocker
: 1006323 (view as bug list)
Depends On:
  Show dependency treegraph
Reported: 2013-08-16 02:30 EDT by Sankar Ramalingam
Modified: 2014-06-13 07:20 EDT (History)
6 users (show)

See Also:
Fixed In Version: systemd-206-7.el7
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2014-06-13 07:20:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
stack trace (full) (5.22 KB, text/plain)
2013-08-22 12:37 EDT, Nathan Kinder
no flags Details

  None (edit)
Description Sankar Ramalingam 2013-08-16 02:30:16 EDT
Description of problem: Creating directory server instance is failing on RHEL7. Running setup-ds.pl throws an error message as
Error: command '/bin/systemctl --system daemon-reload' failed - output [Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused
] error []Error: Could not create directory server instance 'testinst11'.
Exiting . . .
Log file is '/tmp/setupzitRJD.log'

Version-Release number of selected component (if applicable):

How reproducible: Not consistently. 

Steps to Reproduce:
1. Install 389-ds-base- on RHEL7
2. Create multiple instances of directory server and configure replication or Run fourwaymmr test suite from TET.
3. After completing fourway mmr setup, try removing all the masters by running remove-ds.pl.
4. Then, try to create a new instance to check whether it works.

Actual results: Creating DS instance fails.

Error: command '/bin/systemctl --system daemon-reload' failed - output [Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused
] error []Error: Could not create directory server instance 'testinst11'.
Exiting . . .
Log file is '/tmp/setupzitRJD.log'

Expected results: Instance should be successfully created.

Additional info:

It takes quite a long time to complete remove-ds.pl command and in the end, it fails to remove the DS instance and throws the same error.

Few lines from fourwaymmr cleanup tests...

RemoveInstance /usr/lib64/dirsrv/slapd-M4 30106
The following errors occurred during removal:
Error: command '/bin/systemctl --system daemon-reload' failed - output [Failed to get D-Bus connection: Failed to connect to socket /run/systemd/private: Connection refused
] error []Error: could not remove directory server M4
TestCase [fourwaymmr_cleanup] result-> [PASS]

Also, the slapd leaves "defunct" processes on the system.

ps -eaf |grep -i slapd
sramling  2100     1  0 Aug15 ?        00:00:06 [ns-slapd] <defunct>
sramling 14511     1  0 Aug15 ?        00:00:00 [ns-slapd] <defunct>
sramling 14663     1  0 Aug15 ?        00:00:00 [ns-slapd] <defunct>
sramling 14827     1  0 Aug15 ?        00:00:01 [ns-slapd] <defunct>
sramling 15003     1  0 Aug15 ?        00:00:01 [ns-slapd] <defunct>
sramling 15119     1  0 Aug15 ?        00:00:01 [ns-slapd] <defunct>
sramling 15247     1  0 Aug15 ?        00:00:01 [ns-slapd] <defunct>
sramling 15339     1  0 Aug15 ?        00:00:02 [ns-slapd] <defunct>
sramling 15443     1  0 Aug15 ?        00:00:02 [ns-slapd] <defunct>
root     21619 21598  0 02:21 pts/2    00:00:00 grep --color=auto -i slapd
Comment 2 Nathan Kinder 2013-08-19 11:16:29 EDT
Does this happen outside of TET automation?  Are you able to successfully create and remove instances manually by running setup-ds.pl/remove-ds.pl?
Comment 3 Nathan Kinder 2013-08-19 11:27:53 EDT
Please check the following as well:

- Are there any AVC messages logged?
- Do other systemctl commands not related to DS work?
Comment 4 Nathan Kinder 2013-08-22 12:36:36 EDT
The problem is that systemd is crashing.  I suspect that our test automation is cleaning something up out from under systemd, and it doesn't handle it well.

When the system first gets into this broken state, the following is logged to /var/log/messages:

Aug 22 01:05:11 dell-pe2950-01 systemd: Assertion 'path' failed at src/shared/cgroup-util.c:866, function cg_is_empty_recursive(). Aborting.
Aug 22 01:05:11 dell-pe2950-01 systemd: Caught <ABRT>, dumped core as pid 24154.
Aug 22 01:05:11 dell-pe2950-01 systemd: Freezing execution. 

I generated the following stack trace from the core file (a full stack trace will be attached to this bug shortly):

Core was generated by `/usr/lib/systemd/systemd --switched-root --system --deserialize 22'.
Program terminated with signal 6, Aborted.
#0  0x00007f2454e5bffb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
37	  return INLINE_SYSCALL (tgkill, 3, pid, THREAD_GETMEM (THREAD_SELF, tid),
Missing separate debuginfos, use: debuginfo-install libattr-2.4.46-10.el7.x86_64 pcre-8.32-7.el7.x86_64 zlib-1.2.7-10.el7.x86_64
(gdb) bt
#0  0x00007f2454e5bffb in raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/pt-raise.c:37
#1  0x00007f24569a395e in crash (sig=6) at src/core/main.c:144
#2  <signal handler called>
#3  0x00007f2454ac1999 in __GI_raise (sig=sig@entry=6) at ../nptl/sysdeps/unix/sysv/linux/raise.c:56
#4  0x00007f2454ac30a8 in __GI_abort () at abort.c:90
#5  0x00007f24569fc563 in log_assert (text=<optimized out>, file=0x7f2456a55503 "src/shared/cgroup-util.c", line=866, 
    func=0x7f2456a556c0 <__PRETTY_FUNCTION__.7913> "cg_is_empty_recursive", 
    format=format@entry=0x7f2456a56ab0 "Assertion '%s' failed at %s:%u, function %s(). Aborting.") at src/shared/log.c:699
#6  0x00007f24569fce10 in log_assert_failed (text=<optimized out>, file=<optimized out>, line=<optimized out>, func=<optimized out>)
    at src/shared/log.c:704
#7  0x00007f24569f33f3 in cg_is_empty_recursive (controller=controller@entry=0x7f2456a4dda7 "name=systemd", path=0x0, 
    ignore_self=ignore_self@entry=true) at src/shared/cgroup-util.c:866
#8  0x00007f24569e4890 in manager_notify_cgroup_empty (m=m@entry=0x7f245828cba0, cgroup=<optimized out>) at src/core/cgroup.c:736
#9  0x00007f24569d538d in private_bus_message_filter (connection=0x7f245828d820, message=0x7f245858ce50, data=0x7f245828cba0)
    at src/core/dbus.c:491
#10 0x00007f24554969e6 in dbus_connection_dispatch (connection=connection@entry=0x7f245828d820) at dbus-connection.c:4631
#11 0x00007f24569d5dda in bus_dispatch (m=m@entry=0x7f245828cba0) at src/core/dbus.c:525
#12 0x00007f24569a969f in manager_loop (m=0x7f245828cba0) at src/core/manager.c:1816
#13 0x00007f24569a0fb6 in main (argc=5, argv=0x7ffff5ada0c8) at src/core/main.c:1705
Comment 5 Nathan Kinder 2013-08-22 12:37:20 EDT
Created attachment 789271 [details]
stack trace (full)
Comment 6 Harald Hoyer 2013-08-23 12:19:53 EDT
same as https://bugzilla.redhat.com/show_bug.cgi?id=995197#c15
Comment 7 Harald Hoyer 2013-08-23 12:55:44 EDT
next try:
Comment 8 Harald Hoyer 2013-08-28 10:34:21 EDT
Found the culprit in a 12h debug session. hashmap keys were free()'d and caused a hashmap corruption.

Comment 9 Michal Schmidt 2013-09-10 08:45:31 EDT
*** Bug 1006323 has been marked as a duplicate of this bug. ***
Comment 10 Sankar Ramalingam 2013-09-16 07:21:42 EDT
with systemd-207-1.el7, the issue is not re-producible. Hence, marking the bug as Verified.
Comment 11 Ludek Smid 2014-06-13 07:20:30 EDT
This request was resolved in Red Hat Enterprise Linux 7.0.

Contact your manager or support representative in case you have further questions about the request.

Note You need to log in before you can comment on or make changes to this bug.