Bug 1369460

Summary: Systemd times out, DBUS reports tons of errors. This is during normal production which has been working for months
Product: Red Hat Enterprise Linux 7 Reporter: Mikael <tlwc2>
Component: dbusAssignee: David King <dking>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: high Docs Contact:
Priority: unspecified    
Version: 7.2CC: lef, mclasen
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2019-09-23 14:38:37 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Mikael 2016-08-23 13:10:13 UTC
Description of problem: After running production for several months without issue everything suddently goes down on this node. We suspect a bug in DBUS or SYSTEMD.
A complete reboot and things are back up again!

Version-Release number of selected component (if applicable):
CentOS Linux release 7.2.1511 (Core)
Kernel: 3.10.0-327.10.1.el7.x86_64
Systemd: 19.el7_2.4 219
Dbus: 1.6.12 13.el7

How reproducible:
Happened once, but with critical result. A complete reboot removed the error.

Actual results:
After running postgres, mmx and other service for couple of months the service manager seems to "time out" resulting in a strange state of "all" services.

Expected results:
Services cannot run as normal any more. Tons of errors from systemd1. Our cluster did not manage to switch side.


Additional info:

Specific services used:
postgres
corosync
pacemaker
mmx 


Resulting systemlog:
Aug 18 15:15:45 ha2-btfn systemd-logind: Failed to start session scope session-c12249078.scope: Connection timed out (null)
Aug 18 15:15:45 ha2-btfn dbus[754]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Aug 18 15:15:45 ha2-btfn dbus-daemon: dbus[754]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Aug 18 15:15:45 ha2-btfn su: (to postgres) root on none
Aug 18 15:16:10 ha2-btfn dbus[754]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Aug 18 15:16:10 ha2-btfn systemd-logind: Failed to start session scope session-c12249079.scope: Activation of org.freedesktop.systemd1 timed out org.freedesktop.DBus.Error.TimedOut
Aug 18 15:16:10 ha2-btfn dbus-daemon: dbus[754]: [system] Failed to activate service 'org.freedesktop.systemd1': timed out
Aug 18 15:16:10 ha2-btfn su: (to postgres) root on none
Aug 18 15:16:20 ha2-btfn lrmd[19344]: warning: pgsql_monitor_3000 process (PID 16014) timed out
Aug 18 15:16:20 ha2-btfn lrmd[19344]: warning: pgsql_monitor_3000:16014 - timed out after 60000ms                            
                             
Aug 18 15:16:20 ha2-btfn lrmd[19344]: warning: pgsql_monitor_3000 process (PID 16014) timed out
Aug 18 15:16:20 ha2-btfn lrmd[19344]: warning: pgsql_monitor_3000:16014 - timed out after 60000ms
Aug 18 15:16:20 ha2-btfn crmd[19347]:   error: Operation pgsql_monitor_3000: Timed Out (node=ha2-btfn, call=1279, timeout=60000ms)
Aug 18 15:16:46 ha2-btfn crmd[19347]:  notice: Operation pgsql_notify_0: ok (node=ha2-btfn, call=1280, rc=0, cib-update=0, confirmed=true)
Aug 18 15:17:37 ha2-btfn pgsql(pgsql)[17175]: INFO: Stopping PostgreSQL on demote.
Aug 18 15:17:37 ha2-btfn pgsql(pgsql)[17175]: INFO: Stopping PostgreSQL on demote.
Aug 18 15:17:46 ha2-btfn lrmd[19344]: warning: pgsql_demote_0 process (PID 17175) timed out
Aug 18 15:17:46 ha2-btfn lrmd[19344]: warning: pgsql_demote_0:17175 - timed out after 60000ms
Aug 18 15:17:46 ha2-btfn crmd[19347]:   error: Operation pgsql_demote_0: Timed Out (node=ha2-btfn, call=1282, timeout=60000ms)
Aug 18 15:18:11 ha2-btfn crmd[19347]:  notice: Operation pgsql_notify_0: ok (node=ha2-btfn, call=1283, rc=0, cib-update=0, confirmed=true)
Aug 18 15:18:37 ha2-btfn crmd[19347]:  notice: Operation pgsql_notify_0: ok (node=ha2-btfn, call=1286, rc=0, cib-update=0, confirmed=true)
Aug 18 15:19:37 ha2-btfn lrmd[19344]: warning: pgsql_stop_0 process (PID 18445) timed out
Aug 18 15:19:37 ha2-btfn lrmd[19344]: warning: pgsql_stop_0:18445 - timed out after 60000ms


# systemctl status dbus.service
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/usr/lib/systemd/system/dbus.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2016-08-18 15:40:18 CEST; 3 days ago
 Main PID: 778 (dbus-daemon)
   CGroup: /system.slice/dbus.service
           ├─734 /usr/bin/python -Es /usr/sbin/setroubleshootd -f
           └─778 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation
Aug 22 12:15:24 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from getattr access on the dir...18282
Aug 22 12:15:24 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /...tore.
                                      *****  Plugin catchall (100. confidence) suggests   **************************
Aug 22 12:15:24 ha2-btfn setroubleshoot[734]: failed to retrieve rpm info for /sys/kernel/config
Aug 22 12:15:24 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from getattr access on the dir...2198b
Aug 22 12:15:24 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /...nfig.
                                      *****  Plugin catchall (100. confidence) suggests   **************************
Aug 22 12:15:24 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from search access on the dire...02b24
Aug 22 12:15:24 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from search access on the directory fs.
                                      *****  Plugin catchall (100. confidence) suggests   **************************
Aug 22 12:15:24 ha2-btfn setroubleshoot[734]: failed to retrieve rpm info for /dev/hugepages
Aug 22 12:15:24 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from getattr access on the dir...7a55b
Aug 22 12:15:24 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /...ages.
                                      *****  Plugin catchall (100. confidence) suggests   **************************
Hint: Some lines were ellipsized, use -l to show in full.

Comment 2 Mikael 2016-08-24 07:11:35 UTC
systemctl -l status dbus.service
● dbus.service - D-Bus System Message Bus
   Loaded: loaded (/usr/lib/systemd/system/dbus.service; static; vendor preset: disabled)
   Active: active (running) since Thu 2016-08-18 15:40:18 CEST; 5 days ago
 Main PID: 778 (dbus-daemon)
   CGroup: /system.slice/dbus.service
           ├─734 /usr/bin/python -Es /usr/sbin/setroubleshootd -f
           └─778 /bin/dbus-daemon --system --address=systemd: --nofork --nopidfile --systemd-activation

Aug 24 09:09:59 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /sys/fs/cgroup/systemd.

                                      *****  Plugin catchall (100. confidence) suggests   **************************

                                      If you believe that df should be allowed getattr access on the systemd directory by default.
                                      Then you should report this as a bug.
                                      You can generate a local policy module to allow this access.
                                      Do
                                      allow this access for now by executing:
                                      # grep df /var/log/audit/audit.log | audit2allow -M mypol
                                      # semodule -i mypol.pp

Aug 24 09:09:59 ha2-btfn setroubleshoot[734]: failed to retrieve rpm info for /sys/fs/pstore
Aug 24 09:09:59 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /sys/fs/pstore. For complete SELinux messages. run sealert -l fe7dc8de-c163-47cb-93ef-393f4e518282
Aug 24 09:09:59 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /sys/fs/pstore.

                                      *****  Plugin catchall (100. confidence) suggests   **************************

                                      If you believe that df should be allowed getattr access on the pstore directory by default.
                                      Then you should report this as a bug.
                                      You can generate a local policy module to allow this access.
                                      Do
                                      allow this access for now by executing:
                                      # grep df /var/log/audit/audit.log | audit2allow -M mypol
                                      # semodule -i mypol.pp

Aug 24 09:09:59 ha2-btfn setroubleshoot[734]: failed to retrieve rpm info for /sys/kernel/config
Aug 24 09:09:59 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /sys/kernel/config. For complete SELinux messages. run sealert -l 9eaa1e0a-46e9-4114-86f2-17de3b42198b
Aug 24 09:09:59 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /sys/kernel/config.

                                      *****  Plugin catchall (100. confidence) suggests   **************************

                                      If you believe that df should be allowed getattr access on the config directory by default.
                                      Then you should report this as a bug.
                                      You can generate a local policy module to allow this access.
                                      Do
                                      allow this access for now by executing:
                                      # grep df /var/log/audit/audit.log | audit2allow -M mypol
                                      # semodule -i mypol.pp

Aug 24 09:09:59 ha2-btfn setroubleshoot[734]: failed to retrieve rpm info for /dev/hugepages
Aug 24 09:09:59 ha2-btfn setroubleshoot[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /dev/hugepages. For complete SELinux messages. run sealert -l c472af1c-a65a-4e25-9b45-c56787b7a55b
Aug 24 09:09:59 ha2-btfn python[734]: SELinux is preventing /usr/bin/df from getattr access on the directory /dev/hugepages.

                                      *****  Plugin catchall (100. confidence) suggests   **************************

                                      If you believe that df should be allowed getattr access on the hugepages directory by default.
                                      Then you should report this as a bug.
                                      You can generate a local policy module to allow this access.
                                      Do
                                      allow this access for now by executing:
                                      # grep df /var/log/audit/audit.log | audit2allow -M mypol
                                      # semodule -i mypol.pp

Comment 3 Benjamin Lefoul 2016-09-05 09:17:10 UTC
This happened with dbus-1.6.12-13.el7.x86_64 and could be related to:

https://bugzilla.redhat.com//show_bug.cgi?id=1325870

Comment 4 Matthias Clasen 2019-09-11 19:24:42 UTC
Is this still an issue ?