Bug 1565735
Summary: | dbus-broker Fails to Start After Upgrade to F28 (beta) | ||
---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Garry T. Williams <gtwilliams> |
Component: | dbus-broker | Assignee: | Tom Gundersen <tgunders> |
Status: | CLOSED ERRATA | QA Contact: | Fedora Extras Quality Assurance <extras-qa> |
Severity: | unspecified | Docs Contact: | |
Priority: | unspecified | ||
Version: | 28 | CC: | dh.herrmann, j.gjorgji, tgunders, yaneti |
Target Milestone: | --- | ||
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | dbus-broker-13-1.fc26 dbus-broker-13-1.fc28 dbus-broker-13-1.fc27 | Doc Type: | If docs needed, set a value |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2018-05-03 15:22:09 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Garry T. Williams
2018-04-10 15:58:46 UTC
I can also see this happening although it somehow gets started later on. service logs on boot: Apr 14 09:51:55 farscape systemd[1]: Starting D-Bus System Message Bus... Apr 14 09:53:25 farscape systemd[1]: dbus-broker.service: Start operation timed out. Terminating. Apr 14 09:53:26 farscape systemd[1]: dbus-broker.service: Failed with result 'timeout'. Apr 14 09:53:26 farscape systemd[1]: Failed to start D-Bus System Message Bus. Apr 14 09:53:26 farscape systemd[1]: Starting D-Bus System Message Bus... Apr 14 09:54:56 farscape systemd[1]: Started D-Bus System Message Bus. grepping for dbus broker: Apr 14 09:53:25 farscape systemd[1]: dbus-broker.service: Start operation timed out. Terminating. Apr 14 09:53:26 farscape systemd[1]: dbus-broker.service: Failed with result 'timeout'. Apr 14 09:53:26 farscape audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Apr 14 09:54:56 farscape audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' Apr 14 09:54:57 farscape dbus-broker-launch[1149]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +31: Eavesdropping is deprecated and ignored Apr 14 09:54:57 farscape dbus-broker-launch[1149]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +33: Eavesdropping is deprecated and ignored Apr 14 09:55:26 farscape dbus-broker-launch[1456]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +31: Eavesdropping is deprecated and ignored Apr 14 09:55:26 farscape dbus-broker-launch[1456]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +33: Eavesdropping is deprecated and ignored dbus-broker-11-1.fc28.1 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-618e944f7f Thanks for the report! v11 was released when f28 was frozen, so updating from f27 to f28 would downgrade the broker. I now pushed v11 to f28 too, which I hope would fix the problem. Please let us know if that is not the case. No joy: Apr 17 08:12:02 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating. Apr 17 08:12:02 vfr systemd[1]: dbus-broker.service: Failed with result 'timeout'. Apr 17 08:12:02 vfr systemd[1]: Failed to start D-Bus System Message Bus. $ rpm -q dbus-broker dbus-broker-11-1.fc28.1.x86_64 $ And lots of service start problems: Apr 17 08:12:02 vfr rngd[859]: Failed to init entropy source 0: Hardware RNG Device Apr 17 08:12:02 vfr rngd[859]: Failed to init entropy source 1: TPM RNG Device Apr 17 08:13:32 vfr systemd-logind[866]: Failed to start user slice user-1000.slice, ignoring: Connection timed out (org.freedesktop.DBus.Error.Timeout) Apr 17 08:13:32 vfr systemd[1]: avahi-daemon.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd-logind[866]: Failed to start user service, ignoring: Connection timed out Apr 17 08:13:32 vfr systemd[1]: abrtd.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd[1]: systemd-logind.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd[1]: accounts-daemon.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd[1]: rtkit-daemon.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd[1]: firewalld.service: Start operation timed out. Terminating. Apr 17 08:13:32 vfr systemd[1]: accounts-daemon.service: Failed with result 'timeout'. Apr 17 08:13:32 vfr systemd[1]: Failed to start Accounts Service. Apr 17 08:13:32 vfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=accounts-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' Apr 17 08:13:32 vfr systemd[1]: firewalld.service: Failed with result 'timeout'. Apr 17 08:13:32 vfr systemd[1]: Failed to start firewalld - dynamic firewall daemon. I disabled dbus-broker again. dbus-broker-11-1.fc28.1 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-618e944f7f dbus-broker-11-1.fc28.1 does not fix the problem for me. (My system had dbus-broker-11-1.fc28 installed, in case it matters.) Can I provide any other problem source identification data? Hey! I am still trying to trace this down, and I have found some suspicious changes in F28. My current guess is that you trigger a deadlock during start of the session or system bus. Both dbus-daemon and dbus-broker need to resolve user-names in the XML configurations before they start up, for historic reasons. This requires calling out into nss-modules via getpwnam(3) in glibc. Now, the nss-modules might do any kind of remote calls to serve the request. If this happens to be D-Bus, they will try to connect to D-Bus and, voila, deadlock. This issue is well-known, and there have been precautions to avoid this. Long story short, I suspect this is due to nss changes in F28. So I'd be very glad if you could answer me the following questions: A) Do you use any network-information-services? Something like NIS, or LDAP, or SSSD, or NFS? B) Can you provide me the content of /etc/nsswitch.conf, mostly interested in the 'passwd:' line. This config specifies the nss modules loaded by glibc. C) Did you somehow adapt your system to the F28 glibc changes that removed NIS support and instead now requires libnss_nis? Is a package called libnss_nis installed on your system? D) Can you list all installed nss modules? Something like `find /usr -iname "*nss_*.so*"` E) Do you use custom SSSD modules? Have you configured SSSD in any way? We are working on a workaround in dbus-broker, which simply initializes the nss-caches via /etc/passwd, and loudly warns if it has to resort to nss to resolve usernames of the XML configuration. This should at least allow us to verify that the issue is deadlocks through nss. I will try to push this on Monday. So I am unable to reproduce this, sadly. So if you have any special setup that might be related to sharing usernames across networks (as I said, like NIS, or LDAP, or SSSD), let me know and I will try to reproduce locally. Thanks! Hi! In addition to David's questions, could you also try removing DefaultDependencies=false Before=basic.target shutdown.target Conflicts=shutdown.target from your dbus-broker.service files, and see if that changes anything? Cheers, Tom (In reply to David Herrmann from comment #7) > A) Do you use any network-information-services? Something like NIS, or LDAP, > or SSSD, or NFS? No. > B) Can you provide me the content of /etc/nsswitch.conf, mostly interested > in the 'passwd:' line. This config specifies the nss modules loaded by glibc. The only uncommented lines are: bootparams: nisplus [NOTFOUND=return] files ethers: files netmasks: files networks: files protocols: files rpc: files services: files sss netgroup: files sss publickey: nisplus automount: files sss aliases: files nisplus > C) Did you somehow adapt your system to the F28 glibc changes that removed > NIS support and instead now requires libnss_nis? Huh? > Is a package called > libnss_nis installed on your system? No. $ rpm -qa|grep nss nss-softokn-freebl-3.36.0-1.0.fc28.x86_64 openssh-7.7p1-3.fc28.x86_64 jansson-2.11-1.fc28.x86_64 libsss_nss_idmap-1.16.1-2.fc28.x86_64 openssh-server-7.7p1-3.fc28.x86_64 nss-mdns-0.14.1-1.fc28.x86_64 nss-util-3.36.0-1.0.fc28.x86_64 xmlsec1-openssl-1.2.25-3.fc28.x86_64 rubygem-openssl-2.1.0-92.fc28.x86_64 nss-softokn-3.36.0-1.0.fc28.x86_64 nss-sysinit-3.36.0-1.0.fc28.x86_64 openssl-libs-1.1.0h-3.fc28.x86_64 compat-openssl10-1.0.2o-1.fc28.x86_64 nss-pem-1.0.3-9.fc28.x86_64 nss-3.36.0-1.0.fc28.x86_64 openssh-clients-7.7p1-3.fc28.x86_64 kf5-kdnssd-5.44.0-1.fc28.x86_64 openssl-1.1.0h-3.fc28.x86_64 kdnssd-17.12.3-1.fc28.x86_64 nss-tools-3.36.0-1.0.fc28.x86_64 $ > D) Can you list all installed nss modules? Something like `find /usr -iname > "*nss_*.so*"` $ sudo find /usr -iname "*nss_*.so*" [sudo] password for garry: /usr/share/man/man8/libnss_myhostname.so.2.8.gz /usr/share/man/man8/libnss_resolve.so.2.8.gz /usr/share/man/man8/libnss_systemd.so.2.8.gz /usr/lib64/libnss_dns-2.27.so /usr/lib64/libnss_mdns6.so.2 /usr/lib64/libsss_nss_idmap.so.0 /usr/lib64/libnss_myhostname.so.2 /usr/lib64/libnss_mdns4_minimal.so.2 /usr/lib64/libnss_systemd.so.2 /usr/lib64/libnss_mdns6_minimal.so.2 /usr/lib64/libnss_files.so.2 /usr/lib64/libnss_mdns.so.2 /usr/lib64/libsss_nss_idmap.so.0.4.0 /usr/lib64/libnss_compat-2.27.so /usr/lib64/libnss_compat.so.2 /usr/lib64/libnss_resolve.so.2 /usr/lib64/libnss_dns.so.2 /usr/lib64/libnss_mdns4.so.2 /usr/lib64/libnss_sss.so.2 /usr/lib64/libnss_files-2.27.so /usr/lib64/libnss_mdns_minimal.so.2 $ > E) Do you use custom SSSD modules? No. > Have you configured SSSD in any way? No. (In reply to Tom Gundersen from comment #8) > In addition to David's questions, could you also try removing > > DefaultDependencies=false > Before=basic.target shutdown.target > Conflicts=shutdown.target > > from your dbus-broker.service files, and see if that changes anything? I am away from that machine right now, but I tried making the change you suggested and enabling dbus-broker.service again and rebooting. After about two minutes, I cannot ssh back again. This looks like the same problem. After waiting another couple of minutes, I am able to log back in. I now quote the journal with the same issue: Apr 21 10:18:13 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating. Apr 21 10:18:13 vfr systemd[1]: abrtd.service: Start operation timed out. Terminating. Apr 21 10:18:13 vfr systemd[1]: rtkit-daemon.service: Start operation timed out. Terminating. Apr 21 10:18:13 vfr systemd[1]: accounts-daemon.service: Failed with result 'timeout'. Apr 21 10:18:13 vfr systemd[1]: Failed to start Accounts Service. ... Apr 21 10:18:14 vfr avahi-daemon[840]: dbus_bus_request_name(): Connection is closed Apr 21 10:18:14 vfr avahi-daemon[840]: WARNING: Failed to contact D-Bus daemon. Apr 21 10:18:14 vfr avahi-daemon[840]: avahi-daemon 0.7 exiting. Apr 21 10:18:14 vfr systemd[1]: dbus-broker.service: Failed with result 'timeout'. Apr 21 10:18:14 vfr systemd[1]: Failed to start D-Bus System Message Bus. Apr 21 10:18:14 vfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' After disabling the dbus-broker, I rebooted and after one minute I am logged back in again. (No delay this time.) Please let me know if I can provide any other diagnostic data or information. (In reply to Garry T. Williams from comment #9) > (In reply to David Herrmann from comment #7) > > B) Can you provide me the content of /etc/nsswitch.conf, mostly interested > > in the 'passwd:' line. This config specifies the nss modules loaded by glibc. > > The only uncommented lines are: > > bootparams: nisplus [NOTFOUND=return] files > ethers: files > netmasks: files > networks: files > protocols: files > rpc: files > services: files sss > netgroup: files sss > publickey: nisplus > automount: files sss > aliases: files nisplus So your nsswitch.conf lacks a 'passwd', 'group', 'shadow', and 'hosts' entry. Depending on which glibc version you are running, a different default will take effect. Tom prepared the dbus-broker changes to be more verbose about nss queries, which will help us debug this better. You could try adding this to nsswitch.conf: passwd: files group: files This makes sure user-names and groups are resolved by looking at /etc/{passwd,groups}. We will try to push the dbus-broker updates to F28 as soon as possible. Thanks a lot for the feedback! Oh, I am so sorry. I misled you on this. My nsswitch file does have passwd, group, etc. passwd: files sss systemd shadow: files sss group: files sss systemd hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname I hope I didn't cause any trouble. I also noticed a .rpmnew version that I suppose I should be installing. (I'm pretty bad at keeping up to date with these changes.) The new file has these differences: passwd: sss files shadow: files sss group: sss files hosts: files dns netgroup: nisplus sss automount: files nisplus Well, that did it! I moved the .rpmnew file to nsswitch.conf, enabled dbus-broker, and rebooted. No problems now. Please consider this report resolved and thank you for your support. Thanks for the feedback, seems the mystery is solved then. This patch should fix it (even if you use your old nsswitch.conf): https://github.com/bus1/dbus-broker/pull/118/commits/c9232baf2e495365d57acec11e79a2eec10985d6. dbus-broker-13-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-51a0beed51 dbus-broker-13-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b30419e8b1 dbus-broker-13-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-730a5a85c4 dbus-broker-13-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-730a5a85c4 dbus-broker-13-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-51a0beed51 dbus-broker-13-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report. See https://fedoraproject.org/wiki/QA:Updates_Testing for instructions on how to install test updates. You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b30419e8b1 This probably moot, since not many people actually are running dbus-broker, but... I enabled dbus-broker on three different machines before F28. Each one ran into the can't start problem immediately after upgrading to F28. The fix on each one was to move the .rpmnew version of nsswitch.conf to the actual file and reboot. I know that I didn't touch the original nsswitch.conf file on at least one machine, so rpm seems to do the wrong thing by *not* updating the nsswitch.conf file. Or... Maybe dbus-broker does the wrong thing by relying on a certain nsswitch.conf file. The last machine had the latest dbus-broker, but the "down-level" nsswitch.conf and fails at boot-time. If dbus-broker is to be enabled at some future date, I think this needs to be fixed first to avoid the failures. Yes, that is completely true. Good thing is, this NSS-lockup is well-known and also affects dbus-daemon. However, a lot of applications already work around this with unfortunate hacks (like setting special environment variables for dbus-daemon, then queriyng them from NSS modules). So to me the most important part is to know what exactly the issue is. When dbus-broker becomes the default, we must definitely protect against such nsswitch-misconfigurations. I thought we tracked down the issue and fixed it, even with misconfigured nsswitch.conf. If things are still not working with newest dbus-broker, your log should include a line like `Looking up UID for user '%s' over NSS...` directly before a possible deadlock, and `NSS returned UID %u for user '%s'` when resolved. Hence, we can at least detect such lockups now, and know where it stalled. So if you still see lockups with dbus-broker-13, can you show us your nsswitch.conf and maybe /etc/passwd (uid+usernames)? Well, on the last machine I upgraded, I see this after reboot after upgrade: May 02 18:54:30 tfr dbus-broker-launch[763]: Looking up UID for user 'gdm' over NSS... May 02 18:54:31 tfr systemd[1]: Received SIGRTMIN+20 from PID 347 (plymouthd). May 02 18:54:33 tfr systemd[1]: Received SIGRTMIN+20 from PID 347 (plymouthd). May 02 18:54:33 tfr kernel: random: crng init done May 02 18:54:35 tfr audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' May 02 18:56:00 tfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating. May 02 18:56:01 tfr systemd[1]: dbus-broker.service: Failed with result 'timeout'. May 02 18:56:01 tfr systemd[1]: Failed to start D-Bus System Message Bus. Just to be sure, I double checked the dnf logs to confirm I was running 13-1.fc28: 2018-05-02T22:51:45Z INFO Upgraded: dbus-broker-13-1.fc28.x86_64 My logs are US EDT time, so 18:54 EDT (-0400) is 22:54Z. Yes, I upgraded to 13-1 immediately before the reboot, which failed. Now the bad news: I moved the rpmnew file to my nsswitch.conf without saving a copy. :-( But I have some good news. I reset my nsswitch.conf file to these: passwd: files sss systemd shadow: files sss group: files sss systemd hosts: files mdns4_minimal [NOTFOUND=return] dns myhostname which were the values I had in the first machine that had the problem (this bug report). But I cannot be sure the latest machine had the same values when it failed immediately after upgrading to F28. I see dbus-broker fail to start now with this new nsswitch.conf: May 03 09:58:51 tfr dbus-broker-launch[770]: Looking up UID for user 'gdm' over NSS... May 03 09:58:52 tfr systemd[1]: Received SIGRTMIN+20 from PID 358 (plymouthd). May 03 09:58:54 tfr kernel: random: crng init done May 03 09:58:54 tfr systemd[1]: Received SIGRTMIN+20 from PID 358 (plymouthd). May 03 09:58:56 tfr audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success' May 03 10:00:21 tfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating. May 03 10:00:21 tfr systemd[1]: dbus-broker.service: Failed with result 'timeout'. May 03 10:00:21 tfr systemd[1]: Failed to start D-Bus System Message Bus. May 03 10:00:21 tfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed' /etc/passwd: garry@tfr$ cat /etc/passwd root:x:0:0:root:/root:/bin/bash bin:x:1:1:bin:/bin:/sbin/nologin daemon:x:2:2:daemon:/sbin:/sbin/nologin adm:x:3:4:adm:/var/adm:/sbin/nologin lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin sync:x:5:0:sync:/sbin:/bin/sync shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown halt:x:7:0:halt:/sbin:/sbin/halt mail:x:8:12:mail:/var/spool/mail:/sbin/nologin operator:x:11:0:operator:/root:/sbin/nologin games:x:12:100:games:/usr/games:/sbin/nologin ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin nobody:x:99:99:Nobody:/:/sbin/nologin oprofile:x:16:16:Special user account to be used by OProfile:/var/lib/oprofile:/sbin/nologin avahi-autoipd:x:170:170:Avahi IPv4LL Stack:/var/lib/avahi-autoipd:/sbin/nologin dbus:x:81:81:System message bus:/:/sbin/nologin polkitd:x:999:999:User for polkitd:/:/sbin/nologin usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin abrt:x:173:173::/etc/abrt:/sbin/nologin rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin colord:x:998:997:User for colord:/var/lib/colord:/sbin/nologin rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin chrony:x:997:995::/var/lib/chrony:/sbin/nologin tss:x:59:59:Account used by the trousers package to sandbox the tcsd daemon:/dev/null:/sbin/nologin mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/bash openvpn:x:996:994:OpenVPN:/etc/openvpn:/sbin/nologin unbound:x:995:993:Unbound DNS resolver:/etc/unbound:/sbin/nologin pulse:x:994:992:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin nm-openconnect:x:993:990:NetworkManager user for OpenConnect:/:/sbin/nologin avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin tcpdump:x:72:72::/:/sbin/nologin garry:x:1000:1000:Garry T. Williams:/home/garry:/bin/zsh systemd-timesync:x:992:985:systemd Time Synchronization:/:/sbin/nologin systemd-network:x:991:984:systemd Network Management:/:/sbin/nologin systemd-resolve:x:990:983:systemd Resolver:/:/sbin/nologin systemd-bus-proxy:x:989:982:systemd Bus Proxy:/:/sbin/nologin geoclue:x:988:981:User for geoclue:/var/lib/geoclue:/sbin/nologin setroubleshoot:x:987:978::/var/lib/setroubleshoot:/sbin/nologin nm-openvpn:x:986:977:Default user for running openvpn spawned by NetworkManager:/:/sbin/nologin systemd-coredump:x:976:976:systemd Core Dumper:/:/sbin/nologin sddm:x:975:975:Simple Desktop Display Manager:/var/lib/sddm:/sbin/nologin apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin sphinx:x:974:974:Sphinx Search:/usr/lib/tmpfiles.d/lib/sphinx:/bin/bash dnsmasq:x:972:972:Dnsmasq DHCP and DNS server:/var/lib/dnsmasq:/sbin/nologin garry@tfr$ Perhaps a clue is that I do not have a gdm user defined on this system. I think that is because I do not run Gnome -- KDE instead. The same is true for all the systems I see this failure on. (I just logged into all of them to check.) I hope this helps. dbus-broker-13-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report. dbus-broker-13-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report. (In reply to Garry T. Williams from comment #22) > Perhaps a clue is that I do not have a gdm user defined on this system. I > think that is because I do not run Gnome -- KDE instead. The same is true > for all the systems I see this failure on. (I just logged into all of them > to check.) > > I hope this helps. Yes it does. A lot. Turns out, dbus-daemon has the exact same problem as dbus-broker. The difference is, with dbus-daemon all usernames in your XML configurations are valid, with dbus-broker the 'gdm' username turns out to be invalid, since you don't have gdm installed, but the 'gdm' user is special cased for a specific policy (lets ignore details for now). Generally, this would not be an issue. Unknown usernames are simply ignored. However, with F28 a new NSS module was added with systemd (libnss_systemd.so). This module uses D-Bus to resolve dynamic-usernames introduced with a recent systemd change. Since this deadlocks if done from within dbus-daemon or dbus-broker, systemd sets a special environment variable (SYSTEMD_NSS_BYPASS_BUS) which then makes nss-systemd skip the username-resolution via the bus, and instead uses a filesystem shortcut. However, it turns our that nss-systemd cannot read this environment variable if SELinux is enabled. nss-systemd uses secure_getenv(3p) under the hood, and this returns NULL if AT_SECURE is set for a process. If SELinux is enabled, AT_SECURE is set whenever execve(2) performs a selinux-context transition. Long story short: If you add an invalid username to the system-bus XML configuration, dbus-daemon suffers from the same lockup as you describe here. Hence, the only reason all of this works, is because nobody so far triggered this with SELinux enabled. Simple workaround: Disable selinux (or put into permissive mode). Other workaround: Make sure the 'gdm' user exists (as simple as installing gdm) More workarounds: Use dbus-daemon for now. I notified the systemd developers of this problem, and they will no doubt come up with a fix. Until then, I think we will remove the 'gdm' special case from dbus-broker, to make sure we behave the same as dbus-daemon. Again, thanks a lot for the report and insisting on a proper fix! It took a while to trace this down, but I am quite certain we found the issue now. If anyone wants to reproduce this, it is as simple as adding this snippet somewhere in /usr/share/dbus-1/system.conf: <policy user="NonExistantUser"> <allow send_destination="foo.bar" /> </policy> Thanks! David dbus-broker-13-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report. |