Bug 1565735

Summary: dbus-broker Fails to Start After Upgrade to F28 (beta)
Product: [Fedora] Fedora Reporter: Garry T. Williams <gtwilliams>
Component: dbus-brokerAssignee: Tom Gundersen <tgunders>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 28CC: dh.herrmann, j.gjorgji, tgunders, yaneti
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: dbus-broker-13-1.fc26 dbus-broker-13-1.fc28 dbus-broker-13-1.fc27 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-05-03 15:22:09 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Garry T. Williams 2018-04-10 15:58:46 UTC
Description of problem: I enabled dbus-broker to test in F27.  After upgrading to F28, the service fails to start (timeout).


Version-Release number of selected component (if applicable): dbus-broker-11-1.fc28.x86_64


How reproducible: Always


Steps to Reproduce:
1. Enable dbus-broker in F27
2. Upgrade to F28
3.

Actual results: dbus-broker fails to start


Expected results: dbus-broker starts


Additional info:

Apr 10 10:56:49 vfr systemd[1]: Reached target Paths.
Apr 10 10:56:49 vfr systemd[1]: Started Daily Cleanup of Temporary Directories.
Apr 10 10:56:49 vfr systemd[1]: Listening on D-Bus System Message Bus Socket.
Apr 10 10:56:49 vfr systemd[1]: Starting D-Bus System Message Bus...
Apr 10 10:56:49 vfr systemd[1]: Listening on Avahi mDNS/DNS-SD Stack Activation Socket.
Apr 10 10:56:49 vfr systemd[1]: Started Discard unused blocks once a week.
Apr 10 10:56:49 vfr systemd[1]: Listening on CUPS Scheduler.
Apr 10 10:56:49 vfr systemd[1]: Reached target Sockets.
Apr 10 10:56:49 vfr systemd[1]: Started daily update of the root trust anchor for DNSSEC.
Apr 10 10:56:49 vfr systemd[1]: Reached target Timers.
Apr 10 10:56:50 vfr systemd[1]: Received SIGRTMIN+20 from PID 400 (plymouthd).
Apr 10 10:58:19 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
Apr 10 10:58:19 vfr systemd[1]: dbus-broker.service: Failed with result 'timeout'.
Apr 10 10:58:19 vfr systemd[1]: Failed to start D-Bus System Message Bus.
Apr 10 10:58:19 vfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Apr 10 10:58:19 vfr systemd[1]: Reached target Basic System.

Comment 1 j.gjorgji 2018-04-14 08:04:53 UTC
I can also see this happening although it somehow gets started later on.

service logs on boot:
Apr 14 09:51:55 farscape systemd[1]: Starting D-Bus System Message Bus...
Apr 14 09:53:25 farscape systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
Apr 14 09:53:26 farscape systemd[1]: dbus-broker.service: Failed with result 'timeout'.
Apr 14 09:53:26 farscape systemd[1]: Failed to start D-Bus System Message Bus.
Apr 14 09:53:26 farscape systemd[1]: Starting D-Bus System Message Bus...
Apr 14 09:54:56 farscape systemd[1]: Started D-Bus System Message Bus.

grepping for dbus broker:
Apr 14 09:53:25 farscape systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
Apr 14 09:53:26 farscape systemd[1]: dbus-broker.service: Failed with result 'timeout'.
Apr 14 09:53:26 farscape audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Apr 14 09:54:56 farscape audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
Apr 14 09:54:57 farscape dbus-broker-launch[1149]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +31: Eavesdropping is deprecated and ignored
Apr 14 09:54:57 farscape dbus-broker-launch[1149]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +33: Eavesdropping is deprecated and ignored
Apr 14 09:55:26 farscape dbus-broker-launch[1456]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +31: Eavesdropping is deprecated and ignored
Apr 14 09:55:26 farscape dbus-broker-launch[1456]: Policy to allow eavesdropping in /usr/share/dbus-1/session.conf +33: Eavesdropping is deprecated and ignored

Comment 2 Fedora Update System 2018-04-17 09:47:10 UTC
dbus-broker-11-1.fc28.1 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-618e944f7f

Comment 3 Tom Gundersen 2018-04-17 09:49:28 UTC
Thanks for the report! v11 was released when f28 was frozen, so updating from f27 to f28 would downgrade the broker. I now pushed v11 to f28 too, which I hope would fix the problem. Please let us know if that is not the case.

Comment 4 Garry T. Williams 2018-04-17 12:25:00 UTC
No joy:

Apr 17 08:12:02 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
Apr 17 08:12:02 vfr systemd[1]: dbus-broker.service: Failed with result 'timeout'.
Apr 17 08:12:02 vfr systemd[1]: Failed to start D-Bus System Message Bus.

$ rpm -q dbus-broker
dbus-broker-11-1.fc28.1.x86_64
$

And lots of service start problems:

Apr 17 08:12:02 vfr rngd[859]: Failed to init entropy source 0: Hardware RNG Device
Apr 17 08:12:02 vfr rngd[859]: Failed to init entropy source 1: TPM RNG Device
Apr 17 08:13:32 vfr systemd-logind[866]: Failed to start user slice user-1000.slice, ignoring: Connection timed out (org.freedesktop.DBus.Error.Timeout)
Apr 17 08:13:32 vfr systemd[1]: avahi-daemon.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd-logind[866]: Failed to start user service, ignoring: Connection timed out
Apr 17 08:13:32 vfr systemd[1]: abrtd.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd[1]: systemd-logind.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd[1]: accounts-daemon.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd[1]: rtkit-daemon.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd[1]: firewalld.service: Start operation timed out. Terminating.
Apr 17 08:13:32 vfr systemd[1]: accounts-daemon.service: Failed with result 'timeout'.
Apr 17 08:13:32 vfr systemd[1]: Failed to start Accounts Service.
Apr 17 08:13:32 vfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=accounts-daemon comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'
Apr 17 08:13:32 vfr systemd[1]: firewalld.service: Failed with result 'timeout'.
Apr 17 08:13:32 vfr systemd[1]: Failed to start firewalld - dynamic firewall daemon.

I disabled dbus-broker again.

Comment 5 Fedora Update System 2018-04-17 18:09:11 UTC
dbus-broker-11-1.fc28.1 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-618e944f7f

Comment 6 Garry T. Williams 2018-04-18 15:25:36 UTC
dbus-broker-11-1.fc28.1 does not fix the problem for me.

(My system had dbus-broker-11-1.fc28 installed, in case it matters.)

Can I provide any other problem source identification data?

Comment 7 David Herrmann 2018-04-20 15:24:08 UTC
Hey!

I am still trying to trace this down, and I have found some suspicious changes in F28. My current guess is that you trigger a deadlock during start of the session or system bus. Both dbus-daemon and dbus-broker need to resolve user-names in the XML configurations before they start up, for historic reasons. This requires calling out into nss-modules via getpwnam(3) in glibc. Now, the nss-modules might do any kind of remote calls to serve the request. If this happens to be D-Bus, they will try to connect to D-Bus and, voila, deadlock.

This issue is well-known, and there have been precautions to avoid this. Long story short, I suspect this is due to nss changes in F28. So I'd be very glad if you could answer me the following questions:

A) Do you use any network-information-services? Something like NIS, or LDAP, or SSSD, or NFS?

B) Can you provide me the content of /etc/nsswitch.conf, mostly interested in the 'passwd:' line. This config specifies the nss modules loaded by glibc.

C) Did you somehow adapt your system to the F28 glibc changes that removed NIS support and instead now requires libnss_nis? Is a package called libnss_nis installed on your system?

D) Can you list all installed nss modules? Something like `find /usr -iname "*nss_*.so*"`

E) Do you use custom SSSD modules? Have you configured SSSD in any way?

We are working on a workaround in dbus-broker, which simply initializes the nss-caches via /etc/passwd, and loudly warns if it has to resort to nss to resolve usernames of the XML configuration. This should at least allow us to verify that the issue is deadlocks through nss. I will try to push this on Monday.

So I am unable to reproduce this, sadly. So if you have any special setup that might be related to sharing usernames across networks (as I said, like NIS, or LDAP, or SSSD), let me know and I will try to reproduce locally.

Thanks!

Comment 8 Tom Gundersen 2018-04-20 15:59:18 UTC
Hi!

In addition to David's questions, could you also try removing

DefaultDependencies=false
Before=basic.target shutdown.target
Conflicts=shutdown.target

from your dbus-broker.service files, and see if that changes anything?

Cheers,

Tom

Comment 9 Garry T. Williams 2018-04-21 14:12:39 UTC
(In reply to David Herrmann from comment #7)
> A) Do you use any network-information-services? Something like NIS, or LDAP,
> or SSSD, or NFS?

No.

> B) Can you provide me the content of /etc/nsswitch.conf, mostly interested
> in the 'passwd:' line. This config specifies the nss modules loaded by glibc.

The only uncommented lines are:

bootparams: nisplus [NOTFOUND=return] files
ethers:     files
netmasks:   files
networks:   files
protocols:  files
rpc:        files
services:   files sss
netgroup:   files sss
publickey:  nisplus
automount:  files sss
aliases:    files nisplus

> C) Did you somehow adapt your system to the F28 glibc changes that removed
> NIS support and instead now requires libnss_nis?

Huh?

> Is a package called
> libnss_nis installed on your system?

No.

$ rpm -qa|grep nss   
nss-softokn-freebl-3.36.0-1.0.fc28.x86_64
openssh-7.7p1-3.fc28.x86_64
jansson-2.11-1.fc28.x86_64
libsss_nss_idmap-1.16.1-2.fc28.x86_64
openssh-server-7.7p1-3.fc28.x86_64
nss-mdns-0.14.1-1.fc28.x86_64
nss-util-3.36.0-1.0.fc28.x86_64
xmlsec1-openssl-1.2.25-3.fc28.x86_64
rubygem-openssl-2.1.0-92.fc28.x86_64
nss-softokn-3.36.0-1.0.fc28.x86_64
nss-sysinit-3.36.0-1.0.fc28.x86_64
openssl-libs-1.1.0h-3.fc28.x86_64
compat-openssl10-1.0.2o-1.fc28.x86_64
nss-pem-1.0.3-9.fc28.x86_64
nss-3.36.0-1.0.fc28.x86_64
openssh-clients-7.7p1-3.fc28.x86_64
kf5-kdnssd-5.44.0-1.fc28.x86_64
openssl-1.1.0h-3.fc28.x86_64
kdnssd-17.12.3-1.fc28.x86_64
nss-tools-3.36.0-1.0.fc28.x86_64
$

> D) Can you list all installed nss modules? Something like `find /usr -iname
> "*nss_*.so*"`

$ sudo find /usr -iname "*nss_*.so*"
[sudo] password for garry: 
/usr/share/man/man8/libnss_myhostname.so.2.8.gz
/usr/share/man/man8/libnss_resolve.so.2.8.gz
/usr/share/man/man8/libnss_systemd.so.2.8.gz
/usr/lib64/libnss_dns-2.27.so
/usr/lib64/libnss_mdns6.so.2
/usr/lib64/libsss_nss_idmap.so.0
/usr/lib64/libnss_myhostname.so.2
/usr/lib64/libnss_mdns4_minimal.so.2
/usr/lib64/libnss_systemd.so.2
/usr/lib64/libnss_mdns6_minimal.so.2
/usr/lib64/libnss_files.so.2
/usr/lib64/libnss_mdns.so.2
/usr/lib64/libsss_nss_idmap.so.0.4.0
/usr/lib64/libnss_compat-2.27.so
/usr/lib64/libnss_compat.so.2
/usr/lib64/libnss_resolve.so.2
/usr/lib64/libnss_dns.so.2
/usr/lib64/libnss_mdns4.so.2
/usr/lib64/libnss_sss.so.2
/usr/lib64/libnss_files-2.27.so
/usr/lib64/libnss_mdns_minimal.so.2
$

> E) Do you use custom SSSD modules?

No.

> Have you configured SSSD in any way?

No.

Comment 10 Garry T. Williams 2018-04-21 14:26:23 UTC
(In reply to Tom Gundersen from comment #8)
> In addition to David's questions, could you also try removing
> 
> DefaultDependencies=false
> Before=basic.target shutdown.target
> Conflicts=shutdown.target
> 
> from your dbus-broker.service files, and see if that changes anything?

I am away from that machine right now, but I tried making the change you suggested and enabling dbus-broker.service again and rebooting.

After about two minutes, I cannot ssh back again.  This looks like the same problem.

After waiting another couple of minutes, I am able to log back in.

I now quote the journal with the same issue:

Apr 21 10:18:13 vfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
Apr 21 10:18:13 vfr systemd[1]: abrtd.service: Start operation timed out. Terminating.
Apr 21 10:18:13 vfr systemd[1]: rtkit-daemon.service: Start operation timed out. Terminating.
Apr 21 10:18:13 vfr systemd[1]: accounts-daemon.service: Failed with result 'timeout'.
Apr 21 10:18:13 vfr systemd[1]: Failed to start Accounts Service.
...
Apr 21 10:18:14 vfr avahi-daemon[840]: dbus_bus_request_name(): Connection is closed
Apr 21 10:18:14 vfr avahi-daemon[840]: WARNING: Failed to contact D-Bus daemon.
Apr 21 10:18:14 vfr avahi-daemon[840]: avahi-daemon 0.7 exiting.
Apr 21 10:18:14 vfr systemd[1]: dbus-broker.service: Failed with result 'timeout'.
Apr 21 10:18:14 vfr systemd[1]: Failed to start D-Bus System Message Bus.
Apr 21 10:18:14 vfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

After disabling the dbus-broker, I rebooted and after one minute I am logged back in again.  (No delay this time.)

Please let me know if I can provide any other diagnostic data or information.

Comment 11 David Herrmann 2018-04-23 09:57:24 UTC
(In reply to Garry T. Williams from comment #9)
> (In reply to David Herrmann from comment #7)
> > B) Can you provide me the content of /etc/nsswitch.conf, mostly interested
> > in the 'passwd:' line. This config specifies the nss modules loaded by glibc.
> 
> The only uncommented lines are:
> 
> bootparams: nisplus [NOTFOUND=return] files
> ethers:     files
> netmasks:   files
> networks:   files
> protocols:  files
> rpc:        files
> services:   files sss
> netgroup:   files sss
> publickey:  nisplus
> automount:  files sss
> aliases:    files nisplus

So your nsswitch.conf lacks a 'passwd', 'group', 'shadow', and 'hosts' entry. Depending on which glibc version you are running, a different default will take effect.

Tom prepared the dbus-broker changes to be more verbose about nss queries, which will help us debug this better. You could try adding this to nsswitch.conf:

passwd: files
group: files

This makes sure user-names and groups are resolved by looking at /etc/{passwd,groups}.

We will try to push the dbus-broker updates to F28 as soon as possible.

Thanks a lot for the feedback!

Comment 12 Garry T. Williams 2018-04-23 13:28:39 UTC
Oh, I am so sorry.  I misled you on this.  My nsswitch file does have passwd, group, etc.

    passwd:      files sss systemd
    shadow:     files sss
    group:       files sss systemd
    hosts:      files mdns4_minimal [NOTFOUND=return] dns myhostname

I hope I didn't cause any trouble.

I also noticed a .rpmnew version that I suppose I should be installing.  (I'm pretty bad at keeping up to date with these changes.)  The new file has these differences:

    passwd:     sss files
    shadow:     files sss
    group:      sss files
    hosts:      files dns

    netgroup:   nisplus sss
    automount:  files nisplus

Well, that did it!

I moved the .rpmnew file to nsswitch.conf, enabled dbus-broker, and rebooted.  No problems now.

Please consider this report resolved and thank you for your support.

Comment 13 Tom Gundersen 2018-04-23 14:30:40 UTC
Thanks for the feedback, seems the mystery is solved then. This patch should fix it (even if you use your old nsswitch.conf): https://github.com/bus1/dbus-broker/pull/118/commits/c9232baf2e495365d57acec11e79a2eec10985d6.

Comment 14 Fedora Update System 2018-04-23 22:56:08 UTC
dbus-broker-13-1.fc27 has been submitted as an update to Fedora 27. https://bodhi.fedoraproject.org/updates/FEDORA-2018-51a0beed51

Comment 15 Fedora Update System 2018-04-23 22:56:18 UTC
dbus-broker-13-1.fc28 has been submitted as an update to Fedora 28. https://bodhi.fedoraproject.org/updates/FEDORA-2018-b30419e8b1

Comment 16 Fedora Update System 2018-04-23 22:56:25 UTC
dbus-broker-13-1.fc26 has been submitted as an update to Fedora 26. https://bodhi.fedoraproject.org/updates/FEDORA-2018-730a5a85c4

Comment 17 Fedora Update System 2018-04-24 05:04:22 UTC
dbus-broker-13-1.fc26 has been pushed to the Fedora 26 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-730a5a85c4

Comment 18 Fedora Update System 2018-04-24 05:38:58 UTC
dbus-broker-13-1.fc27 has been pushed to the Fedora 27 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-51a0beed51

Comment 19 Fedora Update System 2018-04-24 19:14:08 UTC
dbus-broker-13-1.fc28 has been pushed to the Fedora 28 testing repository. If problems still persist, please make note of it in this bug report.
See https://fedoraproject.org/wiki/QA:Updates_Testing for
instructions on how to install test updates.
You can provide feedback for this update here: https://bodhi.fedoraproject.org/updates/FEDORA-2018-b30419e8b1

Comment 20 Garry T. Williams 2018-05-02 23:30:28 UTC
This probably moot, since not many people actually are running
dbus-broker, but...

I enabled dbus-broker on three different machines before F28.  Each
one ran into the can't start problem immediately after upgrading to
F28.  The fix on each one was to move the .rpmnew version of
nsswitch.conf to the actual file and reboot.

I know that I didn't touch the original nsswitch.conf file on at least
one machine, so rpm seems to do the wrong thing by *not* updating the
nsswitch.conf file.  Or...

Maybe dbus-broker does the wrong thing by relying on a certain
nsswitch.conf file.  The last machine had the latest dbus-broker, but
the "down-level" nsswitch.conf and fails at boot-time.

If dbus-broker is to be enabled at some future date, I think this
needs to be fixed first to avoid the failures.

Comment 21 David Herrmann 2018-05-03 08:01:59 UTC
Yes, that is completely true. Good thing is, this NSS-lockup is well-known and also affects dbus-daemon. However, a lot of applications already work around this with unfortunate hacks (like setting special environment variables for dbus-daemon, then queriyng them from NSS modules).

So to me the most important part is to know what exactly the issue is. When dbus-broker becomes the default, we must definitely protect against such nsswitch-misconfigurations.

I thought we tracked down the issue and fixed it, even with misconfigured nsswitch.conf. If things are still not working with newest dbus-broker, your log should include a line like `Looking up UID for user '%s' over NSS...` directly before a possible deadlock, and `NSS returned UID %u for user '%s'` when resolved. Hence, we can at least detect such lockups now, and know where it stalled.

So if you still see lockups with dbus-broker-13, can you show us your nsswitch.conf and maybe /etc/passwd (uid+usernames)?

Comment 22 Garry T. Williams 2018-05-03 14:19:18 UTC
Well, on the last machine I upgraded, I see this after reboot after upgrade:

    May 02 18:54:30 tfr dbus-broker-launch[763]: Looking up UID for user 'gdm' over NSS...
    May 02 18:54:31 tfr systemd[1]: Received SIGRTMIN+20 from PID 347 (plymouthd).
    May 02 18:54:33 tfr systemd[1]: Received SIGRTMIN+20 from PID 347 (plymouthd).
    May 02 18:54:33 tfr kernel: random: crng init done
    May 02 18:54:35 tfr audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
    May 02 18:56:00 tfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
    May 02 18:56:01 tfr systemd[1]: dbus-broker.service: Failed with result 'timeout'.
    May 02 18:56:01 tfr systemd[1]: Failed to start D-Bus System Message Bus.

Just to be sure, I double checked the dnf logs to confirm I was running 13-1.fc28:

    2018-05-02T22:51:45Z INFO Upgraded: dbus-broker-13-1.fc28.x86_64

My logs are US EDT time, so 18:54 EDT (-0400) is 22:54Z.

Yes, I upgraded to 13-1 immediately before the reboot, which failed.

Now the bad news:  I moved the rpmnew file to my nsswitch.conf without saving a copy.  :-(

But I have some good news.  I reset my nsswitch.conf file to these:

    passwd:      files sss systemd
    shadow:     files sss
    group:       files sss systemd
    hosts:      files mdns4_minimal [NOTFOUND=return] dns myhostname

which were the values I had in the first machine that had the problem (this bug report).  But I cannot be sure the latest machine had the same values when it failed immediately after upgrading to F28.

I see dbus-broker fail to start now with this new nsswitch.conf:

    May 03 09:58:51 tfr dbus-broker-launch[770]: Looking up UID for user 'gdm' over NSS...
    May 03 09:58:52 tfr systemd[1]: Received SIGRTMIN+20 from PID 358 (plymouthd).
    May 03 09:58:54 tfr kernel: random: crng init done
    May 03 09:58:54 tfr systemd[1]: Received SIGRTMIN+20 from PID 358 (plymouthd).
    May 03 09:58:56 tfr audit[1]: SERVICE_STOP pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=systemd-rfkill comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=success'
    May 03 10:00:21 tfr systemd[1]: dbus-broker.service: Start operation timed out. Terminating.
    May 03 10:00:21 tfr systemd[1]: dbus-broker.service: Failed with result 'timeout'.
    May 03 10:00:21 tfr systemd[1]: Failed to start D-Bus System Message Bus.
    May 03 10:00:21 tfr audit[1]: SERVICE_START pid=1 uid=0 auid=4294967295 ses=4294967295 subj=system_u:system_r:init_t:s0 msg='unit=dbus-broker comm="systemd" exe="/usr/lib/systemd/systemd" hostname=? addr=? terminal=? res=failed'

/etc/passwd:

garry@tfr$ cat /etc/passwd
root:x:0:0:root:/root:/bin/bash
bin:x:1:1:bin:/bin:/sbin/nologin
daemon:x:2:2:daemon:/sbin:/sbin/nologin
adm:x:3:4:adm:/var/adm:/sbin/nologin
lp:x:4:7:lp:/var/spool/lpd:/sbin/nologin
sync:x:5:0:sync:/sbin:/bin/sync
shutdown:x:6:0:shutdown:/sbin:/sbin/shutdown
halt:x:7:0:halt:/sbin:/sbin/halt
mail:x:8:12:mail:/var/spool/mail:/sbin/nologin
operator:x:11:0:operator:/root:/sbin/nologin
games:x:12:100:games:/usr/games:/sbin/nologin
ftp:x:14:50:FTP User:/var/ftp:/sbin/nologin
nobody:x:99:99:Nobody:/:/sbin/nologin
oprofile:x:16:16:Special user account to be used by
OProfile:/var/lib/oprofile:/sbin/nologin
avahi-autoipd:x:170:170:Avahi IPv4LL Stack:/var/lib/avahi-autoipd:/sbin/nologin
dbus:x:81:81:System message bus:/:/sbin/nologin
polkitd:x:999:999:User for polkitd:/:/sbin/nologin
usbmuxd:x:113:113:usbmuxd user:/:/sbin/nologin
abrt:x:173:173::/etc/abrt:/sbin/nologin
rpc:x:32:32:Rpcbind Daemon:/var/lib/rpcbind:/sbin/nologin
colord:x:998:997:User for colord:/var/lib/colord:/sbin/nologin
rtkit:x:172:172:RealtimeKit:/proc:/sbin/nologin
chrony:x:997:995::/var/lib/chrony:/sbin/nologin
tss:x:59:59:Account used by the trousers package to sandbox the tcsd
daemon:/dev/null:/sbin/nologin
mysql:x:27:27:MySQL Server:/var/lib/mysql:/bin/bash
openvpn:x:996:994:OpenVPN:/etc/openvpn:/sbin/nologin
unbound:x:995:993:Unbound DNS resolver:/etc/unbound:/sbin/nologin
pulse:x:994:992:PulseAudio System Daemon:/var/run/pulse:/sbin/nologin
nm-openconnect:x:993:990:NetworkManager user for OpenConnect:/:/sbin/nologin
avahi:x:70:70:Avahi mDNS/DNS-SD Stack:/var/run/avahi-daemon:/sbin/nologin
rpcuser:x:29:29:RPC Service User:/var/lib/nfs:/sbin/nologin
nfsnobody:x:65534:65534:Anonymous NFS User:/var/lib/nfs:/sbin/nologin
sshd:x:74:74:Privilege-separated SSH:/var/empty/sshd:/sbin/nologin
tcpdump:x:72:72::/:/sbin/nologin
garry:x:1000:1000:Garry T. Williams:/home/garry:/bin/zsh
systemd-timesync:x:992:985:systemd Time Synchronization:/:/sbin/nologin
systemd-network:x:991:984:systemd Network Management:/:/sbin/nologin
systemd-resolve:x:990:983:systemd Resolver:/:/sbin/nologin
systemd-bus-proxy:x:989:982:systemd Bus Proxy:/:/sbin/nologin
geoclue:x:988:981:User for geoclue:/var/lib/geoclue:/sbin/nologin
setroubleshoot:x:987:978::/var/lib/setroubleshoot:/sbin/nologin
nm-openvpn:x:986:977:Default user for running openvpn spawned by
NetworkManager:/:/sbin/nologin
systemd-coredump:x:976:976:systemd Core Dumper:/:/sbin/nologin
sddm:x:975:975:Simple Desktop Display Manager:/var/lib/sddm:/sbin/nologin
apache:x:48:48:Apache:/usr/share/httpd:/sbin/nologin
sphinx:x:974:974:Sphinx Search:/usr/lib/tmpfiles.d/lib/sphinx:/bin/bash
dnsmasq:x:972:972:Dnsmasq DHCP and DNS server:/var/lib/dnsmasq:/sbin/nologin
garry@tfr$

Perhaps a clue is that I do not have a gdm user defined on this system.  I think that is because I do not run Gnome -- KDE instead.  The same is true for all the systems I see this failure on.  (I just logged into all of them to check.)

I hope this helps.

Comment 23 Fedora Update System 2018-05-03 15:22:09 UTC
dbus-broker-13-1.fc26 has been pushed to the Fedora 26 stable repository. If problems still persist, please make note of it in this bug report.

Comment 24 Fedora Update System 2018-05-03 15:25:55 UTC
dbus-broker-13-1.fc28 has been pushed to the Fedora 28 stable repository. If problems still persist, please make note of it in this bug report.

Comment 25 David Herrmann 2018-05-03 15:55:58 UTC
(In reply to Garry T. Williams from comment #22)
> Perhaps a clue is that I do not have a gdm user defined on this system.  I
> think that is because I do not run Gnome -- KDE instead.  The same is true
> for all the systems I see this failure on.  (I just logged into all of them
> to check.)
> 
> I hope this helps.

Yes it does. A lot.

Turns out, dbus-daemon has the exact same problem as dbus-broker. The difference is, with dbus-daemon all usernames in your XML configurations are valid, with dbus-broker the 'gdm' username turns out to be invalid, since you don't have gdm installed, but the 'gdm' user is special cased for a specific policy (lets ignore details for now).

Generally, this would not be an issue. Unknown usernames are simply ignored. However, with F28 a new NSS module was added with systemd (libnss_systemd.so). This module uses D-Bus to resolve dynamic-usernames introduced with a recent systemd change. Since this deadlocks if done from within dbus-daemon or dbus-broker, systemd sets a special environment variable (SYSTEMD_NSS_BYPASS_BUS) which then makes nss-systemd skip the username-resolution via the bus, and instead uses a filesystem shortcut.

However, it turns our that nss-systemd cannot read this environment variable if SELinux is enabled. nss-systemd uses secure_getenv(3p) under the hood, and this returns NULL if AT_SECURE is set for a process. If SELinux is enabled, AT_SECURE is set whenever execve(2) performs a selinux-context transition.

Long story short: If you add an invalid username to the system-bus XML configuration, dbus-daemon suffers from the same lockup as you describe here. Hence, the only reason all of this works, is because nobody so far triggered this with SELinux enabled.

Simple workaround: Disable selinux (or put into permissive mode).

Other workaround: Make sure the 'gdm' user exists (as simple as installing gdm)

More workarounds: Use dbus-daemon for now.

I notified the systemd developers of this problem, and they will no doubt come up with a fix. Until then, I think we will remove the 'gdm' special case from dbus-broker, to make sure we behave the same as dbus-daemon.

Again, thanks a lot for the report and insisting on a proper fix! It took a while to trace this down, but I am quite certain we found the issue now.

If anyone wants to reproduce this, it is as simple as adding this snippet somewhere in /usr/share/dbus-1/system.conf:

<policy user="NonExistantUser">
  <allow send_destination="foo.bar" />
</policy>

Thanks!
David

Comment 26 Fedora Update System 2018-05-04 08:19:11 UTC
dbus-broker-13-1.fc27 has been pushed to the Fedora 27 stable repository. If problems still persist, please make note of it in this bug report.