Hide Forgot
Description of problem: Mike describes a situation where installing using rhel7next (7.5) repos results in dnsmasq not running properly during install and the NetworkManager script does not properly ensure that it's running which results in systemwide dns failure. Version-Release number of the following components: master branch How reproducible: Steps to Reproduce: 1. Provision RHEL 7.5 hosts 2. Install 3. Fail?
Have you seen this reoccur?
Verified that rhel 7.4 can be upgrade to 3.9 + rhel7next - no issues
I am able to reproduce. New install, v3.7 via tip of master branch openshift-ansible. Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: dnsmasq.service: main process exited, code=exited, status=5/NOTINSTALLED Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: Unit dnsmasq.service entered failed state. Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: dnsmasq.service failed. Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: Started DNS caching server.. Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: Starting DNS caching server.... Mar 02 12:23:26 ip-172-18-0-22.ec2.internal dnsmasq[17501]: dnsmasq: DBus error: Connection ":1.96" is not allowed to own the service "uk.org.thekelleys.dnsmasq" due to security policies in the configuration file Mar 02 12:23:26 ip-172-18-0-22.ec2.internal dnsmasq[17501]: DBus error: Connection ":1.96" is not allowed to own the service "uk.org.thekelleys.dnsmasq" due to security policies in the configuration file Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: dnsmasq.service: main process exited, code=exited, status=5/NOTINSTALLED Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: Unit dnsmasq.service entered failed state. Mar 02 12:23:26 ip-172-18-0-22.ec2.internal systemd[1]: dnsmasq.service failed. dnsmasq-2.76-5.el7.x86_64
dbus package: dbus-1.6.12-17.el7.x86_64 The issue is either dbus default config has changed, or dnsmasq is now looking for new permissions vs what it was doing before.
I've looked over the source changes in dnsmasq, I don't see anything that would cause this (unless there's a weird bug introduced somewhere). Rebooting the host also seems to solve it. I am thinking the AMI I'm using is no good.
Tried a different (older) AMI, hitting some problem using next repos. [root@ip-172-18-4-112 ~]# date && tail /var/log/yum.log Fri Mar 2 14:26:03 EST 2018 Mar 02 14:14:40 Installed: 2:container-selinux-2.42-1.gitad8f0f7.el7.noarch Mar 02 14:14:41 Installed: 2:docker-common-1.12.6-71.git3e8e77d.el7.x86_64 Mar 02 14:14:41 Installed: 2:docker-client-1.12.6-71.git3e8e77d.el7.x86_64 Mar 02 14:14:43 Installed: 2:docker-1.12.6-71.git3e8e77d.el7.x86_64 Mar 02 14:14:43 Erased: python-rhsm-1.19.10-1.el7_4.x86_64 Mar 02 14:14:43 Erased: python-rhsm-certificates-1.19.10-1.el7_4.x86_64 Mar 02 14:19:40 Installed: atomic-openshift-excluder-3.7.36-1.git.0.9d08155.el7.noarch Mar 02 14:22:46 Updated: 1:dbus-libs-1.10.24-7.el7.x86_64 Mar 02 14:22:46 Updated: 1:dbus-1.10.24-7.el7.x86_64 Mar 02 14:22:46 Installed: dnsmasq-2.76-5.el7.x86_64
restarting dbus followed by restarting dnsmasq works. Saw this prior to restarting dbus: Mar 02 14:22:46 ip-172-18-4-112.ec2.internal dbus-daemon[735]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 02 14:22:46 ip-172-18-4-112.ec2.internal dbus[735]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses It appears new configuration is not reload-able, dbus must be restarted.
I think the problem here is that the old dbus daemon is running, and tries to reload the new config file. This almost certainly came out of the rebase in http://bugzilla.redhat.com/1480264 Maybe what we could do is ship the old config file for 7.5 and then do a swap for 7.6? It won't help anyone directly upgrading 7.3 -> 7.5 but it seems like that'll be a lot fewer people.
Yeah this is easy to reproduce from a RHEL (classic) cloud image from 7.4 to 7.5: $ yum -y install NetworkManager dnsmasq $ systemctl start dnsmasq <enable 7.5 repo> $ yum -y update ... $ journalctl -b -r|grep 'Unable to reload config' | wc -l; 125
OK, so my initial thought of using the old config file is actually not going to work because of this: <!-- This is a setuid helper that is used to launch system services --> - <servicehelper>/lib64/dbus-1/dbus-daemon-launch-helper</servicehelper> + <servicehelper>//usr/libexec/dbus-1/dbus-daemon-launch-helper</servicehelper> This "dbus doesn't do live updates" issue is really biting us here. Yet of course, I personally don't think most people really *want* to do 7.4 -> 7.5 as a "live" update either - rpm-ostree fixes this type of issue comprehensively!. You really want to pick up the new kernel at least, so you need a reboot anyways. In terms of quick fixes...we could try to back out the rebase? That'd be a pretty invasive thing at this point and I'm worried other userspace bits have started to require it. Another approach we could take would be to add in a "compat" symlink for the launch helper from /usr/lib64/dbus-1/dbus-daemon-launch-helper -> usr/libexec/dbus-1/dbus-daemon-launch-helper. So basically old config file, but written in a way that works for both new and old. But I don't know offhand whether the old daemon can even talk to the new launch helper. If that doesn't pan out...we could then just ship with broken dbus activation until the bus (or the OS) is restarted. systemd activation is used in some places in RHEL7 so at least some things would work.
Created attachment 1404440 [details] WIP: Use RHEL 7.4 conf file for 7.
Created attachment 1404545 [details] WIP: Use 7.4 compatible config file OK this new patch fixes the reproducer scenario (update 7.4 -> 7.5, install dnsmasq with dbus option turned on).
For reference here's a diff from this "frankenstein" config file to the current upstream: --- /home/walters/src/distgit/rhel/dbus/system-conf-74-to-75.conf 2018-03-05 15:04:04.120236344 -0500 +++ /usr/share/dbus-1/system.conf 1969-12-31 19:00:00.000000000 -0500 @@ -1,7 +1,3 @@ -<!-- NOTE: This configuration file merges changes from both RHEL 7.4 (dbus-1.6) - and 7.5 (dbus-1.10) in order to support live updates from 7.4 to 7.5. - For more information, see https://bugzilla.redhat.com/show_bug.cgi?id=1550582 ---> <!-- This configuration file controls the systemwide message bus. Add a system-local.conf and edit that rather than changing this file directly. --> @@ -28,10 +24,10 @@ <standard_system_servicedirs/> <!-- This is a setuid helper that is used to launch system services --> - <servicehelper>/usr/lib64/dbus-1/dbus-daemon-launch-helper</servicehelper> + <servicehelper>//usr/libexec/dbus-1/dbus-daemon-launch-helper</servicehelper> <!-- Write a pid file --> - <pidfile>/run/messagebus.pid</pidfile> + <pidfile>/var/run/dbus/pid</pidfile> <!-- Enable logging to syslog --> <syslog/> @@ -103,8 +99,36 @@ send_interface="org.freedesktop.DBus.Debug.Stats"/> </policy> + <!-- Include legacy configuration location --> + <include ignore_missing="yes">/etc/dbus-1/system.conf</include> + + <!-- The defaults for these limits are hard-coded in dbus-daemon. + Some clarifications: + Times are in milliseconds (ms); 1000ms = 1 second + 133169152 bytes = 127 MiB + 33554432 bytes = 32 MiB + 150000ms = 2.5 minutes --> + <!-- <limit name="max_incoming_bytes">133169152</limit> --> + <!-- <limit name="max_incoming_unix_fds">64</limit> --> + <!-- <limit name="max_outgoing_bytes">133169152</limit> --> + <!-- <limit name="max_outgoing_unix_fds">64</limit> --> + <!-- <limit name="max_message_size">33554432</limit> --> + <!-- <limit name="max_message_unix_fds">16</limit> --> + <!-- <limit name="service_start_timeout">25000</limit> --> + <!-- <limit name="auth_timeout">5000</limit> --> + <!-- <limit name="pending_fd_timeout">150000</limit> --> + <!-- <limit name="max_completed_connections">2048</limit> --> + <!-- <limit name="max_incomplete_connections">64</limit> --> + <!-- <limit name="max_connections_per_user">256</limit> --> + <!-- <limit name="max_pending_service_starts">512</limit> --> + <!-- <limit name="max_names_per_connection">512</limit> --> + <!-- <limit name="max_match_rules_per_connection">512</limit> --> + <!-- <limit name="max_replies_per_connection">128</limit> --> + <!-- Config files are placed here that among other things, punch holes in the above policy for specific services. --> + <includedir>system.d</includedir> + <includedir>/etc/dbus-1/system.d</includedir> <!-- This is included last so local configuration can override what's
I don't see anything that's supportable and should be fixed in this bugzilla. You updated few packages (preciselly services/daemons) from 7.5 on top of 7.4 and you do expect them working flawlessly? This was never supported in RHEL and you can solve dnsmasq failure by executing systemctl restart dbus.service. Is restarting dbus crippling anything else? Could you do that in your use case/scenario? Blocker criterium is in my oppinion not met if we have a reproducer. Especially in RC phase. Any idea?
(In reply to Vladimir Benes from comment #29) > Blocker criterium is in my oppinion not met if we have a reproducer. we have a workaround (not reproducer)
Following the steps in comment 11 and using RHEL-7.5 repo from http://download.eng.brq.redhat.com/nightly/latest-RHEL-7/compose/Client/x86_64/os/ I got journalctl with 77 repeated lines Mar 07 12:28:52 qe-dell-ovs5-vm-45.dqe.lab.eng.bos.redhat.com dbus[642]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses
Installed packages in the end: dnsmasq-2.76-5.el7.x86_64 dbus-glib-0.100-7.el7.x86_64 dbus-python-1.1.1-9.el7.x86_64 dbus-1.10.24-7.el7.x86_64 dbus-libs-1.10.24-7.el7.x86_64
(In reply to Vladimir Benes from comment #29) > I don't see anything that's supportable and should be fixed in this > bugzilla. You updated few packages (preciselly services/daemons) from 7.5 on > top of 7.4 and you do expect them working flawlessly? So, I just tried to install dnsmasq on 7.4 then yum update the system. dnsmasq continues to work. dbus still complains that it can't read the config file, I see this in journalctl: Mar 07 13:17:12 ip-172-18-9-209.ec2.internal dbus[730]: [system] Reloaded configuration Mar 07 13:17:12 ip-172-18-9-209.ec2.internal dbus-daemon[730]: dbus[730]: [system] Reloaded configuration Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: dbus[730]: [system] Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses Mar 07 13:17:15 ip-172-18-9-209.ec2.internal dbus-daemon[730]: Unable to reload configuration: Configuration file needs one or more <listen> elements giving addresses > This was never supported in RHEL and you can solve dnsmasq failure by > executing systemctl restart dbus.service. > > Is restarting dbus crippling anything else? Could you do that in your use > case/scenario? > Blocker criterium is in my oppinion not met if we have a reproducer. > Especially in RC phase. > > Any idea? It appears this is only an issue going installing new services from 7.5 on a 7.4 host without rebooting dbus. This is probably not a likely scenario for most users, so I think this is probably ok.
I also encounter such issue on rhel74. kernel-3.10.0-693.11.1.el7.x86_64 # rpm -qa|grep dbus dbus-libs-1.10.24-7.el7.x86_64 dbus-python-1.1.1-9.el7.x86_64 python-slip-dbus-0.4.0-2.el7.noarch dbus-1.10.24-7.el7.x86_64 dbus-glib-0.100-7.el7.x86_64 # rpm -qa|grep dnsmasq dnsmasq-2.76-5.el7.x86_64 # rpm -qa|grep dnsmasq dnsmasq-2.76-5.el7.x86_64 [root@host-172-16-120-13 ~]# journalctl -u dnsmasq -- Logs begin at Tue 2018-04-10 06:41:17 EDT, end at Tue 2018-04-10 23:23:42 EDT. -- Apr 10 06:45:57 host-172-16-120-13 systemd[1]: Started DNS caching server.. Apr 10 06:45:57 host-172-16-120-13 systemd[1]: Starting DNS caching server.... Apr 10 06:45:57 host-172-16-120-13 dnsmasq[15671]: dnsmasq: DBus error: Connection ":1.30" is not allowed to own the service "uk.org.thekelleys.dnsmasq" due to security policies in the Apr 10 06:45:57 host-172-16-120-13 dnsmasq[15671]: DBus error: Connection ":1.30" is not allowed to own the service "uk.org.thekelleys.dnsmasq" due to security policies in the configura Apr 10 06:45:57 host-172-16-120-13 systemd[1]: dnsmasq.service: main process exited, code=exited, status=5/NOTINSTALLED Apr 10 06:45:57 host-172-16-120-13 systemd[1]: Unit dnsmasq.service entered failed state. Apr 10 06:45:57 host-172-16-120-13 systemd[1]: dnsmasq.service failed. Apr 10 06:45:58 host-172-16-120-13 systemd[1]: Started DNS caching server.. Apr 10 06:45:58 host-172-16-120-13 systemd[1]: Starting DNS caching server.... Apr 10 06:45:58 host-172-16-120-13 dnsmasq[15871]: dnsmasq: DBus error: Connection ":1.36" is not allowed to own the service "uk.org.thekelleys.dnsmasq" due to security policies in the Apr 10 06:45:58 host-172-16-120-13 systemd[1]: dnsmasq.service: main process exited, code=exited, status=5/NOTINSTALLED Apr 10 06:45:58 host-172-16-120-13 systemd[1]: Unit dnsmasq.service entered failed state. Apr 10 06:45:58 host-172-16-120-13 systemd[1]: dnsmasq.service failed. After restart dbus service (systemctl restart dbus), dnsmasq could be started successfully.
This issue does not happen before. dbus-1.10.24-7.el7 is installed as dependency of dnsmasq in openshift installation.
*** This bug has been marked as a duplicate of bug 1568856 ***