Bug 2013010
| Summary: | NetworkManager not updating DNS in /etc/resolv.conf when using DHCP | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Robert McSwain <rmcswain> | ||||
| Component: | cloud-init | Assignee: | Virtualization Maintenance <virt-maint> | ||||
| Status: | CLOSED MIGRATED | QA Contact: | xiachen | ||||
| Severity: | high | Docs Contact: | |||||
| Priority: | unspecified | ||||||
| Version: | 7.9 | CC: | anisinha, bdas, bgalvani, eesposit, eterrell, huzhao, jgreguske, lrintel, rkhan, sfaye, shaselde, sukulkar, thaller, till, usurse, xiachen, xiliang, yacao | ||||
| Target Milestone: | rc | Keywords: | MigratedToJIRA | ||||
| Target Release: | --- | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | x86_64 | ||||||
| OS: | Linux | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2023-09-22 15:43:32 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Robert McSwain
2021-10-11 19:59:58 UTC
While trying to reproduce it in aws, I have below questions not clear. Create an image with the cloud.cfg file attached and I cannot access the instances because it failed to bring up eth0. Is customer's system accessible with this configuration applied? Nov 15 08:12:41 ip-10-116-2-42 network: Bringing up loopback interface: [ OK ] Nov 15 08:12:41 ip-10-116-2-42 kernel: IPv6: ADDRCONF(NETDEV_CHANGE): eth0: link becomes ready Nov 15 08:12:41 ip-10-116-2-42 NetworkManager[592]: <info> [1636963961.4000] agent-manager: req[0x56034a49aca0, :1.18/nmcli-connect/0]: agent registered Nov 15 08:12:41 ip-10-116-2-42 NetworkManager[592]: <info> [1636963961.4008] audit: op="connection-activate" uuid="5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03" name="System eth0" result="fail" reason="No suitable device found for this connection (device eth0 not available because profile is not compatible with device (permanent MAC address doesn't match))." Nov 15 08:12:41 ip-10-116-2-42 network: Bringing up interface eth0: Error: Connection activation failed: No suitable device found for this connection (device eth0 not available because profile is not compatible with device (permanent MAC address doesn't match)). Nov 15 08:12:41 ip-10-116-2-42 network: [FAILED] Nov 15 08:12:41 ip-10-116-2-42 systemd: network.service: control process exited, code=exited status=1 Nov 15 08:12:41 ip-10-116-2-42 systemd: Failed to start LSB: Bring up/down networking. Nov 15 08:12:41 ip-10-116-2-42 systemd: Unit network.service entered failed state. Nov 15 08:12:41 ip-10-116-2-42 systemd: network.service failed. What is expected content in '/etc/resolv.conf'? I checked the 2 sos reports(cloud-init-19.4-7.el7_9.4.x86_64), one from vmware and one from aws. They have the similar content generated by NM. # cat etc/resolv.conf # Generated by NetworkManager search int.cdphp.com nameserver 10.201.32.50 nameserver 10.201.16.54 nameserver 10.100.4.112 nameserver 10.100.4.114 options timeout:2 attempts:1 After checked bz1748015, I guess it is not the same. Replaced attached cloud.cfg in a running instance firstly, the resolv.conf is restored without any problem after reboot. [root@ip-10-116-2-42 ec2-user]# cat /etc/resolv.conf # Generated by NetworkManager search us-west-2.compute.internal nameserver 10.116.0.2 [root@ip-10-116-2-42 ec2-user]# truncate -s0 /etc/resolv.conf [root@ip-10-116-2-42 ec2-user]# cloud-init clean [root@ip-10-116-2-42 ec2-user]# reboot Connection to ec2-34-209-65-192.us-west-2.compute.amazonaws.com closed by remote host. Connection to ec2-34-209-65-192.us-west-2.compute.amazonaws.com closed. [root@cloud-aws-2 fedora]# ssh ec2-user.compute.amazonaws.com Last login: Mon Nov 15 09:02:58 2021 from 66.187.232.127 [root@ip-10-116-2-42 ec2-user]# cat /etc/resolv.conf # Generated by NetworkManager search us-west-2.compute.internal nameserver 10.116.0.2 in comment 0 it indicated that downgrading cloud-init made it work. If that is correct, then cloud-init seems involved. It's not clear to me which log to look at. Looking at "after.tar.gz" and the related sosreport from https://access.redhat.com/support/cases/#/case/03044228/discussion?commentId=a0a6R00000U2yPzQAJ , we see that NetworkManager does not actually manage eth0: <info> [1668607054.6799] ifcfg-rh: Ignoring connection /etc/sysconfig/network-scripts/ifcfg-eth0 (5fb06bd0-0bb0-7ffb-45f1-d6edd65f3e03,"System eth0") due to NM_CONTROLLED=no. Unmanaged: interface-name:eth0. We see that resolv.conf is empty, and there is a comment that it was written by NetworkManager. If there are no devices/profiles active in NetworkManager, NetworkManager has no DNS configuration. That is probably why it wrote an empty resolv.conf. It's questionable whether that makes sense. I seem to remember, we did a change that NetworkManager doesn't write an empty /etc/resolv.conf, if NetworkManager has no configuration. Beniamino, do you recall that? In any case, if the user clearly does not want NetworkManager to handle /etc/resolv.conf (or anything at all), they can either tell NetworkManager to not touch /etc/resolv.conf (via `[main].dns=none` or `[main].rc-manager=unmanaged` -- see `man NetworkManager.conf`). Or, it seems NetworkManager doesn't do anything useful on that system anyway, and they have their own tool (network-scripts?), then it's better to just `systemctl disable NetworkManager` . Or even just `yum remove NetworkManager`. But the issue is rather unclear to me and I don't see relevant logs. (In reply to Thomas Haller from comment #20) > If there are no devices/profiles active in NetworkManager, NetworkManager > has no DNS configuration. That is probably why it wrote an empty > resolv.conf. It's questionable whether that makes sense. I seem to remember, > we did a change that NetworkManager doesn't write an empty /etc/resolv.conf, > if NetworkManager has no configuration. Right, that is what should happen because of the way in which we compute the hash of current DNS configuration. Also, see bug 1344303. As Thomas said, to understand whether the issue is related to NM (which doesn't seem the case since a cloud-init downgrade fixes the problem), we need NM logs at trace level - with both cloud-init versions. Hi, as said in the previous comments, we need NetworkManager logs at trace level to understand how cloud-init is configuring NM. Are those logs available anywhere? To change the log level, set level=TRACE in the [logging] section of /etc/NetworkManager/NetworkManager.conf, then reboot, reproduce the issue and attach the output of `journalctl -b`. If possible, repeat this procedure with both cloud-init versions. Thank you. The difference between the two logs is in the behavior of cloud-init between 18.5-6 and 19.4-7. The latter performs a "systemctl reload NetworkManager" after starting up , while the former does not: cloud-init: Cloud-init v. 18.5 running 'modules:final' vs cloud-init: Cloud-init v. 19.4 running 'modules:final' NetworkManager: <info> [1688650541.9370] audit: op="reload" arg="0" pid=1707 uid=0 result="success" In RHEL 7, the NM reload is implemented in the systemd service file as calling the Reload() D-Bus method with flag 0: ExecReload=/usr/bin/dbus-send --print-reply --system --type=method_call --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager.Reload uint32:0 where the flags are defined as: No flags (0x00) means to reload everything that is supported which is identical to sending a SIGHUP. (0x01) means to reload the NetworkManager.conf configuration from disk. Note that this does not include connections, which can be reloaded via Setting's ReloadConnections. (0x02) means to update DNS configuration, which usually involves writing /etc/resolv.conf anew. (0x04) means to restart the DNS plugin. So, calling reload with flag 0 also requests an explicit update of resolv.conf. Since there are no DNS servers configured in NM for eth0, resolv.conf is overwritten with an empty list. I think the solution here is that cloud-init should either: 1) set "dns=none" in NetworkManager configuration if NetworkManager is not supposed to touch resolv.conf or 2) if the purpose of the reload at the end is to make NM aware of the new configuration files in /etc/NetworkManager/conf.d, then cloud-init should send a reload with flag 1, so that resolv.conf is not forcibly rewritten I believe NetworkManager is behaving as documented and expected here, the cause of the bug seems to be a change in cloud-init between 18.5-6 and 19.4-7. Therefore, I'm reassigning the bz; please reassign it back if you find anything needs to be changed in NM. Thank you. (In reply to Beniamino Galvani from comment #27) > The difference between the two logs is in the behavior of cloud-init between > 18.5-6 and 19.4-7. The latter performs a "systemctl reload NetworkManager" > after starting up , while the former does not: > > cloud-init: Cloud-init v. 18.5 running 'modules:final' > > vs > > cloud-init: Cloud-init v. 19.4 running 'modules:final' > NetworkManager: <info> [1688650541.9370] audit: op="reload" arg="0" > pid=1707 uid=0 result="success" > > In RHEL 7, the NM reload is implemented in the systemd service file as > calling the Reload() D-Bus method with flag 0: > > ExecReload=/usr/bin/dbus-send --print-reply --system --type=method_call > --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager > org.freedesktop.NetworkManager.Reload uint32:0 > > where the flags are defined as: > > No flags (0x00) means to reload everything that is supported which is > identical to sending a SIGHUP. > (0x01) means to reload the NetworkManager.conf configuration from disk. > Note that this does not include connections, which can be reloaded via > Setting's ReloadConnections. > (0x02) means to update DNS configuration, which usually involves writing > /etc/resolv.conf anew. > (0x04) means to restart the DNS plugin. > > So, calling reload with flag 0 also requests an explicit update of > resolv.conf. Since there are no DNS servers configured in NM for eth0, > resolv.conf is overwritten with an empty list. > > I think the solution here is that cloud-init should either: > > 1) set "dns=none" in NetworkManager configuration if NetworkManager is not > supposed to touch resolv.conf @bgalvani please see this upstream patch in cloud-init: commit 67bab5bb804e2346673430868935f6bbcdb88f13 Author: Ryan McCabe <rmccabe> Date: Thu Jun 8 13:24:23 2017 -0400 net: Allow for NetworkManager configuration In cases where the config json specifies nameserver entries, if there are interfaces configured to use dhcp, NetworkManager, if enabled, will clobber the /etc/resolv.conf that cloud-init has produced, which can break dns. If there are no interfaces configured to use dhcp, NetworkManager could clobber /etc/resolv.conf with an empty file. This patch adds a mechanism for dropping additional configuration into /etc/NetworkManager/conf.d/ and disables management of /etc/resolv.conf by NetworkManager when nameserver information is provided in the config. LP: #1693251 Signed-off-by: Ryan McCabe <rmccabe> Seems cloud-init does sent dns = none in /etc/NetworkManager/conf.d/99-cloud-init.conf I am not sure when NM is restarted, for what reason NM is ignoring this? - maybe the config is not present yet when NM parses the config file. - maybe its ignoring this config and giving the flag 0 a preference? In either case, In RHEL 9, I see this is how NM service is started: [Service] Type=dbus BusName=org.freedesktop.NetworkManager ExecReload=/usr/bin/busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 0 #ExecReload=/bin/kill -HUP $MAINPID ExecStart=/usr/sbin/NetworkManager --no-daemon Restart=on-failure # NM doesn't want systemd to kill its children for it KillMode=process How do you suggest passing a flag 1 to it during reload? (In reply to Ani Sinha from comment #30) > (In reply to Beniamino Galvani from comment #27) > > The difference between the two logs is in the behavior of cloud-init between > > 18.5-6 and 19.4-7. The latter performs a "systemctl reload NetworkManager" > > after starting up , while the former does not: > > > > cloud-init: Cloud-init v. 18.5 running 'modules:final' > > > > vs > > > > cloud-init: Cloud-init v. 19.4 running 'modules:final' > > NetworkManager: <info> [1688650541.9370] audit: op="reload" arg="0" > > pid=1707 uid=0 result="success" > > > > In RHEL 7, the NM reload is implemented in the systemd service file as > > calling the Reload() D-Bus method with flag 0: > > > > ExecReload=/usr/bin/dbus-send --print-reply --system --type=method_call > > --dest=org.freedesktop.NetworkManager /org/freedesktop/NetworkManager > > org.freedesktop.NetworkManager.Reload uint32:0 > > > > where the flags are defined as: > > > > No flags (0x00) means to reload everything that is supported which is > > identical to sending a SIGHUP. > > (0x01) means to reload the NetworkManager.conf configuration from disk. > > Note that this does not include connections, which can be reloaded via > > Setting's ReloadConnections. > > (0x02) means to update DNS configuration, which usually involves writing > > /etc/resolv.conf anew. > > (0x04) means to restart the DNS plugin. > > > > So, calling reload with flag 0 also requests an explicit update of > > resolv.conf. Since there are no DNS servers configured in NM for eth0, > > resolv.conf is overwritten with an empty list. > > > > I think the solution here is that cloud-init should either: > > > > 1) set "dns=none" in NetworkManager configuration if NetworkManager is not > > supposed to touch resolv.conf > > @bgalvani please see this upstream patch in cloud-init: > > commit 67bab5bb804e2346673430868935f6bbcdb88f13 > Author: Ryan McCabe <rmccabe> > Date: Thu Jun 8 13:24:23 2017 -0400 > > net: Allow for NetworkManager configuration > > In cases where the config json specifies nameserver entries, > if there are interfaces configured to use dhcp, NetworkManager, > if enabled, will clobber the /etc/resolv.conf that cloud-init > has produced, which can break dns. If there are no interfaces > configured to use dhcp, NetworkManager could clobber > /etc/resolv.conf with an empty file. > > This patch adds a mechanism for dropping additional configuration > into /etc/NetworkManager/conf.d/ and disables management of > /etc/resolv.conf by NetworkManager when nameserver information is > provided in the config. > > LP: #1693251 > > Signed-off-by: Ryan McCabe <rmccabe> > > Seems cloud-init does sent dns = none in > /etc/NetworkManager/conf.d/99-cloud-init.conf > I am not sure when NM is restarted, for what reason NM is ignoring this? > - maybe the config is not present yet when NM parses the config file. > - maybe its ignoring this config and giving the flag 0 a preference? Looking at the log file, we see: Jul 06 09:35:41 ip-10-208-16-216 NetworkManager[740]: <debug> [1688650541.9379] config: Reading config file '/usr/lib/NetworkManager/conf.d/00-server.conf' Jul 06 09:35:41 ip-10-208-16-216 NetworkManager[740]: <debug> [1688650541.9381] config: Reading config file '/usr/lib/NetworkManager/conf.d/10-slaves-order.conf' Jul 06 09:35:41 ip-10-208-16-216 NetworkManager[740]: <debug> [1688650541.9381] config: Reading config file '/etc/NetworkManager/NetworkManager.conf' Jul 06 09:35:41 ip-10-208-16-216 NetworkManager[740]: <debug> [1688650541.9385] config: intern config file "/var/lib/NetworkManager/NetworkManager-intern.conf" Seems NM does not find /etc/NetworkManager/conf.d/99-cloud-init.conf either because its not present or because it does not look into that location. (In reply to Ani Sinha from comment #30) > Seems cloud-init does sent dns = none in /etc/NetworkManager/conf.d/99-cloud-init.conf > I am not sure when NM is restarted, for what reason NM is ignoring this? > - maybe the config is not present yet when NM parses the config file. > - maybe its ignoring this config and giving the flag 0 a preference? How do you see that NM is ignoring it? From what I see, it should be honored. If you do `NetworkManager --print-config`, it will print the configuration resulting after merging all .conf snippets. What does it show? > How do you suggest passing a flag 1 to it during reload? busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 1 (In reply to Beniamino Galvani from comment #32) > (In reply to Ani Sinha from comment #30) > > Seems cloud-init does sent dns = none in /etc/NetworkManager/conf.d/99-cloud-init.conf > > I am not sure when NM is restarted, for what reason NM is ignoring this? > > - maybe the config is not present yet when NM parses the config file. > > - maybe its ignoring this config and giving the flag 0 a preference? > > How do you see that NM is ignoring it? From what I see, it should be honored. In the customer configuration they have: network: {config: disabled} This means cloud-init is not configuring the network and hence that config file for NM is not being written. In fact, none of the config files that cloud-init generates should be written. So yes there is no evidence yet that NM is ignoring any config files. >> How do you suggest passing a flag 1 to it during reload? >busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 1 So instead of calling systemctl, use this directly? (In reply to Ani Sinha from comment #33) > >> How do you suggest passing a flag 1 to it during reload? > > >busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 1 > > So instead of calling systemctl, use this directly? Yes I am going to try this fix first:
diff --git a/systemd/cloud-final.service.tmpl b/systemd/cloud-final.service.tmpl
index 85f423ac..9f46cae2 100644
--- a/systemd/cloud-final.service.tmpl
+++ b/systemd/cloud-final.service.tmpl
@@ -24,6 +24,7 @@ KillMode=process
ExecStartPost=/bin/sh -c 'u=NetworkManager.service; \
out=$(systemctl show --property=SubState $u) || exit; \
[ "$out" = "SubState=running" ] || exit 0; \
+ [ -f /etc/NetworkManager/conf.d/99-cloud-init.conf ] || exit 0; \
systemctl reload-or-try-restart $u'
{% else %}
TasksMax=infinity
generated a scratch build here: https://brewweb.engineering.redhat.com/brew/taskinfo?taskID=55310103
(In reply to Beniamino Galvani from comment #34) > (In reply to Ani Sinha from comment #33) > > >> How do you suggest passing a flag 1 to it during reload? > > > > >busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 1 > > > > So instead of calling systemctl, use this directly? > > Yes @bgalvani even if we do this here, since cloud-init is not writing any config file to ignore dns, upon reboot and NM service start on reboot, will it still clear the resolv.conf file? (In reply to Ani Sinha from comment #36) > (In reply to Beniamino Galvani from comment #34) > > (In reply to Ani Sinha from comment #33) > > > >> How do you suggest passing a flag 1 to it during reload? > > > > > > >busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 1 > > > > > > So instead of calling systemctl, use this directly? > > > > Yes > > @bgalvani even if we do this here, since cloud-init is not > writing any config file to ignore dns, upon reboot and NM service start on > reboot, will it still clear the resolv.conf file? On reboot, if there is no dns=none set in NetworkManager.conf, NetworkManager will: - update resolv.conf if any active connection provides a DNS server - not touch resolv.conf if there are no active connections managed by NM, or if they don't provide any DNS (In reply to Beniamino Galvani from comment #37) > (In reply to Ani Sinha from comment #36) > > (In reply to Beniamino Galvani from comment #34) > > > (In reply to Ani Sinha from comment #33) > > > > >> How do you suggest passing a flag 1 to it during reload? > > > > > > > > >busctl call org.freedesktop.NetworkManager /org/freedesktop/NetworkManager org.freedesktop.NetworkManager Reload u 1 > > > > > > > > So instead of calling systemctl, use this directly? > > > > > > Yes > > > > @bgalvani even if we do this here, since cloud-init is not > > writing any config file to ignore dns, upon reboot and NM service start on > > reboot, will it still clear the resolv.conf file? > > On reboot, if there is no dns=none set in NetworkManager.conf, > NetworkManager will: > > - update resolv.conf if any active connection provides a DNS server > > - not touch resolv.conf if there are no active connections managed by NM, > or if they don't provide any DNS right so per comment #c27, since there is no active dns provided by eth0 connection, resolv.conf should in theory remain untouched. Lets see what testing yields. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |