Bug 1344411
| Summary: | NetworkManager removes ifcfg-ovirtmgmt after reboot although it was set to NM_CONTROLLED=no | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Michael Burman <mburman> | ||||||||
| Component: | NetworkManager | Assignee: | Thomas Haller <thaller> | ||||||||
| Status: | CLOSED INSUFFICIENT_DATA | QA Contact: | Desktop QE <desktop-qa-list> | ||||||||
| Severity: | low | Docs Contact: | |||||||||
| Priority: | urgent | ||||||||||
| Version: | 7.2 | CC: | atragler, baptiste.agasse, bgalvani, danken, deepak.jagtap, fdeutsch, huzhao, lrintel, mburman, mgregg, rkhan, sukulkar, thaller, ycui | ||||||||
| Target Milestone: | rc | Flags: | thaller:
needinfo-
|
||||||||
| Target Release: | 7.3 | ||||||||||
| Hardware: | x86_64 | ||||||||||
| OS: | Other | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | |||||||||||
| : | 1345919 (view as bug list) | Environment: | |||||||||
| Last Closed: | 2016-11-03 15:40:57 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Bug Depends On: | 1345919, 1347958 | ||||||||||
| Bug Blocks: | 1304509, 1326798, 1330144 | ||||||||||
| Attachments: |
|
||||||||||
|
Description
Michael Burman
2016-06-09 15:33:53 UTC
could you please enable TRACE logging, reproduce the problem, and attach the entire logfile? Edit /etc/NetworkManager/NetworkManager.conf and add [logging] level=TRACE Thank you. (In reply to Michael Burman from comment #0) > Steps to Reproduce: > 1. Install rhevh-ng on latest 4.0 rhevm engine (ovirtmgmt created over a NIC) > 2. Attach additional network to host on second NIC(not must) > 3. Reboot server > > Actual results: > ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") > > ovirtmgmt was removed from host by NetworkManager. > > Expected results: > NetworkManager shouldn't remove the management network from the host when the > NM_CONTROLLED=no set Since monitor-connection-files is enabled in NetworkManager.conf, NetworkManager recognizes that the file gets removed externally at 16:15:46, so it starts to manage the device. The file is re-added later at 16:15:51 and NM recognizes this too. Do you really need to set monitor-connection-files=yes? Note that this is disabled by default because it can cause race conditions (e.g. many text editors delete the file and re-create them when saving, and so NM would see the deletion event and start to manage the device for a short period). A better solution is to disable monitor-connection-files and explicitly call "nmcli connection reload" when you want NM to pick up changes to connection files. Can you attach debug logs as explained in comment 1? Beniamino, even with monitor-connection-files=yes, shouldn't NM stay passive wen the connection file is getting removed. because we also set no-auto-default=* ?
One note tho, Dan already saw this problem coming up in the commit which introduced monitor-connection-files:
vdsm:57617fe62ac797d02b9a19b216b674d9f4f2c7c3
Please note that this approach is potentially raceful. Under
high loads, it is theoretically possible that NM learns of
a changed file too late. To be sure, we should probably call
to 'nmcli connection load' synchronously, just before running
'ifup' on a given device.
I currently try to reproduce this on RHEL to provide the tracing informations.
(In reply to Fabian Deutsch from comment #5) > Beniamino, even with monitor-connection-files=yes, shouldn't NM stay passive > wen the connection file is getting removed. because we also set > no-auto-default=* ? no-auto-default=* only tells NM not to create a default DHCP connection for the device in absence of other on-disk connections. But when the file with NM_CONTROLLED=no gets removed, there is nothing preventing NM from managing the device, and so it will try to activate existing connections. If none exist, the device will stay in 'disconnected' state without any address. (In reply to Fabian Deutsch from comment #5) > Beniamino, even with monitor-connection-files=yes, shouldn't NM stay passive > wen the connection file is getting removed. because we also set > no-auto-default=* ? > > > One note tho, Dan already saw this problem coming up in the commit which > introduced monitor-connection-files: > > vdsm:57617fe62ac797d02b9a19b216b674d9f4f2c7c3 > > Please note that this approach is potentially raceful. Under > high loads, it is theoretically possible that NM learns of > a changed file too late. To be sure, we should probably call > to 'nmcli connection load' synchronously, just before running > 'ifup' on a given device. > if you are using ifup to activate an ifcfg-rh file, initscripts will ask NetworkManager whether the file is managed by NetworkManager (as indicated by NM_CONTROLLED). Following that, ifup will either call `nmcli connection up` or proceed to activate the interface. When NM is contacted by ifup, it will automatically reload the file to make sure that it's information is up-to-date. Thus, monitor-connection-files should not be necessary in this case. To give some context: In our use-case vdsm is writing ifcfg files and we use the legacy network scripts for bringing up the networking (by adding NM_CONTROLLED=no to each ifcfg). In general we do not want NM to manage any network device. However, we do need NM for monitoring and enumerating these connections, because on RHEV-H we are using Cockpit for administration,a nd Cockpit relies on NM for displaying network informations. According to a small offline IRC discussion, unmanaged-devices=* can be used to prevent NM from touching devices even if the ifcfg goes away. (In reply to Fabian Deutsch from comment #8) > To give some context: In our use-case vdsm is writing ifcfg files and we use > the legacy network scripts for bringing up the networking (by adding > NM_CONTROLLED=no to each ifcfg). > In general we do not want NM to manage any network device. > > However, we do need NM for monitoring and enumerating these connections, > because on RHEV-H we are using Cockpit for administration,a nd Cockpit > relies on NM for displaying network informations. > > According to a small offline IRC discussion, unmanaged-devices=* can be used > to prevent NM from touching devices even if the ifcfg goes away. That sounds right. monitoring-connection-files is still not advised and not necessary. (In reply to Thomas Haller from comment #9) > > However, we do need NM for monitoring and enumerating these connections, > > because on RHEV-H we are using Cockpit for administration,a nd Cockpit > > relies on NM for displaying network informations. > > > > According to a small offline IRC discussion, unmanaged-devices=* can be used > > to prevent NM from touching devices even if the ifcfg goes away. You will only be able to display basic information in cockpit (current throughput?), but not control the device if it is unmanaged by NM (but I guess it's ok since it was the same with NM_CONTROLLED=no). Yes, it's okay that we can not manage the devices, it's just important that Cockpit can still access NM for information retrieval. Michael, can you please 1. start cleanly 2. to /etc/NetworkManager/conf.d/90-vdsm-monitor-connection-files.conf add unmanaged-devices=* 3. Restart NM (maybe enable TRACE as requested in comment 1 4. Try to reproduce the bug (In reply to Thomas Haller from comment #7) > > if you are using ifup to activate an ifcfg-rh file, initscripts will ask > NetworkManager whether the file is managed by NetworkManager (as indicated > by NM_CONTROLLED). Following that, ifup will either call `nmcli connection > up` or proceed to activate the interface. > > When NM is contacted by ifup, it will automatically reload the file to make > sure that it's information is up-to-date. Thus, monitor-connection-files > should not be necessary in this case. But if NM already manages a device, ifup never tells NM that NM_CONNTROLLED=no and that NM should stop managing it. What is the proper way to tell NM "stop managing this device"? Why did NM end up deleting a line from an ifcfg file that had NM_CONNTROLLED=no? (In reply to Dan Kenigsberg from comment #12) > (In reply to Thomas Haller from comment #7) > > > > if you are using ifup to activate an ifcfg-rh file, initscripts will ask > > NetworkManager whether the file is managed by NetworkManager (as indicated > > by NM_CONTROLLED). Following that, ifup will either call `nmcli connection > > up` or proceed to activate the interface. > > > > When NM is contacted by ifup, it will automatically reload the file to make > > sure that it's information is up-to-date. Thus, monitor-connection-files > > should not be necessary in this case. > > But if NM already manages a device, ifup never tells NM that > NM_CONNTROLLED=no and that NM should stop managing it. ifup first calls `nmcli connection load`. If the file then contains NM_CONTROLLED=no, the device should become unmanaged right away. > What is the proper way to tell NM "stop managing this device"? There are several. NM_CONTROLLED=no is a proper way. > Why did NM end up deleting a line from an ifcfg file that had > NM_CONNTROLLED=no? That should not happen. There is no logfile attached to this bug that would show NetworkManager modifying the ifcfg file. Please provide a full logfile with TRACE level enabled. I can't reproduce this report unfortunately, don't understand why.(i reproduced it easily 2 times last week). From journalctl you can see that NetworkManager modifying the ifcfg file --> Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info> ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") Sorry that i can't provide more logs at this point. Created attachment 1167330 [details]
vdsm logs
But i have vdsm logs though from the original report
(In reply to Thomas Haller from comment #13) > > But if NM already manages a device, ifup never tells NM that > > NM_CONNTROLLED=no and that NM should stop managing it. > > ifup first calls `nmcli connection load`. If the file then contains > NM_CONTROLLED=no, the device should become unmanaged right away. from what I see in network-functions, if NM_CONTROLLED=no, ifup does nothing at all, and the device stays managed by NM. if ! is_false $NM_CONTROLLED && is_nm_running; then nmcli con load "/etc/sysconfig/network-scripts/$CONFIG" UUID=$(get_uuid_by_config $CONFIG) [ -n "$UUID" ] && _use_nm=true fi > > > What is the proper way to tell NM "stop managing this device"? > > There are several. NM_CONTROLLED=no is a proper way. Would you be kind to suggest a proper way? (In reply to Michael Burman from comment #14) > I can't reproduce this report unfortunately, don't understand why.(i > reproduced it easily 2 times last week). > > From journalctl you can see that NetworkManager modifying the ifcfg file --> > > Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: > <info> ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") AFAIS, this merely says that NetworkManager noticed that the file disappeared. Which happens with monitor-connection-files=yes. I don't think it means that NetworkManager was actively removing any files (I admit, that is not clear from this wording). (In reply to Michael Burman from comment #15) > Created attachment 1167330 [details] > vdsm logs > > But i have vdsm logs though from the original report Thank you, but here I don't see what is wrong. (In reply to Dan Kenigsberg from comment #16) > (In reply to Thomas Haller from comment #13) > > > > But if NM already manages a device, ifup never tells NM that > > > NM_CONNTROLLED=no and that NM should stop managing it. > > > > ifup first calls `nmcli connection load`. If the file then contains > > NM_CONTROLLED=no, the device should become unmanaged right away. > > from what I see in network-functions, if NM_CONTROLLED=no, ifup does nothing > at all, and the device stays managed by NM. > > if ! is_false $NM_CONTROLLED && is_nm_running; then > nmcli con load "/etc/sysconfig/network-scripts/$CONFIG" > UUID=$(get_uuid_by_config $CONFIG) > [ -n "$UUID" ] && _use_nm=true > fi you are right. This seems to be a bug in initscripts. You cannot really workaround this with monitor-connection-files, because then you have a race where initscripts may setup the interface, but NetworkManager only notices afterwards that the device should be unmanaged. > > > What is the proper way to tell NM "stop managing this device"? > > > > There are several. NM_CONTROLLED=no is a proper way. > > Would you be kind to suggest a proper way? Another way is via NetworkManager's configuration. Create a file /etc/NetworkManager/conf.d/vdsm-unmanage-all.conf with [keyfile] unmanged-devices=* (In reply to Thomas Haller from comment #18) > > [keyfile] > unmanged-devices=* isn't this a bit coarse? I'd like NM to stop managing a specific device, do I need to maintain the list of unmanaged-devices, and restart NM whenever I change it? (In reply to Dan Kenigsberg from comment #19) > (In reply to Thomas Haller from comment #18) > > > > [keyfile] > > unmanged-devices=* > > isn't this a bit coarse? I'd like NM to stop managing a specific device, do > I need to maintain the list of unmanaged-devices, and restart NM whenever I > change it? I opened bug 1345919, which fixes the issue that NM_CONTROLLED=no will work as expected (with monitor-connection-files=no). So, that might be your best option. For completeness, you can also: in NetworkManager.conf, you can also select interfaces explicitly: [keyfile] unmanaged-devices=eth10 also with globbing: unmanaged-devices=interface-name:eth* also multiple entires per line: unmanaged-devices=interface-name:eth*,wlan0 or multiple files in conf.d and extend the list with "+=" unmanaged-devices+=interface-name:more All described in `man NetworkManager.conf`. But if the list is dynamic, this doesn't work well because, after changing such a file, you need to reload the configuration via `killall -SIGHUP NetworkManager`. Then you have a (small) race and you don't know when NM is finished reloading. Alternatively, you can also drop udev files to /etc/udev/rules.d and set NM_UNMANAGED. Followed by `udevadm control --reload-rules`. /usr/lib/udev/rules.d/85-nm-unmanaged.rules This works only for devices that are created after you added the rule. (In reply to Beniamino Galvani from comment #4) > > ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt > > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") > > > > ovirtmgmt was removed from host by NetworkManager. > > > > Expected results: > > NetworkManager shouldn't remove the management network from the host when the > > NM_CONTROLLED=no set > > Since monitor-connection-files is enabled in NetworkManager.conf, > NetworkManager recognizes that the file gets removed externally at > 16:15:46, so it starts to manage the device. The file is re-added > later at 16:15:51 and NM recognizes this too. I believe that ifcfg-ovirtmgmt was removed by NM and not externally (though we don't have the TRACE proof of that). Could it be possible that NM also updated ifcfg-enp4s0 and dropped the BRIDGE= line from it (despite it always had NM_CONTROLLED=no line)? (In reply to Dan Kenigsberg from comment #21) > (In reply to Beniamino Galvani from comment #4) > > > > ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt > > > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") > > > > > > ovirtmgmt was removed from host by NetworkManager. > > > > > > Expected results: > > > NetworkManager shouldn't remove the management network from the host when the > > > NM_CONTROLLED=no set > > > > Since monitor-connection-files is enabled in NetworkManager.conf, > > NetworkManager recognizes that the file gets removed externally at > > 16:15:46, so it starts to manage the device. The file is re-added > > later at 16:15:51 and NM recognizes this too. > > I believe that ifcfg-ovirtmgmt was removed by NM and not externally (though > we don't have the TRACE proof of that). Could it be possible that NM also > updated ifcfg-enp4s0 and dropped the BRIDGE= line from it (despite it always > had NM_CONTROLLED=no line)? NetworkManager should not do that, and I don't see how that could happen. We'll need a logfile showing the misbehavior. Thanks. I do not believe this is networkmanager causing this problem. I am getting this same problem. My interface files disappear right after this line in my logs: Jun 23 11:52:08 chamber-vmhead-01 vdsmd_init_common.sh: vdsm: Running restore_nets I tracked it down to /usr/share/vdsm/vdsm-restore-net-config I commented out line 56 and 57: setupNetworks(removeNetworks, removeBonds, connectivityCheck=False, This seems like another bug, that I do not have time to log ATM. I a running out of patience with ovirt and vdsm. Considering migrating to something else. I'll give ovirt 4.0 a shot, but iuf that doesn't work, I'll switch to something like ProxMox Michael, vdsm has its fair share of network restoration bugs. But this specific bug speaks about modifications of ifcfg files which vdsm never do. I would appreciate if you can share your supervdsm.log with users (CC me) so we can debug your issue constructively. Thank you for the attention Dan. I ended up buying a set of RHEV licences for these systems and installed RHEL. Unfortunately, I no longer have the supervdsm.log from that problem install. Lowering severity, since we have not reproduced the bug since we dropped monitor-connection-files. seems the issue cannot be reproduced. Closing after offline discussion with Dan. (please reopen if you think there is something to do). Noticing same issue on "Red Hat Virtualization Host 4.1 (el7.4)" Her is log snippet from /var/log/messages: Nov 15 13:10:27 kvm114 NetworkManager[1908]: <info> [1510780227.9405] ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") On host reboot ifcfg-ovirtmgmt gets removed by Network manager, seen 2-3 instances on this on my cluster. (In reply to deepak from comment #28) > Noticing same issue on "Red Hat Virtualization Host 4.1 (el7.4)" There wasn't enough information on this bug to understand why it happened. If you think you see this issue, it would be helpful to provide new information. > Her is log snippet from /var/log/messages: > > Nov 15 13:10:27 kvm114 NetworkManager[1908]: <info> [1510780227.9405] > ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt") This message does not indicate that NM actively deletes the file. It is logged when NM was tracking the ifcfg-rh file previously, and after reload from disk, the file was gone (causing NM to forget about it). Whether NM actively deleted the file, is not indicated (or contra-indicated) by this message alone. > On host reboot ifcfg-ovirtmgmt gets removed by Network manager, seen 2-3 > instances on this on my cluster. Please enable level=TRACE logging (see https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf ), and attach a logfile. Thanks. Created attachment 1354442 [details]
messages file after TRACE is enabled for NetworkManager
/var/log/messages on the host after ifcfg-ovirtmgmt gone missing
Created attachment 1354443 [details]
output of 'journalctl -u NetworkManager' after enabling TRACE level
Please find attached output of 'journalctl -u NetworkManager' after enabling TRACE level of NetworkManager.
Thanks,
Deepak
Juts for reference here is timestamp when it went missing: Nov 17 16:32:38 kvm113 network: grep: ifcfg-ovirtmgmt: No such file or directory |