RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1344411 - NetworkManager removes ifcfg-ovirtmgmt after reboot although it was set to NM_CONTROLLED=no
Summary: NetworkManager removes ifcfg-ovirtmgmt after reboot although it was set to NM...
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: NetworkManager
Version: 7.2
Hardware: x86_64
OS: Other
urgent
low
Target Milestone: rc
: 7.3
Assignee: Thomas Haller
QA Contact: Desktop QE
URL:
Whiteboard:
Depends On: 1345919 1347958
Blocks: 1304509 vdsm_config_NetworkMgr_to_be_passive 1330144
TreeView+ depends on / blocked
 
Reported: 2016-06-09 15:33 UTC by Michael Burman
Modified: 2017-11-18 01:07 UTC (History)
14 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1345919 (view as bug list)
Environment:
Last Closed: 2016-11-03 15:40:57 UTC
Target Upstream Version:
Embargoed:
thaller: needinfo-


Attachments (Terms of Use)
vdsm logs (79.95 KB, application/x-gzip)
2016-06-13 09:08 UTC, Michael Burman
no flags Details
messages file after TRACE is enabled for NetworkManager (13.87 MB, text/plain)
2017-11-18 00:56 UTC, deepak
no flags Details
output of 'journalctl -u NetworkManager' after enabling TRACE level (466.14 KB, text/x-vhdl)
2017-11-18 01:00 UTC, deepak
no flags Details

Description Michael Burman 2016-06-09 15:33:53 UTC
Description of problem:
NetworkManager removes ifcfg-ovirtmgmt after reboot although it was set to NM_CONTROLLED=no 

The NetworkManager removes the management network from the host after reboot and leaves the host without ip, vdsmd is not running and failed to restore-nets.

journalctl --> 

Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com kernel: ovirtmgmt: port 1(enp4s0) entered disabled state
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  (ovirtmgmt): link disconnected
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com kernel: device enp4s0 left promiscuous mode
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com kernel: ovirtmgmt: port 1(enp4s0) entered disabled state
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  (ovirtmgmt): bridge port enp4s0 was detached
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  (enp4s0): released from master ovirtmgmt
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <warn>  (enp4s0): failed to disable userspace IPv6LL address handling
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  (enp4s0): new Ethernet device (carrier: OFF, driver: 'bnx2', ifindex: 2)
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  ifcfg-rh: update /etc/sysconfig/network-scripts/ifcfg-enp4s0 (b325fd44-30b3-c744-3fc9-e154b78e8c82,"System enp4s0")
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com systemd[1]: Started /usr/sbin/ifup enp4s0.
-- Subject: Unit 0221356d-09a8-4f41-b55e-a14f4718ec59.scope has finished start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit 0221356d-09a8-4f41-b55e-a14f4718ec59.scope has finished starting up.
-- 
-- The start-up result is done.
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com systemd[1]: Starting /usr/sbin/ifup enp4s0.
-- Subject: Unit 0221356d-09a8-4f41-b55e-a14f4718ec59.scope has begun start-up
-- Defined-By: systemd
-- Support: http://lists.freedesktop.org/mailman/listinfo/systemd-devel
-- 
-- Unit 0221356d-09a8-4f41-b55e-a14f4718ec59.scope has begun starting up.
Jun 09 16:15:45 orchid-vds2.qa.lab.tlv.redhat.com kernel: bnx2 0000:04:00.0: irq 28 for MSI/MSI-X
Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com kernel: bnx2 0000:04:00.0 enp4s0: using MSI
Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com kernel: IPv6: ADDRCONF(NETDEV_UP): enp4s0: link is not ready
Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")


Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com nm-dispatcher[3685]: Dispatching action 'up' for ovirtmgmt
Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com systemd[1]: Unit iscsi.service cannot be reloaded because it is inactive.
Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3684]: DHCPREQUEST on ovirtmgmt to 255.255.255.255 port 67 (xid=0x1b3f959d)
Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  ifcfg-rh: new connection /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")
Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <warn>  ifcfg-rh: Ignoring connection /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")
Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  (ovirtmgmt): device state change: activated -> unmanaged (reason 'unmanaged') [100 10 3]
Jun 09 16:15:51 orchid-vds2.qa.lab.tlv.redhat.com dhclient[3684]: receive_packet failed on ovirtmgmt: Network is down


vdsm.log -->

restore-net::ERROR::2016-06-09 16:18:09,958::__init__::54::root::(__exit__) Failed rollback transaction last known good network.
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/vdsm/network/netswitch.py", line 130, in _setup_legacy
    bondings, _netinfo)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 471, in add_missing_networks
    _netinfo=_netinfo, **attrs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 180, in wrapped
    return func(network, configurator, **kwargs)
  File "/usr/lib/python2.7/site-packages/vdsm/network/legacy_switch.py", line 250, in _add_network
    net_ent_to_configure.configure(**options)
  File "/usr/lib/python2.7/site-packages/vdsm/network/models.py", line 186, in configure
    self.configurator.configureBridge(self, **opts)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 111, in configureBridge
    _ifup(bridge)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 846, in _ifup
    _exec_ifup(iface, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 805, in _exec_ifup
    _exec_ifup_by_name(iface.name, cgroup)
  File "/usr/lib/python2.7/site-packages/vdsm/network/configurators/ifcfg.py", line 791, in _exec_ifup_by_name
    raise ConfigNetworkError(ERR_FAILED_IFUP, out[-1] if out else '')
ConfigNetworkError: (29, 'Determining IPv6 information for ovirtmgmt... failed.')


- :::::::::::::
/etc/NetworkManager/conf.d/00-server.conf
::::::::::::::
# This configuration file, when placed into into
# /etc/NetworkManager/conf.d changes NetworkManager's behavior to
# what's expected on "traditional UNIX server" type deployments.
#
# See "man NetworkManager.conf" for more information about these
# and other keys.
#
# Do not edit this file; it will be overwritten on upgrades. If you
# want to override the values here, or set additional values, you can
# do so by adding another file (eg, "99-local.conf") to this directory
# and setting keys there.

[main]
# Do not do automatic (DHCP/SLAAC) configuration on ethernet devices
# with no other matching connections.
no-auto-default=*

# Ignore the carrier (cable plugged in) state when attempting to
# activate static-IP connections.
ignore-carrier=*
::::::::::::::
/etc/NetworkManager/conf.d/10-ibft-plugin.conf
::::::::::::::
# This file enables the standalone 'iBFT' settings plugin to read
# iBFT information with iscsiadm and create connections from that
# data.
#
# Do not edit this file; it will be overwritten on upgrades. If you
# want to override the values here, or set additional values, you can
# do so by adding another file (eg, "99-local.conf") to this directory
# and setting keys there.

[main]
plugins+=ibft
::::::::::::::
/etc/NetworkManager/conf.d/90-vdsm-monitor-connection-files.conf
::::::::::::::
# This file is necessary to let VDSM properly consume connections owned by
# NetworkManager (to make it unmanage them), primarily on ifcfg systems.

[main]
monitor-connection-files=true

- cat /etc/NetworkManager/NetworkManager.conf 
# Configuration file for NetworkManager.
#
# See "man 5 NetworkManager.conf" for details.
#
# The directory /etc/NetworkManager/conf.d/ can contain additional configuration
# snippets. Those snippets override the settings from this main file.
#
# The files within conf.d/ directory are read in asciibetical order.
#
# If two files define the same key, the one that is read afterwards will overwrite
# the previous one.

[main]
plugins=ifcfg-rh

[logging]
#level=DEBUG
#domains=ALL

- ifcfg-ovirtmgmt before reboot : 
cat /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt 
# Generated by VDSM version 4.18.1-11.gita92976e.el7ev
DEVICE=ovirtmgmt
TYPE=Bridge
DELAY=0
STP=off
ONBOOT=yes
BOOTPROTO=dhcp
MTU=1500
DEFROUTE=yes
NM_CONTROLLED=no
IPV6INIT=no

cat /etc/sysconfig/network-scripts/ifcfg-enp4s0 
# Generated by VDSM version 4.18.1-11.gita92976e.el7ev
DEVICE=enp4s0
BRIDGE=ovirtmgmt
ONBOOT=yes
MTU=1500
NM_CONTROLLED=no
IPV6INIT=no



Version-Release number of selected component (if applicable):
Red Hat Enterprise Linux release 7.2 Technology Preview
rhevh7-ng-4.0-0.20160607.0+1
libvirt-daemon-1.2.17-13.el7_2.4.x86_64
vdsm-4.18.1-11.gita92976e.el7ev.x86_64

How reproducible:
100

Steps to Reproduce:
1. Install rhevh-ng on latest 4.0 rhevm engine (ovirtmgmt created over a NIC)
2. Attach additional network to host on second NIC(not must)
3. Reboot server

Actual results:
ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")

ovirtmgmt was removed from host by NetworkManager.

Expected results:
NetworkManager shouldn't remove the management network from the host when the
NM_CONTROLLED=no set

- This was discovered on a rhevh-ng, but we believe that this is will happen on a rhel7 as well(4 + 3.z)

Comment 1 Thomas Haller 2016-06-09 15:46:29 UTC
could you please enable TRACE logging, reproduce the problem, and attach the entire logfile?

Edit /etc/NetworkManager/NetworkManager.conf and add

[logging]
level=TRACE



Thank you.

Comment 4 Beniamino Galvani 2016-06-10 11:49:52 UTC
(In reply to Michael Burman from comment #0)

> Steps to Reproduce:
> 1. Install rhevh-ng on latest 4.0 rhevm engine (ovirtmgmt created over a NIC)
> 2. Attach additional network to host on second NIC(not must)
> 3. Reboot server
>
> Actual results:
> ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")
>
> ovirtmgmt was removed from host by NetworkManager.
>
> Expected results:
> NetworkManager shouldn't remove the management network from the host when the
> NM_CONTROLLED=no set

Since monitor-connection-files is enabled in NetworkManager.conf,
NetworkManager recognizes that the file gets removed externally at
16:15:46, so it starts to manage the device. The file is re-added
later at 16:15:51 and NM recognizes this too.

Do you really need to set monitor-connection-files=yes? Note that this
is disabled by default because it can cause race conditions (e.g. many
text editors delete the file and re-create them when saving, and so NM
would see the deletion event and start to manage the device for a
short period).

A better solution is to disable monitor-connection-files and
explicitly call "nmcli connection reload" when you want NM to pick up
changes to connection files.

Can you attach debug logs as explained in comment 1?

Comment 5 Fabian Deutsch 2016-06-10 12:43:01 UTC
Beniamino, even with monitor-connection-files=yes, shouldn't NM stay passive wen the connection file is getting removed. because we also set no-auto-default=* ?


One note tho, Dan already saw this problem coming up in the commit which introduced monitor-connection-files:

vdsm:57617fe62ac797d02b9a19b216b674d9f4f2c7c3

    Please note that this approach is potentially raceful. Under
    high loads, it is theoretically possible that NM learns of
    a changed file too late. To be sure, we should probably call
    to 'nmcli connection load' synchronously, just before running
    'ifup' on a given device.

I currently try to reproduce this on RHEL to provide the tracing informations.

Comment 6 Beniamino Galvani 2016-06-10 13:04:03 UTC
(In reply to Fabian Deutsch from comment #5)
> Beniamino, even with monitor-connection-files=yes, shouldn't NM stay passive
> wen the connection file is getting removed. because we also set
> no-auto-default=* ?

no-auto-default=* only tells NM not to create a default DHCP
connection for the device in absence of other on-disk connections.

But when the file with NM_CONTROLLED=no gets removed, there is nothing
preventing NM from managing the device, and so it will try to activate
existing connections. If none exist, the device will stay in
'disconnected' state without any address.

Comment 7 Thomas Haller 2016-06-10 13:07:43 UTC
(In reply to Fabian Deutsch from comment #5)
> Beniamino, even with monitor-connection-files=yes, shouldn't NM stay passive
> wen the connection file is getting removed. because we also set
> no-auto-default=* ?
> 
> 
> One note tho, Dan already saw this problem coming up in the commit which
> introduced monitor-connection-files:
> 
> vdsm:57617fe62ac797d02b9a19b216b674d9f4f2c7c3
> 
>     Please note that this approach is potentially raceful. Under
>     high loads, it is theoretically possible that NM learns of
>     a changed file too late. To be sure, we should probably call
>     to 'nmcli connection load' synchronously, just before running
>     'ifup' on a given device.
> 

if you are using ifup to activate an ifcfg-rh file, initscripts will ask NetworkManager whether the file is managed by NetworkManager (as indicated by NM_CONTROLLED). Following that, ifup will either call `nmcli connection up` or proceed to activate the interface.

When NM is contacted by ifup, it will automatically reload the file to make sure that it's information is up-to-date. Thus, monitor-connection-files should not be necessary in this case.

Comment 8 Fabian Deutsch 2016-06-10 13:40:35 UTC
To give some context: In our use-case vdsm is writing ifcfg files and we use the legacy network scripts for bringing up the networking (by adding NM_CONTROLLED=no to each ifcfg).
In general we do not want NM to manage any network device.

However, we do need NM for monitoring and enumerating these connections, because on RHEV-H we are using Cockpit for administration,a nd Cockpit relies on NM for displaying network informations.

According to a small offline IRC discussion, unmanaged-devices=* can be used to prevent NM from touching devices even if the ifcfg goes away.

Comment 9 Thomas Haller 2016-06-10 13:51:06 UTC
(In reply to Fabian Deutsch from comment #8)
> To give some context: In our use-case vdsm is writing ifcfg files and we use
> the legacy network scripts for bringing up the networking (by adding
> NM_CONTROLLED=no to each ifcfg).
> In general we do not want NM to manage any network device.
> 
> However, we do need NM for monitoring and enumerating these connections,
> because on RHEV-H we are using Cockpit for administration,a nd Cockpit
> relies on NM for displaying network informations.
> 
> According to a small offline IRC discussion, unmanaged-devices=* can be used
> to prevent NM from touching devices even if the ifcfg goes away.

That sounds right.

monitoring-connection-files is still not advised and not necessary.

Comment 10 Beniamino Galvani 2016-06-10 14:04:50 UTC
(In reply to Thomas Haller from comment #9)

> > However, we do need NM for monitoring and enumerating these connections,
> > because on RHEV-H we are using Cockpit for administration,a nd Cockpit
> > relies on NM for displaying network informations.
> > 
> > According to a small offline IRC discussion, unmanaged-devices=* can be used
> > to prevent NM from touching devices even if the ifcfg goes away.

You will only be able to display basic information in cockpit (current throughput?), but not control the device if it is unmanaged by NM (but I guess it's ok since it was the same with NM_CONTROLLED=no).

Comment 11 Fabian Deutsch 2016-06-10 14:42:49 UTC
Yes, it's okay that we can not manage the devices, it's just important that Cockpit can still access NM for information retrieval.

Michael, can you please
1. start cleanly
2. to /etc/NetworkManager/conf.d/90-vdsm-monitor-connection-files.conf add
   unmanaged-devices=*
3. Restart NM (maybe enable TRACE as requested in comment 1
4. Try to reproduce the bug

Comment 12 Dan Kenigsberg 2016-06-13 08:13:22 UTC
(In reply to Thomas Haller from comment #7)
> 
> if you are using ifup to activate an ifcfg-rh file, initscripts will ask
> NetworkManager whether the file is managed by NetworkManager (as indicated
> by NM_CONTROLLED). Following that, ifup will either call `nmcli connection
> up` or proceed to activate the interface.
> 
> When NM is contacted by ifup, it will automatically reload the file to make
> sure that it's information is up-to-date. Thus, monitor-connection-files
> should not be necessary in this case.

But if NM already manages a device, ifup never tells NM that NM_CONNTROLLED=no and that NM should stop managing it.

What is the proper way to tell NM "stop managing this device"?

Why did NM end up deleting a line from an ifcfg file that had NM_CONNTROLLED=no?

Comment 13 Thomas Haller 2016-06-13 08:21:22 UTC
(In reply to Dan Kenigsberg from comment #12)
> (In reply to Thomas Haller from comment #7)
> > 
> > if you are using ifup to activate an ifcfg-rh file, initscripts will ask
> > NetworkManager whether the file is managed by NetworkManager (as indicated
> > by NM_CONTROLLED). Following that, ifup will either call `nmcli connection
> > up` or proceed to activate the interface.
> > 
> > When NM is contacted by ifup, it will automatically reload the file to make
> > sure that it's information is up-to-date. Thus, monitor-connection-files
> > should not be necessary in this case.
> 
> But if NM already manages a device, ifup never tells NM that
> NM_CONNTROLLED=no and that NM should stop managing it.

ifup first calls `nmcli connection load`. If the file then contains NM_CONTROLLED=no, the device should become unmanaged right away.

> What is the proper way to tell NM "stop managing this device"?

There are several. NM_CONTROLLED=no is a proper way.


> Why did NM end up deleting a line from an ifcfg file that had
> NM_CONNTROLLED=no?

That should not happen. There is no logfile attached to this bug that would show NetworkManager modifying the ifcfg file. Please provide a full logfile with TRACE level enabled.

Comment 14 Michael Burman 2016-06-13 08:44:49 UTC
I can't reproduce this report unfortunately, don't understand why.(i reproduced it easily 2 times last week). 

From journalctl you can see that NetworkManager modifying the ifcfg file --> 

Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]: <info>  ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")

Sorry that i can't provide more logs at this point.

Comment 15 Michael Burman 2016-06-13 09:08:09 UTC
Created attachment 1167330 [details]
vdsm logs

But i have vdsm logs though from the original report

Comment 16 Dan Kenigsberg 2016-06-13 10:15:00 UTC
(In reply to Thomas Haller from comment #13)

> > But if NM already manages a device, ifup never tells NM that
> > NM_CONNTROLLED=no and that NM should stop managing it.
> 
> ifup first calls `nmcli connection load`. If the file then contains
> NM_CONTROLLED=no, the device should become unmanaged right away.

from what I see in network-functions, if NM_CONTROLLED=no, ifup does nothing at all, and the device stays managed by NM.

    if ! is_false $NM_CONTROLLED && is_nm_running; then
        nmcli con load "/etc/sysconfig/network-scripts/$CONFIG"
        UUID=$(get_uuid_by_config $CONFIG)
        [ -n "$UUID" ] && _use_nm=true
    fi

> 
> > What is the proper way to tell NM "stop managing this device"?
> 
> There are several. NM_CONTROLLED=no is a proper way.

Would you be kind to suggest a proper way?

Comment 17 Thomas Haller 2016-06-13 12:14:03 UTC
(In reply to Michael Burman from comment #14)
> I can't reproduce this report unfortunately, don't understand why.(i
> reproduced it easily 2 times last week). 
> 
> From journalctl you can see that NetworkManager modifying the ifcfg file --> 
> 
> Jun 09 16:15:46 orchid-vds2.qa.lab.tlv.redhat.com NetworkManager[853]:
> <info>  ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")

AFAIS, this merely says that NetworkManager noticed that the file disappeared. Which happens with monitor-connection-files=yes.
I don't think it means that NetworkManager was actively removing any files (I admit, that is not clear from this wording).



(In reply to Michael Burman from comment #15)
> Created attachment 1167330 [details]
> vdsm logs
> 
> But i have vdsm logs though from the original report

Thank you, but here I don't see what is wrong.

Comment 18 Thomas Haller 2016-06-13 12:18:18 UTC
(In reply to Dan Kenigsberg from comment #16)
> (In reply to Thomas Haller from comment #13)
> 
> > > But if NM already manages a device, ifup never tells NM that
> > > NM_CONNTROLLED=no and that NM should stop managing it.
> > 
> > ifup first calls `nmcli connection load`. If the file then contains
> > NM_CONTROLLED=no, the device should become unmanaged right away.
> 
> from what I see in network-functions, if NM_CONTROLLED=no, ifup does nothing
> at all, and the device stays managed by NM.
> 
>     if ! is_false $NM_CONTROLLED && is_nm_running; then
>         nmcli con load "/etc/sysconfig/network-scripts/$CONFIG"
>         UUID=$(get_uuid_by_config $CONFIG)
>         [ -n "$UUID" ] && _use_nm=true
>     fi

you are right. This seems to be a bug in initscripts.

You cannot really workaround this with monitor-connection-files, because then you have a race where initscripts may setup the interface, but NetworkManager only notices afterwards that the device should be unmanaged.


> > > What is the proper way to tell NM "stop managing this device"?
> > 
> > There are several. NM_CONTROLLED=no is a proper way.
> 
> Would you be kind to suggest a proper way?

Another way is via NetworkManager's configuration. Create a file
/etc/NetworkManager/conf.d/vdsm-unmanage-all.conf
with

[keyfile]
unmanged-devices=*

Comment 19 Dan Kenigsberg 2016-06-13 14:39:52 UTC
(In reply to Thomas Haller from comment #18)
> 
> [keyfile]
> unmanged-devices=*

isn't this a bit coarse? I'd like NM to stop managing a specific device, do I need to maintain the list of unmanaged-devices, and restart NM whenever I change it?

Comment 20 Thomas Haller 2016-06-13 15:27:24 UTC
(In reply to Dan Kenigsberg from comment #19)
> (In reply to Thomas Haller from comment #18)
> > 
> > [keyfile]
> > unmanged-devices=*
> 
> isn't this a bit coarse? I'd like NM to stop managing a specific device, do
> I need to maintain the list of unmanaged-devices, and restart NM whenever I
> change it?

I opened bug 1345919, which fixes the issue that NM_CONTROLLED=no will work as expected (with monitor-connection-files=no). So, that might be your best option.




For completeness, you can also:



in NetworkManager.conf, you can also select interfaces explicitly:
  [keyfile]
  unmanaged-devices=eth10
also with globbing:
  unmanaged-devices=interface-name:eth*
also multiple entires per line:
  unmanaged-devices=interface-name:eth*,wlan0
or multiple files in conf.d and extend the list with "+="
  unmanaged-devices+=interface-name:more
All described in `man NetworkManager.conf`.

But if the list is dynamic, this doesn't work well because, after changing such a file, you need to reload the configuration via `killall -SIGHUP NetworkManager`. Then you have a (small) race and you don't know when NM is finished reloading.



Alternatively, you can also drop udev files to /etc/udev/rules.d and set NM_UNMANAGED. Followed by `udevadm control --reload-rules`.
/usr/lib/udev/rules.d/85-nm-unmanaged.rules
This works only for devices that are created after you added the rule.

Comment 21 Dan Kenigsberg 2016-06-16 09:23:43 UTC
(In reply to Beniamino Galvani from comment #4)

> > ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")
> >
> > ovirtmgmt was removed from host by NetworkManager.
> >
> > Expected results:
> > NetworkManager shouldn't remove the management network from the host when the
> > NM_CONTROLLED=no set
> 
> Since monitor-connection-files is enabled in NetworkManager.conf,
> NetworkManager recognizes that the file gets removed externally at
> 16:15:46, so it starts to manage the device. The file is re-added
> later at 16:15:51 and NM recognizes this too.

I believe that ifcfg-ovirtmgmt was removed by NM and not externally (though we don't have the TRACE proof of that). Could it be possible that NM also updated ifcfg-enp4s0 and dropped the BRIDGE= line from it (despite it always had NM_CONTROLLED=no line)?

Comment 22 Thomas Haller 2016-06-16 17:22:50 UTC
(In reply to Dan Kenigsberg from comment #21)
> (In reply to Beniamino Galvani from comment #4)
> 
> > > ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> > > (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")
> > >
> > > ovirtmgmt was removed from host by NetworkManager.
> > >
> > > Expected results:
> > > NetworkManager shouldn't remove the management network from the host when the
> > > NM_CONTROLLED=no set
> > 
> > Since monitor-connection-files is enabled in NetworkManager.conf,
> > NetworkManager recognizes that the file gets removed externally at
> > 16:15:46, so it starts to manage the device. The file is re-added
> > later at 16:15:51 and NM recognizes this too.
> 
> I believe that ifcfg-ovirtmgmt was removed by NM and not externally (though
> we don't have the TRACE proof of that). Could it be possible that NM also
> updated ifcfg-enp4s0 and dropped the BRIDGE= line from it (despite it always
> had NM_CONTROLLED=no line)?

NetworkManager should not do that, and I don't see how that could happen.
We'll need a logfile showing the misbehavior. Thanks.

Comment 23 Michael Gregg 2016-06-23 19:21:34 UTC
I do not believe this is networkmanager causing this problem. 

I am getting this same problem. My interface files disappear right after this line in my logs:

Jun 23 11:52:08 chamber-vmhead-01 vdsmd_init_common.sh: vdsm: Running restore_nets

I tracked it down to /usr/share/vdsm/vdsm-restore-net-config

I commented out line 56 and 57:

setupNetworks(removeNetworks, removeBonds, connectivityCheck=False,

This seems like another bug, that I do not have time to log ATM. I a running out of patience with ovirt and vdsm. Considering migrating to something else. 

I'll give ovirt 4.0 a shot, but iuf that doesn't work, I'll switch to something like ProxMox

Comment 24 Dan Kenigsberg 2016-06-26 07:16:28 UTC
Michael, vdsm has its fair share of network restoration bugs. But this specific bug speaks about modifications of ifcfg files which vdsm never do.

I would appreciate if you can share your supervdsm.log with users (CC me) so we can debug your issue constructively.

Comment 25 Michael Gregg 2016-08-01 16:27:03 UTC
Thank you for the attention Dan. 

I ended up buying a set of RHEV licences for these systems and installed RHEL. 

Unfortunately, I no longer have the supervdsm.log from that problem install.

Comment 26 Dan Kenigsberg 2016-10-06 12:56:18 UTC
Lowering severity, since we have not reproduced the bug since we dropped monitor-connection-files.

Comment 27 Thomas Haller 2016-11-03 15:40:57 UTC
seems the issue cannot be reproduced.

Closing after offline discussion with Dan.

(please reopen if you think there is something to do).

Comment 28 deepak 2017-11-16 22:43:51 UTC
Noticing same issue on "Red Hat Virtualization Host 4.1 (el7.4)"

Her is log snippet from /var/log/messages:

Nov 15 13:10:27 kvm114 NetworkManager[1908]: <info>  [1510780227.9405] ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")

On host reboot ifcfg-ovirtmgmt gets removed by Network manager, seen  2-3 instances on this on my cluster.

Comment 29 Thomas Haller 2017-11-17 09:18:25 UTC
(In reply to deepak from comment #28)
> Noticing same issue on "Red Hat Virtualization Host 4.1 (el7.4)"

There wasn't enough information on this bug to understand why it happened. If you think you see this issue, it would be helpful to provide new information.

> Her is log snippet from /var/log/messages:
> 
> Nov 15 13:10:27 kvm114 NetworkManager[1908]: <info>  [1510780227.9405]
> ifcfg-rh: remove /etc/sysconfig/network-scripts/ifcfg-ovirtmgmt
> (9a0b07c0-2983-fe97-ec7f-ad2b51c3a3f0,"System ovirtmgmt")

This message does not indicate that NM actively deletes the file. It is logged when NM was tracking the ifcfg-rh file previously, and after reload from disk, the file was gone (causing NM to forget about it).
Whether NM actively deleted the file, is not indicated (or contra-indicated) by this message alone.

> On host reboot ifcfg-ovirtmgmt gets removed by Network manager, seen  2-3
> instances on this on my cluster.

Please enable level=TRACE logging (see https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf ), and attach a logfile. Thanks.

Comment 30 deepak 2017-11-18 00:56:43 UTC
Created attachment 1354442 [details]
messages file after TRACE is enabled for NetworkManager

/var/log/messages on the host after ifcfg-ovirtmgmt gone missing

Comment 31 deepak 2017-11-18 01:00:04 UTC
Created attachment 1354443 [details]
output of 'journalctl -u NetworkManager' after enabling TRACE level

Please find attached output of 'journalctl -u NetworkManager' after enabling TRACE level of NetworkManager.

Thanks,
Deepak

Comment 32 deepak 2017-11-18 01:07:53 UTC
Juts for reference here is timestamp when it went missing:
Nov 17 16:32:38 kvm113 network: grep: ifcfg-ovirtmgmt: No such file or directory


Note You need to log in before you can comment on or make changes to this bug.