Bug 1400649

Summary: VLAN ifcfg created with nmcli/NetworkManager isn't valid for network.service
Product: Red Hat Enterprise Linux 7 Reporter: Dax Kelson <dkelson>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED WONTFIX QA Contact: Desktop QE <desktop-qa-list>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.3CC: atragler, bgalvani, fgiudici, lrintel, mleitner, rkhan, sukulkar, thaller, vbenes
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-09-26 09:01:13 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1470965    
Attachments:
Description Flags
[patch] rh1400649-ifcfg-vlan-compat.patch none

Description Dax Kelson 2016-12-01 17:21:15 UTC
Description of problem:

TL;DR:

NetworkManager should add the DEVICE=<ifname>.<vlanid> to ifcfg it creates so the ifcfg files work even if NetworkManager is then disabled.

Long version:

Given two NICS, eth0, and eth1, using nmcli/NetworkManager to create an active/passive bond0 interface, and then creating a VLAN on that interface results in an invalid ifcfg file that the network.service can use when NetworKManager is then disabled.

# create bond0
nmcli con add type bond con-name bond0 ifname bond0 mode active-backup ipv4.method disabled ipv6.method ignore 802-3-ethernet.mtu 9000

# add bond slaves
nmcli con add type bond-slave ifname eth0 master bond0 802-3-ethernet.mtu 9000
nmcli con add type bond-slave ifname eth1 master bond0 802-3-ethernet.mtu 9000

# create bond0.12 interface for VLAN 12
nmcli con add type vlan con-name bond0.12 dev bond0 id 12 ipv4.method disabled ipv6.method ignore 802-3-ethernet.mtu 9000

# cat /etc/sysconfig/network-scripts/ifcfg-bond0.12
VLAN=yes
TYPE=Vlan
DEVICE=bond0.12
PHYSDEV=bond0
VLAN_ID=12
REORDER_HDR=yes
GVRP=no
MVRP=no
MTU=9000
IPV6INIT=no
NAME=bond0.12
UUID=0f3235fe-8102-4f06-8288-b1fab5a9e69c
ONBOOT=yes

It's missing 'DEVICE=bond0.12', so if you disable NetworkManager with NM_CONTROLLED=no, then you get errors trying to start or stop the interface:

# ifdown bond0.12
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
ERROR    : [ipv6_test_device_status] Missing parameter 'device' (arg 1)
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'

# ifup bond0.12
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
ERROR    : [/etc/sysconfig/network-scripts/ifup-eth] Device  does not seem to be present, delaying initialization.

NetworkManager is OK without the DEVICE line, no errors, but if you disable NetworkManger then the above errors appear.

Version-Release number of selected component (if applicable):

RHEL7.3 (and Fedora 25)

How reproducible:

every time

Steps to Reproduce:

It's simple to reproduce, you can do this on any machine:

nmcli con add type bond con-name bond0 ifname bond0 mode active-backup ipv4.method disabled ipv6.method ignore 802-3-ethernet.mtu 9000

nmcli con add type vlan con-name bond0.12 dev bond0 id 12 ipv4.method disabled ipv6.method ignore 802-3-ethernet.mtu 9000

# notice no errors
ifup bond0.12
ifdown bond0.12

# now disable NetworkManager
echo 'NM_CONTROLLED=no' > /etc/sysconfig/network-scripts/ifcfg-bond0
echo 'NM_CONTROLLED=no' > /etc/sysconfig/network-scripts/ifcfg-bond0.12
nmcli c reload

# try to bring it up
ifup bond0.12
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
Could not load file '/etc/sysconfig/network-scripts/ifcfg-bond0.12'
ERROR    : [/etc/sysconfig/network-scripts/ifup-eth] Device  does not seem to be present, delaying initialization.

# add the DEVICE variable and try again
echo 'DEVICE=bond0.12' >> ifcfg-bond0.12
ifup bond0.12 

(works, no error now)


Actual results:


Expected results:

The ifcfg files created by NetworkManager should work with or without NetworkManager.

Additional info:

Comment 1 Thomas Haller 2016-12-01 17:30:17 UTC
in general, it is not supported to let NetworkManager write ifcfg files, then disable NetworkManager and expect legacy initscripts to handle them.
-- note that legacy initscripts handle NM ifcfg files by calling nmcli to let NetworkManager deal with those files. If you disable NetworkManager, that is not guaranteed to work. Nor is it possible to work for every case because both initscripts and NM support feature that the other doesn't.

Of course, it should aim to create files that work as well as possible.
See also https://bugzilla.redhat.com/show_bug.cgi?id=1369008#c4

Comment 3 Beniamino Galvani 2017-09-06 08:57:54 UTC
initscripts need the DEVICE= variable set for vlans in order to
determine the interface name. For NetworkManager this is a bit
different, as it can build the interface name from the physical device
and the vlan id. So, in NM you can create connections with:

 * nmcli con add type vlan dev bond0 id 12

  This sets PHYSDEV=bond0, VLAN_ID=12 in the ifcfg file. The VLAN
  interface name is determined automatically by NM. The generated
  ifcfg file is not compatible with initscripts.

 * nmcli con add type vlan dev bond0 id 12 ifname bond0.12

  This sets PHYSDEV=bond0, VLAN_ID=12, DEVICE=bond0.12 in the ifcfg
  file, which is compatible with initscripts.

We could try to add a workaround that always sets DEVICE when the
interface name is not explictly set, but that would break existing
scripts; for example:

 * nmcli con add type vlan dev bond0 id 12 con-name my
 * nmcli con mod my vlan.id 13

would result in a connection with DEVICE=bond0.12 and VLAN_id=13,
unless we add an other ugly workaround to "adjust" things (and
possibly breaking something else).

Note that there are other differences in how vlan connections are
stored, as initscripts use the 'VID' variable for the vlan id while NM
uses 'VLAN_ID'. Unfortunately when initscripts added support for an
arbitrary vlan id in 2014, they picked a variable name different from
NM's:

 https://github.com/fedora-sysv/initscripts/commit/368d19d31a057bf4d82ba428629694618edce133


So, I think this is not easily fixable on NM side and I suggest that
if you really need a connection file compatible with initscripts
(which is in general something we try to achieve, but without
guarantees), you explicitly create the vlan with an interface name:

 * nmcli con add type vlan dev bond0 id 12 ifname bond0.12

Therefore I propose to close this as WONTFIX, any other opinions?

Comment 4 Thomas Haller 2017-09-06 09:18:26 UTC
(In reply to Beniamino Galvani from comment #3)

Nice investigation.

> We could try to add a workaround that always sets DEVICE when the
> interface name is not explictly set, but that would break existing
> scripts; for example:

If I understand you right, you propose to always write DEVICE. But without taking additional measures, that means that a VLAN connection always has connection.interface-name set. ifcfg-rh plugin is supposed to be able to store every valid connection. It would then no longer be able to store connections without connection.interface-name.

A additional measure/workaround could be to also write another flag like

DEVICE=bond0.12
DEVICE_NM_IGNORED=yes

That would also omit any problems with subsequent modify, because every time we write the connection anew, we would re-write DEVICE+DEVICE_NM_IGNORED.


The VID issue could be solved, by just writing both VID and VLAN_ID.NM could also fallback to VID if VLAN_ID is not present.


The ugliness of all this is of course frightening, but I think we should do this.

Comment 5 Beniamino Galvani 2017-09-08 13:24:33 UTC
(In reply to Thomas Haller from comment #4)
> A additional measure/workaround could be to also write another flag like
>
> DEVICE=bond0.12
> DEVICE_NM_IGNORED=yes
>
> That would also omit any problems with subsequent modify, because every time
> we write the connection anew, we would re-write DEVICE+DEVICE_NM_IGNORED.

> The VID issue could be solved, by just writing both VID and VLAN_ID.NM could
> also fallback to VID if VLAN_ID is not present.

I've pushed branch bg/vlan-device-ifcfg-rh1400649 which addresses the
problem using your suggestions.

What concerns me is that it adds complexity for a corner case that can
be easily worked around in a different way. The complexity is not in
the code, but especially in the ifcfg file, that now has duplicated
variables (VLAN_ID, VID) and variables whose meaning depend on another
variable (DEVICE, NM_IGNORE_DEVICE).

I'd like to hear what others think about it.

Comment 6 Thomas Haller 2017-09-08 14:46:34 UTC
(In reply to Beniamino Galvani from comment #5)
> What concerns me is that it adds complexity for a corner case that can
> be easily worked around in a different way.

That is true, I would still do it, but no strong opinion.
(at least parts of the branch are worthy in any cas).


> ifcfg-rh: rename write_bonding_setting() to write_bond_setting()

"trivial"?


> ifcfg-rh: always write DEVICE for VLAN connections

+         tmp = nm_utils_new_vlan_name (nm_setting_vlan_get_parent (s_vlan),
+                                       nm_setting_vlan_get_id (s_vlan));

nm_setting_vlan_get_parent() can return NULL, which would trigger an assertion.


> ifcfg-rh: add VID together with VLAN_ID
    
 DEVICE=super-vlan
-VLAN_ID=44
+VID=44
 PHYSDEV=eth9

this looks wrong. Does the test pass?



Otherwise, I am in favor of this patch.

Comment 7 Beniamino Galvani 2017-09-08 15:14:07 UTC
(In reply to Thomas Haller from comment #6)
> > ifcfg-rh: rename write_bonding_setting() to write_bond_setting()
> 
> "trivial"?

Ok.

> > ifcfg-rh: always write DEVICE for VLAN connections
> 
> +         tmp = nm_utils_new_vlan_name (nm_setting_vlan_get_parent (s_vlan),
> +                                       nm_setting_vlan_get_id (s_vlan));
> 
> nm_setting_vlan_get_parent() can return NULL, which would trigger an
> assertion.

Right, fixed. If the parent is NULL, we can't determine the VLAN interface name here and thus DEVICE is not added to the file.


> > ifcfg-rh: add VID together with VLAN_ID
>     
>  DEVICE=super-vlan
> -VLAN_ID=44
> +VID=44
>  PHYSDEV=eth9
> 
> this looks wrong. Does the test pass?

Yes. This is to test that the reader can parse VID instead of VLAN_ID.

Comment 8 Thomas Haller 2017-09-18 13:00:51 UTC
bg/vlan-device-ifcfg-rh1400649 lgtm

Comment 9 Beniamino Galvani 2017-09-26 09:01:13 UTC
Since Lubomir expressed some concerns about the branch and I tend to
agree that it adds hacks for an unsupported scenario, I'm closing this
as WONTFIX.

A simple way to achieve compatibility with initscripts in this case is
to explicitly specify the interface name with nmcli:

 nmcli con add type vlan dev bond0 id 12 ifname bond0.12

Comment 10 Thomas Haller 2017-09-26 09:19:06 UTC
Created attachment 1330948 [details]
[patch] rh1400649-ifcfg-vlan-compat.patch

Attaching the rejected patches to the bug.

the apply on upstream commit 7070d17cedd09d07f12ce977dd1e16cecf8d4b45