Bug 1907960

Summary: Cannot create VLAN interface with name `vlan0`
Product: Red Hat Enterprise Linux 8 Reporter: Gris Ge <fge>
Component: NetworkManagerAssignee: Thomas Haller <thaller>
Status: CLOSED ERRATA QA Contact: Filip Pokryvka <fpokryvk>
Severity: medium Docs Contact:
Priority: medium    
Version: 8.4CC: acardace, atragler, bgalvani, fpokryvk, lrintel, rkhan, sukulkar, thaller, till, vbenes
Target Milestone: rcKeywords: Triaged
Target Release: 8.4   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-11-09 19:28:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
System logs with NM trace enabled none

Description Gris Ge 2020-12-15 15:14:48 UTC
Description of problem:

Cannot create `vlan0` holding vlan ID 10. Both the kernel and nmcli shows the VLAN ID is 0.

Version-Release number of selected component (if applicable):
NetworkManager-1.29.6-27413.copr.ec7d8ddb29.el8.x86_64

How reproducible:
100%

Steps to Reproduce:

sudo nmcli connection add type dummy \
    con-name dummy0 ifname dummy0 \
    ipv4.method disabled ipv6.method disabled

sudo nmcli connection add type vlan \
    con-name vlan0 ifname vlan0 vlan.parent dummy0 \
    vlan.id 10 ipv4.method disabled ipv6.method disabled

sudo ip -D link show vlan0
sudo nmcli -f 'vlan.id' c show vlan0


Actual results:

Both nmcli and ip shows the VLAN ID is 0.

Expected results:

Both nmcli and ip shows the VLAN ID is 10.

Additional info:

Comment 1 Gris Ge 2020-12-15 15:16:46 UTC
Created attachment 1739364 [details]
System logs with NM trace enabled

Comment 2 Gris Ge 2020-12-15 15:18:39 UTC
Please be aware kernel has no limitation on `vlan0` name:


sudo ip link add link eth1 name vlan0 type vlan id 101

Comment 3 Beniamino Galvani 2021-03-25 09:21:43 UTC
This happens because the ifcfg-rh connection reader tries to be backwards compatible with initscripts:

https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/1.30.2/src/core/settings/plugins/ifcfg-rh/nms-ifcfg-rh-reader.c#L6077

At least for connections written by NM, this compatibility adjustment should not be done.

Comment 4 Thomas Haller 2021-03-25 10:29:24 UTC
(In reply to Beniamino Galvani from comment #3)
> This happens because the ifcfg-rh connection reader tries to be backwards
> compatible with initscripts:
> 
> https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/blob/1.30.2/
> src/core/settings/plugins/ifcfg-rh/nms-ifcfg-rh-reader.c#L6077
> 
> At least for connections written by NM, this compatibility adjustment should
> not be done.

this needs careful considerations.

Possibly we need to introduce a new key "NM_VLAN_ID" that does take precedence over the id from the interface name.

Comment 5 Till Maas 2021-03-25 10:52:57 UTC
Can we just document this as a known limitation and recommend to use the keyfile plugin for such connections if necessary?

Comment 6 Thomas Haller 2021-03-25 11:12:24 UTC
(In reply to Till Maas from comment #5)
> Can we just document this as a known limitation and recommend to use the
> keyfile plugin for such connections if necessary?

I don't thinks so, because if you are in a situation where this hits you, it's not at all clear where to find that documentation.

Also, what's nmstate (in this example) gonna do? Fail? Report an error to RHV/OSP? And then RHV/OSP logs an error? Log a message to use keyfile plugin? That's all high effort, hard to understand (and describe) and cumbersome to workaround.


In general, when NetworkManager persists a profile to disk, then it must be able to do so without loss of information.
There are a few exceptions, for example regarding certificate blobs. But in general, this is really important.

Luckily, the fix is simple...

Comment 8 Beniamino Galvani 2021-03-25 13:12:15 UTC
Can we instead drop the compatibily code altogether? I mean upstream. So that we don't need to complicate it further with additional ifcfg-rh variables.

For RHEL 8 we need to carry a simple downstream patch to re-introduce the compat workaround. We have done the same in the past to preserve compatibility when we changed behavior upstream (as for the mac=preserve/permanent, or for slaves-order=name/ifindex).

RHEL 9 will be aligned to upstream and we need to document the change.

Comment 9 Thomas Haller 2021-03-25 13:30:18 UTC
(In reply to Beniamino Galvani from comment #8)
> Can we instead drop the compatibily code altogether? I mean upstream. So
> that we don't need to complicate it further with additional ifcfg-rh
> variables.
> 
> For RHEL 8 we need to carry a simple downstream patch to re-introduce the
> compat workaround. We have done the same in the past to preserve
> compatibility when we changed behavior upstream (as for the
> mac=preserve/permanent, or for slaves-order=name/ifindex).
> 
> RHEL 9 will be aligned to upstream and we need to document the change.

That's possible.

It still leaves the bug on rhel-8. How would you address it there?
Do you mean to only merge the patch to rhel-8?


I disagree with the term "complicated". Sure, ifcfg is a mess, it's inconsistent etc.
But you can `git grep '\<VLAN_ID'` and get a clear(?) picture of how the
reader/writer interpret the format. Also the documentation of the property is not really(?)
complicated. I mean, if you write an ifcfg file by hand and you read
`man nm-settings-ifcfg-rh` and have hope to understand it (if not, let's improve
the wording there).
I'd reserve words like "complicated" for NMDevice and NMManager. Because thos is orders
of magnitude more complicated. This patch, is something I did in 20 minutes and I can understand
it. I am very sure you can review and understand it also in 20 minutes. Patches like
8df3ef5063cf0f8ee904f8fd39e9e64cc60836eb I do not understand despite trying.
Is this fix ugly? Yes! So maybe something else should be done. On the upside, it's
backward compatible, which leads to ugliness but has important benefits.

Comment 10 Thomas Haller 2021-03-25 13:32:05 UTC
> Patches like 8df3ef5063cf0f8ee904f8fd39e9e64cc60836eb I do not understand despite trying.

I meant https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/190fd9aa9f3fbf5705c2b80b9fc64c89d22b7593.

Comment 11 Beniamino Galvani 2021-03-26 08:34:12 UTC
(In reply to Thomas Haller from comment #9)
> It still leaves the bug on rhel-8. How would you address it there?
> Do you mean to only merge the patch to rhel-8?

What I mean is that the direction that we should pursue upstream is to remove such compatibility hacks. If a ifcfg file contains:

PHYSDEV=vlan0
VLAN_ID=80

it doesn't make any sense to ignore the VLAN id, since it's set explicitly for a reason.

I think in the long term not having the workaround is a better solution; however I do realize that having your patch as RHEL8-only is painful.

Comment 12 Thomas Haller 2021-03-26 08:57:45 UTC
I understood that part.


You want to have a nice fix upstream (which changes behavior).

My question was, what exactly do you suggest for rhel-8?
The patch that I opened? Or WONTFIX?

Also, there is a desire that upgrade for rhel-8 to rhel-9 works
without problems. How do we address the change in behavior when switching
from one version of NetworkManager to another? With a ReleaseNote entry?


Having a downstream-only patch in rhel-8 by itself is not a problem.
The problem is that there is a behavior associated with this patch, and
whenever you upgrade to a different behavior (e.g. from rhel-8 to rhel-9),
it's painful.


Originally this behavior was added to preserve backward compatibility.
Back then apparently backward compatibility with initscripts was important.
That might have changed. We don't really care about backward compat with initscripts.
But we care that our on-disk format for connection profiles is stable. And changing
how we interpret the flags in an ifcfg file is a breaking change.


Long term we want to get rid of ifcfg-rh altogether. That's why I don't mind
some ugly solution mid-term, in particular because it's backward compatible.



These are just some arguments. I don't have a strong opinion, except that I would
tend to care less about the ugliness and more about changing behavior.




Btw, the patch addes VLAN_ID_USE=yes, instead of VLAN_ID2=. The reason for that
is not only backward compatiblity, but "forward compatibility". Not sure that is the
right word but I mean that you can write a profile with a future version that writes
VLAN_ID_USE=yes, and downgrade to an old version, and chances are that it's interpreted
the same way. That's achieved by:

   VLAN_ID=10
   VLAN_ID_USE=yes

That could have also been achieved also by writing

   VLAN_ID=10
   VLAN_ID2=10

but then we write redundant information, which seems more ugly.

If we don't care about this kind of forward compatibility, we could
simply write:

   VLAN_ID2=10

and treat the old profiles as before. That is less ugly than !794, more backward
compatible than changing behavior, and not "forward compatible".


So, there are also alternatives to !794 that break compatibility less.

Comment 13 Thomas Haller 2021-03-26 09:00:33 UTC
> So, there are also alternatives to !794 that break compatibility less

I meant:

there are also alternatives to !794 that break compatibility more/differently,
but less then what you might have in mind.

Comment 14 Thomas Haller 2021-03-26 09:02:24 UTC
> PHYSDEV=vlan0
> VLAN_ID=80
>
> it doesn't make any sense to ignore the VLAN id, since it's set explicitly for a reason.

OK, good point.

I'll update the MR (next week)

Comment 15 Beniamino Galvani 2021-03-26 09:27:41 UTC
(In reply to Thomas Haller from comment #12)
> My question was, what exactly do you suggest for rhel-8?
> The patch that I opened? Or WONTFIX?

I would use your patch in RHEL8.
 
> Also, there is a desire that upgrade for rhel-8 to rhel-9 works
> without problems. How do we address the change in behavior when switching
> from one version of NetworkManager to another? With a ReleaseNote entry?

Yes, I think a release note is enough.
If necessary, we can add a Leapp module to convert files during the upgrade from 8 to 9.

> These are just some arguments. I don't have a strong opinion, except that I would
> tend to care less about the ugliness and more about changing behavior.

This is a good point. Ugliness doesn't cause more burden for us, while changing behavior does (as we need to take care about downstream patches, release notes, migration).

So I'm reconsidering my suggestion, and I'm fine with having your patch both upstream and in RHEL.

BTW, you current branch is how I would have done it. My observation on VLAN_ID= was only about the current state (e.g. git master) vs the ideal one.

Comment 16 Thomas Haller 2021-03-29 20:11:45 UTC
fixed upstream by https://gitlab.freedesktop.org/NetworkManager/NetworkManager/-/commit/5d6532f2d7c203c2d70c0e328806dcb0b0ea53bc

This is now considered simply a bug(fix), and not something where "backward compatibilty" is affected (see commit message).

Comment 21 errata-xmlrpc 2021-11-09 19:28:55 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: NetworkManager security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:4361