Bug 1553595

Summary: Vlan over bond device cannot be shown after installation finished
Product: Red Hat Enterprise Linux 7 Reporter: jiachen zhang <jiaczhan>
Component: NetworkManagerAssignee: Beniamino Galvani <bgalvani>
Status: CLOSED ERRATA QA Contact: Desktop QE <desktop-qa-list>
Severity: urgent Docs Contact:
Priority: high    
Version: 7.5CC: atragler, bgalvani, cshao, dbragalo, dfediuck, fgiudici, huzhao, kasmith, lrintel, mtessun, nanda_kishore_chinna, ptalbert, qiyuan, rbarry, rkhan, rvykydal, sbonazzo, sbueno, sukulkar, thaller, toneata, vbenes, weiwang, yaniwang, ycui, ylavi, yturgema
Target Milestone: pre-dev-freezeKeywords: ZStream
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: NetworkManager-1.10.2-14.el7 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1570521 (view as bug list) Environment:
Last Closed: 2018-10-30 11:11:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 1507957, 1526256, 1447254, 1570521    
Attachments:
Description Flags
/var/log /tmp ifcfg
none
log containing NM messages from installation
none
The updated attachment of "/var/log"
none
NetworkManager.log
none
[PATCH] manager: retry activating devices when the parent becomes managed none

Description jiachen zhang 2018-03-09 07:19:35 UTC
Created attachment 1406158 [details]
/var/log /tmp ifcfg

Description of problem:

Install RHVH-4.2-20180305.0-RHVH-x86_64-dvd1.iso via Anaconda GUI, configure a vlan device over a bond, after installation finished, the vlan device cannot be shown.


Version-Release number of selected component (if applicable):

RHVH-4.2-20180305.0-RHVH-x86_64-dvd1.iso

How reproducible:
100%

Steps to Reproduce:
1.Install RHVH-4.2-20180305.0-RHVH-x86_64-dvd1.iso via Anaconda GUI
2.Configure a vlan device over a bond
3.After installation finished, check ip with `ip addr`

Actual results:
1. There is no vlan device shown in the results of `ip addr`
2. There is ifcfg-VLAN-connection-1 under /etc/sysconfig/network-scripts/, but DEVICE parameter is missing.

Expected results:
1. The vlan over bond device could be shown, and can get IP

Additional info:
1. Restart NetworkManager can bring the vlan device up.
2. Reboot the host again, vlan device still cannot be shown.
3. If only configure a vlan device, not over bond, then after installation finished, the vlan device can be shown with the right IP.

Comment 1 Ryan Barry 2018-03-09 12:38:44 UTC
Samantha, any changes in 7.5 you know of which would have caused this?

Jiachen: is ONBOOT set to YES?

Comment 3 jiachen zhang 2018-03-12 02:02:59 UTC
The ONBOOT set to YES.

Comment 4 Radek Vykydal 2018-03-12 08:01:07 UTC
(In reply to Ryan Barry from comment #1)
> Samantha, any changes in 7.5 you know of which would have caused this?

There should not be any, so if it is a regression I'd look at NetworkManager (1.8 in RHEL 7.4 -> 1.10 in RHEL 7.5).

(In reply to jiachen zhang from comment #0)

> Actual results:
> 1. There is no vlan device shown in the results of `ip addr`
> 2. There is ifcfg-VLAN-connection-1 under /etc/sysconfig/network-scripts/,
> but DEVICE parameter is missing.

Do you think it is the cause the problem?
As of now, if you want to set DEVICE in ifcfg file you have to specify "VLAN" -> "VLAN Interface Name" in NetworkManager connection editor (used by Anaconda GUI)
And I've checked that this doesn't seem to be changed between 7.4 (NM 1.8) and 7.5 (NM 1.10).

I am reassigning to NetworkManager for comments.

Comment 5 Radek Vykydal 2018-03-12 08:04:10 UTC
Created attachment 1407102 [details]
log containing NM messages from installation

Comment 6 Thomas Haller 2018-03-12 13:36:32 UTC
> 3.After installation finished, check ip with `ip addr`

"After installation" here means on first boot into the newly installed system?


> 2. Reboot the host again, vlan device still cannot be shown.

Can you please enable level=TRACE debug level (see https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/fedora/rpm/NetworkManager.conf).
Then reboot, and provide the full log.

Comment 7 jiachen zhang 2018-03-13 03:06:46 UTC
Created attachment 1407413 [details]
The updated attachment of "/var/log"

Comment 8 jiachen zhang 2018-03-13 03:08:21 UTC
(In reply to Thomas Haller from comment #6)
> > 3.After installation finished, check ip with `ip addr`
> 
> "After installation" here means on first boot into the newly installed
> system?
> 
>   Yes.
> 
> > 2. Reboot the host again, vlan device still cannot be shown.
> 
> Can you please enable level=TRACE debug level (see
> https://cgit.freedesktop.org/NetworkManager/NetworkManager/tree/contrib/
> fedora/rpm/NetworkManager.conf).
> Then reboot, and provide the full log.
>
>   Please see the lastest attachment.

Comment 9 Thomas Haller 2018-03-13 09:20:19 UTC
(In reply to jiachen zhang from comment #7)
> Created attachment 1407413 [details]
> The updated attachment of "/var/log"

Thank you, but the attached logfile contains no debug logging. Please configure level=TRACE in /etc/NetworkManager/NetworkManager.conf before reboot. See comment 6.

Comment 10 Sandro Bonazzola 2018-03-13 10:50:04 UTC
Ryan can we have a workaround in RHV-H while platform fixes this?

Comment 11 cshao 2018-03-13 10:58:37 UTC
According bug's description:
1. Restart NetworkManager can bring the vlan device up.
2. Reboot the host again, vlan device still cannot be shown.

Comment 12 Qin Yuan 2018-03-13 13:04:47 UTC
Created attachment 1407568 [details]
NetworkManager.log

Comment 13 Qin Yuan 2018-03-13 13:11:13 UTC
(In reply to Thomas Haller from comment #9)
> (In reply to jiachen zhang from comment #7)
> > Created attachment 1407413 [details]
> > The updated attachment of "/var/log"
> 
> Thank you, but the attached logfile contains no debug logging. Please
> configure level=TRACE in /etc/NetworkManager/NetworkManager.conf before
> reboot. See comment 6.

I generated NetworkManager.log using `journalctl -u NetworkManager.service > NetworkManager.log`, and there are trace info in it. I wonder if this is what you need, please check the attachment.

Comment 14 Ryan Barry 2018-03-13 14:44:23 UTC
We do not have a workaround for this, and I'm not sure if one is possible without trying to completely reimplement NM's logic in imgbased

Comment 15 Thomas Haller 2018-03-13 14:59:14 UTC
(In reply to Qin Yuan from comment #13)
> I generated NetworkManager.log using `journalctl -u NetworkManager.service >
> NetworkManager.log`, and there are trace info in it. I wonder if this is
> what you need, please check the attachment.

This is fine. Thank you!!

Comment 16 Beniamino Galvani 2018-03-15 15:59:17 UTC
I think the regression is caused by this commit:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit/?id=ed640f857a1a1eae45d92cce35ea8dcfd8aba08d

During startup we look for a parent for the vlan connection and the bond is skipped because still unmanaged (by platform); therefore the activation fails with:

 manager: (VLAN connection 1) can't get a name of a virtual device: failed to determine interface name: error determine name for vlan

Comment 19 jiachen zhang 2018-03-19 09:06:11 UTC
The same issues occurs in redhat-virtualization-host-4.1-20180307.0

Comment 20 Beniamino Galvani 2018-03-20 12:49:40 UTC
Created attachment 1410448 [details]
[PATCH] manager: retry activating devices when the parent becomes managed

Comment 21 Thomas Haller 2018-03-20 13:17:07 UTC
(In reply to Beniamino Galvani from comment #20)
> Created attachment 1410448 [details]
> [PATCH] manager: retry activating devices when the parent becomes managed

lgtm. Does it pass CI?

Comment 22 Beniamino Galvani 2018-03-20 13:56:13 UTC
(In reply to Thomas Haller from comment #21)
> (In reply to Beniamino Galvani from comment #20)
> > Created attachment 1410448 [details]
> > [PATCH] manager: retry activating devices when the parent becomes managed
> 
> lgtm. Does it pass CI?

Yes, it does.

Comment 24 Beniamino Galvani 2018-03-22 12:48:36 UTC
CI test for this scenario:

https://github.com/NetworkManager/NetworkManager-ci/pull/162

Comment 25 Beniamino Galvani 2018-03-22 12:54:32 UTC
Patch applied to master:

https://cgit.freedesktop.org/NetworkManager/NetworkManager/commit
/?id=6493bd443f6c1d089919f0bb63c735bc2a76fc75

and nm-1-10.

Comment 26 Ryan Barry 2018-03-27 10:20:32 UTC
Just to verify -

This is picked to nm-1.10. Are we planning to ship this with 7.5 (which currently uses 1.10)?

Comment 27 Beniamino Galvani 2018-03-29 14:30:49 UTC
(In reply to Ryan Barry from comment #26)
> Just to verify -
> 
> This is picked to nm-1.10. Are we planning to ship this with 7.5 (which
> currently uses 1.10)?

It's too late for 7.5 GA, but we can ship the fix in the first z-stream batch.

Comment 28 Yaniv Lavi 2018-04-01 14:01:03 UTC
This is critical to RHV 4.2 GA. Adding blocker flag.

Comment 29 Ying Cui 2018-04-03 12:31:31 UTC
According to comment 19, this is critical to RHVH 4.1.10 async el7.5 release.

Comment 35 jiachen zhang 2018-05-08 08:42:58 UTC
Test this bug with the version redhat-virtualization-host-4.2-20180507.0 
I configured Bond+Vlan and reboot successfully and the bond+vlan can be shown.

Comment 36 Karl Hastings 2018-05-11 18:09:37 UTC
*** Bug 1576506 has been marked as a duplicate of this bug. ***

Comment 39 errata-xmlrpc 2018-10-30 11:11:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3207