Bug 1289028 - The vlan network is not up after upgrade from RHEVH-7.2/RHEVH-7.1 publicly released version to RHEV-H 7.2 for 3.6 beta2
Summary: The vlan network is not up after upgrade from RHEVH-7.2/RHEVH-7.1 publicly re...
Keywords:
Status: CLOSED EOL
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: ovirt-node
Version: 3.6.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Fabian Deutsch
QA Contact: Huijuan Zhao
URL:
Whiteboard:
Depends On:
Blocks: RHEV3.6Upgrade 1324513 1352452 1354596
TreeView+ depends on / blocked
 
Reported: 2015-12-07 09:29 UTC by Huijuan Zhao
Modified: 2022-04-16 09:02 UTC (History)
15 users (show)

Fixed In Version: ovirt-node-3.6.1-11.0
Doc Type: Bug Fix
Doc Text:
Clone Of:
: 1324513 (view as bug list)
Environment:
Last Closed: 2016-06-09 12:16:20 UTC
oVirt Team: Node
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
screenshot vlan upgrade fail (65.41 KB, image/png)
2015-12-07 09:29 UTC, Huijuan Zhao
no flags Details
vlan log (6.66 MB, application/x-gzip)
2015-12-07 09:32 UTC, Huijuan Zhao
no flags Details
the contents of /etc/default/ovirt both before and after the upgrade (612 bytes, application/x-gzip)
2015-12-16 08:55 UTC, Huijuan Zhao
no flags Details
screenshot of ifconfig after upgrade (86.43 KB, image/png)
2015-12-16 08:56 UTC, Huijuan Zhao
no flags Details
vlan.tar.gz (707.47 KB, application/x-gzip)
2015-12-24 05:32 UTC, Huijuan Zhao
no flags Details
network.tar.gz (78.94 KB, application/x-gzip)
2016-01-04 02:39 UTC, Huijuan Zhao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker RHV-45715 0 None None None 2022-04-16 09:02:51 UTC
oVirt gerrit 51231 0 master ABANDONED Bringing back the VLANS to the tui. 2016-01-17 09:01:58 UTC
oVirt gerrit 52450 0 master MERGED Unset BOOTIF if it's already set (TUI upgrades) 2016-04-04 09:21:58 UTC
oVirt gerrit 55701 0 ovirt-3.6 MERGED Unset BOOTIF if it's already set (TUI upgrades) 2016-04-05 15:55:33 UTC

Description Huijuan Zhao 2015-12-07 09:29:44 UTC
Created attachment 1103087 [details]
screenshot vlan upgrade fail

Description of problem:
The vlan network is not up after upgrade from RHEVH-7.2/RHEVH-7.1 publicly released version to RHEV-H 7.2 for 3.6 beta2

Version-Release number of selected component (if applicable):
RHEV-H 7.2-20151201.2.el7ev
ovirt-node-3.6.0-0.23.20151201git5eed7af.el7ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. TUI install RHEV-H 7.2-20151129.1.el7ev
2. Login RHEV-H 7.2-20151129.1.el7ev, setup vlan network via dhcp, can obtain dhcp vlan ip successful
3. Upgrade from RHEV-H 7.2-20151129.1.el7ev to RHEV-H 7.2-20151201.2.el7ev via TUI
4. Login RHEV-H 7.2-20151201.2.el7ev, check the vlan network

Actual results:
After step4, the vlan network is not up, it shows NIC(configured vlan via dhcp) unconfigured. Enter NIC configure page, it shows NIC Disabled.

Expected results:
After step4, the vlan network should be up and obtain dhcp vlan ip successful

Additional info:
Also encounter this issue on RHEV-H 7.2-20151112.1.el7ev

Comment 1 Huijuan Zhao 2015-12-07 09:32:06 UTC
Created attachment 1103088 [details]
vlan log

Comment 3 Fabian Deutsch 2015-12-07 11:10:07 UTC
Ido, can you tell anything from the logs?

Comment 4 Ido Barkan 2015-12-07 13:19:21 UTC
from supervdsm.log it looks like there were no networks persisted. This means that nothing is restored after the boot:

restore-net::DEBUG::2015-12-04 10:00:38,367::libvirtconnection::160::root::(get) trying to connect libvirt
restore-net::INFO::2015-12-04 10:00:38,398::vdsm-restore-net-config::385::root::(restore) starting network restoration.
restore-net::DEBUG::2015-12-04 10:00:38,399::vdsm-restore-net-config::183::root::(_remove_networks_in_running_config) Not cleaning running configuration since it is empty.
restore-net::INFO::2015-12-04 10:00:38,402::netconfpersistence::179::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/
restore-net::DEBUG::2015-12-04 10:00:38,402::netconfpersistence::187::root::(_clearDisk) No existent config to clear.
restore-net::INFO::2015-12-04 10:00:38,402::netconfpersistence::129::root::(save) Saved new config RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/
restore-net::DEBUG::2015-12-04 10:00:38,402::vdsm-restore-net-config::329::root::(_wait_for_for_all_devices_up) All devices are up.
restore-net::INFO::2015-12-04 10:00:38,409::vdsm-restore-net-config::396::root::(restore) restoration completed successfully.

Comment 5 Fabian Deutsch 2015-12-07 17:52:02 UTC

*** This bug has been marked as a duplicate of bug 1289026 ***

Comment 6 Fabian Deutsch 2015-12-11 20:53:09 UTC
From the logs I see that a dhcp address is obtained:
Dec  4 09:59:25 localhost dhclient[2194]: DHCPDISCOVER on p3p1.20 to 255.255.255.255 port 67 interval 15 (xid=0x7f635803)
Dec  4 09:59:40 localhost dhclient[2194]: DHCPDISCOVER on p3p1.20 to 255.255.255.255 port 67 interval 11 (xid=0x7f635803)
Dec  4 09:59:51 localhost dhclient[2194]: DHCPDISCOVER on p3p1.20 to 255.255.255.255 port 67 interval 8 (xid=0x7f635803)
Dec  4 09:59:52 localhost dhclient[2194]: DHCPREQUEST on p3p1.20 to 255.255.255.255 port 67 (xid=0x7f635803)
Dec  4 09:59:52 localhost dhclient[2194]: DHCPOFFER from 192.168.20.2
Dec  4 09:59:52 localhost dhclient[2194]: DHCPACK from 192.168.20.2 (xid=0x7f635803)
Dec  4 09:59:54 localhost NET[2279]: /usr/sbin/dhclient-script : updated /etc/resolv.conf
Dec  4 09:59:54 localhost dhclient[2194]: bound to 192.168.20.129 -- renewal in 8463 seconds.
Dec  4 09:59:54 localhost network: Determining IP information for p3p1.20... done.
Dec  4 09:59:54 localhost NET[2330]: /etc/sysconfig/network-scripts/ifup-post : updated /etc/resolv.conf
Dec  4 09:59:55 localhost network: [  OK  ] 

I suppose the problem is thus just a visual one in the TUI, and thus it's not a dupe of bug 1289026.

Comment 7 Fabian Deutsch 2015-12-15 13:46:34 UTC
The question is if there is a correct networking, and if there is, why the TUI does not detect that it's there.

Comment 8 Fabian Deutsch 2015-12-15 13:47:31 UTC
This might be related to bug 1280241

Comment 9 Ryan Barry 2015-12-15 17:49:03 UTC
I'm not able to reproduce this.

1. TUI install RHEV-H 7.2-20151129.1.el7ev
2. Log into RHEV-H 7.2-20151129.1.el7ev, setup VLAN network via DHCP, DHCP works
3. Upgrade from RHEV-H 7.2-20151129.1.el7ev to RHEV-H 7.2-20151201.2.el7ev via TUI
4. Login RHEV-H 7.2-20151201.2.el7ev, check the status page and network page
5. Both say "Configured", networking works.

Were there any other steps taken?

....
2015-12-04 09:53:24,211       INFO Saving network stuff
2015-12-04 09:53:24,245       INFO Effective changes {'nics': 'p3p1'}
.. upgrade ..
2015-12-04 10:01:09,522       INFO Saving network stuff
2015-12-04 10:01:09,570       INFO Effective changes {'nics': 'em1'}
2015-12-04 10:01:10,601      ERROR An error appeared in the UI: UnknownNicError("Unknown network interface: 'em1'",)
....
2015-12-04 10:02:22,238       INFO Saving network stuff
2015-12-04 10:02:22,288       INFO Effective changes {'nics': 'p3p1'}

What happened in the middle here? 

Can you please provide a test system?

Comment 10 Ryan Barry 2015-12-15 17:56:43 UTC
Also, the contents of /etc/default/ovirt both before and after the upgrade would be helpful

Comment 11 Huijuan Zhao 2015-12-16 08:51:14 UTC
Ryan, test steps is all right in comment 9, but maybe you should reproduce this issue on the machine with at least two NICs.

As I checked the contents of /etc/default/ovirt both before and after the upgrade,
the OVIRT_BOOTIF is different before and after the upgrade, maybe this is the issue.

1. Before the upgrade:
# cat /etc/default/ovirt 
OVIRT_BOOTIF="p3p1"
……

[root@localhost admin]# ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 2254  bytes 365657 (357.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2254  bytes 365657 (357.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p3p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::21b:21ff:fe27:470b  prefixlen 64  scopeid 0x20<link>
        ether 00:1b:21:27:47:0b  txqueuelen 1000  (Ethernet)
        RX packets 866  bytes 70296 (68.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 307  bytes 55233 (53.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p3p1.20: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.20.129  netmask 255.255.255.0  broadcast 192.168.20.255
        inet6 2001:db8:1:0:21b:21ff:fe27:470b  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::21b:21ff:fe27:470b  prefixlen 64  scopeid 0x20<link>
        ether 00:1b:21:27:47:0b  txqueuelen 0  (Ethernet)
        RX packets 269  bytes 30710 (29.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 290  bytes 43481 (42.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

2. After the upgrade:
# cat /etc/default/ovirt 
OVIRT_BOOTIF="em1"
……

[root@localhost admin]# ifconfig
lo: flags=73<UP,LOOPBACK,RUNNING>  mtu 65536
        inet 127.0.0.1  netmask 255.0.0.0
        inet6 ::1  prefixlen 128  scopeid 0x10<host>
        loop  txqueuelen 0  (Local Loopback)
        RX packets 2254  bytes 365657 (357.0 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 2254  bytes 365657 (357.0 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p3p1: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet6 fe80::21b:21ff:fe27:470b  prefixlen 64  scopeid 0x20<link>
        ether 00:1b:21:27:47:0b  txqueuelen 1000  (Ethernet)
        RX packets 866  bytes 70296 (68.6 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 307  bytes 55233 (53.9 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

p3p1.20: flags=4163<UP,BROADCAST,RUNNING,MULTICAST>  mtu 1500
        inet 192.168.20.129  netmask 255.255.255.0  broadcast 192.168.20.255
        inet6 2001:db8:1:0:21b:21ff:fe27:470b  prefixlen 64  scopeid 0x0<global>
        inet6 fe80::21b:21ff:fe27:470b  prefixlen 64  scopeid 0x20<link>
        ether 00:1b:21:27:47:0b  txqueuelen 0  (Ethernet)
        RX packets 269  bytes 30710 (29.9 KiB)
        RX errors 0  dropped 0  overruns 0  frame 0
        TX packets 290  bytes 43481 (42.4 KiB)
        TX errors 0  dropped 0 overruns 0  carrier 0  collisions 0

Additional info:
please refer to attachment for detailed information:
1. the contents of /etc/default/ovirt both before and after the upgrade (vlan.tar.gz)
2. screenshot of ifconfig after upgrade

Comment 12 Huijuan Zhao 2015-12-16 08:55:07 UTC
Created attachment 1106332 [details]
the contents of /etc/default/ovirt both before and after the upgrade

Comment 13 Huijuan Zhao 2015-12-16 08:56:48 UTC
Created attachment 1106333 [details]
screenshot of ifconfig after upgrade

Comment 14 Ryan Barry 2015-12-17 04:50:21 UTC
I'll add another NIC.

You upgraded via TUI or PXE (or TUI over PXE)?

I would guess that you're correct, and the TUI is wrong because OVIRT_BOOTIF changed. My question now is why it changed. This could happen over PXE, but I'll reproduce via TUI over CDROM if that was the boot method.

Comment 15 Huijuan Zhao 2015-12-17 06:16:01 UTC
Ryan, I upgraded via TUI over PXE

Comment 16 Ryan Barry 2015-12-17 18:00:14 UTC
(In reply to Huijuan Zhao from comment #12)
> Created attachment 1106332 [details]
> the contents of /etc/default/ovirt both before and after the upgrade

There are two problems --

First is the blank MANAGED_IFNAMES, which is a symptom of bz#1280241, and it will make the networking appear to be unconfigured. Not registering this system to RHEV-M will allow you to see...

Second is that PXE upgrading from a different interface than the one which was configured from the TUI will set OVIRT_BOOTIF to a different value, and the TUI will show the wrong interface as configured after upgrades. 

I'd like to track this bug here. But I can't reproduce it, and it looks like bz#1053425, which was fixed two years ago.

I'd also like to lower the severity, because it's cosmetic only, and it would be unexpected for users to set new configuration values in the TUI after upgrades (or even see the TUI, since common upgrade flows are over RHEV-M or PXE). I suspect that pxebooting from a non-management interface is also rare.

I tried the following:

1. TUI install RHEV-H 7.2-20151129.1.el7ev
2. Log into RHEV-H 7.2-20151129.1.el7ev, setup VLAN network via DHCP on ens3, DHCP works
3. Upgrade from RHEV-H 7.2-20151129.1.el7ev to RHEV-H 7.2-20151201.2.el7ev via PXE on ens8
4. Login RHEV-H 7.2-20151201.2.el7ev, check the status page and network page
5. Both say "Configured", networking works, OVIRT_BOOTIF is still ens3 after the upgrade.

Can you please provide a test environment?

Comment 18 Ryan Barry 2015-12-18 03:47:12 UTC
That's perfect. Are both images available over cobbler? VLAN 20?

Comment 22 Huijuan Zhao 2015-12-24 05:32:57 UTC
Created attachment 1109136 [details]
vlan.tar.gz

Comment 23 Huijuan Zhao 2015-12-24 05:58:31 UTC
(In reply to Huijuan Zhao from comment #0)
> Created attachment 1103087 [details]
> screenshot vlan upgrade fail
> 
> Description of problem:
> The vlan network is not up after upgrade from RHEVH-7.2/RHEVH-7.1 publicly
> released version to RHEV-H 7.2 for 3.6 beta2
> 
> Version-Release number of selected component (if applicable):
> RHEV-H 7.2-20151201.2.el7ev
> ovirt-node-3.6.0-0.23.20151201git5eed7af.el7ev.noarch
> 
> How reproducible:
> 100%
> 
> Steps to Reproduce:
> 1. TUI install RHEV-H 7.2-20151129.1.el7ev
> 2. Login RHEV-H 7.2-20151129.1.el7ev, setup vlan network via dhcp, can
> obtain dhcp vlan ip successful
> 3. Upgrade from RHEV-H 7.2-20151129.1.el7ev to RHEV-H 7.2-20151201.2.el7ev
> via TUI
> 4. Login RHEV-H 7.2-20151201.2.el7ev, check the vlan network
> 
> Actual results:
> After step4, the vlan network is not up, it shows NIC(configured vlan via
> dhcp) unconfigured. Enter NIC configure page, it shows NIC Disabled.
> 
> Expected results:
> After step4, the vlan network should be up and obtain dhcp vlan ip successful
> 
> Additional info:
> Also encounter this issue on RHEV-H 7.2-20151112.1.el7ev

Additional info for the reproduce steps:
In the above "Steps to Reproduce":
2. setup vlan network  (NIC is: p3p1)
3. Upgrade via PXE + TUI (default NIC in cmdline is: em1)

Comment 26 Anatoly Litovsky 2016-01-03 11:04:11 UTC
Please take a look at the system you provided.
I made a change there to fix it.
Is this what the desired functionality you want ?

Comment 27 Huijuan Zhao 2016-01-04 02:39:29 UTC
Created attachment 1111278 [details]
network.tar.gz

Comment 28 Huijuan Zhao 2016-01-04 02:51:31 UTC
(In reply to Anatoly Litovsky from comment #26)
> Please take a look at the system you provided.
> I made a change there to fix it.
> Is this what the desired functionality you want ?

No.

The current results:
In Status page, it shows "Networking:  Connected     p3p1.20",
In Network page, NIC p3p1 shows Unconfigured, but p3p1.20 shows Configured(enter it, actually no configuration)

Expected results:
In Status page, it shows "Networking:  Connected     p3p1",
In Network page, NIC p3p1 should show Configured(Bootprotocol DHCP, VLAN ID:20), there should be no p3p1.20.
Please refer to attachment "network.tar.gz" for detailed info, there are two screenshot including Status page and Network page.

Comment 29 Huijuan Zhao 2016-01-04 06:56:31 UTC
(In reply to Huijuan Zhao from comment #25)
> Hi, Tolik and Ying, I reproduced this bug on latest build RHEV-H
> 7.2-20151129.0.el7ev, the ENV:
> 192.168.20.129
> admin/redhat
> 

Update: I reproduced this bug on latest build RHEV-H 7.2-20151229.0.el7ev

Comment 30 Fabian Deutsch 2016-01-06 15:00:09 UTC
Considering comment 11, Huijuan, is the bug fixed if you 
1. after upgrade
2. change the BOOTIF value to p3p1 again
3. and re-login into the tui?

It could be as ryan says, that the BOOTIF is just changed, because the upgrade is performed using PXE.

Also: Does this bug also appear if you perform the upgrade using USB?

Comment 31 Huijuan Zhao 2016-01-07 10:01:48 UTC
(In reply to Fabian Deutsch from comment #30)
> Considering comment 11, Huijuan, is the bug fixed if you 
> 1. after upgrade
> 2. change the BOOTIF value to p3p1 again
> 3. and re-login into the tui?
> 
> It could be as ryan says, that the BOOTIF is just changed, because the
> upgrade is performed using PXE.
> 

Fabian, the bug is fixed according to the above steps.

> Also: Does this bug also appear if you perform the upgrade using USB?

There is not this bug when I perform the upgrade using USB.

Comment 32 Fabian Deutsch 2016-01-07 10:56:16 UTC
Thanks Huijuan.

This supports the assumption that the problem is that the BOOTIF is beeing updated during the PXE upgrade flow.

The solution is then to prevent this.

Comment 37 Fabian Deutsch 2016-01-15 17:43:02 UTC
Okay, I could reproduce it:

1. Install inside a VM (with two nics, i.e. ens3 + ens11) using CDROM
2. Configure ens3 with static IP and a vlan
3. Boot from CDROM media again, append to the commandline: BOOTIF=ens11
4. Perform the TUI upgrade
5. After installation: Reboot, boot from disk and login

Findings:
After 2. The network appears as configured in the TUI, and the vlan is correctly configured on the system, BOOTIF==ens3
After 3. The TUI upgrade will be started
After 5. The TUI shows the network as unconfiguerd, BOOTIF==ens11


The root cause of this bug has two conditions that need to be met:
1. Boot from PXE
2. Perform upgrade through TUI

In that flow, the BOOTIF will be overwritten during the TUI upgrade.

Huijuan,
1. please provide all kernel arguments you use for the TUI PXE upgrade.
2. Can you reproduce the issue according to the steps above?

Possible solutions:
1. Do automatic upgrade by appending "upgrade=1"
2. Fix the TUI flow to unset BOOTIF in the uprgade flow.

Comment 38 Huijuan Zhao 2016-01-18 08:27:29 UTC
(In reply to Fabian Deutsch from comment #37)
> Okay, I could reproduce it:
> 
> 1. Install inside a VM (with two nics, i.e. ens3 + ens11) using CDROM
> 2. Configure ens3 with static IP and a vlan
> 3. Boot from CDROM media again, append to the commandline: BOOTIF=ens11
> 4. Perform the TUI upgrade
> 5. After installation: Reboot, boot from disk and login
> 
> Findings:
> After 2. The network appears as configured in the TUI, and the vlan is
> correctly configured on the system, BOOTIF==ens3
> After 3. The TUI upgrade will be started
> After 5. The TUI shows the network as unconfiguerd, BOOTIF==ens11
> 
> 
> The root cause of this bug has two conditions that need to be met:
> 1. Boot from PXE
> 2. Perform upgrade through TUI
> 
> In that flow, the BOOTIF will be overwritten during the TUI upgrade.
> 
> Huijuan,
> 1. please provide all kernel arguments you use for the TUI PXE upgrade.
> 2. Can you reproduce the issue according to the steps above?
> 
1. All kernel arguments for the TUI PXE upgrade:
/images/rhevh-vdsm7-7.2-20151229.0_36/vmlinuz0 initrd=/images/rhevh-vdsm7-7.2-20151229.0_36/initrd0.img ksdevice=bootif rootflags=loop rootflags=ro rd.dm=0 rd_NO_MULTIPATH rd.md=0 crashkernel=256M rootfstype=auto lang= max_loop=256 rhgb quiet elevator=deadline rd.live.check rd.luks=0 install ro root=live:/rhev-hypervisor7-7.2-20151229.0.iso rd.live.image BOOTIF=01-d4-be-d9-95-61-ca

2. I can reproduce the issue according to the steps above.

Comment 45 Mike McCune 2016-03-28 22:35:49 UTC
This bug was accidentally moved from POST to MODIFIED via an error in automation, please see mmccune with any questions

Comment 48 Fabian Deutsch 2016-06-09 12:16:20 UTC
This issue will not be fixed with an eventually attached in RHEV 4.0. Instead this bug is getting fixed by the new functionality in Cockpit.

Comment 49 Huijuan Zhao 2016-07-05 11:17:03 UTC
Encounter the bug on rhev-hypervisor6-6.8-20160630.2.iso, and added this bug to: Bug 1352452 - [Tracker] Track RHEV-H 6.8 bugs


Note You need to log in before you can comment on or make changes to this bug.