Bug 725392 - network topology changed after upgrade
Summary: network topology changed after upgrade
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: ovirt-node
Version: 6.2
Hardware: Unspecified
OS: Unspecified
urgent
medium
Target Milestone: rc
: ---
Assignee: Alan Pevec
QA Contact: Virtualization Bugs
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-07-25 11:14 UTC by Mohua Li
Modified: 2016-04-26 15:14 UTC (History)
14 users (show)

Fixed In Version: ovirt-node-2.0.2-0.1.git5dce5f9.el6
Doc Type: Bug Fix
Doc Text:
Upgrade from RHEV Hypervisor 6.2-0.12 (RHEV 3.0 Beta 1) is not supported. You must reinstall the hypervisor using installation media for RHEV Hypervisor 6.2-0.17.2 or higher version.
Clone Of:
Environment:
Last Closed: 2011-12-06 19:20:45 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2011:1783 0 normal SHIPPED_LIVE rhev-hypervisor6 bug fix and enhancement update 2011-12-06 15:10:54 UTC

Description Mohua Li 2011-07-25 11:14:37 UTC
Description of problem:

rhev-hypervisor 6.2-0.6 registered to a rhevm 2.2 or any ip(invalid rhevm), before upgrade,


[root@amd-1352-8-5 ~]# ifconfig
eth0      Link encap:Ethernet  HWaddr 00:22:19:2D:4B:A3
          inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1897 errors:0 dropped:0 overruns:0 frame:0
          TX packets:298 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:376021 (367.2 KiB)  TX bytes:23234 (22.6 KiB)
          Interrupt:17

lo        Link encap:Local Loopback
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:220 errors:0 dropped:0 overruns:0 frame:0
          TX packets:220 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:31786 (31.0 KiB)  TX bytes:31786 (31.0 KiB)

rhevm     Link encap:Ethernet  HWaddr 00:22:19:2D:4B:A3
          inet addr:10.66.72.105  Bcast:10.66.73.255  Mask:255.255.254.0
          inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1901 errors:0 dropped:0 overruns:0 frame:0
          TX packets:304 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:345036 (336.9 KiB)  TX bytes:21692 (21.1 KiB)

[root@amd-1352-8-5 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
rhevm           8000.0022192d4ba3       no              eth0
[root@amd-1352-8-5 ~]# service vdsmd restart
Shutting down vdsm daemon:
vdsm watchdog stop                                         [  OK  ]
vdsm: not running                                          [FAILED]
vdsm: Missing certificates, vdsm not registered            [FAILED]
vdsm: failed to reconfigure libvirt                        [FAILED]



after upgrade,

[root@amd-1352-8-5 ~]# ifconfig
breth0    Link encap:Ethernet  HWaddr 00:22:19:2D:4B:A3  
          inet addr:10.66.72.105  Bcast:10.66.73.255  Mask:255.255.254.0
          inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11398 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1325 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:2202663 (2.1 MiB)  TX bytes:91218 (89.0 KiB)

eth0      Link encap:Ethernet  HWaddr 00:22:19:2D:4B:A3  
          inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:11540 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1326 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000 
          RX bytes:2424434 (2.3 MiB)  TX bytes:102896 (100.4 KiB)
          Interrupt:17 

lo        Link encap:Local Loopback  
          inet addr:127.0.0.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:134 errors:0 dropped:0 overruns:0 frame:0
          TX packets:134 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:23664 (23.1 KiB)  TX bytes:23664 (23.1 KiB)

rhevm     Link encap:Ethernet  HWaddr E2:6A:F9:D4:9B:3E  
          inet6 addr: fe80::e06a:f9ff:fed4:9b3e/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:14 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0 
          RX bytes:0 (0.0 b)  TX bytes:2700 (2.6 KiB)

[root@amd-1352-8-5 ~]# brctl show
bridge name     bridge id               STP enabled     interfaces
breth0          8000.0022192d4ba3       no              eth0
rhevm           8000.000000000000       no
[root@amd-1352-8-5 ~]# ls /etc/sysconfig/network-scripts/
ifcfg-breth0  ifcfg-lo     ifdown-bnep  ifdown-ipv6  ifdown-ppp     ifdown-tunnel  ifup-bnep  ifup-ipv6  ifup-plusb  ifup-routes  ifup-wireless     network-functions
ifcfg-eth0    ifcfg-rhevm  ifdown-eth   ifdown-isdn  ifdown-routes  ifup           ifup-eth   ifup-isdn  ifup-post   ifup-sit     init.ipv6-global  network-functions-ipv6
ifcfg-eth1    ifdown       ifdown-ippp  ifdown-post  ifdown-sit     ifup-aliases   ifup-ippp  ifup-plip  ifup-ppp    ifup-tunnel  net.hotplu
[root@amd-1352-8-5 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0
DEVICE=eth0
BRIDGE=breth0
ONBOOT=yes
HWADDR=00:22:19:2d:4b:a3


Version-Release number of selected component (if applicable):
rhev-hypervisor 6.2-0.6

How reproducible:
always with a invalid rhevm

Steps to Reproduce:
1.registered to a invalid rhevm
2.upgrade with "linux upgrade"
3.
  
Actual results:
rhev-hypervisor didn't handle this exception well

Expected results:
need to better handle this, keep the original network topology, 

Additional info:
didn't happen to a rhev-hypervisor that status display as "up" on rhevm side

Comment 2 Mike Burns 2011-07-27 18:53:12 UTC
Dan,

(In reply to comment #0)
> Description of problem:
> 
> rhev-hypervisor 6.2-0.6 registered to a rhevm 2.2 or any ip(invalid rhevm),
> before upgrade,
> 
> 
> [root@amd-1352-8-5 ~]# ifconfig
> eth0      Link encap:Ethernet  HWaddr 00:22:19:2D:4B:A3
>           inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1897 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:298 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:1000
>           RX bytes:376021 (367.2 KiB)  TX bytes:23234 (22.6 KiB)
>           Interrupt:17
> 
> lo        Link encap:Local Loopback
>           inet addr:127.0.0.1  Mask:255.0.0.0
>           inet6 addr: ::1/128 Scope:Host
>           UP LOOPBACK RUNNING  MTU:16436  Metric:1
>           RX packets:220 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:220 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:31786 (31.0 KiB)  TX bytes:31786 (31.0 KiB)
> 
> rhevm     Link encap:Ethernet  HWaddr 00:22:19:2D:4B:A3
>           inet addr:10.66.72.105  Bcast:10.66.73.255  Mask:255.255.254.0
>           inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link
>           UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
>           RX packets:1901 errors:0 dropped:0 overruns:0 frame:0
>           TX packets:304 errors:0 dropped:0 overruns:0 carrier:0
>           collisions:0 txqueuelen:0
>           RX bytes:345036 (336.9 KiB)  TX bytes:21692 (21.1 KiB)

Just from this, it sounds like they registered to an invalid IP address for rhevm.  In this case, shouldn't vdsm-reg roll back the change of breth0 to rhevm?

Comment 3 Mike Burns 2011-07-27 20:39:17 UTC
Ok, just to put my notes down for what I've determined so far.

1.  upgrade runs primarily during the ovirt-firstboot service
2.  vdsm-reg runs prior to ovirt-firstboot during startup
3.  ifcfg-rhevm is *not* persisted at any point during normal operation or registration process
4.  during boot, ifcfg-breth0 is started
5.  vdsm-reg renames ifcfg-breth0 during it's startup to ifcfg-rhevm
6.  during upgrade, we run ovirt_store_firstboot_config which persists (among other things) /etc/sysconfig/network-scripts/ifcfg-*
7.  vdsm-reg aborts rename if ifcfg-rhevm already exists

so during upgrade,

breth0 comes up
vdsm-reg renames to rhevm
upgrade runs
after upgrade, we persist ifcfg-rhevm
next boot, vdsm-reg doesn't rename because rhevm device already exists

This results in both breth0 and rhevm bridges existing.

Comment 4 Mike Burns 2011-07-27 20:46:28 UTC
possible fixes:

1.  don't run ovirt_store_firstboot_config
   Means that new files that need to be persisted would be missed

2.  manually unpersist ifcfg-rhevm
      Don't like this because it means that we're handling something that's outside ovirt-node in ovirt-node when we've tried to keep vdsm/rhevm configuration stuff in vdsm

3.  enhance vdsm-reg to check for persisted file, and handle it by removing persisted ifcfg-rhevm, stopping the rhevm bridge, then proceeding with renaming

Comment 5 Mike Burns 2011-07-27 20:49:12 UTC
4.  persist ifcfg-rhevm at registration time and unpersist ifcfg-breth0

Comment 6 Mike Burns 2011-07-27 21:10:40 UTC
Workaround for the bug

# . /usr/libexec/ovirt-functions
# ovirt_safe_delete_config /etc/sysconfig/network-scripts/ifcfg-rhevm

Comment 9 Ayal Baron 2011-08-02 22:43:04 UTC
(In reply to comment #6)
> Workaround for the bug
> 
> # . /usr/libexec/ovirt-functions
> # ovirt_safe_delete_config /etc/sysconfig/network-scripts/ifcfg-rhevm

I don't understand why you're persisting something that wasn't required to be persisted to begin with (during upgrade).

Comment 10 Mike Burns 2011-08-02 23:16:57 UTC
We maintain a list of files that ovirt-node needs to have persisted.  On upgrade, we persist the list of files again in case new files were added to list.  The list includes /etc/sysconfig/network-scripts/ifcfg-* so that all network scripts get persisted.  

In the upgrade case, vdms-reg has already run and done the rename to ifcfg-rhevm.  This is what causes the file to get persisted.  

We're not explicitly persisting it, it just gets caught in our upgrade logic.

Is there a reason that you don't want to persist it when you configure it?

Comment 11 Alan Pevec 2011-08-03 08:09:14 UTC
(In reply to comment #9)
> I don't understand why you're persisting something that wasn't required to be
> persisted to begin with (during upgrade).

ifcfg-rhevm should be persisted as are all ifcfg-* files, otherwise you get different network configuration when "network" initscript starts.
I don't see a point in vdsm-reg renaming original breth0 on each boot, this is just confusing, it should rename completely it once, at registration time.

Comment 12 Dan Kenigsberg 2011-08-03 08:39:51 UTC
Why could there be unpersisted ifcfg-* files? vdsm has them after net config has changed, and not declared as "safe". We expect them do revert to the persisted files if we have to fence the node due to misconfiguration. I think that this should apply to whomever keeps an unpersisted file - he probably wanted that. Upgrade should not decide for him that the files are good for posterity.

Comment 13 Alan Pevec 2011-08-03 09:17:41 UTC
Doing upgrade in the mid of changing network configuration is not a use-case I'd support.
Shouldn't user better confirm all network configuration changes before doing such change as an upgrade?

Also I still didn't see a case why would ifcfg-rhevm be left unpersisted?

Comment 14 Dan Kenigsberg 2011-08-03 11:43:05 UTC
The specific bug at had could be solved by persisting ifcfg-rhevm after creating it. I am not sure about the scope of work, but Yes.

Still, I do not get why upgrade should interfere with persistency. It is not something I'd expect the platform to do on its own volition, and I did not understand where it can be helpful.

Comment 15 Alan Pevec 2011-08-03 12:00:02 UTC
(In reply to comment #14)
> Still, I do not get why upgrade should interfere with persistency. It is not
> something I'd expect the platform to do on its own volition, and I did not
> understand where it can be helpful.

ok, after some digging in git history, I see ovirt_store_firstboot_config in o-c-boot was added to make sure all files are really persisted, previously it was 
done in ovirt-firstboot only.

http://git.fedorahosted.org/git/?p=ovirt/node.git;a=commitdiff;h=28992b37c819903d1c3a6ab43f437a4a834b0237

Perry, that's your commit, do you still remember what was it about? :)

Comment 16 Perry Myers 2011-08-03 12:16:33 UTC
I looked at the commit, but sorry, I don't have a good recollection... 2.5 years ago is a long time :)

Comment 17 Yotam Oron 2011-08-04 13:14:09 UTC
The solution to be implemented: 
Only persist interface files the correspond to a real physical interfaces.
VDSM is renaming ifcfg-breth0 to ifcfg-rhevm, so it will not be persisted.
Assigning back to ovirt-node.

Comment 18 Alan Pevec 2011-08-08 21:46:31 UTC
(In reply to comment #17)
> Only persist interface files the correspond to a real physical interfaces.

Small correction: we also need to persist ifcfg-breth* and ifcfg-eth*.* vlan interfaces.

Comment 24 Ying Cui 2011-09-22 10:53:25 UTC
I have to re-assign this bug. It reproduce from rhevh 6.2-0.12.1 upgrade to 6.2-0.17.2 build.

It can NOT reproduce from rhevh 12.1 upgrade to rhevh 6.2-0.14 build.

Comment 25 Alan Pevec 2011-09-22 11:07:54 UTC
(In reply to comment #24)
> I have to re-assign this bug. It reproduce from rhevh 6.2-0.12.1 upgrade to
> 6.2-0.17.2 build.
> 
> It can NOT reproduce from rhevh 12.1 upgrade to rhevh 6.2-0.14 build.

An ymore details?
There were no changes since 0.14 in ovirt-node for that area.

Comment 31 Alan Pevec 2011-10-07 15:53:29 UTC
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
Upgrade from RHEV-H 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
You must reinstall the hypervisor using RHEV-H 6.2-0.17.2 installation media.

Comment 32 Alan Pevec 2011-10-07 15:54:07 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,2 +1,2 @@
-Upgrade from RHEV-H 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
+Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
 You must reinstall the hypervisor using RHEV-H 6.2-0.17.2 installation media.

Comment 33 Alan Pevec 2011-10-07 15:54:38 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,2 +1,2 @@
 Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
-You must reinstall the hypervisor using RHEV-H 6.2-0.17.2 installation media.+You must reinstall the hypervisor using RHEV Hypervisor 6.2-0.17.2 installation media.

Comment 34 Alan Pevec 2011-10-07 18:38:38 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,2 +1,2 @@
 Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
-You must reinstall the hypervisor using RHEV Hypervisor 6.2-0.17.2 installation media.+You must reinstall the hypervisor using installation media for RHEV Hypervisor 6.2-0.17.2 or higher version.

Comment 35 Alan Pevec 2011-10-07 18:47:20 UTC
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,2 +1,2 @@
-Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
+Upgrade from RHEV Hypervisor 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
 You must reinstall the hypervisor using installation media for RHEV Hypervisor 6.2-0.17.2 or higher version.

Comment 36 Ying Cui 2011-10-14 08:43:44 UTC
As comment #35, change the bug status to Verified.

Comment 37 errata-xmlrpc 2011-12-06 19:20:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2011-1783.html


Note You need to log in before you can comment on or make changes to this bug.