| Summary: | network topology changed after upgrade | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 6 | Reporter: | Mohua Li <moli> |
| Component: | ovirt-node | Assignee: | Alan Pevec <apevec> |
| Status: | CLOSED ERRATA | QA Contact: | Virtualization Bugs <virt-bugs> |
| Severity: | medium | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 6.2 | CC: | abaron, acathrow, apevec, bazulay, cpelland, cshao, danken, gouyang, iheim, leiwang, mburns, ovirt-maint, ycui, ykaul |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | ovirt-node-2.0.2-0.1.git5dce5f9.el6 | Doc Type: | Bug Fix |
| Doc Text: |
Upgrade from RHEV Hypervisor 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
You must reinstall the hypervisor using installation media for RHEV Hypervisor 6.2-0.17.2 or higher version.
|
Story Points: | --- |
| Clone Of: | Environment: | ||
| Last Closed: | 2011-12-06 19:20:45 UTC | Type: | --- |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
Dan, (In reply to comment #0) > Description of problem: > > rhev-hypervisor 6.2-0.6 registered to a rhevm 2.2 or any ip(invalid rhevm), > before upgrade, > > > [root@amd-1352-8-5 ~]# ifconfig > eth0 Link encap:Ethernet HWaddr 00:22:19:2D:4B:A3 > inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1897 errors:0 dropped:0 overruns:0 frame:0 > TX packets:298 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:1000 > RX bytes:376021 (367.2 KiB) TX bytes:23234 (22.6 KiB) > Interrupt:17 > > lo Link encap:Local Loopback > inet addr:127.0.0.1 Mask:255.0.0.0 > inet6 addr: ::1/128 Scope:Host > UP LOOPBACK RUNNING MTU:16436 Metric:1 > RX packets:220 errors:0 dropped:0 overruns:0 frame:0 > TX packets:220 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:31786 (31.0 KiB) TX bytes:31786 (31.0 KiB) > > rhevm Link encap:Ethernet HWaddr 00:22:19:2D:4B:A3 > inet addr:10.66.72.105 Bcast:10.66.73.255 Mask:255.255.254.0 > inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link > UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 > RX packets:1901 errors:0 dropped:0 overruns:0 frame:0 > TX packets:304 errors:0 dropped:0 overruns:0 carrier:0 > collisions:0 txqueuelen:0 > RX bytes:345036 (336.9 KiB) TX bytes:21692 (21.1 KiB) Just from this, it sounds like they registered to an invalid IP address for rhevm. In this case, shouldn't vdsm-reg roll back the change of breth0 to rhevm? Ok, just to put my notes down for what I've determined so far. 1. upgrade runs primarily during the ovirt-firstboot service 2. vdsm-reg runs prior to ovirt-firstboot during startup 3. ifcfg-rhevm is *not* persisted at any point during normal operation or registration process 4. during boot, ifcfg-breth0 is started 5. vdsm-reg renames ifcfg-breth0 during it's startup to ifcfg-rhevm 6. during upgrade, we run ovirt_store_firstboot_config which persists (among other things) /etc/sysconfig/network-scripts/ifcfg-* 7. vdsm-reg aborts rename if ifcfg-rhevm already exists so during upgrade, breth0 comes up vdsm-reg renames to rhevm upgrade runs after upgrade, we persist ifcfg-rhevm next boot, vdsm-reg doesn't rename because rhevm device already exists This results in both breth0 and rhevm bridges existing. possible fixes:
1. don't run ovirt_store_firstboot_config
Means that new files that need to be persisted would be missed
2. manually unpersist ifcfg-rhevm
Don't like this because it means that we're handling something that's outside ovirt-node in ovirt-node when we've tried to keep vdsm/rhevm configuration stuff in vdsm
3. enhance vdsm-reg to check for persisted file, and handle it by removing persisted ifcfg-rhevm, stopping the rhevm bridge, then proceeding with renaming
4. persist ifcfg-rhevm at registration time and unpersist ifcfg-breth0 Workaround for the bug # . /usr/libexec/ovirt-functions # ovirt_safe_delete_config /etc/sysconfig/network-scripts/ifcfg-rhevm (In reply to comment #6) > Workaround for the bug > > # . /usr/libexec/ovirt-functions > # ovirt_safe_delete_config /etc/sysconfig/network-scripts/ifcfg-rhevm I don't understand why you're persisting something that wasn't required to be persisted to begin with (during upgrade). We maintain a list of files that ovirt-node needs to have persisted. On upgrade, we persist the list of files again in case new files were added to list. The list includes /etc/sysconfig/network-scripts/ifcfg-* so that all network scripts get persisted. In the upgrade case, vdms-reg has already run and done the rename to ifcfg-rhevm. This is what causes the file to get persisted. We're not explicitly persisting it, it just gets caught in our upgrade logic. Is there a reason that you don't want to persist it when you configure it? (In reply to comment #9) > I don't understand why you're persisting something that wasn't required to be > persisted to begin with (during upgrade). ifcfg-rhevm should be persisted as are all ifcfg-* files, otherwise you get different network configuration when "network" initscript starts. I don't see a point in vdsm-reg renaming original breth0 on each boot, this is just confusing, it should rename completely it once, at registration time. Why could there be unpersisted ifcfg-* files? vdsm has them after net config has changed, and not declared as "safe". We expect them do revert to the persisted files if we have to fence the node due to misconfiguration. I think that this should apply to whomever keeps an unpersisted file - he probably wanted that. Upgrade should not decide for him that the files are good for posterity. Doing upgrade in the mid of changing network configuration is not a use-case I'd support. Shouldn't user better confirm all network configuration changes before doing such change as an upgrade? Also I still didn't see a case why would ifcfg-rhevm be left unpersisted? The specific bug at had could be solved by persisting ifcfg-rhevm after creating it. I am not sure about the scope of work, but Yes. Still, I do not get why upgrade should interfere with persistency. It is not something I'd expect the platform to do on its own volition, and I did not understand where it can be helpful. (In reply to comment #14) > Still, I do not get why upgrade should interfere with persistency. It is not > something I'd expect the platform to do on its own volition, and I did not > understand where it can be helpful. ok, after some digging in git history, I see ovirt_store_firstboot_config in o-c-boot was added to make sure all files are really persisted, previously it was done in ovirt-firstboot only. http://git.fedorahosted.org/git/?p=ovirt/node.git;a=commitdiff;h=28992b37c819903d1c3a6ab43f437a4a834b0237 Perry, that's your commit, do you still remember what was it about? :) I looked at the commit, but sorry, I don't have a good recollection... 2.5 years ago is a long time :) The solution to be implemented: Only persist interface files the correspond to a real physical interfaces. VDSM is renaming ifcfg-breth0 to ifcfg-rhevm, so it will not be persisted. Assigning back to ovirt-node. (In reply to comment #17) > Only persist interface files the correspond to a real physical interfaces. Small correction: we also need to persist ifcfg-breth* and ifcfg-eth*.* vlan interfaces. I have to re-assign this bug. It reproduce from rhevh 6.2-0.12.1 upgrade to 6.2-0.17.2 build. It can NOT reproduce from rhevh 12.1 upgrade to rhevh 6.2-0.14 build. (In reply to comment #24) > I have to re-assign this bug. It reproduce from rhevh 6.2-0.12.1 upgrade to > 6.2-0.17.2 build. > > It can NOT reproduce from rhevh 12.1 upgrade to rhevh 6.2-0.14 build. An ymore details? There were no changes since 0.14 in ovirt-node for that area.
Technical note added. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
New Contents:
Upgrade from RHEV-H 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
You must reinstall the hypervisor using RHEV-H 6.2-0.17.2 installation media.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1,2 +1,2 @@
-Upgrade from RHEV-H 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
+Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
You must reinstall the hypervisor using RHEV-H 6.2-0.17.2 installation media.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1,2 +1,2 @@
Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
-You must reinstall the hypervisor using RHEV-H 6.2-0.17.2 installation media.+You must reinstall the hypervisor using RHEV Hypervisor 6.2-0.17.2 installation media.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1,2 +1,2 @@
Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
-You must reinstall the hypervisor using RHEV Hypervisor 6.2-0.17.2 installation media.+You must reinstall the hypervisor using installation media for RHEV Hypervisor 6.2-0.17.2 or higher version.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field
accordingly. All revisions will be proofread by the Engineering Content Services team.
Diffed Contents:
@@ -1,2 +1,2 @@
-Upgrade from RHEV Hypervisor 6.2-0.12.1 (RHEV 3.0 Beta 1) is not supported.
+Upgrade from RHEV Hypervisor 6.2-0.12 (RHEV 3.0 Beta 1) is not supported.
You must reinstall the hypervisor using installation media for RHEV Hypervisor 6.2-0.17.2 or higher version.
As comment #35, change the bug status to Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2011-1783.html |
Description of problem: rhev-hypervisor 6.2-0.6 registered to a rhevm 2.2 or any ip(invalid rhevm), before upgrade, [root@amd-1352-8-5 ~]# ifconfig eth0 Link encap:Ethernet HWaddr 00:22:19:2D:4B:A3 inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1897 errors:0 dropped:0 overruns:0 frame:0 TX packets:298 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:376021 (367.2 KiB) TX bytes:23234 (22.6 KiB) Interrupt:17 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:220 errors:0 dropped:0 overruns:0 frame:0 TX packets:220 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:31786 (31.0 KiB) TX bytes:31786 (31.0 KiB) rhevm Link encap:Ethernet HWaddr 00:22:19:2D:4B:A3 inet addr:10.66.72.105 Bcast:10.66.73.255 Mask:255.255.254.0 inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:1901 errors:0 dropped:0 overruns:0 frame:0 TX packets:304 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:345036 (336.9 KiB) TX bytes:21692 (21.1 KiB) [root@amd-1352-8-5 ~]# brctl show bridge name bridge id STP enabled interfaces rhevm 8000.0022192d4ba3 no eth0 [root@amd-1352-8-5 ~]# service vdsmd restart Shutting down vdsm daemon: vdsm watchdog stop [ OK ] vdsm: not running [FAILED] vdsm: Missing certificates, vdsm not registered [FAILED] vdsm: failed to reconfigure libvirt [FAILED] after upgrade, [root@amd-1352-8-5 ~]# ifconfig breth0 Link encap:Ethernet HWaddr 00:22:19:2D:4B:A3 inet addr:10.66.72.105 Bcast:10.66.73.255 Mask:255.255.254.0 inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11398 errors:0 dropped:0 overruns:0 frame:0 TX packets:1325 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:2202663 (2.1 MiB) TX bytes:91218 (89.0 KiB) eth0 Link encap:Ethernet HWaddr 00:22:19:2D:4B:A3 inet6 addr: fe80::222:19ff:fe2d:4ba3/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:11540 errors:0 dropped:0 overruns:0 frame:0 TX packets:1326 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:1000 RX bytes:2424434 (2.3 MiB) TX bytes:102896 (100.4 KiB) Interrupt:17 lo Link encap:Local Loopback inet addr:127.0.0.1 Mask:255.0.0.0 inet6 addr: ::1/128 Scope:Host UP LOOPBACK RUNNING MTU:16436 Metric:1 RX packets:134 errors:0 dropped:0 overruns:0 frame:0 TX packets:134 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:23664 (23.1 KiB) TX bytes:23664 (23.1 KiB) rhevm Link encap:Ethernet HWaddr E2:6A:F9:D4:9B:3E inet6 addr: fe80::e06a:f9ff:fed4:9b3e/64 Scope:Link UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1 RX packets:0 errors:0 dropped:0 overruns:0 frame:0 TX packets:14 errors:0 dropped:0 overruns:0 carrier:0 collisions:0 txqueuelen:0 RX bytes:0 (0.0 b) TX bytes:2700 (2.6 KiB) [root@amd-1352-8-5 ~]# brctl show bridge name bridge id STP enabled interfaces breth0 8000.0022192d4ba3 no eth0 rhevm 8000.000000000000 no [root@amd-1352-8-5 ~]# ls /etc/sysconfig/network-scripts/ ifcfg-breth0 ifcfg-lo ifdown-bnep ifdown-ipv6 ifdown-ppp ifdown-tunnel ifup-bnep ifup-ipv6 ifup-plusb ifup-routes ifup-wireless network-functions ifcfg-eth0 ifcfg-rhevm ifdown-eth ifdown-isdn ifdown-routes ifup ifup-eth ifup-isdn ifup-post ifup-sit init.ipv6-global network-functions-ipv6 ifcfg-eth1 ifdown ifdown-ippp ifdown-post ifdown-sit ifup-aliases ifup-ippp ifup-plip ifup-ppp ifup-tunnel net.hotplu [root@amd-1352-8-5 ~]# cat /etc/sysconfig/network-scripts/ifcfg-eth0 DEVICE=eth0 BRIDGE=breth0 ONBOOT=yes HWADDR=00:22:19:2d:4b:a3 Version-Release number of selected component (if applicable): rhev-hypervisor 6.2-0.6 How reproducible: always with a invalid rhevm Steps to Reproduce: 1.registered to a invalid rhevm 2.upgrade with "linux upgrade" 3. Actual results: rhev-hypervisor didn't handle this exception well Expected results: need to better handle this, keep the original network topology, Additional info: didn't happen to a rhev-hypervisor that status display as "up" on rhevm side