Bug 1289026
Summary: | The bonding/vlan network is disabled after upgrade via TUI prior to Engine registration | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Huijuan Zhao <huzhao> | ||||||
Component: | ovirt-node | Assignee: | Fabian Deutsch <fdeutsch> | ||||||
Status: | CLOSED NOTABUG | QA Contact: | Huijuan Zhao <huzhao> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | 3.6.0 | CC: | cshao, cwu, danken, ecohen, fdeutsch, gklein, huiwa, huzhao, ibarkan, leiwang, lsurette, lyi, mburman, mgoldboi, yaniwang, ycui | ||||||
Target Milestone: | ovirt-3.6.2 | Keywords: | Regression | ||||||
Target Release: | --- | ||||||||
Hardware: | Unspecified | ||||||||
OS: | Unspecified | ||||||||
Whiteboard: | node | ||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | Environment: | ||||||||
Last Closed: | 2015-12-10 10:56:08 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1285700 | ||||||||
Attachments: |
|
Created attachment 1103084 [details]
bond log
Ido, can you tell anything from the logs? from the supervdsm.log I see that nothing was persisted: restore-net::DEBUG::2015-12-04 07:08:55,475::libvirtconnection::160::root::(get) trying to connect libvirt restore-net::INFO::2015-12-04 07:08:55,520::vdsm-restore-net-config::385::root::(restore) starting network restoration. restore-net::DEBUG::2015-12-04 07:08:55,520::vdsm-restore-net-config::183::root::(_remove_networks_in_running_config) Not cleaning running configuration since it is empty. restore-net::INFO::2015-12-04 07:08:55,523::netconfpersistence::179::root::(_clearDisk) Clearing /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ restore-net::DEBUG::2015-12-04 07:08:55,523::netconfpersistence::187::root::(_clearDisk) No existent config to clear. restore-net::INFO::2015-12-04 07:08:55,524::netconfpersistence::129::root::(save) Saved new config RunningConfig({}, {}) to /var/run/vdsm/netconf/nets/ and /var/run/vdsm/netconf/bonds/ restore-net::DEBUG::2015-12-04 07:08:55,524::vdsm-restore-net-config::329::root::(_wait_for_for_all_devices_up) All devices are up. restore-net::INFO::2015-12-04 07:08:55,529::netconfpersistence::71::root::(setBonding) Adding bond0({'nics': ['em1', 'p4p2'], 'options': 'miimon=100'}) restore-net::INFO::2015-12-04 07:08:55,530::vdsm-restore-net-config::396::root::(restore) restoration completed successfully. *** Bug 1289028 has been marked as a duplicate of this bug. *** What persistence failed? The node specific file persistence? Or the unified persistence? It is noted that this is a regression between RHEV-H 7.2-20151112.1.el7ev (works) and RHEV-H 7.2-20151201.2.el7ev (does not work) , the diff between the two is: --- RHEV-H 7.2-20151112.1.el7ev +++ RHEV-H 7.2-20151201.2.el7ev -glibc-2.17-105.el7.x86_64 -glibc-common-2.17-105.el7.x86_64 +glibc-2.17-106.el7_2.1.x86_64 +glibc-common-2.17-106.el7_2.1.x86_64 -gmp-6.0.0-11.el7.x86_64 +gmp-6.0.0-12.el7_1.x86_64 -ioprocess-0.14.0-4.el7ev.x86_64 +ioprocess-0.15.0-5.el7ev.x86_64 -ipxe-roms-qemu-20130517-7.gitc4bce43.el7.noarch +ipxe-roms-qemu-20130517-7.1fm.gitc4bce43.el7sat.noarch -librados2-0.94.1-19.el7cp.x86_64 -librbd1-0.94.1-19.el7cp.x86_64 +librados2-0.94.3-3.el7cp.x86_64 +librbd1-0.94.3-3.el7cp.x86_64 -libreport-filesystem-2.1.11-30.el7.x86_64 +libreport-filesystem-2.1.11-31.el7.x86_64 -libvirt-1.2.17-13.el7.x86_64 +libvirt-1.2.17-13.el7_2.2.x86_64 -libvirt-client-1.2.17-13.el7.x86_64 -libvirt-daemon-1.2.17-13.el7.x86_64 -libvirt-daemon-config-network-1.2.17-13.el7.x86_64 -libvirt-daemon-config-nwfilter-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-interface-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-lxc-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-network-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-nodedev-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-nwfilter-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-qemu-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-secret-1.2.17-13.el7.x86_64 -libvirt-daemon-driver-storage-1.2.17-13.el7.x86_64 -libvirt-daemon-kvm-1.2.17-13.el7.x86_64 -libvirt-lock-sanlock-1.2.17-13.el7.x86_64 +libvirt-client-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-config-network-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-config-nwfilter-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-interface-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-lxc-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-network-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-nodedev-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-nwfilter-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-qemu-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-secret-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-driver-storage-1.2.17-13.el7_2.2.x86_64 +libvirt-daemon-kvm-1.2.17-13.el7_2.2.x86_64 +libvirt-lock-sanlock-1.2.17-13.el7_2.2.x86_64 +lttng-ust-2.4.1-1.el7cp.x86_64 +OpenIPMI-2.0.19-11.el7.x86_64 +OpenIPMI-libs-2.0.19-11.el7.x86_64 -ovirt-host-deploy-1.4.1-0.0.master.el7ev.noarch +ovirt-host-deploy-1.4.1-1.el7ev.noarch -ovirt-hosted-engine-ha-1.3.2.1-1.el7ev.noarch -ovirt-hosted-engine-setup-1.3.0-1.el7ev.noarch -ovirt-node-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-branding-rhev-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-lib-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-lib-config-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-lib-legacy-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-plugin-cim-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-plugin-cim-logic-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-plugin-hosted-engine-0.3.0-3.el7ev.noarch -ovirt-node-plugin-rhn-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-plugin-snmp-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-plugin-snmp-logic-3.6.0-0.20.20151103git3d3779a.el7ev.noarch -ovirt-node-plugin-vdsm-0.6.1-3.el7ev.noarch -ovirt-node-selinux-3.6.0-0.20.20151103git3d3779a.el7ev.noarch +ovirt-hosted-engine-ha-1.3.3.1-1.el7ev.noarch +ovirt-hosted-engine-setup-1.3.1.1-1.el7ev.noarch +ovirt-node-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-branding-rhev-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-lib-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-lib-config-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-lib-legacy-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-plugin-cim-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-plugin-cim-logic-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-plugin-hosted-engine-0.3.0-4.el7ev.noarch +ovirt-node-plugin-rhn-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-plugin-snmp-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-plugin-snmp-logic-3.6.0-0.23.20151201git5eed7af.el7ev.noarch +ovirt-node-plugin-vdsm-0.6.1-4.el7ev.noarch +ovirt-node-selinux-3.6.0-0.23.20151201git5eed7af.el7ev.noarch -python-ioprocess-0.14.0-4.el7ev.noarch +python-ioprocess-0.15.0-5.el7ev.noarch -python-rhsm-1.15.4-5.el7.x86_64 +python-rhsm-1.13.2-1.el7.x86_64 -rdma-7.2_4.1_rc6-1.el7.noarch +rdma-7.2_4.1_rc6-2.el7.noarch -screen-4.1.0-0.21.20120314git3c2946.el7.x86_64 +screen-4.1.0-0.22.20120314git3c2946.el7.x86_64 -subscription-manager-1.15.9-15.el7.x86_64 +subscription-manager-1.10.14-10.el7.x86_64 +userspace-rcu-0.7.9-2.el7rhs.x86_64 -vdsm-4.17.10.1-0.el7ev.noarch -vdsm-cli-4.17.10.1-0.el7ev.noarch -vdsm-hook-ethtool-options-4.17.10.1-0.el7ev.noarch -vdsm-infra-4.17.10.1-0.el7ev.noarch -vdsm-jsonrpc-4.17.10.1-0.el7ev.noarch -vdsm-python-4.17.10.1-0.el7ev.noarch -vdsm-xmlrpc-4.17.10.1-0.el7ev.noarch -vdsm-yajsonrpc-4.17.10.1-0.el7ev.noarch +vdsm-4.17.12-0.el7ev.noarch +vdsm-cli-4.17.12-0.el7ev.noarch +vdsm-hook-ethtool-options-4.17.12-0.el7ev.noarch +vdsm-infra-4.17.12-0.el7ev.noarch +vdsm-jsonrpc-4.17.12-0.el7ev.noarch +vdsm-python-4.17.12-0.el7ev.noarch +vdsm-xmlrpc-4.17.12-0.el7ev.noarch +vdsm-yajsonrpc-4.17.12-0.el7ev.noarch ifcfg: remove files properly on the node Since change-id I02ae28c345 we are always persisting ifcfg files on the node. This means that we should unpersist them on removal. Change-Id: I2ab83b3fad7679f8f3f459b682860a95e08d6b1e Bug-Url: https://bugzilla.redhat.com/1283628 Signed-off-by: Dan Kenigsberg <danken> Reviewed-on: https://gerrit.ovirt.org/48841 Reviewed-by: Ido Barkan <ibarkan> Reviewed-by: Fabian Deutsch <fabiand> Tested-by: Sagi Shnaidman <sshnaidm> Reviewed-by: Sagi Shnaidman <sshnaidm> (cherry picked from commit 1ae349016221c52e1a80971aac2e5080ad33fd11) Reviewed-on: https://gerrit.ovirt.org/49373 Continuous-Integration: Jenkins CI Was merged during that time, which could have an effect here. Oddly, I see that a DHCPOFFER on 07:02:46 but is somehow ignored on 07:03:58. Dec 4 07:06:53 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 7 (xid=0x1cf82057) Dec 4 07:06:53 localhost dhclient[2245]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x1cf82057) Dec 4 07:06:53 localhost dhclient[2245]: DHCPOFFER from 10.66.73.254 Dec 4 07:07:01 localhost dhclient[2245]: DHCPREQUEST on bond0 to 255.255.255.255 port 67 (xid=0x1cf82057) Dec 4 07:07:11 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 3 (xid=0x52479cbf) Dec 4 07:07:14 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 4 (xid=0x52479cbf) Dec 4 07:07:18 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 10 (xid=0x52479cbf) Dec 4 07:07:28 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 15 (xid=0x52479cbf) Dec 4 07:07:43 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 18 (xid=0x52479cbf) Dec 4 07:08:01 localhost systemd: Created slice user-0.slice. Dec 4 07:08:01 localhost systemd: Starting user-0.slice. Dec 4 07:08:01 localhost systemd: Started Session 1 of user root. Dec 4 07:08:01 localhost systemd: Starting Session 1 of user root. Dec 4 07:08:01 localhost dhclient[2245]: DHCPDISCOVER on bond0 to 255.255.255.255 port 67 interval 11 (xid=0x52479cbf) Dec 4 07:08:02 localhost systemd: Removed slice user-0.slice. Dec 4 07:08:02 localhost systemd: Stopping user-0.slice. Dec 4 07:08:12 localhost dhclient[2245]: No DHCPOFFERS received. Dec 4 07:08:12 localhost network: Determining IP information for bond0... failed. Dec 4 07:08:12 localhost network: [FAILED] Could it be that your dhcp server is unfamiliar with p4p2's mac address? Can you repeat the test with bond mode=4 (in case this is your switch's config)? Hi RHEV-H 7.2-20151129.1.el7ev have a vdsm-4.16.30-1.el7ev.x86_64 , right? It means, that you had 'rhevm' under /var/lib/vdsm/persistence/netconf/nets/ and not 'ovirtmgmt' . And isn't your RHEV-H 7.2-20151112.1.el7ev had vdsm 4.17.10(3.6)? If yes, it explain why it worked. I think it's all related to BZ 1271273. Hi Michael, RHEV-H is not register to rhevm before upgrade, there is no bridge "rhevm" or "ovirtmgmt", so maybe it is not related to "rhevm" or "ovirtmgmt". Additional, 1. For bonding network, no such issue on RHEV-H 7.2-20151112.1.el7ev, But, for vlan network, also encounter the issue on RHEV-H 7.2-20151112.1.el7ev. 2. No such issue during upgrade via cmd. Huijuan Hi, Even if the host wasn't registered to rhev-m, 'rhevm' bridge is created over the NIC. You can verify/see that after step 2^^ with: tree /ver/lib/vdsm/persistence/netconf/nets/ ├── rhevm and brctl show command. Danken, isn't the same issue as BZ 1271273 ? the management network and the associate NICs(bond in this case), weren't persistent. If Huijuan run an upgrade from : - RHEV-H 7.2-20151129.1.el7ev (vdsm 3.5.6)>> RHEV-H 7.2-20151201.2.el7ev(vdsm 3.6.1.1 beta) rhevm>ovirtmgmt He failed. - But when he run upgrade from: RHEV-H 7.2-20151112.1.el7ev (vdsm 3.6.1 beta 1) >> RHEV-H 7.2-20151201.2.el7ev(vdsm 3.6.1.1 beta) ovirtmgmt>ovirtmgmt He succeeded. Looks like 'rhevm'/'ovirtmgmt' management network issue(even if the host wasn't registered to rhev-m) Are you sure that "rhemv" exists right after step 2? AFAIK rhev-h 3.5 creates the "rhevm" network only after the details of Engine are supplied to the TUI, and not right after step 2 of comment 0. Comment 0 did not supply Engine IP, so bug 1271273 is unrelated. Dan , Huijuan You right guys, sorry, my mistake. 'rhevm' exist only after the details of engine are supplied to the TUI. Michael, for Comment 12, I run upgrade again from: - RHEV-H 7.2-20151129.1.el7ev (vdsm 3.5.6)>> RHEV-H 7.2-20151201.2.el7ev(vdsm 3.6.1.1 beta) Failed. - RHEV-H 7.2-20151112.1.el7ev (vdsm 3.6.1 beta 1) >> RHEV-H 7.2-20151201.2.el7ev(vdsm 3.6.1.1 beta) succeeded. But in the previous run, it failed, so this is not 100% reproduce. Huijuan, can you try to reproduce this bug on a machine with a dual- or quad-nic-card. Please create the bond over two ports of such a dual or quad card. I'd like to see if this problem is related to the currently involved NICs. Fabian, Dan Can someone please explain the use case for such upgrade scenario?? without involving the rhev-m engine? why someone will run such upgrade in the first place? Thanks )) A valid point, there were cases were this was happening, but in RHEV, indeed, this should not happen to often. Still, I suspect that this problem will also be encountered if RHEV-H was connected to RHEV-M, because I don't see any RHEV-H specific problem here. Huijuan, can you please check if this bug also appears if the host is rgeistered to RHEV-M, so adding a step between step 2 and three of comment 0: 2.a Register to RHEV-M In addition it would be good to have the question in commen 16 answered. You will be blocked by BZ 1271273 if you will do this with RHEV-H 7.2-20151129.1.el7ev (vdsm 3.5.6)>> RHEV-H 7.2-20151201.2.el7ev(vdsm 3.6.1.1 beta) RHEV-H 7.2-20151112.1.el7ev (vdsm 3.6.1 beta 1) >> RHEV-H 7.2-20151201.2.el7ev(vdsm 3.6.1.1 beta should succeed. Fabian, ycui, for comment 0 and comment 15, I both created the bond over two NICs. Do you mean the bond over two ports on the same one NIC or just two ports(two NICs or one NIC both ok)? reducing urgency since upgrade prior to registration is less important. Fabian, ycui, for comment 16, I tested bond over two ports on same one card for one time, no such issue. Thanks Huijuan Closing this bug according to comment 22. For this bug, here still have something not clear. 1. Why regression happened? See bug description and comment 11. 2. Why bond over two NIC cards is disabled after upgrading? It should be valid scenario. But bond over _A_ dual- NICs card works well after upgrading |
Created attachment 1103082 [details] screenshot bond upgrade fail Description of problem: The bonding network is disabled after upgrade from RHEVH-7.2/RHEVH-7.1 publicly released version to RHEV-H 7.2 for 3.6 beta2 Version-Release number of selected component (if applicable): RHEV-H 7.2-20151201.2.el7ev ovirt-node-3.6.0-0.23.20151201git5eed7af.el7ev.noarch How reproducible: 100% Whiteboard: regression Steps to Reproduce: 1. TUI install RHEV-H 7.2-20151129.1.el7ev 2. Login RHEV-H 7.2-20151129.1.el7ev, setup bond network with two NICs via dhcp, can obtain dhcp ip successful 3. Upgrade from RHEV-H 7.2-20151129.1.el7ev to RHEV-H 7.2-20151201.2.el7ev via TUI 4. Login RHEV-H 7.2-20151201.2.el7ev, check the bond network Actual results: After step4, the bond network is disabled Expected results: After step4, the bond network should be up and obtain dhcp ip successful Additional info: No such issue on RHEV-H 7.2-20151112.1.el7ev, so this is regression bug