Bug 1463218
Summary: | [downstream clone - 4.1.7] Adding rhvh-4.1-20170417.0 to engine failed with bond(active+backup) configured by cockpit | ||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | rhev-integ | ||||||||||||
Component: | vdsm | Assignee: | Edward Haas <edwardh> | ||||||||||||
Status: | CLOSED ERRATA | QA Contact: | dguo | ||||||||||||
Severity: | urgent | Docs Contact: | |||||||||||||
Priority: | high | ||||||||||||||
Version: | 4.1.0 | CC: | bazulay, bugs, cshao, danken, dfediuck, dguo, edwardh, eedri, fgiudici, huzhao, jiawu, leiwang, lsurette, mburman, pbrilla, qiyuan, rbarry, rhev-integ, sbonazzo, srevivo, weiwang, yaniwang, ycui, ykaul, ylavi, yzhao | ||||||||||||
Target Milestone: | ovirt-4.1.7 | Keywords: | Regression, TestBlocker, ZStream | ||||||||||||
Target Release: | --- | ||||||||||||||
Hardware: | Unspecified | ||||||||||||||
OS: | Unspecified | ||||||||||||||
Whiteboard: | |||||||||||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||||||||||
Doc Text: | Story Points: | --- | |||||||||||||
Clone Of: | 1443347 | Environment: | |||||||||||||
Last Closed: | 2017-11-07 17:29:21 UTC | Type: | --- | ||||||||||||
Regression: | --- | Mount Type: | --- | ||||||||||||
Documentation: | --- | CRM: | |||||||||||||
Verified Versions: | Category: | --- | |||||||||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||||||||
Embargoed: | |||||||||||||||
Bug Depends On: | 1443347, 1472965 | ||||||||||||||
Bug Blocks: | 1491561 | ||||||||||||||
Attachments: |
|
Description
rhev-integ
2017-06-20 11:49:31 UTC
Created attachment 1272497 [details]
engine log
(Originally by Daijie Guo)
Is this the same version of rhvm? Can you grab the engine log and the generated ifcfg files from the previous RHVH build and this one? The problem is either engine or platform cockpit, but this information is needed for root cause analysis (Originally by Ryan Barry) Created attachment 1272500 [details]
network-scripts
(Originally by Daijie Guo)
Created attachment 1272501 [details]
vdsm logs and deploy log
(Originally by Daijie Guo)
NOTE for Regression & Testblocker: No such issue on previous version(redhat-virtualization-host-4.1-20170413.) and this bug will block add RHVH to engine with bond configured test scenario. (Originally by Chen Shao) Same engine version? This is critical. Absolutely nothing changed in RHVH which would affect this (in general, but especially from 0413 to 0417). If the interface comes up properly in Cockpit, I'd also expect engine to work. (Originally by Ryan Barry) (In reply to Ryan Barry from comment #6) Ryan, > Same engine version? This is critical. Yes, same version rhvm-4.1.1.8-0.1.el7 > > Absolutely nothing changed in RHVH which would affect this (in general, but > especially from 0413 to 0417). If the interface comes up properly in > Cockpit, I'd also expect engine to work. I should correct comments 5 rhvh verison to rhvh-4.1-20170403, not 0413. Since respin in 0413, I did not test this bond scenario there, thus, finally found in 0417 It should be noted that there is a big change between 0403 to 0417, which is cockpit version. ON rhvh-4.1-20170403: cockpit-shell-126-1.el7.noarch cockpit-ovirt-dashboard-0.10.7-0.0.16.el7ev.noarch On rhvh-4.1-20170417: cockpit-ovirt-dashboard-0.10.7-0.0.17.el7ev.noarch cockpit-system-135-4.el7.noarch and there is a bug fix for network issue in Cockpit 132, which might affect. https://bugzilla.redhat.com/show_bug.cgi?id=1395108 https://bugzilla.redhat.com/show_bug.cgi?id=1420708 (Originally by Daijie Guo) Created attachment 1272520 [details]
network-scripts in previous build 0403
(Originally by Daijie Guo)
(In reply to dguo from comment #7) > https://bugzilla.redhat.com/show_bug.cgi?id=1395108 > https://bugzilla.redhat.com/show_bug.cgi?id=1420708 Since this actually works until an attempt to register to engine is made, I expect that Cockpit is actually working here, and the problem is some confusion in the ifcfg scripts, but I'm looking (Originally by Ryan Barry) It appears that host-deploy is not adding the vlan to ovirtmgmt. This makes comparison difficult, though, since the previous ifcfg scripts do not contain a VLAN config. Can you please attach new ifcfgs with a matching config? If "network-scripts.after_add" is without a vlan (ifcfg-bond0 has no vlan config here), then the attachment is more confusing, since before_add has a vlan... (Originally by Ryan Barry) engine.log and vdsm.log both have messages about SSL handshake errors rather than 'no route to host', so networking is probably up. Can you please provide the following: Configure a system with a bond OR bond+vlan, but keep the configuration the same: ifcfg files 0403 before and after add ifcfg files 0417 before and after add host-deploy, vdsm, and engine logs from the failed addition (Originally by Ryan Barry) Created attachment 1272833 [details]
vdsm.log, hosted-engine.log, ifcfg files
(Originally by Yihui Zhao)
Deploy the HE with bond(bond+vlan) during the bond's ip changed. Upload the vdsm.log , hosted-engine.log, ifcfg files(before setup bond0), ifcfg files(setup bond0), ifcfg files(deploy HE failed). Attachment : https://bugzilla.redhat.com/attachment.cgi?id=1272833 (Originally by Yihui Zhao) (In reply to Yihui Zhao from comment #13) > Deploy the HE with bond(bond+vlan) during the bond's ip changed. > > Upload the vdsm.log , hosted-engine.log, ifcfg files(before setup bond0), > ifcfg files(setup bond0), ifcfg files(deploy HE failed). > > Attachment : https://bugzilla.redhat.com/attachment.cgi?id=1272833 So, the bug will also block HE testing (HE with bond or bond+vlan). (Originally by Yihui Zhao) Created attachment 1272840 [details]
All files of 04017
(Originally by Daijie Guo)
Created attachment 1272841 [details]
All files of 0403
(Originally by Daijie Guo)
(In reply to Ryan Barry from comment #11) > engine.log and vdsm.log both have messages about SSL handshake errors rather > than 'no route to host', so networking is probably up. > > Can you please provide the following: > > Configure a system with a bond OR bond+vlan, but keep the configuration the > same: > > ifcfg files 0403 before and after add > ifcfg files 0417 before and after add > > host-deploy, vdsm, and engine logs from the failed addition Ryan, Attach all files required, and clarify them into 0403 and 0417. (Originally by Daijie Guo) From all tests did on 0417, we observed the following phenomenon: 1. Create bond0 over em1 + em2(em1 was set to master slave), The bond0 got the em2's mac, which ip was 10.73.131.184. 2. Add host over bond0, during the installation, the bond0's mac was changed to em1's, which ip was 10.73.131.65. 3. After adding failed, the bond0's ip was disappear. But for tests did on 0403: 1. Bond0 got em1(master)'s mac, which ip was 10.73.131.65. 2. Add host over bond0, the mac there was not changed, and the ip was always 10.73.131.65 (Originally by Daijie Guo) Reassigning to vdsm for tracking. The cause of this seems to be a known problem with NM/cockpit changing IPs if the active mac changes. There are workaround for this. (Originally by Ryan Barry) The proposed patch (https://gerrit.ovirt.org/77933) should be suitable for RHVH, as the VDSM has been already installed on it with the NM configuration file. Note that the NM configuration that enables adding slaves to a bond in the order of the slaves names (same as initscripts order) will be available in RHEL 7.4, with NM version 1.8. (Originally by edwardh) INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [Tag 'v4.19.21' doesn't contain patch 'https://gerrit.ovirt.org/78362'] gitweb: https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.21 For more info please contact: rhv-devops (In reply to rhev-integ from comment #24) > INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following > reason: > > [Tag 'v4.19.21' doesn't contain patch 'https://gerrit.ovirt.org/78362'] > gitweb: > https://gerrit.ovirt.org/gitweb?p=vdsm.git;a=shortlog;h=refs/tags/v4.19.21 > > For more info please contact: rhv-devops But it does contain it. (commit 168ebb7) Failed ON-QA on the latest rhvh build Test version: Red Hat Virtualization Manager Version: 4.1.4.1-0.1.el7 redhat-virtualization-host-4.1-20170714.1 vdsm-4.19.22-1.el7ev.x86_64 imgbased-0.9.33-0.1.el7ev.noarch cockpit-ovirt-dashboard-0.10.7-0.0.21.el7ev.noarch cockpit-ws-141-2.el7.x86_64 Test step: 1. Install a rhvh4.1 2. Configure bond0(active+backup) via cockpit on rhvh4.1 3. Add this host to engine4.1 Actual results: 1.After step#3, adding failed. During installing process, the ip address was changed, and after adding failed, the ip address was disappear Expected results: 1. After step#3, the host can be added successfully Additional info: 1. Please see the logs in the new attachment Created attachment 1300219 [details]
All the logs including engine.log vdsm.log host-deploy.log ifcfg-files
The bond0 interface as described in its ifcfg file, before VDSM takes over, has an mac address statically set: MACADDR=08:9E:01:63:2C:B3 VDSM does not support such a configuration, it expects NM to automatically select the mac address per the name order. Please advice who is setting this... Is it cockpit? If so, a BZ should be opened against it. (In reply to Edward Haas from comment #30) > The bond0 interface as described in its ifcfg file, before VDSM takes over, > has an mac address statically set: MACADDR=08:9E:01:63:2C:B3 > VDSM does not support such a configuration, it expects NM to automatically > select the mac address per the name order. > > Please advice who is setting this... Is it cockpit? If so, a BZ should be > opened against it. There is new "Mac input box" while adding a bond on cockpit, which you can see from attached picture. I try to create the bond with two different way: 1. Specify the mac address with the existing em2 mac, the bond will get the em2's ip 2. Do not specify the mac address and leave the "MAC input box" blank. For #1, Failed to add the host to engine with this bond, as you pointed. And for #2, it also failed, I will attach the new logs. Created attachment 1300324 [details]
creating bond on cockpit
Created attachment 1300325 [details]
New logs where bond mac is not specified
(In reply to dguo from comment #31) > > I try to create the bond with two different way: > 1. Specify the mac address with the existing em2 mac, the bond will get the > em2's ip > 2. Do not specify the mac address and leave the "MAC input box" blank. > > For #1, Failed to add the host to engine with this bond, as you pointed. > And for #2, it also failed, I will attach the new logs. Thanks for the input. I see the IP has changed, but I cannot see to what mac address it changed to. Could you please post the "ip link" output before the 120sec timeout is reached? (In reply to Edward Haas from comment #34) > (In reply to dguo from comment #31) > > > > I try to create the bond with two different way: > > 1. Specify the mac address with the existing em2 mac, the bond will get the > > em2's ip > > 2. Do not specify the mac address and leave the "MAC input box" blank. > > > > For #1, Failed to add the host to engine with this bond, as you pointed. > > And for #2, it also failed, I will attach the new logs. > > Thanks for the input. > I see the IP has changed, but I cannot see to what mac address it changed to. > Could you please post the "ip link" output before the 120sec timeout is > reached? Below is the output which you required: 1: lo: <LOOPBACK,UP,LOWER_UP> mtu 65536 qdisc noqueue state UNKNOWN qlen 1 link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00 inet 127.0.0.1/8 scope host lo valid_lft forever preferred_lft forever inet6 ::1/128 scope host valid_lft forever preferred_lft forever 2: p1p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc pfifo_fast state UP qlen 256 link/ether 00:c0:dd:20:13:e8 brd ff:ff:ff:ff:ff:ff 3: em1: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether de:27:be:a6:a6:2d brd ff:ff:ff:ff:ff:ff 4: em2: <BROADCAST,MULTICAST,SLAVE,UP,LOWER_UP> mtu 1500 qdisc mq master bond0 state UP qlen 1000 link/ether de:27:be:a6:a6:2d brd ff:ff:ff:ff:ff:ff 5: p3p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:1b:21:a6:3d:7a brd ff:ff:ff:ff:ff:ff 6: p3p2: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:1b:21:a6:3d:7b brd ff:ff:ff:ff:ff:ff 7: p2p1: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc mq state UP qlen 1000 link/ether 00:1b:21:a6:64:6c brd ff:ff:ff:ff:ff:ff 8: p2p2: <NO-CARRIER,BROADCAST,MULTICAST,UP> mtu 1500 qdisc mq state DOWN qlen 1000 link/ether 00:1b:21:a6:64:6d brd ff:ff:ff:ff:ff:ff 24: bond0: <BROADCAST,MULTICAST,MASTER,UP,LOWER_UP> mtu 1500 qdisc noqueue master ovirtmgmt state UP qlen 1000 link/ether de:27:be:a6:a6:2d brd ff:ff:ff:ff:ff:ff 25: ;vdsmdummy;: <BROADCAST,MULTICAST> mtu 1500 qdisc noop state DOWN qlen 1000 link/ether c2:e5:c8:c3:c0:4a brd ff:ff:ff:ff:ff:ff 26: ovirtmgmt: <BROADCAST,MULTICAST,UP,LOWER_UP> mtu 1500 qdisc noqueue state UP qlen 1000 link/ether de:27:be:a6:a6:2d brd ff:ff:ff:ff:ff:ff inet 10.73.75.189/22 brd 10.73.75.255 scope global dynamic ovirtmgmt valid_lft 43190sec preferred_lft 43190sec inet6 2620:52:0:4948:dc27:beff:fea6:a62d/64 scope global mngtmpaddr dynamic valid_lft 2591990sec preferred_lft 604790sec inet6 fe80::dc27:beff:fea6:a62d/64 scope link valid_lft forever preferred_lft forever Deploy HostedEngine also failed with bond+vlan (active+backup) configured by cockpit. Because the NIC like "bond0.20" ip was changed. Test version: rhvh-4.1-0.20170714.0+1 vdsm-4.19.22-1.el7ev.x86_64 cockpit-ovirt-dashboard-0.10.7-0.0.21.el7ev.noarch ovirt-hosted-engine-setup-2.1.3.4-1.el7ev.noarch imgbased-0.9.33-0.1.el7ev.noarch rhvm-appliance-4.1.20170709.3-1.el7.noarch How to reproduce: 1. Install RHVH4.1 2. Configure network (bond+vlan) by cockpit 3. Deploy HostedEngine Actual results: 1.After step3, deploying failed. During deploying process, the ip address was changed. Expected results: 1. After step3, the host can deploy HostedEngine successfully Additional info: /var/log/* in the attachment. Created attachment 1300958 [details]
/var/log/*
It looks like the mac assigned to the bond (de:27:be:a6:a6:2d) is not one of the nics original macs. @mburman also has seen the same scenario in TLV lab this morning. We currently suspect NM of overwriting the bond mac, although it does not manage it. Waiting for some insights from NM team on this. Created attachment 1301090 [details]
NM log in debug1
Isolated the issue that is probably the root cause of this (thanks for Michael for sharing setup and machine). If a NetworkManager bond connection which is already up is brought up again, the MAC address will be reset to the "fake" one the bond interface had before enslaving any interface. Tracked this with trace level logs here: https://bugzilla.redhat.com/show_bug.cgi?id=1472965 changing summary to reflect re-target of the bug This is tracked for 7.4.z in bug 1490741 which is already VERIFIED. Do we have anything specific to do here, besides wait for it to be released? In order to test this report properly we still need: 1) rhv-h + cockpit build with NetworkManager-1.8.0-10.el7_4.x86_64 included. 2) In order to consume bond with MACADDR= key we need to wait for - BZ 1422430, if a cloned MAC address is specified via cockpit, vdsm currently will fail to consume it. (In reply to Yaniv Kaul from comment #44) > This is tracked for 7.4.z in bug 1490741 which is already VERIFIED. > Do we have anything specific to do here, besides wait for it to be released? The release of the NM fix is expected on the 17th of Oct which should include NetworkManager-1.8.0-10.el7_4. The first RHV-H image that includes it should resolve this BZ. Test two scenario on latest rhvh-4.1-0.20171024.0 based on comment 31 Test version: [root@localhost ~]# rpm -q redhat-release-virtualization-host redhat-release-virtualization-host-4.1-7.0.el7.x86_64 [root@localhost ~]# imgbase w You are on rhvh-4.1-0.20171024.0+1 [root@localhost ~]# rpm -q NetworkManager NetworkManager-1.8.0-11.el7_4.x86_64 [root@localhost ~]# rpm -q vdsm vdsm-4.19.35-1.el7ev.x86_64 Test scenario: 1. Do not specify mac address while configuring bond on cockpit Add rhvh to rhvm over this bond successfully 2. Specify the bond mac address with existing em2's mac Failed to add to rhvm, and the bond's ip still disappear after the failure. Seems that we still need to wait BZ 1422430 from comment 45. Please do not fail this bug due to scenario 2, it should track only scenario 1. We know well that scenario 2 fails. We have bug 1422430 tracking it, and we don't need a second bug for that. The most important bit is that *finally*, we can use cockpit to define a dhcp address over a bond, and add the host successfully to ovirt-engine. In my opinion, this merits a VERIFIED status, please reconsider. (In reply to Dan Kenigsberg from comment #49) > Please do not fail this bug due to scenario 2, it should track only scenario > 1. > > We know well that scenario 2 fails. We have bug 1422430 tracking it, and we > don't need a second bug for that. The most important bit is that *finally*, > we can use cockpit to define a dhcp address over a bond, and add the host > successfully to ovirt-engine. In my opinion, this merits a VERIFIED status, > please reconsider. Thanks for the clarification, I agree with that. From customer's perspective, We also need to give them the clarification that scenario#2 is not supported currently. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:3139 |