Description of problem: Adding rhvh-4.1-20170417.0 to engine failed with bond(active+backup) configured by cockpit Version-Release number of selected component (if applicable): Red Hat Virtualization Manager Version: 4.1.1.8-0.1.el7 redhat-virtualization-host-4.1-20170417.0.x86_64 imgbased-0.9.23-0.1.el7ev.noarch vdsm-4.19.10.1-1.el7ev.x86_64 cockpit-ovirt-dashboard-0.10.7-0.0.17.el7ev.noarch cockpit-system-135-4.el7.noarch How reproducible: 100% Steps to Reproduce: 1. Install a rhvh4.1 2. Configure bond0(active+backup) via cockpit on rhvh4.1 3. Add this host to engine4.1 Actual results: 1.After step#3, adding failed. During installing process, the ip address was changed, and after adding failed, the ip address was disappear Expected results: 1. After step#3, the host can be added successfully Additional info: 1. Regression since no such issue in previous build 2. Tested with (vlan over bond) configured by cockpit, also adding failed 3. Bond configured by ifcfg-files manually can be added successfully
Created attachment 1272497 [details] engine log
Is this the same version of rhvm? Can you grab the engine log and the generated ifcfg files from the previous RHVH build and this one? The problem is either engine or platform cockpit, but this information is needed for root cause analysis
Created attachment 1272500 [details] network-scripts
Created attachment 1272501 [details] vdsm logs and deploy log
NOTE for Regression & Testblocker: No such issue on previous version(redhat-virtualization-host-4.1-20170413.) and this bug will block add RHVH to engine with bond configured test scenario.
Same engine version? This is critical. Absolutely nothing changed in RHVH which would affect this (in general, but especially from 0413 to 0417). If the interface comes up properly in Cockpit, I'd also expect engine to work.
(In reply to Ryan Barry from comment #6) Ryan, > Same engine version? This is critical. Yes, same version rhvm-4.1.1.8-0.1.el7 > > Absolutely nothing changed in RHVH which would affect this (in general, but > especially from 0413 to 0417). If the interface comes up properly in > Cockpit, I'd also expect engine to work. I should correct comments 5 rhvh verison to rhvh-4.1-20170403, not 0413. Since respin in 0413, I did not test this bond scenario there, thus, finally found in 0417 It should be noted that there is a big change between 0403 to 0417, which is cockpit version. ON rhvh-4.1-20170403: cockpit-shell-126-1.el7.noarch cockpit-ovirt-dashboard-0.10.7-0.0.16.el7ev.noarch On rhvh-4.1-20170417: cockpit-ovirt-dashboard-0.10.7-0.0.17.el7ev.noarch cockpit-system-135-4.el7.noarch and there is a bug fix for network issue in Cockpit 132, which might affect. https://bugzilla.redhat.com/show_bug.cgi?id=1395108 https://bugzilla.redhat.com/show_bug.cgi?id=1420708
Created attachment 1272520 [details] network-scripts in previous build 0403
(In reply to dguo from comment #7) > https://bugzilla.redhat.com/show_bug.cgi?id=1395108 > https://bugzilla.redhat.com/show_bug.cgi?id=1420708 Since this actually works until an attempt to register to engine is made, I expect that Cockpit is actually working here, and the problem is some confusion in the ifcfg scripts, but I'm looking
It appears that host-deploy is not adding the vlan to ovirtmgmt. This makes comparison difficult, though, since the previous ifcfg scripts do not contain a VLAN config. Can you please attach new ifcfgs with a matching config? If "network-scripts.after_add" is without a vlan (ifcfg-bond0 has no vlan config here), then the attachment is more confusing, since before_add has a vlan...
engine.log and vdsm.log both have messages about SSL handshake errors rather than 'no route to host', so networking is probably up. Can you please provide the following: Configure a system with a bond OR bond+vlan, but keep the configuration the same: ifcfg files 0403 before and after add ifcfg files 0417 before and after add host-deploy, vdsm, and engine logs from the failed addition
Created attachment 1272833 [details] vdsm.log, hosted-engine.log, ifcfg files
Deploy the HE with bond(bond+vlan) during the bond's ip changed. Upload the vdsm.log , hosted-engine.log, ifcfg files(before setup bond0), ifcfg files(setup bond0), ifcfg files(deploy HE failed). Attachment : https://bugzilla.redhat.com/attachment.cgi?id=1272833
(In reply to Yihui Zhao from comment #13) > Deploy the HE with bond(bond+vlan) during the bond's ip changed. > > Upload the vdsm.log , hosted-engine.log, ifcfg files(before setup bond0), > ifcfg files(setup bond0), ifcfg files(deploy HE failed). > > Attachment : https://bugzilla.redhat.com/attachment.cgi?id=1272833 So, the bug will also block HE testing (HE with bond or bond+vlan).
Created attachment 1272840 [details] All files of 04017
Created attachment 1272841 [details] All files of 0403
(In reply to Ryan Barry from comment #11) > engine.log and vdsm.log both have messages about SSL handshake errors rather > than 'no route to host', so networking is probably up. > > Can you please provide the following: > > Configure a system with a bond OR bond+vlan, but keep the configuration the > same: > > ifcfg files 0403 before and after add > ifcfg files 0417 before and after add > > host-deploy, vdsm, and engine logs from the failed addition Ryan, Attach all files required, and clarify them into 0403 and 0417.
From all tests did on 0417, we observed the following phenomenon: 1. Create bond0 over em1 + em2(em1 was set to master slave), The bond0 got the em2's mac, which ip was 10.73.131.184. 2. Add host over bond0, during the installation, the bond0's mac was changed to em1's, which ip was 10.73.131.65. 3. After adding failed, the bond0's ip was disappear. But for tests did on 0403: 1. Bond0 got em1(master)'s mac, which ip was 10.73.131.65. 2. Add host over bond0, the mac there was not changed, and the ip was always 10.73.131.65
Reassigning to vdsm for tracking. The cause of this seems to be a known problem with NM/cockpit changing IPs if the active mac changes. There are workaround for this.
The proposed patch (https://gerrit.ovirt.org/77933) should be suitable for RHVH, as the VDSM has been already installed on it with the NM configuration file. Note that the NM configuration that enables adding slaves to a bond in the order of the slaves names (same as initscripts order) will be available in RHEL 7.4, with NM version 1.8.
Move to Modify status due to no 4.2 build available to verify this bug.
Verified on build rhvh-4.2-0.20171102.0+1 over a bond without specified mac address on cockpit Test version: vdsm-4.20.6-1.el7ev.x86_64 rhvh-4.2-0.20171102.0+1 rhvm: 4.2.0-0.4.master.el7 NetworkManager-1.8.0-11.el7_4.x86_64 Test steps: 1. Install rhvh via pxe 2. Login to cockpit, enter into Network page 3. Setup a dhcp bond(active+backup mode) over two nics, do not specify mac address 4. Add rhvh to engine over the bond Actual result: 1. After step#4, rhvh were added to engine successfully, status is up Additional info: 1. If specify the mac address on cockpit while configuring bond, add can be failed , which was tracked in bug 1422430.
Edward, can you please check that there are no regressions here? See https://bugzilla.redhat.com/show_bug.cgi?id=1556666#c14 If a MAC is specified, everything goes haywire rebooting -- the IP changes, NM reports that ifcfg files are removed and NM is reloaded, etc. This only happens on the first reboot, and everything works (with the changed IP) every time after that. See https://bugzilla.redhat.com/show_bug.cgi?id=1556666#c8 for relevant entries
We have covered several bond related issues in 4.2, some have been triggered by RHEL 7.5. This BZ has been verified on 4.2 while the new BZ is on 4.1. If I'm not mistaken, the target version for this BZ is 4.2 anyway. I'm guessing that some new point has been touched. I would suggest checking if this problem exists on 4.2. If this is a 4.1 only problem, then we can assume it was resolved in 4.2 and can negotiate backporting some changes. Here is one that smells related: https://gerrit.ovirt.org/#/c/83399 But there are probably some others as well.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:1489
BZ<2>Jira Resync