Bug 1323782
Summary: | vdsm-restore-network leaves inconsistent RunningConfig after a failed restoration | ||||||||
---|---|---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Prasanth <prasanth.krishnamoorthy> | ||||||
Component: | vdsm | Assignee: | Edward Haas <edwardh> | ||||||
Status: | CLOSED ERRATA | QA Contact: | Michael Burman <mburman> | ||||||
Severity: | high | Docs Contact: | |||||||
Priority: | high | ||||||||
Version: | unspecified | CC: | abdel.sadek, bazulay, cshao, ctatman, danken, edwardh, fdeutsch, gklein, huzhao, leiwang, lsurette, mburman, myakove, prasanth.krishnamoorthy, srevivo, xdl-apbu-iop-bz, ycui, ykaul, ylavi | ||||||
Target Milestone: | ovirt-4.0.0-alpha | Keywords: | ZStream | ||||||
Target Release: | 4.0.0 | ||||||||
Hardware: | x86_64 | ||||||||
OS: | Linux | ||||||||
Whiteboard: | |||||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||||
Doc Text: | Story Points: | --- | |||||||
Clone Of: | |||||||||
: | 1332027 (view as bug list) | Environment: | |||||||
Last Closed: | 2016-08-23 20:15:32 UTC | Type: | Bug | ||||||
Regression: | --- | Mount Type: | --- | ||||||
Documentation: | --- | CRM: | |||||||
Verified Versions: | Category: | --- | |||||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||||
Cloudforms Team: | --- | Target Upstream Version: | |||||||
Embargoed: | |||||||||
Bug Depends On: | |||||||||
Bug Blocks: | 1332027 | ||||||||
Attachments: |
|
Description
Prasanth
2016-04-04 16:59:10 UTC
Would you please attach /var/log/vdsm/supervdsm.log ? Created attachment 1146940 [details]
supervdsm log attached
Investigation of supervdsm log showed at least two issues. - The first, is related to IPv6 being used and reported at boot. Persistent and kernel config is compared and wrongly seen as different, due to the IPv6 address in kernel. The difference causes these networks to be removed and re-added. This issue has been resolved in vdsm-4.17.25, by ignoring IPv6 addresses in the comparison. As an workaround, IPv6 can be disabled on the host if that is possible. This issue is not severe, as it is limited to the initial host boot and causes juts a reconfiguration of existing networks. - The second issue is caused by a failure to restore or reconfigure a network after the host boot. In our case, the attempt to reconfigure ovirtmgmt network fails after ifup times out (no link?). This failure causes the running config to contain non existence networks and retries to restore the networks will always fail as it is unable to remove them. A fix for this issue has been submitted for review. Is there a tentative date on when the fix will be released. Its completely blocking our testing and release for support with our products.If you can provide us a tentative date on the fix coming with ovirt4.0, we can move on with other things and come back to it at the right time. Prasanth, Next beta build is going to have a fix for the secondary issue, but I wonder how does this block your further testing? Does it reproduce often on other hosts? Can you not disable ipv6 on your host? You may also clear the content of /var/lib/vdsm/persistence and /etc/libvirt/qemu/networks/vdsm-* and reboot to start afresh. (In reply to Prasanth from comment #6) > Its completely blocking our testing and release for support with our > products.If you can provide us a tentative date on the fix coming with > ovirt4.0, we can move on with other things and come back to it at the right > time. Can you try to work with the nightly builds? It should be fixed there. (In reply to Dan Kenigsberg from comment #7) > Prasanth, Next beta build is going to have a fix for the secondary issue, > but I wonder how does this block your further testing? Does it reproduce > often on other hosts? Can you not disable ipv6 on your host? > > You may also clear the content of /var/lib/vdsm/persistence and > /etc/libvirt/qemu/networks/vdsm-* and reboot to start afresh. Dan, Its very persistent in all three hypervisors ( 7.2, 7.1 and 7.0 ) im running with. Also IPV6 is not getting disabled by using the GUI option and had problems with sysctl as well > bust once set to static it going fine so far. Three independent reproductions sounds very odd (and worrying). What happens when you remove /var/lib/vdsm/persistence/* /etc/libvirt/qemu/networks/vdsm-* and reboot? You can also add ipv6.disable=1 to your kernel command line. (In reply to Dan Kenigsberg from comment #10) > Three independent reproductions sounds very odd (and worrying). What happens > when you remove /var/lib/vdsm/persistence/* > /etc/libvirt/qemu/networks/vdsm-* and reboot? > > You can also add ipv6.disable=1 to your kernel command line. Dan, The Hypervisors are up and optimal after removing the files you have mentioned and by disabling IPv6. As i said before it was very consistent before. Im also building a completely new config in a different lab. Let me see if i see this issue again without using the workaround provided. Will keep you posted. Verified on - 4.0.0.2-0.1.el7ev with vdsm-4.18.1-11.gita92976e.el7ev.x86_64 rhevh7-ng-4.0-0.20160607.0+1 I ran an update to rhev-m 3.6.6.2-0.1.el6 and installed a new rhev-h 7.2 20160516.0.el7ev Disabled ipv6 at kernel command line on boot and the hypervisor still fails to add. Retried after removing /var/lib/vdsm/persistence/* /etc/libvirt/qemu/networks/vdsm-* and rebooted with ipv6 disabled that didnt work either. Created attachment 1167972 [details]
Supervdsm.log_new
This now looks like a different problem, please open a new bug if the below is not helpful. The setup of the ovirtmgmt network fails because there was no response from the dhcp server for 120sec, then VDSM reverts the changes and returns the original configuration/ifcfg file. The initial em1 iface had no dhcp set on it (I would expect BOOTPROTO=dhcp), so I am not sure what was there before. DEVICE="em1" HWADDR="b8:ca:3a:68:3b:a0" IPV6INIT="no" IPV6_AUTOCONF="no" NM_CONTROLLED="no" ONBOOT="yes" PEERNTP="yes" Perhaps you should try a static IP and check if it works for you and then return back and debug dhcp (the the static one worked). (In reply to Edward Haas from comment #15) > This now looks like a different problem, please open a new bug if the below > is not helpful. > > The setup of the ovirtmgmt network fails because there was no response from > the dhcp server for 120sec, then VDSM reverts the changes and returns the > original configuration/ifcfg file. > > The initial em1 iface had no dhcp set on it (I would expect BOOTPROTO=dhcp), > so I am not sure what was there before. > DEVICE="em1" > HWADDR="b8:ca:3a:68:3b:a0" > IPV6INIT="no" > IPV6_AUTOCONF="no" > NM_CONTROLLED="no" > ONBOOT="yes" > PEERNTP="yes" > > Perhaps you should try a static IP and check if it works for you and then > return back and debug dhcp (the the static one worked). It seems to work with static configuration. We have plenty of other set up working without any issues on DHCP. Not sure what is going on here. Will try debugging and keep you posted. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHEA-2016-1671.html |