Bug 955429
Summary: | displayNetwork must have an IP address on host | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Kevein Liu <yaliu> | ||||
Component: | ovirt-engine | Assignee: | Moti Asayag <masayag> | ||||
Status: | CLOSED ERRATA | QA Contact: | GenadiC <gcheresh> | ||||
Severity: | low | Docs Contact: | |||||
Priority: | low | ||||||
Version: | 3.1.3 | CC: | acathrow, cwei, danken, dyuan, gcheresh, iheim, jkt, kcleveng, lbopf, lpeer, lyarwood, masayag, mhuth, mkalinin, mprivozn, myakove, mzhan, Rhev-m-bugs, s.kieske, sputhenp, tdosek, ydu, yeylon | ||||
Target Milestone: | --- | Keywords: | Triaged | ||||
Target Release: | 3.4.0 | ||||||
Hardware: | x86_64 | ||||||
OS: | Linux | ||||||
Whiteboard: | network | ||||||
Fixed In Version: | av3 | Doc Type: | Bug Fix | ||||
Doc Text: |
Previously, virtual machines failed to start due to "libvirtError: internal error ifname "vnet20" not in key". This happened because the display network to which the virtual machine was assigned did not have an IP address configured on the host. Now, the engine blocks "setupNetwork" of a display network with no address, and the scheduler will attempt to start virtual machines only on a host on which the display network is configured with an IP address.
|
Story Points: | --- | ||||
Clone Of: | Environment: | ||||||
Last Closed: | 2014-06-09 14:58:53 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | Network | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Bug Depends On: | |||||||
Bug Blocks: | 1019461, 1078909, 1142926 | ||||||
Attachments: |
|
Description
Kevein Liu
2013-04-23 03:28:05 UTC
Hi Kevein, I'm trying to reproduce this bug. As no specific reproduce steps, i just trying to start a vm which has more than 20 nic in RHEVM, but can reproduce it. :( I'm using: # rpm -q libvirt libvirt-0.10.2-18.el6_4.4.x86_64 # rpm -q vdsm vdsm-4.10.2-1.9.el6ev.x86_64 I just attached the vm xml file, and could your help to check it? BTW, could you please provide the domain xml which encounter this bug? Thanks! Unfortunately, I cannot access the logs. Kevein, can you please attach logs to the BZ? Hopefully, I will get more insight from the logs. Created attachment 739374 [details]
Problematic OVF file
I've managed to find and dig out libvirt logs. Here's a short snippet which is causing the trouble: 2013-04-22 00:25:15.071+0000: 43401: error : virNetDevGetIPv4Address:834 : Unable to get IPv4 address for interface vlan111: Cannot assign requested address 2013-04-22 00:25:15.071+0000: 43401: debug : virFileClose:72 : Closed fd 159 2013-04-22 00:25:15.071+0000: 43401: error : qemuBuildCommandLine:6130 : XML error: listen network 'vdsm-vlan111' had no usable address 2013-04-22 00:25:15.071+0000: 43401: error : virNWFilterDHCPSnoopEnd:2131 : internal error ifname "vnet20" not in key map So there are two problems: 1) we are overwriting previously reported error 2) why doesn't "vdsm-vlan111" have any usable address For the first problem I've just posted a patch: https://www.redhat.com/archives/libvir-list/2013-April/msg01738.html For the second problem, unfortunately, there's not an XML of the network in the logs so I don't know why it doesn't have any usable address. Kevein, Dan and others - do you have any bright idea in case 'virsh net-dumpxml vdsm-vlan111' doesn't work (even if it does - are we guaranteed it is the very same network?). I think the best solution is to gather logs immediately when the error occurs again. And by logs I mean not only libvirt/vdsm logs, but routing table, iptables, ebtables listings as well. (In reply to comment #16) > > On[e] thing is vlan111 doesn't have any IPv4 address > assigned, the other is if it should have one. But I think, once we find > setupNetwork we will know the XML immediately, isn't that right Dan? Correct. My guess is that the vlan111 network was configured on host with no IP address. The problem is twofold: 1. Engine should block setupNetwork of display network with no address. 2. Engine should avoid starting VMs whose displayNetwork has no IP address on hosts that somehow lost their address (i.e. bad dhcp server) > > BTW any reason for these comments to be private? Privacy is viral :-( Hi, This issue happened again, could anyone provide a check list that I can get those information for further investigation? Thank you! (In reply to comment #21) > This issue happened again, could anyone provide a check list that I can get > those information for further investigation? What is their cluster displayNetwork? Still this vlan111 network? What is the IP configuration for this network on the cluster hosts (static/dhcp/none)? And in particular, on the host that fails to start the VM? The admin has to ensure that the displayNetwork has an IP address on each and every host. Mark et all, I am still not fully convinced where the real bug is. I know we've moved from libvirt to ovirt-engine, but I'd like to be 100% sure. Which means, we need logs from network setup process. Do you think it is possible to gather logs from setupNetwork command in vdsm.log? I still can't find it anywhere. I know we have thousands of logs here, but none of them contains that kind of info. I just want to make sure somebody really did started a network without an IP address. The other possibility is, the network was stated with an IP address assigned, but something has taken it away. Either libvirt itself, or ... Hi Michal,
I did a grep of all the vdsm.log* files on the hypervisor and setupNetwork wasn't matched in any of them (and I'm sure the relevant logs hadn't been rotated away).
From the rhev-prio-list "New critical issues from China Zhuji" email thread ...
<thread>
> Then customer found the "rhevm" is not the Display network, and made
> it as Display network. So the issue was repaired.
So if I understand correctly the original conclusion of engineering is
correct and what should be fixed is to avoid (and ATM I don't say how)
to run Virtual Machines on a host that has it's Display-Network does
not have an IP + inform this problematic status to the user.
</thread>
We are not sure how the display network was changed in the RHEVM WebUI. Customer is sure they didn't change it away from rhevm, but when they changed it back to the rhevm network, the problem was resolved (ie VMs could be started again). So it seems the problem was on the RHEVM in that it was passing the wrong displayNetwork to the hypervisor thus preventing VMs from starting.
I hope that information is helpful. If not, please let me know.
-- Mark
(In reply to Dan Kenigsberg from comment #17) > (In reply to comment #16) > > > > On[e] thing is vlan111 doesn't have any IPv4 address > > assigned, the other is if it should have one. But I think, once we find > > setupNetwork we will know the XML immediately, isn't that right Dan? > > Correct. My guess is that the vlan111 network was configured on host with no > IP address. The problem is twofold: > > 1. Engine should block setupNetwork of display network with no address. > This can be achieved by requiring Static or DHCP boot protocol for the display network, I'd suggest also to require it on the attach/update network api of the engine which seems to be commonly used by customers. > 2. Engine should avoid starting VMs whose displayNetwork has no IP address > on hosts that somehow lost their address (i.e. bad dhcp server) > This is more tricky since the notorious race between the response from the DHCP server to the getVdsCaps after the network command is completed may occur and we might block the operation when we shouldn't. We can however consider a specific refreshCapilities call to verify the actual address of the display network if not exist. However, running multiple VMs are once will cost more resources from the host for that. When Bug 999947 will be implemented, it will be simpler to validate the display network is properly configured with IP address. > > > > BTW any reason for these comments to be private? > > Privacy is viral :-( lowering priority given comment#31. Following best practice to have ip address on the display network should work. With the suggested fix, a host which has no boot protocol configured for its display network, will be selected by the scheduler to run vms. Verified in AV3 that the VM is started when the display network has configured IP and fail on Can do action when the display network doesn't have IP Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHSA-2014-0506.html |