Bug 655855
Summary: | [RHEL6] network stops functioning on Hyper-V hosted VMs with irqbalance daemon started | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | asilva <asilva> |
Component: | kernel | Assignee: | Anton Arapov <anton> |
Status: | CLOSED WONTFIX | QA Contact: | Red Hat Kernel QE team <kernel-qe> |
Severity: | medium | Docs Contact: | |
Priority: | medium | ||
Version: | 6.0 | CC: | Jan.van.Eldik, jmunilla, jwest, kn, mm, nhorman, nobody, rdassen, redhat-bugzilla, robert.scheck, Stuart.Kirk |
Target Milestone: | rc | Keywords: | Reopened |
Target Release: | --- | ||
Hardware: | x86_64 | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2011-01-19 17:49:28 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 662543 |
Description
asilva
2010-11-22 14:44:15 UTC
Neil, any guess here? This request was evaluated by Red Hat Product Management for inclusion in the current release of Red Hat Enterprise Linux. Because the affected component is not scheduled to be updated in the current release, Red Hat is unfortunately unable to address this request at this time. This request has been proposed for the next release of Red Hat Enterprise Linux. If you would like it considered as an exception in the current release, please ask your support representative. Anton, sorry its taken me so long to see this. In regards to your question, I'm not entirely sure whats going on here. What I can say definitively is that its not (strictly speaking) an irqbalance problem. The irqbalance daemon moves processor affinity around by writing to /proc/irq/<irqn>/smp_affinity. So even if you turned off irqbalance, you could (or should), if such affinity movements are the root of this problem, be able to trigger the issue with a manual echo of an affinity mask to that same file. As such, it would seem to me to be the kernels job to prevent such malicious behavior. Comparing hyper-v and xen, it would seem to me that the hyper-v kernel code needs to remap the irqs that get requested in the guest to a new irq_chip structure so that it has control over the set_affinity method when user space changes affinity (see bind_evtchn_to_irq for an example). Alternatively, hyper-v could add some arch specific code to flag all irq chips as not supporting affinity movement, so that userspace can't make any changes. Or, the customer could use kvm, which already has this hashed out in qemu. Closing the bug as WONTFIX, that stands for unable to fix here. This is the arch-specific issue that must be addressed in hyper-v kernel. Still same issue with 6.2.. Will this be fixed? This issue is discussed in <https://access.redhat.com/kb/docs/DOC-49132>, "Network stops functioning on RHEL6 guest under Hyper-V when the irqbalance service is started". The Hyper-V hypervisor is a product of Microsoft. Fixing its limitations is outside Red Hat's scope. If you would like to see this limitation addressed, please contact your Microsoft support representative. I did thought that virtualization works this way: 1) Guest OSs are "unaware" of being virtualized. 2) Hypervisor is called only when needed - - -> facilitate simultaneous operation of OSs and protect access to SHARED system resources.. The link above seems useless, how can you adress hyper-v as being the root cause here? I reproduced the same issue with red hat el 6.1 / 6.2 with only 1vcpu.. From my point of view, the /etc/rc.d/init.d/irqbalance should have a section like this in the beginning: if [ -x /usr/sbin/virt-what ]; then if [ "$(/usr/sbin/virt-what)" = "hyperv" ]; then exit 1 fi fi I have opened case 00598829 in the Red Hat customer portal to request this workaround or similar or better. *** Bug 788607 has been marked as a duplicate of this bug. *** Robert, you have the right idea, but look at the description of the bug - RHEL is the Guest OS here, when we alter irq affinity, we do so without any knoweldge of the fact that we are running under a hyperv hypervisor. It is the responsibility of the hypervisor to trap such affinity changes and prevent them from occuring, if doing so will stop delivery of needed interrupts to the guest. The Hyper-V hypervisor is a microsoft product, hence this is their problem to fix |