Bug 975474 - add ConditionVirtualization to service file
Summary: add ConditionVirtualization to service file
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: irqbalance
Version: 19
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
Assignee: Petr Holasek
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2013-06-18 14:31 UTC by Bill Nottingham
Modified: 2016-10-04 04:09 UTC (History)
16 users (show)

Fixed In Version: irqbalance-1.0.5-4.fc19
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2013-06-28 17:39:13 UTC
Type: Bug
Embargoed:


Attachments (Terms of Use)

Description Bill Nottingham 2013-06-18 14:31:31 UTC
Description of problem:

From a mail from Neil Horman
...
- Should it be disabled on virt?
    Maybe.  Depends on the hypervisor and how it maps virtual cpus to
physical cpus, and how its virtual apic routes interrupts.  On Xen,
definately disable it, on KVM I think you should also typically
disable it, but I'm not 100% sure.
...

Hence, something like:

ConditionVirtualization=False

could be useful in the service file.

Version-Release number of selected component (if applicable):

F19 current

Comment 1 Fedora Update System 2013-06-19 15:15:03 UTC
irqbalance-1.0.5-3.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/irqbalance-1.0.5-3.fc19

Comment 2 Alex Williamson 2013-06-19 20:46:01 UTC
Is there actually a bug being fixed here or is this just code churn?  I don't see why irqbalance would be disabled on KVM, or any hypervisor for that matter.  Don't we still want to spread the interrupt load across the vCPUs?  Interrupt affinity in the guest does work in KVM.

Comment 3 Bill Nottingham 2013-06-19 20:51:11 UTC
Given the relationship of vCPUs to pCPUs in different hypervisors, it seems unclear that redistributing the interrupts actually accomplishes anything if the vCPU is going to be randomly scheduled on a variety of pCPUs. 

See Neil's comment. I'll buy that it might need refinement based on which hypervisor/how the guests are set up.

Comment 4 Alex Williamson 2013-06-19 21:10:15 UTC
Sure, making a decision about the optimal CPU to move the interrupt to is a problem, but irqbalance supports plenty of systems where it doesn't know the optimal affinity of an interrupt.  Isn't there an obvious problem with deciding to turn off irqbalance that we potentially now have all interrupts on a single CPU?  That's clearly worse than non-optimal affinity per IRQ.

Comment 5 Michael S. Tsirkin 2013-06-20 05:48:41 UTC
Disabling irq balancing on kvm would be a serious bug.
I'm not sure about other hypervisors, but it would likely be a bug too.

Generally, hypervisors make effort to behave just like a physical machine,
the hypervisor leaf is there for very specific corner cases where
we ship paravirtualization and should always be done entirely in kernel.
If userspace software starts detecting hypervisors and adding
special code, this will just start an arms race for hypervisor to
hide from your code.

If a hypervisor wants to disable rebalancing it already has a
way to do this in the kernel, and likely already does.

Did you make this change? Please don't make it without consulting
maintainers of kvm and xen hypervisors.

Comment 6 Gleb Natapov 2013-06-20 08:21:58 UTC
(In reply to Bill Nottingham from comment #3)
> Given the relationship of vCPUs to pCPUs in different hypervisors, it seems
> unclear that redistributing the interrupts actually accomplishes anything if
> the vCPU is going to be randomly scheduled on a variety of pCPUs. 
Irqbalancer, when running in a KVM, balances virtual interrupts between virtual cpus. There is no point where the relationship between vCPUs to pCPUs is comming into a play. HW interrupt are also irrelevent.

Please always talk to virtualization people before doing changes that affects virtualization.

Comment 7 Laszlo Ersek 2013-06-20 08:41:19 UTC
CC'ing Drew for Xen.

(In reply to Bill Nottingham from comment #3)
> Given the relationship of vCPUs to pCPUs in different hypervisors, it seems
> unclear that redistributing the interrupts actually accomplishes anything if
> the vCPU is going to be randomly scheduled on a variety of pCPUs. 

It does accomplish something: by keeping the irq load homogeneous across VCPUs, the hypervisor's notion of VCPUs carrying equal scheduling weight is not violated. The load you split in the guest the hypervisor can merge, but what you join in the guest the hypervisor can't split. That can be useful sometimes, but probably shouldn't be the default (same as "taskset" is likely the exception for guest processes).

Of course this depends on the "homogenity" that irqbalance strives for, and the stats that it uses as input. If irqbalance ensures something more sophisticated than even distribution, then *that* could actually interfere with the hypervisor's default notion of VCPUs being equal -- if the stats irqbalance collects as input make no sense on a hypervisor (they don't actually mean what irqbalance takes them for), then irqbalance's decisions are bound to be bogus. (See also: forwarding TCP-over-IP over TCP-over-IP -- the lower level TCP violates the upper level TCP's assumptions about the middle IP and the timers go haywire.)

Just my two cents.

Comment 8 Petr Holasek 2013-06-20 09:30:15 UTC
According to discussion above, I've removed irqbalance-1.0.5-3.fc19 update from Bodhi so far.

Comment 9 Andrew Jones 2013-06-20 10:27:35 UTC
xen uses event channels for pv and pvhvm guests. Each channel is associated with a single vcpu (which does not have to be the same vcpu). Or, if a guest isn't hypervisor aware (i.e. not paravirt in any way), then xen injects interrupts to the next vcpu that is round-robin selected from a list of eligible vcpus, that the guest has identified by programming is lapic (a vlapic). I don't see any reason to disallow irqbalance for xen guests, but it may not help much either. We can/should leave it to the users to decide.

Comment 10 Andrew Jones 2013-06-20 10:48:37 UTC
Gleb caught me on my comment. The round-robin selection is only for lowprio delivery, which is rarely used. So irqbalance is likely to benefit the guest.

Comment 11 Paolo Bonzini 2013-06-24 16:24:31 UTC
I second Laszlo's comment 7.  Removing visibility from the hypervisor of the guest's desired policy can only do harm.

On KVM, the hypervisor (actually the Linux scheduler) tries at least to not migrate vCPUs gratuitously from a physical CPU to another, and in some workloads they might well be pinned by management.

For example, virtio-scsi (and virtio-net I think) multiqueue completely depends on irqbalance working correctly inside guests.

Comment 12 Paolo Bonzini 2013-06-28 17:39:13 UTC
The patch has been reverted upstream too, and in the meanwhile rawhide was upgraded to 1.0.6.

Comment 13 Fedora Update System 2013-07-26 12:36:53 UTC
irqbalance-1.0.5-4.fc19 has been submitted as an update for Fedora 19.
https://admin.fedoraproject.org/updates/irqbalance-1.0.5-4.fc19

Comment 14 Fedora Update System 2013-08-02 03:35:05 UTC
irqbalance-1.0.5-4.fc19 has been pushed to the Fedora 19 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.