Bug 1029343

Summary: irqbalance is broken in guest
Product: Red Hat Enterprise Linux 6 Reporter: Chao Yang <chayang>
Component: qemu-kvmAssignee: Radim Krčmář <rkrcmar>
Status: CLOSED DUPLICATE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 6.6CC: acathrow, alex.williamson, bsarathy, drjones, juzhang, michen, mkenneth, qzhang, rkrcmar, virt-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-11-28 18:23:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chao Yang 2013-11-12 08:32:13 UTC
Description of problem:
Booted a guest with virtio-net-pci, started irqbalance in debug mode. Stressed virtio nic by netperf. What I observed is the corresponding interrupt was distributed consistently on the 2nd vCPU.

Version-Release number of selected component (if applicable):
2.6.32-430.el6.x86_64
qemu-kvm-0.12.1.2-2.415.el6.x86_64

How reproducible:
100%

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:
CLI:
/usr/libexec/qemu-kvm -M rhel6.5.0 -cpu host -enable-kvm -m 4096 -realtime mlock=off -smp 4,sockets=2,cores=2,threads=1 -nodefaults -monitor stdio -boot menu=on -rtc base=utc,clock=host,driftfix=slew -device piix3-usb-uhci,id=usb,bus=pci.0,addr=0x1.0x2 -device virtio-serial-pci,id=virtio-serial0,bus=pci.0 -drive file=/home/rhel6.5.qcow2,if=none,id=drive-virtio-disk0,format=qcow2,cache=none,werror=stop,rerror=stop,aio=native -device virtio-blk-pci,scsi=off,bus=pci.0,drive=drive-virtio-disk0,id=virtio-disk0,bootindex=1 -netdev tap,id=hostnet0,vhost=on -device virtio-net-pci,netdev=hostnet0,id=net0,mac=46:1a:4a:42:48:25,bus=pci.0 -k en-us -vga cirrus -device intel-hda,id=sound0,bus=pci.0 -device hda-duplex,id=sound0-codec0,bus=sound0.0,cad=0 -device virtio-balloon-pci,id=balloon0,bus=pci.0 -vnc :1


-- In guest:

# lspci | grep Eth
00:05.0 Ethernet controller: Red Hat, Inc Virtio network device
# dmesg | grep 00:05.0
pci 0000:00:05.0: reg 10: [io  0xc0c0-0xc0df]
pci 0000:00:05.0: reg 14: [mem 0xf2022000-0xf2022fff]
pci 0000:00:05.0: reg 30: [mem 0xf2030000-0xf203ffff pref]
virtio-pci 0000:00:05.0: PCI INT A -> Link[LNKA] -> GSI 10 (level, high) -> IRQ 10
virtio-pci 0000:00:05.0: irq 28 for MSI/MSI-X
virtio-pci 0000:00:05.0: irq 29 for MSI/MSI-X
virtio-pci 0000:00:05.0: irq 30 for MSI/MSI-X


-- A snip from output of irqbalance in guest:
----------------------------------------------------------------------------
Package 0:  numa_node is 0 cpu mask is 00000003 (load 0)
        Cache domain 0:  numa_node is 0 cpu mask is 00000001  (load 0) 
                CPU number 0  numa_node is 0 (load 0)
                  Interrupt 30 node_num is -1 (ethernet/1) 
          Interrupt 25 node_num is -1 (storage/1) 
        Cache domain 1:  numa_node is 0 cpu mask is 00000002  (load 0) 
                CPU number 1  numa_node is 0 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/1) 
          Interrupt 11 node_num is -1 (legacy/1) 
  Interrupt 26 node_num is -1 (other/1) 
Package 0:  numa_node is 0 cpu mask is 0000000c (load 13000)
        Cache domain 2:  numa_node is 0 cpu mask is 00000004  (load 26000) 
                CPU number 2  numa_node is 0 (load 26000)
                  Interrupt 29 node_num is -1 (ethernet/1) 
          Interrupt 9 node_num is -1 (legacy/1) 
        Cache domain 3:  numa_node is 0 cpu mask is 00000008  (load 0) 
                CPU number 3  numa_node is 0 (load 0)
          Interrupt 24 node_num is -1 (storage/1) 
  Interrupt 10 node_num is -1 (other/1) 
  Interrupt 27 node_num is -1 (other/1) 



-----------------------------------------------------------------------------
Package 0:  numa_node is 0 cpu mask is 00000003 (load 0)
        Cache domain 0:  numa_node is 0 cpu mask is 00000001  (load 0) 
                CPU number 0  numa_node is 0 (load 0)
                  Interrupt 30 node_num is -1 (ethernet/1) 
          Interrupt 25 node_num is -1 (storage/1) 
        Cache domain 1:  numa_node is 0 cpu mask is 00000002  (load 0) 
                CPU number 1  numa_node is 0 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/1) 
          Interrupt 11 node_num is -1 (legacy/1) 
  Interrupt 26 node_num is -1 (other/1) 
Package 0:  numa_node is 0 cpu mask is 0000000c (load 16500)
        Cache domain 2:  numa_node is 0 cpu mask is 00000004  (load 33000) 
                CPU number 2  numa_node is 0 (load 33000)
                  Interrupt 29 node_num is -1 (ethernet/1) 
          Interrupt 9 node_num is -1 (legacy/1) 
        Cache domain 3:  numa_node is 0 cpu mask is 00000008  (load 0) 
                CPU number 3  numa_node is 0 (load 0)
          Interrupt 24 node_num is -1 (storage/1) 
  Interrupt 10 node_num is -1 (other/1) 
  Interrupt 27 node_num is -1 (other/1) 



-----------------------------------------------------------------------------
Package 0:  numa_node is 0 cpu mask is 00000003 (load 500)
        Cache domain 0:  numa_node is 0 cpu mask is 00000001  (load 1000) 
                CPU number 0  numa_node is 0 (load 1000)
                  Interrupt 30 node_num is -1 (ethernet/750) 
          Interrupt 25 node_num is -1 (storage/994) 
        Cache domain 1:  numa_node is 0 cpu mask is 00000002  (load 0) 
                CPU number 1  numa_node is 0 (load 0)
                  Interrupt 28 node_num is -1 (ethernet/1) 
          Interrupt 11 node_num is -1 (legacy/1) 
  Interrupt 26 node_num is -1 (other/1) 
Package 0:  numa_node is 0 cpu mask is 0000000c (load 19000)
        Cache domain 2:  numa_node is 0 cpu mask is 00000004  (load 38000) 
                CPU number 2  numa_node is 0 (load 38000)
                  Interrupt 29 node_num is -1 (ethernet/1) 
          Interrupt 9 node_num is -1 (legacy/1) 
        Cache domain 3:  numa_node is 0 cpu mask is 00000008  (load 0) 
                CPU number 3  numa_node is 0 (load 0)
          Interrupt 24 node_num is -1 (storage/1) 
  Interrupt 10 node_num is -1 (other/1) 
  Interrupt 27 node_num is -1 (other/1)

Comment 2 Chao Yang 2013-11-12 09:40:40 UTC
Reproducible with 6.4.Z kernel and qemu-kvm as well as rhel6.5 guest.

2.6.32-358.18.1.el6.x86_64
qemu-kvm-rhev-0.12.1.2-2.355.el6_4.9.x86_64

Comment 3 Radim Krčmář 2013-11-12 14:12:19 UTC
One interrupt can't be balanced across multiple cpus with '-cpu host', because it selects physical x2apic at this time. (nox2apic kernel parameter also works)
irqbalance then can't do much with just one interrupt source -- moving it to another cpu is not making anything better.

But there is a deficiency in irqbalance that the reporter might have meant:
even with priority balancing, irqbalance still sets smp_affinity to just one cpu, although we could achieve better balance if it considered wider masks.

Comment 4 Radim Krčmář 2013-11-28 18:23:55 UTC
This is expected, for reasons in comment #3; see the parent bug for behavior details.

*** This bug has been marked as a duplicate of bug 960383 ***