Bug 1026178

Summary: irqbalance service not work properly with 82599EB PF/VF
Product: Red Hat Enterprise Linux 7 Reporter: Xu Han <xuhan>
Component: qemu-kvmAssignee: Bandan Das <bdas>
Status: CLOSED CURRENTRELEASE QA Contact: Virtualization Bugs <virt-bugs>
Severity: medium Docs Contact:
Priority: medium    
Version: 7.0CC: acathrow, alex.williamson, chayang, hhuang, juzhang, michen, virt-maint, xfu, xuhan
Target Milestone: rc   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-21 22:38:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Xu Han 2013-11-04 06:26:38 UTC
Description of problem:
irqbalance service not work properly with 82599EB PF/VF.

Version-Release number of selected component (if applicable):
qemu-kvm-rhev-1.5.3-10.el7.x86_64
kernel-3.10.0-40.el7.x86_64

How reproducible:
always

Steps to Reproduce:
1. boot guest with 82599EB PF/VF.
# /usr/libexec/qemu-kvm -nodefaults -M pc -m 2G -cpu Nehalem -smp 4,cores=2,threads=2,sockets=1 -boot menu=on -monitor stdio -vga qxl -spice disable-ticketing,port=5931 -drive file=/home/vfio-RHEL7.0-64.qcow2_v3,id=guest-img,if=none,cache=none,aio=native -device virtio-blk-pci,scsi=off,drive=guest-img,id=os-disk,bootindex=1 -device virtio-balloon-pci,id=balloon -global PIIX4_PM.disable_s3=0 -global PIIX4_PM.disable_s4=0 -qmp tcp:0:5555,server,nowait -serial unix:/tmp/guest-sock,server,nowait \
-device vfio-pci,host=05:10.0,id=vf0

2. start irqbalance.
# service irqbalance start

3. check interrupts and smp_affinity on guest.
# cat /proc/interrupts | grep ens5; \
  cat /proc/irq/42/smp_affinity; \
  sleep 120; \
  cat /proc/interrupts | grep ens5; \
  cat /proc/irq/42/smp_affinity


Actual results:
step 2:
# service irqbalance status
Redirecting to /bin/systemctl status  irqbalance.service
irqbalance.service - irqbalance daemon
   Loaded: loaded (/usr/lib/systemd/system/irqbalance.service; enabled)
   Active: active (running) since Sun 2013-11-03 22:49:52 MST; 21min ago
 Main PID: 603 (irqbalance)
   CGroup: /system.slice/irqbalance.service
           └─603 /usr/sbin/irqbalance --foreground

Nov 03 22:49:52 localhost.localdomain systemd[1]: Started irqbalance daemon.
Nov 03 23:00:15 localhost.localdomain systemd[1]: Started irqbalance daemon.
Nov 03 23:01:49 localhost.localdomain systemd[1]: Started irqbalance daemon.

step 3:
 42:         13         13         34     193847   PCI-MSI-edge      ens5-TxRx-0
 43:          5          7          7          5   PCI-MSI-edge      ens5
8 <-- smp_affinity
 42:         13         13         34     975747   PCI-MSI-edge      ens5-TxRx-0
 43:          5          7          7          5   PCI-MSI-edge      ens5
8 <-- smp_affinity

# cat /proc/irq/42/affinity_hint 
0


Expected results:
irqbalance service could work properly.

Additional info:
# lspci -vvv -s 00:05.0 | grep -i MSI
	Capabilities: [70] MSI-X: Enable+ Count=3 Masked-
	Capabilities: [a0] Express (v0) Endpoint, MSI 00

Comment 3 Alex Williamson 2013-11-04 14:22:38 UTC
(In reply to xuhan from comment #0)
> step 3:
>  42:         13         13         34     193847   PCI-MSI-edge     
> ens5-TxRx-0
>  43:          5          7          7          5   PCI-MSI-edge      ens5
> 8 <-- smp_affinity

smp_affinity is a bitmap, so 8 means CPU3 is targeted for the interrupt.

>  42:         13         13         34     975747   PCI-MSI-edge     
> ens5-TxRx-0
>  43:          5          7          7          5   PCI-MSI-edge      ens5
> 8 <-- smp_affinity

Tada, only CPU3's interrupt count increased.

> # cat /proc/irq/42/affinity_hint 
> 0

Seems like you're making an assumption that affinity_hint should be showing something else.  What do you think it should be showing?  What does it show on bare metal?

> Expected results:
> irqbalance service could work properly.

I don't see how it's not working, please double check the results and clarify exactly where it's not working.

Comment 4 Xu Han 2013-11-08 08:37:28 UTC
Currently, I have no environment to do more test. Will update the results after test finish.

Comment 5 Alex Williamson 2013-11-08 14:09:15 UTC
Re-adding needinfo

Comment 6 Xu Han 2013-11-19 08:25:59 UTC
Test again with qemu-kvm-1.5.3-19.el7.x86_64.

This time have observed the migrating of irq.
# while true; do cat /proc/interrupts | grep ens; sleep 1; done
...
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     116767          9        113          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     117587          9        113          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     118461          9        113          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     119410          9        113          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9        188          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9        668          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       1574          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       2421          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       3170          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       4173          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       4884          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       5530          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
 45:     120145          9       6202          4   PCI-MSI-edge      ens5-TxRx-0
 46:          6          5          6          7   PCI-MSI-edge      ens5
...

The affinity_hint value in comment 0 is just provided as an additional information. Actually I am not sure what value it should be. But I saw a non-zero value of other device before, so just think if it is related to this issue.

Anyway, as this test result showed, irqbalance service worked properly.

Thanks,
xuhan

Comment 7 Alex Williamson 2014-01-21 22:38:34 UTC
Comment 6 confirms irqbalance works as expected with this device, not sure why this bz is still open.  closing.