Bug 781578

Summary: irqbalance logs the same message every few seconds
Product: [Fedora] Fedora Reporter: Andreas Girgensohn <andreasg123>
Component: irqbalanceAssignee: Neil Horman <nhorman>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: low Docs Contact:
Priority: unspecified    
Version: 16CC: anton, wd
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: irqbalance-1.0.3-2.fc16 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-01-25 22:29:16 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log/messages
none
/proc/cpuinfo
none
/proc/interrupts
none
/var/log/messages for system-2
none
Output of "irqbalance --debug" for system-1
none
Output of "irqbalance --debug" for system-2 none

Description Andreas Girgensohn 2012-01-13 19:12:54 UTC
Created attachment 555129 [details]
/var/log/messages

Description of problem:

This is the same problem as described in closed bug 755570.  However, this system runs on bare metal and not in a VM (the reason why the other bug was closed).  I should note that this system was upgraded to Fedora 16 with yum.

The following message appears every 10-60 seconds.  No other irqbalance messages appear in the log.

/usr/sbin/irqbalance: Load average increasing, re-enabling all cpus for irq balancing

Version-Release number of selected component (if applicable):

irqbalance-1.0.3-1.fc16.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Use yum to upgrade a system to Fedora 16.
2. Run the system.
  
Actual results:

The message appears very frequently.

Expected results:

The message should only appear occasionally.


Additional info:

I tried "yum reinstall irqbalance" and restarted the service without any change.

There is a kernel problem in the log that may or may not be related to the irqbalance problem.

Jan 12 19:28:02 talkminer kernel: [    6.612545] ioatdma: Intel(R) QuickData Technology Driver 4.00
Jan 12 19:28:02 talkminer kernel: [    6.612629] ioatdma 0000:00:0f.0: PCI INT A -> GSI 33 (level, low) -> IRQ 33
Jan 12 19:28:02 talkminer kernel: [    6.612633] ------------[ cut here ]------------
Jan 12 19:28:02 talkminer kernel: [    6.612640] WARNING: at drivers/iommu/intel-iommu.c:3216 quirk_ioat_snb_local_iommu+0xab/0xc0()
Jan 12 19:28:02 talkminer kernel: [    6.612642] Hardware name: X7DWU
Jan 12 19:28:02 talkminer kernel: [    6.612643] BIOS assigned incorrect VT-d unit for Intel(R) QuickData Technology device
Jan 12 19:28:02 talkminer kernel: [    6.612645] Modules linked in: ioatdma(+) intel_rng(-) serio_raw dca i5k_amb nfsd lockd nfs_acl auth_rpcgss sunrpc uinput radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
Jan 12 19:28:02 talkminer kernel: [    6.612657] Pid: 497, comm: modprobe Not tainted 3.1.7-1.fc16.x86_64 #1
Jan 12 19:28:02 talkminer kernel: [    6.612659] Call Trace:
Jan 12 19:28:02 talkminer kernel: [    6.612666]  [<ffffffff8106b73f>] warn_slowpath_common+0x7f/0xc0
Jan 12 19:28:02 talkminer kernel: [    6.612668]  [<ffffffff8106b7df>] warn_slowpath_fmt_taint+0x3f/0x50
Jan 12 19:28:02 talkminer kernel: [    6.612671]  [<ffffffff814a2c8b>] quirk_ioat_snb_local_iommu+0xab/0xc0
Jan 12 19:28:02 talkminer kernel: [    6.612676]  [<ffffffff812d732b>] pci_fixup_device+0x8b/0x120
Jan 12 19:28:02 talkminer kernel: [    6.612681]  [<ffffffff812d013c>] do_pci_enable_device+0x5c/0x70
Jan 12 19:28:02 talkminer kernel: [    6.612683]  [<ffffffff812d01c7>] __pci_enable_device_flags+0x77/0xc0
Jan 12 19:28:02 talkminer kernel: [    6.612686]  [<ffffffff812d0223>] pci_enable_device+0x13/0x20
Jan 12 19:28:02 talkminer kernel: [    6.612689]  [<ffffffff812d0298>] pcim_enable_device+0x68/0xd0
Jan 12 19:28:02 talkminer kernel: [    6.612695]  [<ffffffffa0178648>] ioat_pci_probe+0x13/0x175 [ioatdma]
Jan 12 19:28:02 talkminer kernel: [    6.612698]  [<ffffffff812d14dc>] local_pci_probe+0x5c/0xd0
Jan 12 19:28:02 talkminer kernel: [    6.612701]  [<ffffffff812d2d79>] pci_device_probe+0x109/0x130
Jan 12 19:28:02 talkminer kernel: [    6.612706]  [<ffffffff8138baec>] driver_probe_device+0x9c/0x2b0
Jan 12 19:28:02 talkminer kernel: [    6.612708]  [<ffffffff8138bdab>] __driver_attach+0xab/0xb0
Jan 12 19:28:02 talkminer kernel: [    6.612710]  [<ffffffff8138bd00>] ? driver_probe_device+0x2b0/0x2b0
Jan 12 19:28:02 talkminer kernel: [    6.612713]  [<ffffffff8138bd00>] ? driver_probe_device+0x2b0/0x2b0
Jan 12 19:28:02 talkminer kernel: [    6.612717]  [<ffffffff8138a96c>] bus_for_each_dev+0x5c/0x90
Jan 12 19:28:02 talkminer kernel: [    6.612719]  [<ffffffff8138b6fe>] driver_attach+0x1e/0x20
Jan 12 19:28:02 talkminer kernel: [    6.612722]  [<ffffffff8138b300>] bus_add_driver+0x1b0/0x2a0
Jan 12 19:28:02 talkminer kernel: [    6.612724]  [<ffffffffa0076000>] ? 0xffffffffa0075fff
Jan 12 19:28:02 talkminer kernel: [    6.612727]  [<ffffffff8138c306>] driver_register+0x76/0x140
Jan 12 19:28:02 talkminer kernel: [    6.612730]  [<ffffffff81160bbc>] ? kmem_cache_create+0x2ac/0x2f0
Jan 12 19:28:02 talkminer kernel: [    6.612733]  [<ffffffffa0076000>] ? 0xffffffffa0075fff
Jan 12 19:28:02 talkminer kernel: [    6.612735]  [<ffffffff812d2a56>] __pci_register_driver+0x56/0xd0
Jan 12 19:28:02 talkminer kernel: [    6.612737]  [<ffffffffa0076000>] ? 0xffffffffa0075fff
Jan 12 19:28:02 talkminer kernel: [    6.612742]  [<ffffffffa0076068>] ioat_init_module+0x68/0x80 [ioatdma]
Jan 12 19:28:02 talkminer kernel: [    6.612747]  [<ffffffff81002042>] do_one_initcall+0x42/0x180
Jan 12 19:28:02 talkminer kernel: [    6.612752]  [<ffffffff810aad61>] sys_init_module+0x91/0x200
Jan 12 19:28:02 talkminer kernel: [    6.612756]  [<ffffffff815dcd82>] system_call_fastpath+0x16/0x1b
Jan 12 19:28:02 talkminer kernel: [    6.612758] ---[ end trace 400dd7ab2db00eef ]---
Jan 12 19:28:02 talkminer kernel: [    6.612760] Disabling lock debugging due to kernel taint
Jan 12 19:28:02 talkminer kernel: [    6.612809] ioatdma 0000:00:0f.0: channel enumeration error
Jan 12 19:28:02 talkminer kernel: [    6.612813] ioatdma 0000:00:0f.0: Intel(R) I/OAT DMA Engine init failed
Jan 12 19:28:02 talkminer kernel: [    6.612823] ioatdma 0000:00:0f.0: PCI INT A disabled

Comment 1 Andreas Girgensohn 2012-01-13 19:13:52 UTC
Created attachment 555130 [details]
/proc/cpuinfo

Comment 2 Andreas Girgensohn 2012-01-13 19:14:25 UTC
Created attachment 555131 [details]
/proc/interrupts

Comment 3 Andreas Girgensohn 2012-01-13 19:27:34 UTC
I found the same problem on another system with dual Xeon CPUs (without the kernel problem listed above) but not on two others.

Intel(R) Xeon(R) E5649 @ 2.53GHz

None of the upgraded single-CPU or VM systems have that problem.  All were upgraded with yum.

Please let me know if you need more information about the second system.

Comment 5 Neil Horman 2012-01-16 12:14:01 UTC
sure, np

Comment 6 Neil Horman 2012-01-16 13:10:12 UTC
could you please upload a copy of /var/log/messages showing the irqbalance messages?  I need to check for the presence of additional irqbalance messages.  Also, if you could run irqbalance in debug mode (run it from the command line with --debug, rather than as a daemon service), and record the output, so I can see how its parsed the cpu topology, that would be very helpful.  Thanks!

Comment 7 Andreas Girgensohn 2012-01-16 19:10:05 UTC
Created attachment 555572 [details]
/var/log/messages for system-2

Comment 8 Andreas Girgensohn 2012-01-16 19:11:01 UTC
Created attachment 555573 [details]
Output of "irqbalance --debug" for system-1

Comment 9 Andreas Girgensohn 2012-01-16 19:11:38 UTC
Created attachment 555574 [details]
Output of "irqbalance --debug" for system-2

Comment 10 Andreas Girgensohn 2012-01-16 19:16:32 UTC
I had already uploaded /var/log/messages for the first system (the first attachment above).  I now uploaded it for the second system as well.  There are no other irqbalance messages for days.

I also uploaded the output for "irqbalance --debug" for both systems.

The first system has two 4-core CPUs.  The second system has two 6-core CPUs with hyperthreading.

Comment 11 Neil Horman 2012-01-16 19:50:13 UTC
Thank you,  that helps. I've managed to find a system that I can reproduce this on here, I'll look into this further

Comment 12 Neil Horman 2012-01-16 20:35:38 UTC
I apologize, I just noticed the bone headed thing that I did.  I actually fixed the problem in bz755570 and comitted it upstream, but because the other system was a virt guest that had so many other oddities, we decided to close it as a CANTFIX, and so I never comitted the actual fix to f16.  Its in there now, and I'll be pushing an update in just a bit.

Comment 13 Fedora Update System 2012-01-16 20:58:59 UTC
irqbalance-1.0.3-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/irqbalance-1.0.3-2.fc16

Comment 14 Anton Arapov 2012-01-17 17:27:27 UTC
*** Bug 782463 has been marked as a duplicate of this bug. ***

Comment 15 Fedora Update System 2012-01-17 20:26:39 UTC
Package irqbalance-1.0.3-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing irqbalance-1.0.3-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0622/irqbalance-1.0.3-2.fc16
then log in and leave karma (feedback).

Comment 16 Fedora Update System 2012-01-25 22:29:16 UTC
irqbalance-1.0.3-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.