Bug 781578 - irqbalance logs the same message every few seconds
Summary: irqbalance logs the same message every few seconds
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Fedora
Classification: Fedora
Component: irqbalance
Version: 16
Hardware: x86_64
OS: Linux
unspecified
low
Target Milestone: ---
Assignee: Neil Horman
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
: 782463 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2012-01-13 19:12 UTC by Andreas Girgensohn
Modified: 2012-01-25 22:29 UTC (History)
2 users (show)

Fixed In Version: irqbalance-1.0.3-2.fc16
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-01-25 22:29:16 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
/var/log/messages (248.25 KB, application/octet-stream)
2012-01-13 19:12 UTC, Andreas Girgensohn
no flags Details
/proc/cpuinfo (6.05 KB, text/plain)
2012-01-13 19:13 UTC, Andreas Girgensohn
no flags Details
/proc/interrupts (5.16 KB, text/plain)
2012-01-13 19:14 UTC, Andreas Girgensohn
no flags Details
/var/log/messages for system-2 (1.65 MB, text/plain)
2012-01-16 19:10 UTC, Andreas Girgensohn
no flags Details
Output of "irqbalance --debug" for system-1 (288.00 KB, text/plain)
2012-01-16 19:11 UTC, Andreas Girgensohn
no flags Details
Output of "irqbalance --debug" for system-2 (608.00 KB, text/plain)
2012-01-16 19:11 UTC, Andreas Girgensohn
no flags Details

Description Andreas Girgensohn 2012-01-13 19:12:54 UTC
Created attachment 555129 [details]
/var/log/messages

Description of problem:

This is the same problem as described in closed bug 755570.  However, this system runs on bare metal and not in a VM (the reason why the other bug was closed).  I should note that this system was upgraded to Fedora 16 with yum.

The following message appears every 10-60 seconds.  No other irqbalance messages appear in the log.

/usr/sbin/irqbalance: Load average increasing, re-enabling all cpus for irq balancing

Version-Release number of selected component (if applicable):

irqbalance-1.0.3-1.fc16.x86_64

How reproducible:

Always

Steps to Reproduce:
1. Use yum to upgrade a system to Fedora 16.
2. Run the system.
  
Actual results:

The message appears very frequently.

Expected results:

The message should only appear occasionally.


Additional info:

I tried "yum reinstall irqbalance" and restarted the service without any change.

There is a kernel problem in the log that may or may not be related to the irqbalance problem.

Jan 12 19:28:02 talkminer kernel: [    6.612545] ioatdma: Intel(R) QuickData Technology Driver 4.00
Jan 12 19:28:02 talkminer kernel: [    6.612629] ioatdma 0000:00:0f.0: PCI INT A -> GSI 33 (level, low) -> IRQ 33
Jan 12 19:28:02 talkminer kernel: [    6.612633] ------------[ cut here ]------------
Jan 12 19:28:02 talkminer kernel: [    6.612640] WARNING: at drivers/iommu/intel-iommu.c:3216 quirk_ioat_snb_local_iommu+0xab/0xc0()
Jan 12 19:28:02 talkminer kernel: [    6.612642] Hardware name: X7DWU
Jan 12 19:28:02 talkminer kernel: [    6.612643] BIOS assigned incorrect VT-d unit for Intel(R) QuickData Technology device
Jan 12 19:28:02 talkminer kernel: [    6.612645] Modules linked in: ioatdma(+) intel_rng(-) serio_raw dca i5k_amb nfsd lockd nfs_acl auth_rpcgss sunrpc uinput radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core
Jan 12 19:28:02 talkminer kernel: [    6.612657] Pid: 497, comm: modprobe Not tainted 3.1.7-1.fc16.x86_64 #1
Jan 12 19:28:02 talkminer kernel: [    6.612659] Call Trace:
Jan 12 19:28:02 talkminer kernel: [    6.612666]  [<ffffffff8106b73f>] warn_slowpath_common+0x7f/0xc0
Jan 12 19:28:02 talkminer kernel: [    6.612668]  [<ffffffff8106b7df>] warn_slowpath_fmt_taint+0x3f/0x50
Jan 12 19:28:02 talkminer kernel: [    6.612671]  [<ffffffff814a2c8b>] quirk_ioat_snb_local_iommu+0xab/0xc0
Jan 12 19:28:02 talkminer kernel: [    6.612676]  [<ffffffff812d732b>] pci_fixup_device+0x8b/0x120
Jan 12 19:28:02 talkminer kernel: [    6.612681]  [<ffffffff812d013c>] do_pci_enable_device+0x5c/0x70
Jan 12 19:28:02 talkminer kernel: [    6.612683]  [<ffffffff812d01c7>] __pci_enable_device_flags+0x77/0xc0
Jan 12 19:28:02 talkminer kernel: [    6.612686]  [<ffffffff812d0223>] pci_enable_device+0x13/0x20
Jan 12 19:28:02 talkminer kernel: [    6.612689]  [<ffffffff812d0298>] pcim_enable_device+0x68/0xd0
Jan 12 19:28:02 talkminer kernel: [    6.612695]  [<ffffffffa0178648>] ioat_pci_probe+0x13/0x175 [ioatdma]
Jan 12 19:28:02 talkminer kernel: [    6.612698]  [<ffffffff812d14dc>] local_pci_probe+0x5c/0xd0
Jan 12 19:28:02 talkminer kernel: [    6.612701]  [<ffffffff812d2d79>] pci_device_probe+0x109/0x130
Jan 12 19:28:02 talkminer kernel: [    6.612706]  [<ffffffff8138baec>] driver_probe_device+0x9c/0x2b0
Jan 12 19:28:02 talkminer kernel: [    6.612708]  [<ffffffff8138bdab>] __driver_attach+0xab/0xb0
Jan 12 19:28:02 talkminer kernel: [    6.612710]  [<ffffffff8138bd00>] ? driver_probe_device+0x2b0/0x2b0
Jan 12 19:28:02 talkminer kernel: [    6.612713]  [<ffffffff8138bd00>] ? driver_probe_device+0x2b0/0x2b0
Jan 12 19:28:02 talkminer kernel: [    6.612717]  [<ffffffff8138a96c>] bus_for_each_dev+0x5c/0x90
Jan 12 19:28:02 talkminer kernel: [    6.612719]  [<ffffffff8138b6fe>] driver_attach+0x1e/0x20
Jan 12 19:28:02 talkminer kernel: [    6.612722]  [<ffffffff8138b300>] bus_add_driver+0x1b0/0x2a0
Jan 12 19:28:02 talkminer kernel: [    6.612724]  [<ffffffffa0076000>] ? 0xffffffffa0075fff
Jan 12 19:28:02 talkminer kernel: [    6.612727]  [<ffffffff8138c306>] driver_register+0x76/0x140
Jan 12 19:28:02 talkminer kernel: [    6.612730]  [<ffffffff81160bbc>] ? kmem_cache_create+0x2ac/0x2f0
Jan 12 19:28:02 talkminer kernel: [    6.612733]  [<ffffffffa0076000>] ? 0xffffffffa0075fff
Jan 12 19:28:02 talkminer kernel: [    6.612735]  [<ffffffff812d2a56>] __pci_register_driver+0x56/0xd0
Jan 12 19:28:02 talkminer kernel: [    6.612737]  [<ffffffffa0076000>] ? 0xffffffffa0075fff
Jan 12 19:28:02 talkminer kernel: [    6.612742]  [<ffffffffa0076068>] ioat_init_module+0x68/0x80 [ioatdma]
Jan 12 19:28:02 talkminer kernel: [    6.612747]  [<ffffffff81002042>] do_one_initcall+0x42/0x180
Jan 12 19:28:02 talkminer kernel: [    6.612752]  [<ffffffff810aad61>] sys_init_module+0x91/0x200
Jan 12 19:28:02 talkminer kernel: [    6.612756]  [<ffffffff815dcd82>] system_call_fastpath+0x16/0x1b
Jan 12 19:28:02 talkminer kernel: [    6.612758] ---[ end trace 400dd7ab2db00eef ]---
Jan 12 19:28:02 talkminer kernel: [    6.612760] Disabling lock debugging due to kernel taint
Jan 12 19:28:02 talkminer kernel: [    6.612809] ioatdma 0000:00:0f.0: channel enumeration error
Jan 12 19:28:02 talkminer kernel: [    6.612813] ioatdma 0000:00:0f.0: Intel(R) I/OAT DMA Engine init failed
Jan 12 19:28:02 talkminer kernel: [    6.612823] ioatdma 0000:00:0f.0: PCI INT A disabled

Comment 1 Andreas Girgensohn 2012-01-13 19:13:52 UTC
Created attachment 555130 [details]
/proc/cpuinfo

Comment 2 Andreas Girgensohn 2012-01-13 19:14:25 UTC
Created attachment 555131 [details]
/proc/interrupts

Comment 3 Andreas Girgensohn 2012-01-13 19:27:34 UTC
I found the same problem on another system with dual Xeon CPUs (without the kernel problem listed above) but not on two others.

Intel(R) Xeon(R) E5649 @ 2.53GHz

None of the upgraded single-CPU or VM systems have that problem.  All were upgraded with yum.

Please let me know if you need more information about the second system.

Comment 5 Neil Horman 2012-01-16 12:14:01 UTC
sure, np

Comment 6 Neil Horman 2012-01-16 13:10:12 UTC
could you please upload a copy of /var/log/messages showing the irqbalance messages?  I need to check for the presence of additional irqbalance messages.  Also, if you could run irqbalance in debug mode (run it from the command line with --debug, rather than as a daemon service), and record the output, so I can see how its parsed the cpu topology, that would be very helpful.  Thanks!

Comment 7 Andreas Girgensohn 2012-01-16 19:10:05 UTC
Created attachment 555572 [details]
/var/log/messages for system-2

Comment 8 Andreas Girgensohn 2012-01-16 19:11:01 UTC
Created attachment 555573 [details]
Output of "irqbalance --debug" for system-1

Comment 9 Andreas Girgensohn 2012-01-16 19:11:38 UTC
Created attachment 555574 [details]
Output of "irqbalance --debug" for system-2

Comment 10 Andreas Girgensohn 2012-01-16 19:16:32 UTC
I had already uploaded /var/log/messages for the first system (the first attachment above).  I now uploaded it for the second system as well.  There are no other irqbalance messages for days.

I also uploaded the output for "irqbalance --debug" for both systems.

The first system has two 4-core CPUs.  The second system has two 6-core CPUs with hyperthreading.

Comment 11 Neil Horman 2012-01-16 19:50:13 UTC
Thank you,  that helps. I've managed to find a system that I can reproduce this on here, I'll look into this further

Comment 12 Neil Horman 2012-01-16 20:35:38 UTC
I apologize, I just noticed the bone headed thing that I did.  I actually fixed the problem in bz755570 and comitted it upstream, but because the other system was a virt guest that had so many other oddities, we decided to close it as a CANTFIX, and so I never comitted the actual fix to f16.  Its in there now, and I'll be pushing an update in just a bit.

Comment 13 Fedora Update System 2012-01-16 20:58:59 UTC
irqbalance-1.0.3-2.fc16 has been submitted as an update for Fedora 16.
https://admin.fedoraproject.org/updates/irqbalance-1.0.3-2.fc16

Comment 14 Anton Arapov 2012-01-17 17:27:27 UTC
*** Bug 782463 has been marked as a duplicate of this bug. ***

Comment 15 Fedora Update System 2012-01-17 20:26:39 UTC
Package irqbalance-1.0.3-2.fc16:
* should fix your issue,
* was pushed to the Fedora 16 testing repository,
* should be available at your local mirror within two days.
Update it with:
# su -c 'yum update --enablerepo=updates-testing irqbalance-1.0.3-2.fc16'
as soon as you are able to.
Please go to the following url:
https://admin.fedoraproject.org/updates/FEDORA-2012-0622/irqbalance-1.0.3-2.fc16
then log in and leave karma (feedback).

Comment 16 Fedora Update System 2012-01-25 22:29:16 UTC
irqbalance-1.0.3-2.fc16 has been pushed to the Fedora 16 stable repository.  If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.