Created attachment 555129 [details] /var/log/messages Description of problem: This is the same problem as described in closed bug 755570. However, this system runs on bare metal and not in a VM (the reason why the other bug was closed). I should note that this system was upgraded to Fedora 16 with yum. The following message appears every 10-60 seconds. No other irqbalance messages appear in the log. /usr/sbin/irqbalance: Load average increasing, re-enabling all cpus for irq balancing Version-Release number of selected component (if applicable): irqbalance-1.0.3-1.fc16.x86_64 How reproducible: Always Steps to Reproduce: 1. Use yum to upgrade a system to Fedora 16. 2. Run the system. Actual results: The message appears very frequently. Expected results: The message should only appear occasionally. Additional info: I tried "yum reinstall irqbalance" and restarted the service without any change. There is a kernel problem in the log that may or may not be related to the irqbalance problem. Jan 12 19:28:02 talkminer kernel: [ 6.612545] ioatdma: Intel(R) QuickData Technology Driver 4.00 Jan 12 19:28:02 talkminer kernel: [ 6.612629] ioatdma 0000:00:0f.0: PCI INT A -> GSI 33 (level, low) -> IRQ 33 Jan 12 19:28:02 talkminer kernel: [ 6.612633] ------------[ cut here ]------------ Jan 12 19:28:02 talkminer kernel: [ 6.612640] WARNING: at drivers/iommu/intel-iommu.c:3216 quirk_ioat_snb_local_iommu+0xab/0xc0() Jan 12 19:28:02 talkminer kernel: [ 6.612642] Hardware name: X7DWU Jan 12 19:28:02 talkminer kernel: [ 6.612643] BIOS assigned incorrect VT-d unit for Intel(R) QuickData Technology device Jan 12 19:28:02 talkminer kernel: [ 6.612645] Modules linked in: ioatdma(+) intel_rng(-) serio_raw dca i5k_amb nfsd lockd nfs_acl auth_rpcgss sunrpc uinput radeon ttm drm_kms_helper drm i2c_algo_bit i2c_core Jan 12 19:28:02 talkminer kernel: [ 6.612657] Pid: 497, comm: modprobe Not tainted 3.1.7-1.fc16.x86_64 #1 Jan 12 19:28:02 talkminer kernel: [ 6.612659] Call Trace: Jan 12 19:28:02 talkminer kernel: [ 6.612666] [<ffffffff8106b73f>] warn_slowpath_common+0x7f/0xc0 Jan 12 19:28:02 talkminer kernel: [ 6.612668] [<ffffffff8106b7df>] warn_slowpath_fmt_taint+0x3f/0x50 Jan 12 19:28:02 talkminer kernel: [ 6.612671] [<ffffffff814a2c8b>] quirk_ioat_snb_local_iommu+0xab/0xc0 Jan 12 19:28:02 talkminer kernel: [ 6.612676] [<ffffffff812d732b>] pci_fixup_device+0x8b/0x120 Jan 12 19:28:02 talkminer kernel: [ 6.612681] [<ffffffff812d013c>] do_pci_enable_device+0x5c/0x70 Jan 12 19:28:02 talkminer kernel: [ 6.612683] [<ffffffff812d01c7>] __pci_enable_device_flags+0x77/0xc0 Jan 12 19:28:02 talkminer kernel: [ 6.612686] [<ffffffff812d0223>] pci_enable_device+0x13/0x20 Jan 12 19:28:02 talkminer kernel: [ 6.612689] [<ffffffff812d0298>] pcim_enable_device+0x68/0xd0 Jan 12 19:28:02 talkminer kernel: [ 6.612695] [<ffffffffa0178648>] ioat_pci_probe+0x13/0x175 [ioatdma] Jan 12 19:28:02 talkminer kernel: [ 6.612698] [<ffffffff812d14dc>] local_pci_probe+0x5c/0xd0 Jan 12 19:28:02 talkminer kernel: [ 6.612701] [<ffffffff812d2d79>] pci_device_probe+0x109/0x130 Jan 12 19:28:02 talkminer kernel: [ 6.612706] [<ffffffff8138baec>] driver_probe_device+0x9c/0x2b0 Jan 12 19:28:02 talkminer kernel: [ 6.612708] [<ffffffff8138bdab>] __driver_attach+0xab/0xb0 Jan 12 19:28:02 talkminer kernel: [ 6.612710] [<ffffffff8138bd00>] ? driver_probe_device+0x2b0/0x2b0 Jan 12 19:28:02 talkminer kernel: [ 6.612713] [<ffffffff8138bd00>] ? driver_probe_device+0x2b0/0x2b0 Jan 12 19:28:02 talkminer kernel: [ 6.612717] [<ffffffff8138a96c>] bus_for_each_dev+0x5c/0x90 Jan 12 19:28:02 talkminer kernel: [ 6.612719] [<ffffffff8138b6fe>] driver_attach+0x1e/0x20 Jan 12 19:28:02 talkminer kernel: [ 6.612722] [<ffffffff8138b300>] bus_add_driver+0x1b0/0x2a0 Jan 12 19:28:02 talkminer kernel: [ 6.612724] [<ffffffffa0076000>] ? 0xffffffffa0075fff Jan 12 19:28:02 talkminer kernel: [ 6.612727] [<ffffffff8138c306>] driver_register+0x76/0x140 Jan 12 19:28:02 talkminer kernel: [ 6.612730] [<ffffffff81160bbc>] ? kmem_cache_create+0x2ac/0x2f0 Jan 12 19:28:02 talkminer kernel: [ 6.612733] [<ffffffffa0076000>] ? 0xffffffffa0075fff Jan 12 19:28:02 talkminer kernel: [ 6.612735] [<ffffffff812d2a56>] __pci_register_driver+0x56/0xd0 Jan 12 19:28:02 talkminer kernel: [ 6.612737] [<ffffffffa0076000>] ? 0xffffffffa0075fff Jan 12 19:28:02 talkminer kernel: [ 6.612742] [<ffffffffa0076068>] ioat_init_module+0x68/0x80 [ioatdma] Jan 12 19:28:02 talkminer kernel: [ 6.612747] [<ffffffff81002042>] do_one_initcall+0x42/0x180 Jan 12 19:28:02 talkminer kernel: [ 6.612752] [<ffffffff810aad61>] sys_init_module+0x91/0x200 Jan 12 19:28:02 talkminer kernel: [ 6.612756] [<ffffffff815dcd82>] system_call_fastpath+0x16/0x1b Jan 12 19:28:02 talkminer kernel: [ 6.612758] ---[ end trace 400dd7ab2db00eef ]--- Jan 12 19:28:02 talkminer kernel: [ 6.612760] Disabling lock debugging due to kernel taint Jan 12 19:28:02 talkminer kernel: [ 6.612809] ioatdma 0000:00:0f.0: channel enumeration error Jan 12 19:28:02 talkminer kernel: [ 6.612813] ioatdma 0000:00:0f.0: Intel(R) I/OAT DMA Engine init failed Jan 12 19:28:02 talkminer kernel: [ 6.612823] ioatdma 0000:00:0f.0: PCI INT A disabled
Created attachment 555130 [details] /proc/cpuinfo
Created attachment 555131 [details] /proc/interrupts
I found the same problem on another system with dual Xeon CPUs (without the kernel problem listed above) but not on two others. Intel(R) Xeon(R) E5649 @ 2.53GHz None of the upgraded single-CPU or VM systems have that problem. All were upgraded with yum. Please let me know if you need more information about the second system.
sure, np
could you please upload a copy of /var/log/messages showing the irqbalance messages? I need to check for the presence of additional irqbalance messages. Also, if you could run irqbalance in debug mode (run it from the command line with --debug, rather than as a daemon service), and record the output, so I can see how its parsed the cpu topology, that would be very helpful. Thanks!
Created attachment 555572 [details] /var/log/messages for system-2
Created attachment 555573 [details] Output of "irqbalance --debug" for system-1
Created attachment 555574 [details] Output of "irqbalance --debug" for system-2
I had already uploaded /var/log/messages for the first system (the first attachment above). I now uploaded it for the second system as well. There are no other irqbalance messages for days. I also uploaded the output for "irqbalance --debug" for both systems. The first system has two 4-core CPUs. The second system has two 6-core CPUs with hyperthreading.
Thank you, that helps. I've managed to find a system that I can reproduce this on here, I'll look into this further
I apologize, I just noticed the bone headed thing that I did. I actually fixed the problem in bz755570 and comitted it upstream, but because the other system was a virt guest that had so many other oddities, we decided to close it as a CANTFIX, and so I never comitted the actual fix to f16. Its in there now, and I'll be pushing an update in just a bit.
irqbalance-1.0.3-2.fc16 has been submitted as an update for Fedora 16. https://admin.fedoraproject.org/updates/irqbalance-1.0.3-2.fc16
*** Bug 782463 has been marked as a duplicate of this bug. ***
Package irqbalance-1.0.3-2.fc16: * should fix your issue, * was pushed to the Fedora 16 testing repository, * should be available at your local mirror within two days. Update it with: # su -c 'yum update --enablerepo=updates-testing irqbalance-1.0.3-2.fc16' as soon as you are able to. Please go to the following url: https://admin.fedoraproject.org/updates/FEDORA-2012-0622/irqbalance-1.0.3-2.fc16 then log in and leave karma (feedback).
irqbalance-1.0.3-2.fc16 has been pushed to the Fedora 16 stable repository. If problems still persist, please make note of it in this bug report.