Description of problem: When running on hp-xw9400-01.rhts.redhat.com system spews messages. eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. Version-Release number of selected component (if applicable): kernel-debug 2.6.18-81.el5 2.6.18-53.1.10.el5 How reproducible: Always Steps to Reproduce: 1. Install RHEL5.U1 on hp-xw9400-01.rhts.redhat.com 2. Install the kernel-debug variant reboot Actual results: eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. eth0: too many iterations (6) in nv_nic_irq. NETDEV WATCHDOG: eth0: transmit timed out eth0: Got tx_timeout. irq: 00000036 eth0: Ring at 36b4e000 eth0: Dumping tx registers 0: 00002036 000000ff 00000003 007f03ca 00000000 00000000 00000000 00000000 20: 00000000 00000000 00000000 00000000 00000001 00000100 00000000 00000000 40: 0420e20e 0000a855 00002e20 00000000 00000000 00000000 00000000 00000000 60: 00000000 00000000 00000000 0000ffff 0000ffff 0000ffff 0000ffff 00000000 80: 003b0f3c 40000001 00000000 007f0088 0000061c 00000001 00000000 00007fa3 a0: 0014050f 00000016 26fe1800 000040bd 00000001 00000000 00000000 00000000 c0: 10000002 00000001 00000001 00000001 00000001 00000001 00000001 00000001 e0: 00000001 00000001 00000001 00000001 00000001 00000001 00000001 00000001 100: 36b4e800 36b4e000 007f00ff 00008000 00010032 00000000 0000001b 36b4f110 120: 36b4e530 2c254dc0 a000ffef 00000000 00000000 36b4f11c 36b4e530 0fe08000 140: 00304120 80c02600 00000000 00000000 00000000 00000000 00000000 00000000 160: 00000000 00000000 00000000 00000000 01ff0080 0000c000 00000000 00000000 180: 00000006 00000008 00947969 00008103 0000000a 00003800 00000080 0000b983 1a0: 0000000e 00000008 0094796d 00008103 0000000a 00003800 000000b0 0000b9a3 1c0: 0000000e 00000008 0094796d 00008103 0000000a 00003800 000000b0 0000b9a3 1e0: 0000000e 00000008 0094796d 00008103 0000000a 00003800 000000b0 0000b9a3 200: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 220: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 240: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 260: 00000000 00000000 fe027001 00000100 00000011 000000a3 fe027011 000001a3 280: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 2a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 2c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 2e0: 00000000 00000000 00000000 00000000 00000000 00000001 00000001 00000001 300: 80212000 00000000 00000000 00000000 00000000 00002000 00000000 00000000 320: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 340: 00000000 00000000 00000000 00000000 00000000 00000020 01442646 00000000 360: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 380: 00000000 00000000 00000000 00000000 00000000 00000000 00000002 00000000 3a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 3e0: 06255300 00701365 00000000 00000000 00000032 00000000 00000000 00000000 400: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 420: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 440: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 460: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 480: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 4a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 4c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 4e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 500: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 520: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 540: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 560: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 580: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5a0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5c0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 5e0: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 600: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000 eth0: Dumping tx ring 000: 00000000 36d5492e 20000040 // 00000000 36d57456 20000040 // 00000000 1fd5a662 20000040 // 00000000 36d7434a 20000040 004: 00000000 36d5e96e 20000040 // 00000000 0a09e4d6 20000040 // 00000000 1df4a762 20000040 // 00000000 36d72d06 20000040 008: 00000000 36d57662 20000040 // 00000000 36d54516 20000040 // 00000000 36d59d06 20000040 // 00000000 36d524d6 20000040 00c: 00000000 36d4a03e 20000040 // 00000000 36d7496e 20000040 // 00000000 129bb30a 20000040 // 00000000 0a09e6e2 20000040 010: 00000000 36d7a722 20000040 // 00000000 36d7ab3a 20000040 // 00000000 1fd5a456 20000040 // 00000000 11fba34a 20000040 014: 00000000 36d5807e 20000040 // 00000000 0a09e8ee 20000040 // 00000000 129bb92e 20000040 // 00000000 0a09eafa 20000040 018: 00000000 36d58aba 20000040 // 00000000 129bbd46 20000040 // 00000000 129bb516 20000040 // 00000000 129bb722 20000040 01c: 00000000 129bb0fe 20000040 // 00000000 37d66762 20000040 // 00000000 36d5786e 20000040 // 00000000 36d778ae 20000040 020: 00000000 36d55d86 20000040 // 00000000 36d5b722 20000040 // 00000000 1df4ab7a 20000040 // 00000000 1df4a34a 20000040 024: 00000000 129bbb3a 20000040 // 00000000 1df4a556 20000040 // 00000000 36d594d6 20000040 // 00000000 1df4ad86 20000040 028: 00000000 36d5e13e 20000040 // 00000000 36d5eb7a 20000040 // 00000000 36d4e96e 20000040 // 00000000 36d4eb7a 20000040 02c: 00000000 36d7b96e 20000040 // 00000000 0a09e2ca 20000040 // 00000000 323aacc6 20000040 // 00000000 323aa6a2 20000040 030: 00000000 36d516a2 20000040 // 00000000 0cb3524a 20000040 // 00000000 0cb35662 20000040 // 00000000 36d55556 20000040 034: 00000000 36d720be 20000040 // 00000000 36d728ee 20000040 // 00000000 323aa496 20000040 // 00000000 36d73516 20000040 038: 00000000 36d7392e 20000040 // 00000000 36d5534a 20000040 // 00000000 36d5513e 20000040 // 00000000 323aa28a 20000040 03c: 00000000 12137c86 20000040 // 00000000 2113dd06 20000040 // 00000000 2113dafa 20000040 // 00000000 0cb3586e 20000040 040: 00000000 2113d6e2 20000040 // 00000000 2113d4d6 20000040 // 00000000 2113d8ee 20000040 // 00000000 36d5024a 20000040 044: 00000000 2113d0be 20000040 // 00000000 36d50c86 20000040 // 00000000 323aa07e 20000040 // 00000000 1213703e 20000040 048: 00000000 2113d2ca 20000040 // 00000000 29fc3b3a 20000040 // 00000000 29fc392e 20000040 // 00000000 29fc3d46 20000040 04c: 00000000 29fc3516 20000040 // 00000000 29fc330a 20000040 // 00000000 1fd5aa7a 20000040 // 00000000 1fd5ac86 20000040 050: 00000000 29fc3722 20000040 // 00000000 1fd5a86e 20000040 // 00000000 10a11d86 20000040 // 00000000 10a11b7a 20000040 054: 00000000 136c3c86 20000040 // 00000000 29fc30fe 20000040 // 00000000 1fd5a03e 20000040 // 00000000 10a11762 20000040 058: 00000000 136c386e 20000040 // 00000000 10a11556 20000040 // 00000000 10a1196e 20000040 // 00000000 10a1113e 20000040 05c: 00000000 136c3a7a 20000040 // 00000000 136c3456 20000040 // 00000000 136c324a 20000040 // 00000000 10a1134a 20000040 060: 00000000 2a987cc6 20000040 // 00000000 136c3662 20000040 // 00000000 13a50d06 20000040 // 00000000 136c303e 20000040 064: 00000000 13a508ee 20000040 // 00000000 13a506e2 20000040 // 00000000 13a50afa 20000040 // 00000000 13a502ca 20000040 068: 00000000 13a500be 20000040 // 00000000 1c3a4d46 20000040 // 00000000 1c3a4b3a 20000040 // 00000000 13a504d6 20000040 06c: 00000000 1c3a492e 20000040 // 00000000 2a98728a 20000040 // 00000000 2a987496 20000040 // 00000000 2a98707e 20000040 070: 00000000 2a9878ae 20000040 // 00000000 2a9876a2 20000040 // 00000000 1c3a430a 20000040 // 00000000 1c3a40fe 20000040 074: 00000000 1c3a4722 20000040 // 00000000 2a987aba 20000040 // 00000000 34e9ec86 20000040 // 00000000 34e9ea7a 20000040 078: 00000000 34e9e86e 20000040 // 00000000 34e9e662 20000040 // 00000000 34e9e03e 20000040 // 00000000 0a265aba 20000040 07c: 00000000 0a2658ae 20000040 // 00000000 0a2656a2 20000040 // 00000000 0a265cc6 20000040 // 00000000 0a26528a 20000040 080: 00000000 0a26507e 20000040 // 00000000 0a265496 20000040 // 00000000 1c3a4516 20000040 // 00000000 34e9e24a 20000040 084: 00000000 34e9e456 20000040 // 00000000 271f5afa 20000040 // 00000000 271f58ee 20000040 // 00000000 05b4fb3a 20000040 088: 00000000 05b4f92e 20000040 // 00000000 05b4f722 20000040 // 00000000 05b4fd46 20000040 // 00000000 05b4f0fe 20000040 08c: 00000000 05b4f30a 20000040 // 00000000 271f5d06 20000040 // 00000000 271f52ca 20000040 // 00000000 271f54d6 20000040 090: 00000000 2c254d86 20000040 // 00000000 0c128000 2000048e // 00000000 36d5e7fe 20000046 // 00000000 36d60346 00000000 094: 00000000 0c12810c 20000bdc // 00000000 36d5979e 00000000 // 00000000 0c128c5c 00000000 // 00000000 32f87000 200011ca 098: 00000000 37c8f9ea 00000000 // 00000000 32f87d54 00000000 // 00000000 334c1000 2000048e // 00000000 36d52386 00000000 09c: 00000000 334c119c 20000bdc // 00000000 36d58346 00000000 // 00000000 334c1cec 00000000 // 00000000 0783e000 200011ca 0a0: 00000000 36d7481e 00000000 // 00000000 0783ede4 00000000 // 00000000 33ca8000 2000048e // 00000000 36d59bb6 00000000 0a4: 00000000 33ca822c 20000bdc // 00000000 36d57b36 00000000 // 00000000 33ca8d7c 00000000 // 00000000 1790a000 200011ca 0a8: 00000000 36d51346 00000000 // 00000000 1790ae74 00000000 // 00000000 19c43000 2000048e // 00000000 36d4ab36 00000000 0ac: 00000000 19c432bc 20000bdc // 00000000 37d9b13a 00000000 // 00000000 19c43e0c 00000000 // 00000000 16eb3000 200011ca 0b0: 00000000 11fbaa2a 00000000 // 00000000 16eb3f04 00000000 // 00000000 12a6d000 2000048e // 00000000 36d4e406 00000000 0b4: 00000000 0c12810c 200005ee // 00000000 37c8fbf6 00000000 // 00000000 0c12810c 200005ee // 00000000 0cb35456 2000004f 0b8: 00000000 36d4ad42 00000000 // 00000000 0c12810c 200005ee // 00000000 36d50b36 00000000 // 00000000 0c12810c 200005ee 0bc: 00000000 36d520be 2000004f // 00000000 36d76512 00000000 // 00000000 0c12810c 200005ee // 00000000 12137456 20000040 0c0: 00000000 36d77cc6 2000004f // 00000000 36d77aba 20000040 // 00000000 36d57306 00000000 // 00000000 0c12810c 200005ee 0c4: 00000000 11fbad86 20000040 // 00000000 3738a516 20000040 // 00000000 36d50456 20000040 // 00000000 36d5bb3a 20000040 0c8: 00000000 36d7bb7a 20000066 // 00000000 36d5003e 2000005e // 00000000 36d588ae 20000040 // 00000000 37d66d86 20000040 0cc: 00000000 36d5fc86 20000040 // 00000000 36d4a662 20000066 // 00000000 37c5c07e 20000040 // 00000000 37d66556 20000040 0d0: 00000000 36d7707e 20000040 // 00000000 36d5e34a 20000040 // 00000000 36d5b92e 20000040 // 00000000 36d7bd86 20000040 0d4: 00000000 36d7b556 20000040 // 00000000 11fba13e 20000040 // 00000000 36d7686e 20000040 // 00000000 3738a30a 20000040 0d8: 00000000 36d55762 20000040 // 00000000 36d7b13e 20000040 // 00000000 0cb35a7a 20000040 // 00000000 36d60cc6 20000040 0dc: 00000000 021d5d46 20000040 // 00000000 021d5516 20000040 // 00000000 11fba762 20000040 // 00000000 0cb3503e 20000040 0e0: 00000000 11fba612 00000000 // 00000000 12a6d34c 200005ee // 00000000 36d4a86e 20000040 // 00000000 36d54722 20000040 0e4: 00000000 36d5430a 20000040 // 00000000 12137662 20000040 // 00000000 37d7c4d6 20000040 // 00000000 323aaaba 20000040 0e8: 00000000 36d7a92e 20000040 // 00000000 36d608ae 20000040 // 00000000 36d74d86 20000040 // 00000000 0a09e0be 20000040 0ec: 00000000 36d592ca 20000040 // 00000000 12137a7a 20000040 // 00000000 36d540fe 20000040 // 00000000 1213786e 20000040 0f0: 00000000 36d5596e 20000040 // 00000000 323aa8ae 20000040 // 00000000 1213724a 20000040 // 00000000 36d5086e 20000040 0f4: 00000000 0cb35c86 20000040 // 00000000 1df4a13e 20000040 // 00000000 36d7624a 20000040 // 00000000 36d5f456 20000040 0f8: 00000000 36d73d46 20000040 // 00000000 36d7330a 20000040 // 00000000 36d5bd46 20000040 // 00000000 11fbab7a 20000040 0fc: 00000000 1df4a96e 20000040 // 00000000 0a09ed06 20000040 // 00000000 0213cafa 20000040 // 00000000 3738ab3a 20000040 Additional info:
Created attachment 295441 [details] Full log
Reverse Engineered nForce ethernet driver RHEL driver based on upstream driver version 0.60 Also includes additional upstream commits: 3ba4d093fe8a26f5f2da94411bf8732fa6e9da86 forcedeth: fix tx timeout fcc5f2665c81e087fb95143325ed769a41128d50 forcedeth: fix nic poll 6fedae1f6e66ab5f169bf58064e23e015fc1307d forcedeth: fix checksum feature in mcp65 caf96469e8ab57170cc8ca9c59809132d38e529e forcedeth: disable msix e0379a14fc80cb98978fa86989dab77b522a8106 forcedeth: fixed missing call in napi poll a7475906bc496456ded9e4b062f94067fb93057a forcedeth: msi bugfix
Can I have access to the machine? Of the patches included in this forcedeth update, these one is the one that was designed to fix this problem upstream: a7475906bc496456ded9e4b062f94067fb93057a forcedeth: msi bugfix What's interesting is that on rhel5 it doesn't have the desired effect -- that interrupts are correctly disabled when we hope they are. I've been looking at another interesting forcedeth problem that seems to be related to this, so I'd like to see if this can be tested with pci=nomsi on the kernel command line. I'm guessing it's enabled right now.
As I suspected, the patch that was added in 2.6.18-50.1.3 for the forcedeth msi bugfix seems to be giving us problems here. NFS connectathon test results: 2.6.18-53.1.2 -- pass 2.6.18-53.1.3 -- fail 2.6.18-53.1.3, with pci=nomsi on kernel cmd line -- pass I'm hoping I can do some work on the forcedeth driver to resolve this since I'm a bit worried that trying to pull all the MSI fixes from the latest upstream will be too much.
I would strongly encourage us to NOT revert this patch from rhel5. I would rather see us apply a patch on top to resolve this issue. Without this there will be problems since we are not really enabling and disabling the correct interrupts. I would rather correct the issue that paper-over a new problem by removing the needed patch.
I've started to notice what I feel are problems with enable_irq and disable_irq calls in the forcedeth driver. I recently patched the ethtool_set_settings function because I determined that writing to the BMCR register while interrupts were disabled resulted in no interrupts ever coming back out of the hardware. My guess is that changes to interrupt handling upstream have made issues like dropping pending interrupts (or saving them so they can be posted later) may be somewhat related, but this is just a hunch based on what I've observed.
After a small patch to the MSI subsystem I can now run the NFS connectathon tests on the same system used in the original test (hp-xw9400-01.rhts.boston.redhat.com) and it appears that not tests failed. There isn't any great output indicating that, but I see nothing but 'PASS' messages on the screen and none of the original messages: eth0: too many iterations (6) in nv_nic_irq. There were a few messages in test output like this: Mar 7 09:23:29 hp-xw9400-01 kernel: nfs: server sol9-nfs not responding, still trying Mar 7 09:23:29 hp-xw9400-01 kernel: nfs: server sol9-nfs OK Mar 7 09:23:34 hp-xw9400-01 kernel: nfs: server sol9-nfs not responding, still trying Mar 7 09:23:34 hp-xw9400-01 kernel: nfs: server sol9-nfs not responding, still trying Mar 7 09:23:34 hp-xw9400-01 kernel: nfs: server sol9-nfs OK Mar 7 09:23:34 hp-xw9400-01 kernel: nfs: server sol9-nfs OK Mar 7 09:23:39 hp-xw9400-01 kernel: nfs: server sol9-nfs not responding, still trying but I don't know if that was caused by the test or not. I also looked at /mnt/tests/kernel/filesystems/nfs/connectathon/cthon04/result.txt and it appears to be zero length -- hopefully that's good.
Oh yeah, test kernels available here: http://people.redhat.com/agospoda/
This patch (or something similar) is what I would like to consider for rhel5.2 (if possible). http://people.redhat.com/agospoda/rhel5/irq-msi-upstream-fixes.patch The problem I currently see is that this is only a few (5-6) of the patches needed to make this work whereas there are close to a dozen patches in the original upstream set. I can look over the changes, but am not an expert on this, so I will probably need to get someone to at least keep me in check (whether they are an expert or not).
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Created attachment 312109 [details] RHEL5.2 Forcedeth Failure
We have experienced a similar NIC crash with RHEL 5.2 running on the Nvidia Chipset 00:08.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) 00:09.0 Bridge: nVidia Corporation MCP55 Ethernet (rev a3) The full log details have been attached to this bugreport. Has any progress been made on further testing and integrating the suggested fixes in RHEL5.2 ? Is it known whether RHEL5.0 was prone to the same bug ?
RHEL5 should not be problematic, but later kernels will have problems. What is unfortunate is that a small set of users had problems that appeared in 5.2 from a patch that fixed problems that all users would have on 5.1. The root of the 5.2 issues is some MSI problems in 2.6.18 that were fixed in 2.6.19 and later. Those patches will soon be added to my test kernels and will appear in the kernel version: 2.6.18-94.el5.gtest.50 that will appear here: http://people.redhat.com/agospoda/#rhel5 later today.
*** This bug has been marked as a duplicate of bug 428696 ***