Description of problem: If huge amount of message is passed through netconsole on e1000 and some network load is added, some message like this will be printed: Badness in local_bh_enable at kernel/softirq.c:141 Call Trace:<ffffffff8013d96d>{local_bh_enable+90} <ffffffff802c4d9d>{netpoll_poll_dev+284} <ffffffff802c4c5f>{netpoll_send_skb+397} <ffffffffa024c169>{:netconsole:write_msg+361} <ffffffff80139014>{__call_console_drivers+68} <ffffffff80139281>{release_console_sem+276} <ffffffff8013950c>{vprintk+498} <ffffffff801395b6>{printk+141} <ffffffff80188f22>{get_write_access+62} <ffffffff8017d264>{file_move+26} <ffffffffa02b9053>{:procprintk:write_proc_procpk+83} <ffffffff801b1eeb>{proc_file_write+75} <ffffffff8017c4ce>{vfs_write+207} <ffffffff8017c5b6>{sys_write+69} <ffffffff801102f6>{system_call+126} Version-Release number of selected component (if applicable): RHEL4.8 (kernel 2.6.9-89) How reproducible: Always. Steps to Reproduce: 1.Run something that trigger a lot of message to go through netconsole 2.Run something to add some network load Actual results: Some "Badness in local_bh_enable..." will be printed on the log. Expected results: Should have no warning message. Additional info: A similar issue happened in bz313751 comment #45 but it said it should be fixed in RHEL4.8. Is this a regression?
do you have a sysrq-t, or can you setup a nmi watchdog to capture a system core (or at least get a backtrace) after the system is hung please? Thanks.
I have enabled sysrq and nmi_watchdog should be enabled by default (it is a x86_64 system). But when it was hanged, I cannot triggered SysRq and I waited for a few minutes and no nmi panic happened. nhorman suggested to "nmi_watchdog=panic,1" and "nmi_watchdog=panic,2" option. I will try that and report back the result.
Please just clear the needinfo flag when the requested info is provided.
On my testing system (intellpc1 in GSS China lab), I can reproduce the problem easily. And I setup "nmi_watchdog=panic,1" kernel option and confirmed that NMI count is increasing /proc/interrupts. But I still cannot get a core dump by either SysRq-C or nmi_watchdog (I waited for 5 minutes). I afraid that netdump cannot be used at that moment and even setup diskdump too, but still cannot get a core dump out. But in one experiment it seems that I can get the sysRq-t output accidentally. I upload it as Sysrq-t.jpg. But I cannot manage to get the SysRq-t result again in later experiment.
Created attachment 357015 [details] Something print on the console when testing
About the only thing I can tell you from that image is that possibly you can try to work around the problem by disabling audit (since it appears that audit is in some way involved on the stack you've shown in that jpg. Beyond that however, it shows me nothing that we don't already know. You need to keep trying, and set up a serial console to capture the whole sysrq-t output, or console log when the nmi triggers.
I try to disable audit by adding kernel parameter "audit=0" and re-run the test, but it still get a hang in 10 minutes. I will try to setup what you suggested tomorrow.
Created attachment 357173 [details] expect-core-2.6.9-89.0.3.tar.gz
I don't see jobid 20090812091247 anywhere here: http://megatron.gsslab.rdu.redhat.com:8080/browse Are you using a different CAS server?
further, I'm not sure what you're crash above has to do with this issue, since its setup to use diskdump rather than netdump.
<johnlau> tambot: cas_status 20090812091247 <tambot> Job 20090812091247 reports a status of: Downloading / extracting I setup netconsole to trigger the issue. But since I cannot manage to get a dump by netdump before, I afraid that the bug in netconsole/netpoll that would affect the collect the dump by netdump, so I setup diskdump to collect the dump. But netconsole is still enabled.
The vmcore_2.6.9-89.8.ELsmp.gz is downloaded to megatron:/cores/20090812091247/work But it seems that CAS cannot handle test version kernel. On the other hand, the core dump on 2.6.9-89.0.3 kernel is uploading and eta in 10 mins. I will push it to CAS once the upload is finish. In the meanwhile you can have a look to the files in expect-core-2.6.9-89.0.3.tar.gz first.
yeah, but the problem/hang didn't occur with diskdump enabled! Thats really what I'm after. You don't have to capture a core for this, a backtrace is sufficient, but you need to use a setup that reproduces the same problem.
And no, cas cannot handle test kernels, and neither can I unless you have the debuginfo packages to go with them. I've already looked at the expect-core files, thats how I determined that you used diskdump in the place of netdump! Thats really worthless to me. Bottom line, you need to get me a backtrace of the hang that occurs on netdump after the problem occurs, using nmi in the ways I described. If you cant do that, theres simply not much I can do for you.
if you absolutely can't get a backtrace, then in the alternative, you can build a test kernel with netpoll debugging enabled. You'll need to set up a serial console on the system in question to log the results, but that should tell you what the system is doing right up until the moment is hard locks for you.
Created attachment 358899 [details] patch to force bh's off in netconsole would you please give this patch a try and let me know the results. Thanks!
It seems that the bh related error message doesn't appear in the log anymore but system still hang after a few minutes of test.
Then were back to needing to get the nmi watchdog to trigger on this system and dump out a message. Tell me, when the system hangs, is it still responsive to pings?
No, the system doesn't response to pings. I confirmed that when I set "nmi_watchdog=panic,1", the NMI count of /proc/interrupt keep increasing. But when the system hanged, I waited for a several minutes and no nmi panic occurred. Do you have any idea?
what is /proc/sys/kernel/panic_on_unrecovered_nmi set to?
/proc/sys/kernel/panic_on_unrecovered_nmi is set to 0 by default. And I have changed it to 1 and redo the test again, the system still doesn't get a nmi panic.
Well, thats no help at all. I have absolutely no idea why this isnt working for you. best thing I could imagine to do at this point would be to instrument the kernel to be a bit more forceful in its use of panic. If the nmi watchdog isn't triggering, then I imagine all the cpus in the system must be in a state of running some softirq, in which they are still responsive to interrupts from the timer, so as to prevent nmi watchdogs triggering a panic, but still not able to make forward progress. I would say that we need to instrument the kernel in the following manner: 1) add a global flag to netpoll_send_skb. Set it at the top of the function, and clear it at the bottom 2) in the nmi interrupt handler keep a counter, initalized to zero. check the flag in (1) from the nmi handler, it its set, increment the counter, if its clear, reset the counter to zero. 3) check the counter in the nmi handler, should its value ever reach some number (say 200 for starters), panic the box. The idea here is to force a panic after we have spent 'too long' trying to send netconsole messages. in the normal course of operation the flag in (1) should trip back and forth often, not allowing the counter in (2) to ever increment too many times without getting set back to zero. The assumption here is that the hang is occuring during the sending of a netconsole frame, so when the hang occurs, the flag will be set, causing the counter to increment for every nmi interrupt we handle. since the cpus are hung the flag should not get cleared and the counter will roll over its trigger value, and cause a panic. Do you feel comfortable implementing this?
Created attachment 359702 [details] netpoll_send_skb patch Think I found the (or at least a) problem that would result in the stack trace provided. Under the right conditions its possible that a netpoll_send_skb might loop forever, and with bottom halves disabled, I can see how we might spin forever, blocking the processor from running user space apps, or other softirqs while still servicing hard interrupts. This patch should fix it. Please try it out and let me know if it fixes the issue. thanks!
Created attachment 359712 [details] nwe patch new patch. mmilgram pointed out my counter misuse. I'd intended to break a pessimal udelay loop early, and tried mistakenly to overload a counter to do it. Thanks Mike!
Created attachment 360117 [details] new test patch for netpoll Ok, let try this, since I can't see any reason why we would loop forever in this code, lets try to make double sure we don't delay to long in it. This patch does several things: 1) removes all the udelays from the path. Since the stack traces thus far have been in udelay lets not get caught in it. Those delays are just there to give more opportuinity to holders of resources to relase them and aren't really needed anyway (strictly speaking) 2) lets limit the number of tries that we wait for the xmit lock or the netdev queue to restart to something much smaller than MAX_RETRIES. That should bust us out of that loop more quickly if we don't have the resources we need. 3) Limit the number of skbs that we queue. This should prevent us from spending too much time in the skb_dequeue loop 4) reset the tries counter to zero at the bottom of the loop. In the event we underflow so much we hit tries = 0x7fffffff, we could loop for a very long time.
Did you remember to apply this patch from previously in this bz? https://bugzilla.redhat.com/attachment.cgi?id=358899 If not, you should have.
Neil, I verified that the patch you mentioned was included in the kernel being tested.
The last patch shouldn't have had any effect on how bottom halves were enabled in this case. Do me a favor and remove it (the local_bh_enable patch) to see if the behavior stops.
If you issue a sysrq-t, what is the result?
also, if you do your normal test and hang the system, does our nmi code trigger?
After it hanged, I pressed sysrq-t but no response. If I do normal test (i.e. without enable serial console), I cannot get any special messages in netconsole. But I am not sure if there is anything on console because of I did the test remotely. I may try it again tomorrow.
So, are you saying that, with this most recent patch set, we don't get an nmi triggered panic either (like we did in comment 58,55,50)?
About the most recent patch set, I am not sure if it is nmi triggered panic or not (because it doesn't have any kernel panic related message printed out, as comment 55,50), but I think there is a panic happened because netdump is triggered (although no core is collected successfully).
well, you need to hook up the serial console then, because we need that backtrace to tell whats going on. Otherwise this is a futile exercize. I've mentioned this before...
Comment #64 is the only message I get when I hook up the serial console, and I pressed sysrq-t and no response.
do me a favor, try not starting netconsole at all, then repeat your test please. I'd like to know if you still see the hang if you only have a serial console enabled
I connected the system with serial console and I turned off netconsole. I run the test for a whole day but the system doesn't hang.
Created attachment 361352 [details] new approach to preventing deadlock. Ok, heres a new approach. I'm starting to wonder if we don't have a path in which we can get interrupted in netpoll_send_skb from an irq or bh that results in an eternal spinlock. If it happens in a bh, then we would never crash on nmi as we would be responsive to irqs. This patch should correct that, if its the case
Neil, The latest patch that you uploaded is identical to the previous patch that you uploaded.
Created attachment 361362 [details] new approach to preventing deadlock. sorry, wrong patch previously. heres the right one
This sounds like a successful run to me. The audit code spits out so much information, I expect what you were seeing was the result of frames getting backed up in the netconsole path. I assume you were running with a serial console enabled here, try runnning without the serial console and see if performance improves somewhat. Also, after you run the test for a few hours, try discontinuing the test. If you do, does the system slowly become more responsive. If so, we can mitigate this by placing a maximum depth on the queue length in the netpoll code.
I'll ask again, if you stop the test, does the responsiveness return to the console? I understand tht you may not be able to access the box to discontinue the test, but try encoding the test to stop on its own after a while.
can you post the post the output of what you get from that restart attempt?
http://git.kernel.org/?p=linux/kernel/git/mingo/linux-2.6-x86.git;a=commitdiff;h=0f476b6d91a1395bda6464e653ce66ea9bea7167 Thats the upstream commmit. John can you apply it on top of my patch and see if it helps. It will certainly clear up the messages. If it allows the restart to complete, then I think we have a winner in the aggregate of the changes.
Neil, That patch doesn't apply to Red hat Enterprise Linux 4. I can manually apply it, but it is still basically a no-op. I could change the WARN_ONs to WARN_ON_ONCEs.
Marc, can you attached the massaged patch that you used please, so I don't need to duplicate that work? Thanks!
Created attachment 363832 [details] Patch that worked correctly in testing
Created attachment 363986 [details] cleaned up patch with debug code removed
This request was evaluated by Red Hat Product Management for inclusion in a Red Hat Enterprise Linux maintenance release. Product Management has requested further review of this request by Red Hat Engineering, for potential inclusion in a Red Hat Enterprise Linux Update release for currently deployed products. This request is not yet committed for inclusion in an Update release.
Committed in 89.18.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
The patch committed in 89.18 caused one regression. Upon hitting alt-sysrq-t, we can get into infinite loop. Hence another patch is required. Putting this bug back to POST state. Will commit another patch to fix the regression.
Committed in 89.19.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHSA-2011-0263.html