Bug 516076 - netconsole on e1000 cause "Badness in local_bh_enable at kernel/softirq.c:141"
Summary: netconsole on e1000 cause "Badness in local_bh_enable at kernel/softirq.c:141"
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.8
Hardware: All
OS: Linux
Target Milestone: rc
: ---
Assignee: Neil Horman
QA Contact: Chao Ye
Depends On:
Blocks: 583726 589296 637729 646760
TreeView+ depends on / blocked
Reported: 2009-08-06 16:03 UTC by John Lau
Modified: 2018-11-14 20:33 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Last Closed: 2011-02-16 15:44:38 UTC
Target Upstream Version:

Attachments (Terms of Use)
Something print on the console when testing (696.96 KB, image/jpeg)
2009-08-11 11:11 UTC, John Lau
no flags Details
expect-core-2.6.9-89.0.3.tar.gz (10.63 KB, application/x-gzip)
2009-08-12 13:48 UTC, John Lau
no flags Details
patch to force bh's off in netconsole (572 bytes, patch)
2009-08-27 15:41 UTC, Neil Horman
no flags Details | Diff
netpoll_send_skb patch (612 bytes, patch)
2009-09-03 16:54 UTC, Neil Horman
no flags Details | Diff
nwe patch (1.32 KB, patch)
2009-09-03 18:10 UTC, Neil Horman
no flags Details | Diff
new test patch for netpoll (2.76 KB, patch)
2009-09-08 18:41 UTC, Neil Horman
no flags Details | Diff
new approach to preventing deadlock. (2.76 KB, patch)
2009-09-16 18:44 UTC, Neil Horman
no flags Details | Diff
new approach to preventing deadlock. (2.43 KB, patch)
2009-09-16 19:52 UTC, Neil Horman
no flags Details | Diff
Patch that worked correctly in testing (3.00 KB, patch)
2009-10-06 13:26 UTC, Marc Milgram
no flags Details | Diff
cleaned up patch with debug code removed (1.39 KB, patch)
2009-10-07 15:11 UTC, Neil Horman
no flags Details | Diff

System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2011:0263 0 normal SHIPPED_LIVE Important: Red Hat Enterprise Linux 4.9 kernel security and bug fix update 2011-02-16 15:14:55 UTC

Description John Lau 2009-08-06 16:03:01 UTC
Description of problem:

If huge amount of message is passed through netconsole on e1000 and some network load is added, some message like this will be printed:

Badness in local_bh_enable at kernel/softirq.c:141

Call Trace:<ffffffff8013d96d>{local_bh_enable+90} <ffffffff802c4d9d>{netpoll_poll_dev+284}
      <ffffffff802c4c5f>{netpoll_send_skb+397} <ffffffffa024c169>{:netconsole:write_msg+361}
      <ffffffff80139014>{__call_console_drivers+68} <ffffffff80139281>{release_console_sem+276}
      <ffffffff8013950c>{vprintk+498} <ffffffff801395b6>{printk+141}
      <ffffffff80188f22>{get_write_access+62} <ffffffff8017d264>{file_move+26}
      <ffffffff801b1eeb>{proc_file_write+75} <ffffffff8017c4ce>{vfs_write+207}
      <ffffffff8017c5b6>{sys_write+69} <ffffffff801102f6>{system_call+126}

Version-Release number of selected component (if applicable):
RHEL4.8 (kernel 2.6.9-89)

How reproducible:

Steps to Reproduce:
1.Run something that trigger a lot of message to go through netconsole
2.Run something to add some network load

Actual results:
Some "Badness in local_bh_enable..." will be printed on the log.

Expected results:
Should have no warning message.

Additional info:
A similar issue happened in bz313751 comment #45 but it said it should be fixed in RHEL4.8. Is this a regression?

Comment 10 Neil Horman 2009-08-10 14:29:41 UTC
do you have a sysrq-t, or can you setup a nmi watchdog to capture a system core (or at least get a backtrace) after the system is hung please?  Thanks.

Comment 11 John Lau 2009-08-10 14:53:14 UTC
I have enabled sysrq and nmi_watchdog should be enabled by default (it is a x86_64 system). But when it was hanged, I cannot triggered SysRq and I waited for a few minutes and no nmi panic happened. 

nhorman suggested to "nmi_watchdog=panic,1" and "nmi_watchdog=panic,2" option. I will try that and report back the result.

Comment 12 Neil Horman 2009-08-10 15:16:40 UTC
Please just clear the needinfo flag when the requested info is provided.

Comment 14 John Lau 2009-08-11 11:07:00 UTC
On my testing system (intellpc1 in GSS China lab), I can reproduce the problem easily. And I setup "nmi_watchdog=panic,1" kernel option and confirmed that NMI count is increasing /proc/interrupts. But I still cannot get a core dump by either SysRq-C or nmi_watchdog (I waited for 5 minutes). I afraid that netdump cannot be used at that moment and even setup diskdump too, but still cannot get a core dump out. 

But in one experiment it seems that I can get the sysRq-t output accidentally. I upload it as Sysrq-t.jpg. But I cannot manage to get the SysRq-t result again in later experiment.

Comment 15 John Lau 2009-08-11 11:11:27 UTC
Created attachment 357015 [details]
Something print on the console when testing

Comment 16 Neil Horman 2009-08-11 11:56:59 UTC
About the only thing I can tell you from that image is that possibly you can try to work around the problem by disabling audit (since it appears that audit is in some way involved on the stack you've shown in that jpg.  Beyond that however, it shows me nothing that we don't already know.  You need to keep trying, and set up a serial console to capture the whole sysrq-t output, or console log when the nmi triggers.

Comment 17 John Lau 2009-08-11 13:55:44 UTC
I try to disable audit by adding kernel parameter "audit=0" and re-run the test, but it still get a hang in 10 minutes. I will try to setup what you suggested tomorrow.

Comment 21 John Lau 2009-08-12 13:48:24 UTC
Created attachment 357173 [details]

Comment 22 Neil Horman 2009-08-12 13:55:34 UTC
I don't see jobid 20090812091247 anywhere here:
Are you using a different CAS server?

Comment 23 Neil Horman 2009-08-12 13:59:10 UTC
further, I'm not sure what you're crash above has to do with this issue, since its setup to use diskdump rather than netdump.

Comment 24 John Lau 2009-08-12 14:12:18 UTC
<johnlau> tambot: cas_status 20090812091247
<tambot> Job 20090812091247 reports a status of: Downloading / extracting

I setup netconsole to trigger the issue. But since I cannot manage to get a dump by netdump before, I afraid that the bug in netconsole/netpoll that would affect the collect the dump by netdump, so I setup diskdump to collect the dump. But netconsole is still enabled.

Comment 25 John Lau 2009-08-12 14:19:37 UTC
The vmcore_2.6.9-89.8.ELsmp.gz is downloaded to megatron:/cores/20090812091247/work
But it seems that CAS cannot handle test version kernel.

On the other hand, the core dump on 2.6.9-89.0.3 kernel is uploading and eta in 10 mins. I will push it to CAS once the upload is finish. In the meanwhile you can have a look to the files in expect-core-2.6.9-89.0.3.tar.gz first.

Comment 26 Neil Horman 2009-08-12 14:25:00 UTC
yeah, but the problem/hang didn't occur with diskdump enabled!  Thats really what I'm after.  You don't have to capture a core for this, a backtrace is sufficient, but you need to use a setup that reproduces the same problem.

Comment 27 Neil Horman 2009-08-12 14:27:31 UTC
And no, cas cannot handle test kernels, and neither can I unless you have the debuginfo packages to go with them.  I've already looked at the expect-core files, thats how I determined that you used diskdump in the place of netdump!  Thats really worthless to me.

Bottom line, you need to get me a backtrace of the hang that occurs on netdump after the problem occurs, using nmi in the ways I described.  If you cant do that, theres simply not much I can do for you.

Comment 28 Neil Horman 2009-08-12 14:29:31 UTC
if you absolutely can't get a backtrace, then in the alternative, you can build a test kernel with netpoll debugging enabled.  You'll need to set up a serial console on the system in question to log the results, but that should tell you what the system is doing right up until the moment is hard locks for you.

Comment 38 Neil Horman 2009-08-27 15:41:41 UTC
Created attachment 358899 [details]
patch to force bh's off in netconsole

would you please give this patch a try and let me know the results.  Thanks!

Comment 39 John Lau 2009-08-31 07:00:09 UTC
It seems that the bh related error message doesn't appear in the log anymore but system still hang after a few minutes of test.

Comment 40 Neil Horman 2009-08-31 12:11:19 UTC
Then were back to needing to get the nmi watchdog to trigger on this system and dump out a message.  Tell me, when the system hangs, is it still responsive to pings?

Comment 41 John Lau 2009-08-31 16:27:32 UTC
No, the system doesn't response to pings.

I confirmed that when I set "nmi_watchdog=panic,1", the NMI count of /proc/interrupt keep increasing. But when the system hanged, I waited for a several minutes and no nmi panic occurred.

Do you have any idea?

Comment 42 Neil Horman 2009-08-31 16:53:10 UTC
what is /proc/sys/kernel/panic_on_unrecovered_nmi set to?

Comment 43 John Lau 2009-09-01 14:52:51 UTC
/proc/sys/kernel/panic_on_unrecovered_nmi is set to 0 by default.

And I have changed it to 1 and redo the test again, the system still doesn't get a nmi panic.

Comment 44 Neil Horman 2009-09-01 17:18:33 UTC
Well, thats no help at all.  I have absolutely no idea why this isnt working for you.  best thing I could imagine to do at this point would be to instrument the kernel to be a bit more forceful in its use of panic.  If the nmi watchdog isn't triggering, then I imagine all the cpus in the system must be in a state of running some softirq, in which they are still responsive to interrupts from the timer, so as to prevent nmi watchdogs triggering a panic, but still not able to make forward progress.  I would say that we need to instrument the kernel in the following manner:

1) add a global flag to netpoll_send_skb.  Set it at the top of the function, and clear it at the bottom

2) in the nmi interrupt handler keep a counter, initalized to zero.  check the flag in (1) from the nmi handler, it its set, increment the counter, if its clear, reset the counter to zero.

3) check the counter in the nmi handler, should its value ever reach some number (say 200 for starters), panic the box.

The idea here is to force a panic after we have spent 'too long' trying to send netconsole messages.  in the normal course of operation the flag in (1) should trip back and forth often, not allowing the counter in (2) to ever increment too many times without getting set back to zero.  The assumption here is that the hang is occuring during the sending of a netconsole frame, so when the hang occurs, the flag will be set, causing the counter to increment for every nmi interrupt we handle.  since the cpus are hung the flag should not get cleared and the counter will roll over its trigger value, and cause a panic.

Do you feel comfortable implementing this?

Comment 52 Neil Horman 2009-09-03 16:54:16 UTC
Created attachment 359702 [details]
netpoll_send_skb patch

Think I found the (or at least a) problem that would result in the stack trace provided.  Under the right conditions its possible that a netpoll_send_skb might loop forever, and with bottom halves disabled, I can see how we might spin forever, blocking the processor from running user space apps, or other softirqs while still servicing hard interrupts.  This patch should fix it.  Please try it out and let me know if it fixes the issue.  thanks!

Comment 53 Neil Horman 2009-09-03 18:10:36 UTC
Created attachment 359712 [details]
nwe patch

new patch.  mmilgram pointed out my counter misuse.  I'd intended to break a pessimal udelay loop early, and tried mistakenly to overload a counter to do it.  Thanks Mike!

Comment 56 Neil Horman 2009-09-08 18:41:44 UTC
Created attachment 360117 [details]
new test patch for netpoll

Ok, let try this, since I can't see any reason why we would loop forever in this code, lets try to make double sure we don't delay to long in it.  This patch does several things:

1) removes all the udelays from the path.  Since the stack traces thus far have been in udelay lets not get caught in it.  Those delays are just there to give more opportuinity to holders of resources to relase them and aren't really needed anyway (strictly speaking)

2) lets limit the number of tries that we wait for the xmit lock or the netdev queue to restart to something much smaller than MAX_RETRIES.  That should bust us out of that loop more quickly if we don't have the resources we need.

3) Limit the number of skbs that we queue. This should prevent us from spending too much time in the skb_dequeue loop

4) reset the tries counter to zero at the bottom of the loop.  In the event we underflow so much we hit tries = 0x7fffffff, we could loop for a very long time.

Comment 59 Neil Horman 2009-09-09 11:04:45 UTC
Did you remember to apply this patch from previously in this bz?
If not, you should have.

Comment 60 Marc Milgram 2009-09-09 12:56:40 UTC
Neil, I verified that the patch you mentioned was included in the kernel being tested.

Comment 61 Neil Horman 2009-09-09 13:28:32 UTC
The last patch shouldn't have had any effect on how bottom halves were enabled in this case.  Do me a favor and remove it (the local_bh_enable patch) to see if the behavior stops.

Comment 65 Neil Horman 2009-09-10 12:53:11 UTC
If you issue a sysrq-t, what is the result?

Comment 66 Neil Horman 2009-09-10 12:53:43 UTC
also, if you do your normal test and hang the system, does our nmi code trigger?

Comment 67 John Lau 2009-09-10 13:28:35 UTC
After it hanged, I pressed sysrq-t but no response.

If I do normal test (i.e. without enable serial console), I cannot get any special messages in netconsole. But I am not sure if there is anything on console because of I did the test remotely. I may try it again tomorrow.

Comment 68 Neil Horman 2009-09-10 14:12:51 UTC
So, are you saying that, with this most recent patch set, we don't get an nmi triggered panic either (like we did in comment 58,55,50)?

Comment 69 John Lau 2009-09-10 14:50:03 UTC
About the most recent patch set, I am not sure if it is nmi triggered panic or not (because it doesn't have any kernel panic related message printed out, as comment 55,50), but I think there is a panic happened because netdump is triggered (although no core is collected successfully).

Comment 70 Neil Horman 2009-09-10 17:11:11 UTC
well, you need to hook up the serial console then, because we need that backtrace to tell whats going on.  Otherwise this is a futile exercize.  I've mentioned this before...

Comment 71 John Lau 2009-09-11 17:01:53 UTC
Comment #64 is the only message I get when I hook up the serial console, and I pressed sysrq-t and no response.

Comment 72 Neil Horman 2009-09-11 17:12:21 UTC
do me a favor, try not starting netconsole at all, then repeat your test please.  I'd like to know if you still see the hang if you only have a serial console enabled

Comment 73 John Lau 2009-09-16 02:39:48 UTC
I connected the system with serial console and I turned off netconsole.  I run the test for a whole day but the system doesn't hang.

Comment 74 Neil Horman 2009-09-16 18:44:20 UTC
Created attachment 361352 [details]
new approach to preventing deadlock.

Ok, heres a new approach.  I'm starting to wonder if we don't have a path in which we can get interrupted in netpoll_send_skb from an irq or bh that results in an eternal spinlock.  If it happens in a bh, then we would never crash on nmi as we would be responsive to irqs.  This patch should correct that, if its the case

Comment 75 Marc Milgram 2009-09-16 19:00:57 UTC

The latest patch that you uploaded is identical to the previous patch that you uploaded.

Comment 76 Neil Horman 2009-09-16 19:52:30 UTC
Created attachment 361362 [details]
new approach to preventing deadlock.

sorry, wrong patch previously.  heres the right one

Comment 81 Neil Horman 2009-09-17 13:30:35 UTC
This sounds like a successful run to me.  The audit code spits out so much information, I expect what you were seeing was the result of frames getting backed up in the netconsole path.  I assume you were running with a serial console enabled here, try runnning without the serial console and see if performance improves somewhat.  Also, after you run the test for a few hours, try discontinuing the test.  If you do, does the system slowly become more responsive.  If so, we can mitigate this by placing a maximum depth on the queue length in the netpoll code.

Comment 83 Neil Horman 2009-09-24 12:48:26 UTC
I'll ask again, if you stop the test, does the responsiveness return to the console?  I understand tht you may not be able to access the box to discontinue the test, but try encoding the test to stop on its own after a while.

Comment 85 Neil Horman 2009-09-25 11:04:03 UTC
can you post the post the output of what you get from that restart attempt?

Comment 88 Neil Horman 2009-10-01 19:32:19 UTC

Thats the upstream commmit.  John can you apply it on top of my patch and see if it helps.  It will certainly clear up the messages.  If it allows the restart to complete, then I think we have a winner in the aggregate of the changes.

Comment 89 Marc Milgram 2009-10-01 20:29:09 UTC
That patch doesn't apply to Red hat Enterprise Linux 4.  I can manually apply it, but it is still basically a no-op.  I could change the WARN_ONs to WARN_ON_ONCEs.

Comment 94 Neil Horman 2009-10-06 13:05:11 UTC
Marc, can you attached the massaged patch that you used please, so I don't need to duplicate that work?  Thanks!

Comment 95 Marc Milgram 2009-10-06 13:26:27 UTC
Created attachment 363832 [details]
Patch that worked correctly in testing

Comment 96 Neil Horman 2009-10-07 15:11:44 UTC
Created attachment 363986 [details]
cleaned up patch with debug code removed

Comment 98 RHEL Program Management 2009-11-24 22:50:25 UTC
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update

Comment 99 Vivek Goyal 2009-12-23 20:40:23 UTC
Committed in 89.18.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 100 Vivek Goyal 2010-01-13 20:34:47 UTC
The patch committed in 89.18 caused one regression. Upon hitting alt-sysrq-t, we can get into infinite loop. Hence another patch is required. Putting this bug back to POST state. Will commit another patch to fix the regression.

Comment 101 Vivek Goyal 2010-01-19 21:06:17 UTC
Committed in 89.19.EL . RPMS are available at http://people.redhat.com/vgoyal/rhel4/

Comment 111 errata-xmlrpc 2011-02-16 15:44:38 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.