Bug 173995
Summary: | tg3 Badness in local_bh_enable at kernel/softirq.c:140 | ||||||
---|---|---|---|---|---|---|---|
Product: | [Fedora] Fedora | Reporter: | Quinton Hoole <quinton> | ||||
Component: | kernel | Assignee: | John W. Linville <linville> | ||||
Status: | CLOSED CANTFIX | QA Contact: | Brian Brock <bbrock> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4 | CC: | davej, williama_lovaton, wtogami | ||||
Target Milestone: | --- | ||||||
Target Release: | --- | ||||||
Hardware: | i686 | ||||||
OS: | Linux | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2006-03-09 19:47:11 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Quinton Hoole
2005-11-23 14:58:31 UTC
Wow, that's a lot of comments, some from a while ago... The fedora-netdev kernels are available here: http://people.redhat.com/linville/kernels/fedora-netdev/ Please try those and post the results here. Also, please post a concise description of the actual problem you are currently experiencing...thanks! Hi John, I'm not the original reporter but I am the one who hijacked it. After a detailed observation, this bug doesn't seem to be crashing the system, it just creates a backtrace in the syslog but the system keeps running fine. The backtrace show up every 2 or 4 days more or less. When does it happen? well, I was talking with the person who does the basic administration (maintenance, backups, monitoring, etc) of that machine and we discovered that those crashes were happening when the backup tool was copying via ftp the backup file (a huge one) of the Oracle datafiles. The copying through the network were done at 5AM and that was exactly the time of the crashes, then we moved the copying earlier to 3AM and the backtraces started to appear at 3AM. So it seems to be triggered when transfering huge files continously. Now, before trying anything else I'll put that server up to date with the latest packages available for FC3. I'm still running 2.6.10-1.770_FC3smp which is pretty old now. I hope I can do that this week or the next. The server is a production system (very critical) and there is no much room for experimental work. Created attachment 123233 [details]
New backtrace with latest FC3 kernel
Sorry but I just yum updated my Oracle server to the newest packages and I
still get these traces. In fact, they seem to happen more frequently now.
This is the 2.6.12-1.1381_FC3smp kernel.
The system doesn't crash at that moment but last saturday the server hard
locked completely and after reboot there was no sign on /var/log/messages. How
could I trace this problem besides looking at syslog?
Besides, what option should I set to get the second NIC to connect at Full Duplex. After boot the first NIC gets to connect at Full Duplext but the second NIC gets to connect only at Half Duplex. At boot time I get the following: dnccor50 kernel: tg3: eth0: Link is up at 100 Mbps, full duplex. dnccor50 kernel: tg3: eth0: Flow control is off for TX and off for RX. dnccor50 kernel: tg3: eth1: Link is up at 100 Mbps, half duplex. dnccor50 kernel: tg3: eth1: Flow control is off for TX and off for RX. We need both of them running at Full Duplex becasue our web server and the database server comunicate through eth1. After boot and with all the proccesses running I issue the following command as root: mii-tool -F 100baseTx-FD Is there a problem doing this?? And why Flow control is off? is that a good thing? This is a mass-update to all currently open kernel bugs. A new kernel update has been released (Version: 2.6.15-1.1830_FC4) based upon a new upstream kernel release. Please retest against this new kernel, as a large number of patches go into each upstream release, possibly including changes that may address this problem. This bug has been placed in NEEDINFO_REPORTER state. Due to the large volume of inactive bugs in bugzilla, if this bug is still in this state in two weeks time, it will be closed. Should this bug still be relevant after this period, the reporter can reopen the bug at any time. Any other users on the Cc: list of this bug can request that the bug be reopened by adding a comment to the bug. If this bug is a problem preventing you from installing the release this version is filed against, please see bug 169613. Thank you. Closed due to lack of response...please reopen when the bug is verified against current kernels...thanks! Good call... I'm still seeing this backtraces every few days but they seem not to be harmful. Sorry I can't test new kernel, this is a production server and I'll stay with official FC3 updates (uptime of 36 days right now under heavy load). I'll see if I can reproduce this when FC5 comes out. ----- This is the BT by the way: Mar 10 05:03:02 dnccor50 kernel: Badness in local_bh_enable at kernel/softirq.c:140 (Not tainted) Mar 10 05:03:02 dnccor50 kernel: [<c0126539>] local_bh_enable+0x64/0x82 Mar 10 05:03:02 dnccor50 kernel: [<c02a1c06>] skb_copy_bits+0x144/0x262 Mar 10 05:03:02 dnccor50 kernel: [<c02a121c>] skb_copy+0x7d/0xb5 Mar 10 05:03:02 dnccor50 kernel: [<f896ede2>] tigon3_4gb_hwbug_workaround+0x17/0x12f [tg3] Mar 10 05:03:02 dnccor50 kernel: [<f896f35c>] tg3_start_xmit+0x407/0x5f6 [tg3] Mar 10 05:03:02 dnccor50 kernel: [<c02b43ce>] qdisc_restart+0x5f/0x208 Mar 10 05:03:02 dnccor50 kernel: [<c02a624c>] dev_queue_xmit+0x1fe/0x2a8 Mar 10 05:03:02 dnccor50 kernel: [<c0116953>] smp_apic_timer_interrupt+0xbd/0xc6 Mar 10 05:03:02 dnccor50 kernel: [<c02c2319>] ip_finish_output+0xd5/0x24e Mar 10 05:03:02 dnccor50 kernel: [<c02c2ada>] ip_queue_xmit+0x256/0x51a Mar 10 05:03:02 dnccor50 kernel: [<c011d47e>] scheduler_tick+0x30f/0x3e9 Mar 10 05:03:02 dnccor50 kernel: [<c011b5aa>] recalc_task_prio+0x8a/0x150 Mar 10 05:03:02 dnccor50 kernel: [<c011b6fa>] activate_task+0x8a/0x99 Mar 10 05:03:02 dnccor50 kernel: [<c011bc5d>] try_to_wake_up+0x292/0x2e4 Mar 10 05:03:02 dnccor50 kernel: [<c0116953>] smp_apic_timer_interrupt+0xbd/0xc6 Mar 10 05:03:02 dnccor50 kernel: [<c02d25a9>] tcp_transmit_skb+0x37a/0x70c Mar 10 05:03:02 dnccor50 kernel: [<c02d3258>] tcp_write_xmit+0x10a/0x2d6 Mar 10 05:03:02 dnccor50 kernel: [<c02d05ec>] __tcp_data_snd_check+0x42/0xbc Mar 10 05:03:02 dnccor50 kernel: [<c02d0e5f>] tcp_rcv_established+0x4eb/0x7b2 Mar 10 05:03:02 dnccor50 kernel: [<c02d9365>] tcp_v4_do_rcv+0xf4/0x109 Mar 10 05:03:02 dnccor50 kernel: [<c02d99f5>] tcp_v4_rcv+0x67b/0x778 Mar 10 05:03:02 dnccor50 kernel: [<c02bf146>] ip_local_deliver+0x94/0x273 Mar 10 05:03:02 dnccor50 kernel: [<c02bf857>] ip_rcv+0x350/0x551 Mar 10 05:03:02 dnccor50 kernel: [<c02a68e5>] netif_receive_skb+0x228/0x279 Mar 10 05:03:02 dnccor50 kernel: [<c02a0b33>] alloc_skb+0x35/0xc9 Mar 10 05:03:02 dnccor50 kernel: [<f896e565>] tg3_rx+0x272/0x3da [tg3] Mar 10 05:03:02 dnccor50 kernel: [<f896e74d>] tg3_poll+0x80/0x16e [tg3] Mar 10 05:03:02 dnccor50 kernel: [<c02a6ad5>] net_rx_action+0x82/0x175 Mar 10 05:03:02 dnccor50 kernel: [<c0126469>] __do_softirq+0x69/0xd5 Mar 10 05:03:02 dnccor50 kernel: [<c0106688>] do_softirq+0x45/0x4c Mar 10 05:03:02 dnccor50 kernel: ======================= Mar 10 05:03:02 dnccor50 kernel: [<c0106577>] do_IRQ+0x57/0x89 Mar 10 05:03:02 dnccor50 kernel: [<c0116953>] smp_apic_timer_interrupt+0xbd/0xc6 Mar 10 05:03:02 dnccor50 kernel: [<c0104a2e>] common_interrupt+0x1a/0x20 Mar 10 05:03:02 dnccor50 kernel: [<c02099df>] acpi_processor_idle+0x105/0x29e Mar 10 05:03:02 dnccor50 kernel: [<c01020d3>] cpu_idle+0x5d/0x6c Mar 10 05:03:02 dnccor50 kernel: [<c03c587a>] start_kernel+0x188/0x1c7 Mar 10 05:03:02 dnccor50 kernel: [<c03c5313>] unknown_bootoption+0x0/0x1b0 Mar 10 05:03:02 dnccor50 kernel: Badness in local_bh_enable at kernel/softirq.c:140 (Not tainted) Mar 10 05:03:02 dnccor50 kernel: [<c0126539>] local_bh_enable+0x64/0x82 Mar 10 05:03:02 dnccor50 kernel: [<c02a1c06>] skb_copy_bits+0x144/0x262 Mar 10 05:03:02 dnccor50 kernel: [<c02a121c>] skb_copy+0x7d/0xb5 Mar 10 05:03:02 dnccor50 kernel: [<f896ede2>] tigon3_4gb_hwbug_workaround+0x17/0x12f [tg3] Mar 10 05:03:02 dnccor50 kernel: [<f896f35c>] tg3_start_xmit+0x407/0x5f6 [tg3] Mar 10 05:03:02 dnccor50 kernel: [<c02b43ce>] qdisc_restart+0x5f/0x208 Mar 10 05:03:02 dnccor50 kernel: [<c02a624c>] dev_queue_xmit+0x1fe/0x2a8 Mar 10 05:03:02 dnccor50 kernel: [<c0116953>] smp_apic_timer_interrupt+0xbd/0xc6 Mar 10 05:03:02 dnccor50 kernel: [<c02c2319>] ip_finish_output+0xd5/0x24e Mar 10 05:03:02 dnccor50 kernel: [<c02c2ada>] ip_queue_xmit+0x256/0x51a Mar 10 05:03:02 dnccor50 kernel: [<c011d47e>] scheduler_tick+0x30f/0x3e9 Mar 10 05:03:02 dnccor50 kernel: [<c011b5aa>] recalc_task_prio+0x8a/0x150 Mar 10 05:03:02 dnccor50 kernel: [<c011b6fa>] activate_task+0x8a/0x99 Mar 10 05:03:02 dnccor50 kernel: [<c011bc5d>] try_to_wake_up+0x292/0x2e4 Mar 10 05:03:02 dnccor50 kernel: [<c0116953>] smp_apic_timer_interrupt+0xbd/0xc6 Mar 10 05:03:02 dnccor50 kernel: [<c02d25a9>] tcp_transmit_skb+0x37a/0x70c Mar 10 05:03:02 dnccor50 kernel: [<c02d3258>] tcp_write_xmit+0x10a/0x2d6 Mar 10 05:03:02 dnccor50 kernel: [<c02d05ec>] __tcp_data_snd_check+0x42/0xbc Mar 10 05:03:02 dnccor50 kernel: [<c02d0e5f>] tcp_rcv_established+0x4eb/0x7b2 Mar 10 05:03:02 dnccor50 kernel: [<c02d9365>] tcp_v4_do_rcv+0xf4/0x109 Mar 10 05:03:02 dnccor50 kernel: [<c02d99f5>] tcp_v4_rcv+0x67b/0x778 Mar 10 05:03:02 dnccor50 kernel: [<c02bf146>] ip_local_deliver+0x94/0x273 Mar 10 05:03:02 dnccor50 kernel: [<c02bf857>] ip_rcv+0x350/0x551 Mar 10 05:03:02 dnccor50 kernel: [<c02a68e5>] netif_receive_skb+0x228/0x279 Mar 10 05:03:02 dnccor50 kernel: [<c02a0b33>] alloc_skb+0x35/0xc9 Mar 10 05:03:02 dnccor50 kernel: [<f896e565>] tg3_rx+0x272/0x3da [tg3] Mar 10 05:03:02 dnccor50 kernel: [<f896e74d>] tg3_poll+0x80/0x16e [tg3] Mar 10 05:03:02 dnccor50 kernel: [<c02a6ad5>] net_rx_action+0x82/0x175 Mar 10 05:03:02 dnccor50 kernel: [<c0126469>] __do_softirq+0x69/0xd5 Mar 10 05:03:02 dnccor50 kernel: [<c0106688>] do_softirq+0x45/0x4c Mar 10 05:03:02 dnccor50 kernel: ======================= Mar 10 05:03:02 dnccor50 kernel: [<c0106577>] do_IRQ+0x57/0x89 Mar 10 05:03:02 dnccor50 kernel: [<c0116953>] smp_apic_timer_interrupt+0xbd/0xc6 Mar 10 05:03:02 dnccor50 kernel: [<c0104a2e>] common_interrupt+0x1a/0x20 Mar 10 05:03:02 dnccor50 kernel: [<c02099df>] acpi_processor_idle+0x105/0x29e Mar 10 05:03:02 dnccor50 kernel: [<c01020d3>] cpu_idle+0x5d/0x6c Mar 10 05:03:02 dnccor50 kernel: [<c03c587a>] start_kernel+0x188/0x1c7 Mar 10 05:03:02 dnccor50 kernel: [<c03c5313>] unknown_bootoption+0x0/0x1b0 |