Bug 145959

Summary:	kernel BUG at net/core/skbuff.c:228!
Product:	[Fedora] Fedora	Reporter:	Xavier Mertens <xavier>
Component:	kernel	Assignee:	John W. Linville <linville>
Status:	CLOSED WONTFIX	QA Contact:
Severity:	medium	Docs Contact:
Priority:	medium
Version:	rawhide	CC:	davej, wtogami, xavier
Target Milestone:	---
Target Release:	---
Hardware:	i686
OS:	Linux
Whiteboard:
Fixed In Version:		Doc Type:	Bug Fix
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2005-05-13 15:25:35 UTC	Type:	---
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Xavier Mertens 2005-01-24 13:29:25 UTC

Description of problem:
Server is a Dell poweredge running a brand now FC3, with all latest
RPMs. The box runs an Apache as reverse-proxy. 
Sudenly, no more traffic going out via eth0. No message in the logs.
Logged via a serial console and issued a "ifconfig eth0 down". Lost
even the serial console and was forced to powercycle the box.


Version-Release number of selected component (if applicable):
2.6.10-1.741_FC3smp


How reproducible: No idea.

Comment 1 Xavier Mertens 2005-01-24 13:30:21 UTC

[root@rproxy2 ~]# ifconfig eth0 down
Warning: kfree_skb passed an skb still on a list (from c0267d93).
------------[ cut here ]------------
kernel BUG at net/core/skbuff.c:228!
invalid operand: 0000 [#1]
SMP
Modules linked in: ipt_state ip_conntrack ipt_LOG iptable_filter
ip_tables md5 ipv6 i2c_dev i2c_core dm_mod video button battery ac
uhci_hcd ehci_hcd e1000 floppy sg ext3 jbd megaraid_mbox megaraid_mm
sd_mod scsi_mod
CPU:    1
EIP:    0060:[<c0267cb8>]    Not tainted VLI
EFLAGS: 00010206   (2.6.10-1.741_FC3smp)
EIP is at __kfree_skb+0x19/0xf7
eax: 00000045   ebx: f79c33b8   ecx: c66cceb8   edx: c02ef542
esi: f79c3240   edi: f88ad1f4   ebp: 00000019   esp: c66cceb4
ds: 007b   es: 007b   ss: 0068
Process ifconfig (pid: 20127, threadinfo=c66cc000 task=c3fc9a40)
Stack: c02ef542 c0267d93 dd8a8c80 f79c33b8 f88e5627 00000100 000001f4
f79c3240
       f79c3000 00001003 00000000 f88e4121 f88e4a1a f79c3240 00001042
f88e4a25
       f79c3000 c026c38a f79c3000 c026d444 f7205780 ffffff9d f72057ac
f7446080
Call Trace:
 [<c0267d93>] __kfree_skb+0xf4/0xf7
 [<f88e5627>] e1000_clean_rx_ring+0x48/0xf2 [e1000]
 [<f88e4121>] e1000_down+0x82/0xc1 [e1000]
 [<f88e4a1a>] e1000_close+0x0/0x1d [e1000]
 [<f88e4a25>] e1000_close+0xb/0x1d [e1000]
 [<c026c38a>] dev_close+0x57/0x77
 [<c026d444>] dev_change_flags+0x48/0xee
 [<c02a2345>] devinet_ioctl+0x26e/0x4de
 [<c02a3e3d>] inet_ioctl+0x79/0xa5
 [<c026522b>] sock_ioctl+0x22a/0x238
 [<c0160adb>] sys_ioctl+0x1d5/0x1f2
 [<c0103c97>] syscall_call+0x7/0xb
Code: e8 93 ff ff ff 89 da 5b a1 58 2e 43 c0 e9 5b 75 ed ff 53 52 89
04 24 83 78 08 00 74 18 ff 74 24 fc 68 42 f5 2e c0 e8 a5 62 eb ff <0f>
0b e4 00 0c f5 2e c0 59 5b 8b 04 24 8b 58 30 85 db 74 2c 8b
 Segmentation fault

Comment 2 Xavier Mertens 2005-01-25 10:17:08 UTC

Hi,
Less than 24 hours later, same problem: new kernel panic!
Here is the dump:

Unable to handle kernel NULL pointer dereference at virtual address
0000000c
 printing eip:
f89710e0
*pde = 375d6001
Oops: 0000 [#1]
SMP
Modules linked in: ipt_state ip_conntrack ipt_LOG iptable_filter
ip_tables md5 ipv6 i2c_dev i2c_core dm_mod video button battery ac
uhci_hcd ehci_hcd e1000 floppy sg ext3 jbd megaraid_mbox megaraid_mm
sd_mod scsi_mod
CPU:    1
EIP:    0060:[<f89710e0>]    Not tainted VLI
EFLAGS: 00010246   (2.6.10-1.741_FC3smp)
EIP is at ipt_do_table+0xc4/0x2fc [ip_tables]
eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: f8b9c04c   edi: e0585020   ebp: 00000070   esp: c03ace50
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03ac000 task=f7f48530)
Stack: f8b9bc08 f8b9b180 f8976080 f7547000 ffffffff 00000000 00000000
f7547000
       00000001 c03aced8 00000000 c03aced8 c03acedc c0433b88 00000001
f893b017
       00000000 f893bac0 00000000 f893bb00 c027476f 00000000 c0282153
c03aced8
Call Trace:
 [<f893b017>] ipt_hook+0x17/0x1c [iptable_filter]
 [<c027476f>] nf_iterate+0x40/0x81
 [<c0282153>] ip_local_deliver_finish+0x0/0x188
 [<c0274a6d>] nf_hook_slow+0x47/0xb4
 [<c0282153>] ip_local_deliver_finish+0x0/0x188
 [<c028214c>] ip_local_deliver+0x1d7/0x1de
 [<c0282153>] ip_local_deliver_finish+0x0/0x188
 [<c0282891>] ip_rcv_finish+0x1b7/0x202
 [<c0274aa9>] nf_hook_slow+0x83/0xb4
 [<c0282695>] ip_rcv+0x3ba/0x3ff
 [<c02826da>] ip_rcv_finish+0x0/0x202
 [<c026cda7>] netif_receive_skb+0x1de/0x20c
 [<f88e75d7>] e1000_clean_rx_irq+0x2fe/0x36b [e1000]
 [<f88e6fba>] e1000_clean+0x3d/0xaf [e1000]
 [<c026cf33>] net_rx_action+0x61/0xd8
 [<c0121f60>] __do_softirq+0x4c/0xb1
 [<c0105d9f>] do_softirq+0x41/0x48
 =======================
 [<c0105cd0>] do_IRQ+0x74/0x7e
 [<c010467e>] common_interrupt+0x1a/0x20
 [<c01020e8>] mwait_idle+0x33/0x42
 [<c01020a0>] cpu_idle+0x26/0x3b
Code: 54 24 20 8b 7c 24 04 03 7c 91 20 03 74 91 0c 89 3c 24 8b 44 24
24 8b 10 8b 46 54 09 82 84 00 00 00 8b 54 24 14 8b 0e 0f b6 5e 53 <8b>
42 0c 8b 56 08 f6 c3 08 74 0c 21 d0 39 c8 0f 84 e7 01 00 00
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Comment 3 John W. Linville 2005-02-25 19:18:42 UTC

Let's try an updated e1000 driver...available here:

   http://people.redhat.com/linville/kernels/fc3/

Please give that a try and see if it works any better for you...let me
know...thanks!

Comment 4 John W. Linville 2005-03-09 13:26:45 UTC

Xavier, any word on the results of testing the kernels w/ the update
e1000 driver?

Comment 5 Xavier Mertens 2005-03-09 13:50:06 UTC

Hi John,

I was quite busy until today... :-/
I'll test your patched kernel tomorrow or today.
I keep you informed.

Regards,
Xavier

Comment 6 Xavier Mertens 2005-03-09 20:06:50 UTC

Hi John,

Seems to be ok. Server is up for more than 5 hours without problem.
Let's wait 24h. Thanks for providing a patch!

Comment 7 Xavier Mertens 2005-03-10 13:38:56 UTC

Hi John,
Server crashed once again :(

Unable to handle kernel NULL pointer dereference at virtual address
0000000c
 printing eip:
f89710e0
*pde = 171cc001
Oops: 0000 [#1]
SMP
Modules linked in: ipt_state ip_conntrack ipt_LOG iptable_filter
ip_tables md5
CPU:    1
EIP:    0060:[<f89710e0>]    Not tainted VLI
EFLAGS: 00010246   (2.6.10-1.769.2.3_FC3.jwltest.1smp)
EIP is at ipt_do_table+0xc4/0x2fc [ip_tables]
eax: 00000000   ebx: 00000000   ecx: 00000000   edx: 00000000
esi: f8bc018c   edi: d883c020   ebp: 00000070   esp: c03abe50
ds: 007b   es: 007b   ss: 0068
Process swapper (pid: 0, threadinfo=c03ab000 task=f7f48540)
Stack: f8bbfd48 f8bbf200 f8976080 f79ff000 ffffffff 00000000 00000000
f79ff000
       00000001 c03abed8 00000000 c03abed8 c03abedc c0433ce8 00000001
f893b017
       00000000 f893bac0 00000000 f893bb00 c027517b 00000000 c0282bfb
c03abed8
Call Trace:
 [<f893b017>] ipt_hook+0x17/0x1c [iptable_filter]
 [<c027517b>] nf_iterate+0x40/0x81
 [<c0282bfb>] ip_local_deliver_finish+0x0/0x188
 [<c0275479>] nf_hook_slow+0x47/0xb4
 [<c0282bfb>] ip_local_deliver_finish+0x0/0x188
 [<c0282bf4>] ip_local_deliver+0x1d7/0x1de
 [<c0282bfb>] ip_local_deliver_finish+0x0/0x188
 [<c0283339>] ip_rcv_finish+0x1b7/0x202
 [<c02754b5>] nf_hook_slow+0x83/0xb4
 [<c028313d>] ip_rcv+0x3ba/0x3ff
 [<c0283182>] ip_rcv_finish+0x0/0x202
 [<c026d7b3>] netif_receive_skb+0x1de/0x20c
 [<f88e7125>] e1000_clean_rx_irq+0x34a/0x3b9 [e1000]
 [<f88e6b7d>] e1000_clean+0x40/0xd4 [e1000]
 [<c026d93f>] net_rx_action+0x61/0xd8
 [<c012211c>] __do_softirq+0x4c/0xb1
 [<c0105db7>] do_softirq+0x41/0x48
 =======================
 [<c0105ce8>] do_IRQ+0x74/0x7e
 [<c010467e>] common_interrupt+0x1a/0x20
 [<c01020e8>] mwait_idle+0x33/0x42
 [<c01020a0>] cpu_idle+0x26/0x3b
Code: 54 24 20 8b 7c 24 04 03 7c 91 20 03 74 91 0c 89 3c 24 8b 44 24
24 8b 10
 <0>Kernel panic - not syncing: Fatal exception in interrupt

Comment 8 John W. Linville 2005-03-10 15:37:54 UTC

Hmmm...the crash is happening in the netfilter code...it isn't clear
to me that this is actually an e1000 problem...

Would you mind attaching your iptables configuration?  If you don't
want to do so publicly, you can send them directly to me via e-mail. 
If you are even too paranoid for that, we can probably work-out
something else... :-)

Comment 9 Xavier Mertens 2005-03-10 18:24:02 UTC

Here we go...

IPT="/sbin/iptables"
MPB="/sbin/modprobe"
LSM="/sbin/lsmod"

# Get out IP config
LAN_IF=eth0
IP=`/sbin/ifconfig $LAN_IF | grep inet | cut -d : -f 2 | cut -d \  -f 1`
MASK=`/sbin/ifconfig $LAN_IF | grep Mas | cut -d : -f 4`
NET=$IP/$MASK
echo "Firewall applied on: $LAN_IF/$NET"

# Flush and zero the chains.
$IPT -F
$IPT -X
$IPT -Z

# Delete `nat' and `mangle' chains.
if ( $LSM | /bin/grep iptable_mangle > /dev/null ); then
$IPT -t mangle -F
fi
if ( $LSM | /bin/grep iptable_nat > /dev/null ); then
$IPT -t nat -F
fi

# Create a new log and drop (LD) convenience chain.
$IPT -N LD
$IPT -A LD -j LOG
$IPT -A LD -j DROP

STOP=LD

TOSOPT=8

# Allow all traffic on the loopback interface
$IPT -t filter -A INPUT -i lo -s 127.0.0.0/8 -d 127.0.0.0/8 -j ACCEPT
$IPT -t filter -A OUTPUT -o lo -s 127.0.0.0/8 -d 127.0.0.0/8 -j ACCEPT

# Turn on source address verification in kernel
if [ -e /proc/sys/net/ipv4/conf/all/rp_filter ]; then
  for f in /proc/sys/net/ipv4/conf/*/rp_filter
  do
   echo 2 > $f
  done
fi

# Turn on syn cookies protection in kernel
if [ -e /proc/sys/net/ipv4/tcp_syncookies ]; then
  echo 1 > /proc/sys/net/ipv4/tcp_syncookies
fi

# ICMP Dead Error Messages protection
if [ -e /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses ]; then
  echo 1 > /proc/sys/net/ipv4/icmp_ignore_bogus_error_responses
fi

# ICMP Broadcasting protection
if [ -e /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts ]; then
  echo 1 > /proc/sys/net/ipv4/icmp_echo_ignore_broadcasts
fi

# Turn off dynamic TCP/IP address hacking
if [ -e /proc/sys/net/ipv4/ip_dynaddr ]; then
  echo 0 > /proc/sys/net/ipv4/ip_dynaddr
fi

# Doubling current limit for ip_conntrack
if [ -e /proc/sys/net/ipv4/ip_conntrack_max ]; then
  echo 16384 > /proc/sys/net/ipv4/ip_conntrack_max
fi

# Accept ESTABLISHED sessions
$IPT  -A INPUT -m state --state ESTABLISHED,RELATED -j ACCEPT

# Allow SSH
$IPT -t filter -A INPUT -p tcp -s 0/0 -d $NET --dport 22 -j ACCEPT

# Allow remote backup start from depot001
$IPT -t filter -A INPUT -p tcp -s 10.50.10.215/32 -d $NET --dport 2988
-j ACCEPT
# Monitoring
# tcp/1040 -> Nagios
# icmp
$IPT -t filter -A INPUT -p tcp -s 10.50.10.5/32 -d $NET --dport 1040
-j ACCEPT
$IPT -t filter -A INPUT -p icmp -s 10.50.10.5/32 -d $NET -j ACCEPT

# ntp.belga.be
$IPT -t filter -A INPUT -p tcp -s 10.50.10.5/32 -d $NET --dport 123 -j
ACCEPT
$IPT -t filter -A INPUT -p udp -s 10.50.10.5/32 -d $NET --dport 123 -j
ACCEPT

# Allow HTTP (Reverse proxy)
$IPT -t filter -A INPUT -p tcp -s 0/0 -d $NET --dport 80 -j ACCEPT
$IPT -t filter -A INPUT -p tcp -s 10.50.10.247 --sport 80 -d $NET -j
ACCEPT
$IPT -t filter -A INPUT -p tcp -s 0/0 -d $NET --dport 443 -j ACCEPT

$IPT -A OUTPUT -j ACCEPT

# ---------------------------
# Do not log annoying traffic
# ---------------------------

# SMB
$IPT -t filter -A INPUT -p udp -s 0/0 -d $NET --dport 137:139 -j DROP
# IGMP
$IPT -t filter -A INPUT -p 2 -s 0/0 -d 0/0 -j DROP
# BOOTP
$IPT -t filter -A INPUT -p udp -s 0/0 -d 0/0 --dport 67:68 -j DROP
# UDP/694 (ftp heartbeat)
$IPT -t filter -A INPUT -p udp -s 192.168.2.0/24 -d 0/0 --dport 694 -j
DROP

# Deny everything not let through earlier
$IPT -A INPUT -j $STOP

Comment 10 John W. Linville 2005-03-19 04:26:29 UTC

Xavier,

I've been doing some work w/ the e1000 driver for another issue.  In fact, some
of the symptoms with that issue look very much like what you have reported here.

Along the way, I've updated to a later version and added a couple of fixes.  I'd
like for you to re-conduct your tests with the newer version.

I have pre-built test kernels in the same plase as in comment 3.  Please attempt
to recreate the issue with those kernels and post the results.  Thanks!

Comment 11 John W. Linville 2005-05-02 14:35:47 UTC

The e1000 drivers in the kernels @ comment 3 have been updated yet again.  
Please give them a try and report back the results ASAP.  Thanks!

Comment 12 John W. Linville 2005-05-13 15:25:35 UTC

Closing due to lack of response.  Please reopen with the results of running 
with the lastest kernels from the location in comment 3 if the problem 
persists.  Thanks!