Bug 127312 - cerberus / accept01 crashes 2.6.7-1.471smp
cerberus / accept01 crashes 2.6.7-1.471smp
Status: CLOSED CURRENTRELEASE
Product: Fedora
Classification: Fedora
Component: kernel (Show other bugs)
rawhide
All Linux
medium Severity high
: ---
: ---
Assigned To: David Miller
Brian Brock
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2004-07-06 11:17 EDT by Rik van Riel
Modified: 2007-11-30 17:10 EST (History)
4 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2005-01-01 23:56:01 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Fix for ipv4 UDP accept() oops. (398 bytes, patch)
2004-07-07 01:05 EDT, David Miller
no flags Details | Diff

  None (edit)
Description Rik van Riel 2004-07-06 11:17:38 EDT
Description of problem:

Unable to handle kernel NULL pointer dereference a0 printing eip:
00000000
*pde = 00003001
Oops: 0000 [#1]
SMP
Modules linked in: nfsd exportfs nfs lockd ipv6 parport_pc lp parport
autofs4 rdCPU:    0
EIP:    0060:[<00000000>]    Not tainted
EFLAGS: 00010246   (2.6.7-1.471smp)
EIP is at 0x0
eax: 057320ec   ebx: 02346dc0   ecx: 0e33cee4   edx: 00000002
esi: 093a59a8   edi: 093a5680   ebp: 0e33c000   esp: 0e33cee0
ds: 007b   es: 007b   ss: 0068
Process accept01 (pid: 4585, threadinfo=0e33c000 task=0e0806d0)
Stack: 022b7b75 ffffffea 02348ae0 093a59a8 02275804 022756f9 00000000
00000000
       00000000 00000000 00000000 030193a0 030193a0 00000c9d 00000000
00000000
       00000000 00000000 0e33cf60 03012e40 03012e40 0000000c 0000000c
0297231c
Call Trace:
 [<022b7b75>] inet_accept+0x1c/0xc2
 [<02275804>] sys_accept+0xa8/0x13f
 [<022756f9>] sys_bind+0x70/0x7e
 [<021590ac>] rw_vm+0x2f0/0x342
 [<02276182>] sys_socketcall+0x8d/0x179
 [<0215b2fc>] sys_close+0xa0/0xd3

Code:  Bad EIP value.


Version-Release number of selected component (if applicable):

2.6.7-1.471smp

The previous kernel didn't crash like this.  This is a regression from
last week's kernel (2.6.7-1.459).
Comment 1 Rik van Riel 2004-07-06 12:32:55 EDT
Reproducible, on the next LTP run (in the same Cerberus run) it
happened again.  The system is still up and running, so here's a
proper (wrapped instead of cut at 80 cols) version of the oops:

 <1>Unable to handle kernel NULL pointer dereference at virtual
address 00000000
 printing eip:
00000000
*pde = 00003001
Oops: 0000 [#3]
SMP
Modules linked in: nfsd exportfs nfs lockd ipv6 parport_pc lp parport
autofs4 rfcomm l2cap bluetooth sunrpc tlan floppy sg microcode dm_mod
usb_storage ohci_hcd ehci_hcd ext3 jbd sym53c8xx scsi_transport_spi
cpqarray sd_mod scsi_mod
CPU:    1
EIP:    0060:[<00000000>]    Not tainted
EFLAGS: 00010246   (2.6.7-1.471smp)
EIP is at 0x0
eax: 0374f2bc   ebx: 02346dc0   ecx: 08d7eee4   edx: 00000002
esi: 0daa71c4   edi: 08d4a814   ebp: 08d7e000   esp: 08d7eee0
ds: 007b   es: 007b   ss: 0068
Process accept01 (pid: 15145, threadinfo=08d7e000 task=07120bf0)
Stack: 022b7b75 ffffffea 02348ae0 0daa71c4 02275804 022756f9 00000000
00000000
       00000000 00000000 00000000 030791e0 030791e0 00003c8f 00000000
00000000
       00000000 00000000 08d7ef60 0318f7e0 0318f7e0 0000000c 0000000c
0e7bfb0c
Call Trace:
 [<022b7b75>] inet_accept+0x1c/0xc2
 [<02275804>] sys_accept+0xa8/0x13f
 [<022756f9>] sys_bind+0x70/0x7e
 [<021590ac>] rw_vm+0x2f0/0x342
 [<02276182>] sys_socketcall+0x8d/0x179
 [<0215b2fc>] sys_close+0xa0/0xd3
Code:  Bad EIP value.
Comment 2 Bob Gustafson 2004-07-06 15:09:36 EDT
Basically the same thing. Am currently running on the Single Processor
version 471

/var/log/messages fragment showing attempt to reboot into 471smp and
then begining of subsequent reboot into UP 471

Jul  5 23:35:58 hoho2 shutdown: shutting down for system reboot
Jul  5 23:35:58 hoho2 init: Switching to runlevel: 6
Jul  5 23:35:58 hoho2 su(pam_unix)[4686]: session closed for user root
Jul  5 23:35:58 hoho2 login(pam_unix)[2850]: session closed for user user1
Jul  5 23:35:59 hoho2 init: no more processes left in this runlevel
Jul  5 23:35:59 hoho2 kernel: Unable to handle kernel NULL pointer
dereference at virtual address 00000000
Jul  5 23:35:59 hoho2 kernel:  printing eip:
Jul  5 23:35:59 hoho2 kernel: 021c9811
Jul  5 23:35:59 hoho2 kernel: *pde = 00000000
Jul  5 23:35:59 hoho2 kernel: Oops: 0000 [#1]
Jul  5 23:35:59 hoho2 kernel: Modules linked in: radeon snd_mixer_oss
snd_intel8x0 snd_ac97_codec snd_pcm snd_timer snd_page_alloc gameport
snd_mpu401_uart snd_rawmidi snd_seq_device snd soundcore usbserial
parport_pc lp parport autofs4 nfs lockd sunrpc ipv6 e1000 ipt_REJECT
ipt_state ip_conntrack iptable_filter ip_tables floppy sg microcode
dm_mod uhci_hcd ehci_hcd button battery asus_acpi ac ext3 jbd aic79xx
sd_mod scsi_mod
Jul  5 23:35:59 hoho2 kernel: CPU:    0
Jul  5 23:35:59 hoho2 kernel: EIP:    0060:[<021c9811>]    Not tainted
Jul  5 23:35:59 hoho2 kernel: EFLAGS: 00010216   (2.6.7-1.469)
Jul  5 23:35:59 hoho2 kernel: EIP is at vt_ioctl+0x1d/0x17bc
Jul  5 23:35:59 hoho2 kernel: eax: 00000000   ebx: 00005401   ecx:
00005401   edx: 274f5a80
Jul  5 23:35:59 hoho2 kernel: esi: fee826d4   edi: 41f53800   ebp:
00005401   esp: 27bffe88
Jul  5 23:35:59 hoho2 kernel: ds: 007b   es: 007b   ss: 0068
Jul  5 23:35:59 hoho2 kernel: Process rc (pid: 4787,
threadinfo=27bff000 task=4125f3e0)
Jul  5 23:35:59 hoho2 kernel: Stack: 415ad9ac 037ee040 0212bad8
00000001 00000000 415ad914 3a2190c8 27bffee4
Jul  5 23:35:59 hoho2 kernel:        404f2318 037ee040 0eb4d004
00000000 40fcc000 02136100 4144a560 fffffdee
Jul  5 23:35:59 hoho2 kernel:        27bfff68 27bfff04 00000000
415ad9ac 006eecc0 404f2318 40967a80 00000001
Jul  5 23:35:59 hoho2 kernel: Call Trace:
Jul  5 23:35:59 hoho2 kernel:  [<0212bad8>] filemap_nopage+0x13b/0x25f
Jul  5 23:35:59 hoho2 kernel:  [<02136100>] do_no_page+0x215/0x249
Jul  5 23:35:59 hoho2 kernel:  [<02136252>] handle_mm_fault+0x70/0xe8
Jul  5 23:35:59 hoho2 kernel:  [<021134e6>] do_page_fault+0x169/0x492
Jul  5 23:35:59 hoho2 kernel:  [<02147532>] cdev_put+0xc/0x2c
Jul  5 23:35:59 hoho2 kernel:  [<0214063e>] dentry_open+0x15b/0x178
Jul  5 23:35:59 hoho2 kernel:  [<021c5ed7>] tty_ioctl+0x30f/0x344
Jul  5 23:35:59 hoho2 kernel:  [<0214e472>] sys_ioctl+0x1fd/0x233
Jul  5 23:35:59 hoho2 kernel: Code: 8b 28 8b 0c ad 20 30 38 02 89 e8
89 4c 24 2c e8 e1 52 00 00
Jul  5 23:35:59 hoho2 kernel:  <3>vt: argh, driver_data is NULL !
Jul  5 23:35:59 hoho2 kernel: vt: argh, driver_data is NULL !
Jul  5 23:36:00 hoho2 last message repeated 95 times
Jul  5 23:36:36 hoho2 shutdown: shutting down for system halt
Jul  5 23:36:36 hoho2 init: Switching to runlevel: 0
Jul  5 23:36:36 hoho2 kernel: vt: argh, driver_data is NULL !
Jul  5 23:43:55 hoho2 syslogd 1.4.1: restart.
Comment 3 Bob Gustafson 2004-07-06 15:11:38 EDT
See also bug 126947
Comment 4 Bob Gustafson 2004-07-06 15:23:14 EDT
Above Oops was probably from 469smp, not 471smp. I got 471 after the
dates shown above.

Still cannot boot 471smp, but don't see the Oops in the messages file.
Comment 5 David Miller 2004-07-06 16:43:05 EDT
Something appears to be severly problematic in the 471smp
kernels.  Perhaps some non-networking change causes
socket memory to be corrupted or similar.

I really don't think this is a networking bug, and that
once the cause of the 471smp boot problems is solved this
accept01 crash will go away too.

Therefore putting into needinfo state.  When the 471smp
boot problems are resolved, whether the accept01 crash
still occurs should be stated in this bug, and bug state
advanced as appropriate.

Comment 6 David Miller 2004-07-07 00:55:32 EDT
I can reproduce this now locally, moving back to ASSIGNED.
Comment 7 David Miller 2004-07-07 01:06:00 EDT
Created attachment 101675 [details]
Fix for ipv4 UDP accept() oops.

During a recent cleanup, the ipv4 datagram accept() method
pointer was erroneously set to inet_accept instead of
sock_no_accept.  This reverts that mistake to fix the bug.
Comment 8 Dave Jones 2004-11-27 17:33:01 EST
mass update for old bugs:

Is this still a problem in the 2.6.9 based kernel update ?

Note You need to log in before you can comment on or make changes to this bug.