Bug 132153

Summary: 643 netconsole oops
Product: [Fedora] Fedora Reporter: Warren Togami <wtogami>
Component: kernelAssignee: Jeff Moyer <jmoyer>
Status: CLOSED RAWHIDE QA Contact: Brian Brock <bbrock>
Severity: medium Docs Contact:
Priority: medium    
Version: rawhideCC: davej, jmoyer, jturner, wtogami
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2004-11-30 22:13:57 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Fix for dereferencing a null ifa_list in netpoll_setup.
none
Fix for netconsole module init code none

Description Warren Togami 2004-09-09 10:40:18 UTC
Description of problem:
Unable to handle kernel NULL pointer dereference at virtual address
00000010
 printing eip:
022af966
*pde = 00000000
Oops: 0000 [#1]
Modules linked in: netconsole md5 ipv6 parport_pc lp parport autofs4
ds sunrpc microcode dm_mod button battery asus_acpi ac yenta_socket
pcmcia_core uhci_hcd ehci_hcd radeonfb i2c_algo_bit i2c_core
snd_intel8x0m snd_intel8x0 snd_ac97_codec snd_pcm_oss snd_mixer_oss
snd_pcm snd_timer snd_page_alloc gameport snd_mpu401_uart snd_rawmidi
snd_seq_device snd soundcore e1000 aes_i586 airo xfs
CPU:    0
EIP:    0060:[<022af966>]    Not tainted VLI
EFLAGS: 00010206   (2.6.8-1.551)
EIP is at netpoll_setup+0x200/0x389
eax: 00000000   ebx: 428fe7ec   ecx: 428fe7ec   edx: 428fe7ec
esi: 428fe7c0   edi: 409eb000   ebp: fffd5105   esp: 3b993f84
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 3933, threadinfo=3b993000 task=3b0959b0)
Stack: 428fe7c4 00000000 ffffffff 00000000 428fec21 02346440 428fe0da
02346480
       428fe900 02346440 0213aa37 3b993fc4 08e85970 00000000 3b993000
ffff4200
       08e8b8b8 0000123c 08e85970 08e85970 00000000 feea6b68 00000080
0000007b
Call Trace:
 [<428fe0da>] init_netconsole+0x53/0xb6 [netconsole]
 [<0213aa37>] sys_init_module+0x1fd/0x2e5
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():0[expected: 0], irqs_disabled():1
 [<0211c62b>] __might_sleep+0x7d/0x8a
 [<0215bcfe>] rw_vm+0x216/0x482
 [<022af93b>] netpoll_setup+0x1d5/0x389
 [<022af93b>] netpoll_setup+0x1d5/0x389
 [<0215c450>] get_user_size+0x30/0x57
 [<022af93b>] netpoll_setup+0x1d5/0x389
 [<021067ef>] show_registers+0x109/0x15e
 [<021069f3>] die+0x14a/0x248
 [<0211911a>] do_page_fault+0x0/0x50d
 [<0211911a>] do_page_fault+0x0/0x50d
 [<021194cb>] do_page_fault+0x3b1/0x50d
 [<022af966>] netpoll_setup+0x200/0x389
 [<0211ac67>] activate_task+0x53/0x5f
 [<0211ccdc>] autoremove_wake_function+0xd/0x2d
 [<0211b6f3>] __wake_up_common+0x36/0x51
 [<0211b79b>] __wake_up+0x8d/0xf2
 [<0211911a>] do_page_fault+0x0/0x50d
 [<428f007b>] snd_pcm_oss_playback_ready+0x2b/0x57 [snd_pcm_oss]
 [<022af966>] netpoll_setup+0x200/0x389
 [<428fe0da>] init_netconsole+0x53/0xb6 [netconsole]
 [<0213aa37>] sys_init_module+0x1fd/0x2e5
 Bad EIP value.

Version-Release number of selected component (if applicable):
kernel-2.6.8-1.551

Comment 1 Jeff Moyer 2004-10-12 12:51:11 UTC
Dave already has a patch set which addresses this.

Comment 2 Dave Anderson 2004-10-13 19:53:13 UTC
A patch that addresses this problem was part of a
netdump/netconsole/diskdump patch set that was committed
to the kernel/devel tree yesterday (10/12):

http://post-office.corp.redhat.com/archives/cvs-commits-list/2004-October/msg01066.html



Comment 3 Bill Nottingham 2004-10-15 15:45:57 UTC
Please test with current kernels.

Comment 4 Warren Togami 2004-10-15 22:52:17 UTC
Still happens in 2.6.8-1.610

Comment 5 Dave Anderson 2004-10-18 13:16:05 UTC
What version of the netdump user package is running on the
client machine?  

Comment 6 Warren Togami 2004-10-18 13:19:07 UTC
This is not netdump, but netconsole, similar in purpose, but it works
on all archs (rather than just x86 netdump).  It utilizes plain UDP
and no authentication.

Comment 7 Jeff Moyer 2004-10-18 13:55:41 UTC
How are you loading the module?  What parameters are passed?  I cannot
reproduce your problem on the 610 kernel.  What hardware are you
using? UP or SMP kernel?

-Jeff

Comment 8 Warren Togami 2004-10-18 14:28:38 UTC
modprobe netconsole netconsole=@/eth1,6667.16.102/

The OOPS happens if:

1) If you attempted to use netconsole on a device that does not
support polling.  dmesg will tell you if it failed.
2) During module unload after it failed.

Comment 9 Jeff Moyer 2004-10-18 15:01:55 UTC
Back to my question, what hardware are you using?  Namely, which
ethernet card?  Do we have access to the system displaying the problem?

Comment 10 Warren Togami 2004-10-18 15:07:59 UTC
Oops, I have seen this problem on airo.ko and 3c59x.ko.  I have access
to this hardware, one being my laptop.


Comment 11 Bill Nottingham 2004-10-20 04:54:51 UTC
Does this still happen with .639?

Comment 12 Warren Togami 2004-10-20 04:57:15 UTC
It did in .637 earlier today.  Really need 639 tested?

Comment 13 Bill Nottingham 2004-10-20 05:03:59 UTC
Nah, that should be close enough.

Comment 14 Jeff Moyer 2004-10-27 18:57:30 UTC
Created attachment 105861 [details]
Fix for dereferencing a null ifa_list in netpoll_setup.

Warren, can you try the attached patch, please?

Comment 15 Warren Togami 2004-10-28 04:21:47 UTC
Tried 643 + your patch on i686.

netconsole: local port 6665
netconsole: interface eth0
netconsole: remote port 6667
netconsole: remote IP 172.31.16.102
netconsole: remote ethernet address ff:ff:ff:ff:ff:ff
netconsole: eth0 doesn't support polling, aborting.
netconsole: failed to configure syslog service
netconsole: network logging started
Unable to handle kernel NULL pointer dereference at virtual address
00000184
 printing eip:
022b710c
*pde = 00000000
Oops: 0002 [#1]
Modules linked in: netconsole radeon md5 ipv6 parport_pc lp parport
autofs4 i2c_dev i2c_core ds sunrpc microcode dm_mod button battery ac
yenta_socket pcmcia_core uhci_hcd ehci_hcd snd_intel8x0m snd_intel8x0
snd_ac97_codec snd_pcm_oss snd_mixer_oss snd_pcm snd_timer
snd_page_alloc gameport snd_mpu401_uart snd_rawmidi snd_seq_device snd
soundcore e1000 aes_i586 airo xfs
CPU:    0
EIP:    0060:[<022b710c>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-1.643.builder3)
EIP is at netpoll_cleanup+0x143/0x152
eax: 00000000   ebx: 42b14cc0   ecx: 0234f600   edx: 42b15180
esi: 023523e0   edi: 00000000   ebp: 36c26000   esp: 36c26f60
ds: 007b   es: 007b   ss: 0068
Process modprobe (pid: 3808, threadinfo=36c26000 task=3659e030)
Stack: 42b15200 023523e0 42b14278 02139c39 00000000 6374656e 6f736e6f
0000656c
       00000000 39512a00 f6fff000 f7000000 021565c3 39512a00 3bc6cba0
021569a2
       3bc6c660 39512a00 36c26fc0 00000004 0211937e 36c26fc4 00000000
0892d0e8
Call Trace:
 [<42b14278>] cleanup_netconsole+0x1d/0x31 [netconsole]
 [<02139c39>] sys_delete_module+0x132/0x179
 [<021565c3>] unmap_vma_list+0xe/0x17
 [<021569a2>] do_munmap+0x20e/0x218
 [<0211937e>] do_page_fault+0x0/0x511
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():0[expected: 0], irqs_disabled():1
 [<0211c8b9>] __might_sleep+0x7d/0x88
 [<0215e282>] rw_vm+0x216/0x482
 [<022b70e1>] netpoll_cleanup+0x118/0x152
 [<022b70e1>] netpoll_cleanup+0x118/0x152
 [<0215e9d4>] get_user_size+0x30/0x57
 [<022b70e1>] netpoll_cleanup+0x118/0x152
 [<0210682b>] show_registers+0x109/0x15e
 [<02106a2f>] die+0x14a/0x241
 [<0211937e>] do_page_fault+0x0/0x511
 [<0211937e>] do_page_fault+0x0/0x511
 [<02119733>] do_page_fault+0x3b5/0x511
 [<022b710c>] netpoll_cleanup+0x143/0x152
 [<02153a08>] do_no_page+0x3b5/0x434
 [<02153c7c>] handle_mm_fault+0xe4/0x21e
 [<02151de6>] follow_page_pfn+0xec/0xfd
 [<0211937e>] do_page_fault+0x0/0x511
 [<022b710c>] netpoll_cleanup+0x143/0x152
 [<42b14278>] cleanup_netconsole+0x1d/0x31 [netconsole]
 [<02139c39>] sys_delete_module+0x132/0x179
 [<021565c3>] unmap_vma_list+0xe/0x17
 [<021569a2>] do_munmap+0x20e/0x218
 [<0211937e>] do_page_fault+0x0/0x511
 Bad EIP value.


Comment 16 Jeff Moyer 2004-10-28 15:23:52 UTC
Created attachment 105897 [details]
Fix for netconsole module init code

Warren, I've tested this patch and it works for me.  Please give it a try and
let me know the results.

Comment 17 Jeff Moyer 2004-10-28 15:24:24 UTC
I should note that this patch should be applied in addition to the previous one.

Comment 18 Warren Togami 2004-10-28 17:28:51 UTC
You mean patch in Comment #14 and Comment #16 should be applied?


Comment 19 Jeff Moyer 2004-10-28 17:30:51 UTC
Yes, that is correct.

Thanks!

Comment 20 Warren Togami 2004-10-29 07:14:27 UTC
The combination of those two patches prevent the module from loading
in error cases like this, so it avoids the oops during unloading.  Is
this intended?

I wont be able to test the normal non-error case until the weekend.

Comment 21 Jeff Moyer 2004-10-29 13:18:51 UTC
Yes, basically, here is what is happening.  You don't specify a local IP
address, and one cannot be obtained automatically.  In this case, how are we
supposed to send IP datagrams?  We can't.  So, if you want your setup to work,
you need to specify a source address.  So, you want to change your modprobe line:
  modprobe netconsole netconsole=@/eth1,6667.16.102/
to include the source IP address after the @ and before the /eth1.  If you don't
know your IP address, then this sounds like quite an unusual configuration.

If you need more help with this, please contact me via email.  I'd be happy to help.

Thanks.