Bug 179545 - lockd kernel panic in nlmclnt_mark_reclaim
lockd kernel panic in nlmclnt_mark_reclaim
Status: CLOSED DUPLICATE of bug 176848
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel (Show other bugs)
4.0
i386 Linux
medium Severity high
: ---
: ---
Assigned To: Peter Staubach
David Lawrence
http://www.ii.uib.no/~hallstei/kernel...
: Reopened
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2006-02-01 05:13 EST by Hallstein
Modified: 2007-11-30 17:07 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2007-01-16 10:46:52 EST
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Hallstein 2006-02-01 05:13:16 EST
From Bugzilla Helper:
User-Agent: Mozilla/4.0 (compatible; MSIE 6.0; Windows NT 5.1; SV1; .NET CLR 1.1.4322)

Description of problem:
Most recent kernel where this bug did not occur: (unknown)
Distribution: CentOS release 4.2
Hardware Environment: Dell PowerEdge 2650
Software Environment: NFS/SMB Server (samba-3.0.10-1.4E.2)

Problem Description:

We've encountered Kernel Panic in lockd (nlmclnt_mark_reclaim) several times 
in the last months. This seem to be related to heavy usage on server. 

The server runs Samba export of NFS imported (and local) filesystems. 

A picture of the panic is here: 
http://www.ii.uib.no/~hallstei/kernelpanic_screen.jpg

After this panic, the computer stops responding, and we are forced to reboot 
the computer.

Version-Release number of selected component (if applicable):
2.6.9-22.0.1.ELsmp, samba-3.0.10-1.4E.2

How reproducible:
Didn't try

Steps to Reproduce:
Seem related to high load, but otherwise difficult to reproduce.

Actual Results:  Rare kernel panics

Additional info:

Hi. I know this is CentOS and your probably will not fix this "at once", since we are not a Red Hat customer. However I think this might be related to issue #176848 (https://bugzilla.redhat.com/bugzilla/show_bug.cgi?id=176848) and I therefore hoped this would help your existing Red Hat Server stability.
Comment 1 Jay Fenlason 2006-02-01 09:31:09 EST
This belongs in the CentOS bug tracking database, not ours. 
Comment 2 Nate Straz 2006-09-28 18:25:43 EDT
I hit this panic while running NFS relocation tests w/ GFS and rgmanager.

I was able to reproduce the bug a few times.

Unable to handle kernel NULL pointer dereference at virtual address 0000000c
 printing eip:
222761d8
*pde = 00004001
Oops: 0000 [#1]
SMP
Modules linked in: nfs lockd nfs_acl md5 ipv6 parport_pc lp parport autofs4
i2c_dev i2c_core sunrpc button battery ac uhci_hcd ehci_hcd e1000 floppy
dm_snapshot dm_zero dm_mirror ext3 jbd dm_mod
CPU:    0
EIP:    0060:[<222761d8>]    Not tainted VLI
EFLAGS: 00010246   (2.6.9-42.ELhugemem)
EIP is at nlmclnt_mark_reclaim+0x35/0x4d [lockd]
eax: 00000000   ebx: 03688380   ecx: 0366572c   edx: 03665730
esi: f5503a81   edi: 216c6400   ebp: 00000004   esp: 1cf6ff58
ds: 007b   es: 007b   ss: 0068
Process lockd (pid: 3270, threadinfo=1cf6f000 task=1f9130b0)
Stack: 03688380 2227627c 03688380 216c6800 2227dcc9 b5030002 d2590f0a 00000000
       00000000 216c6800 2227ce3c 22284920 07f81014 22277c57 216c6864 00000004
       22284920 216c7204 216c6800 22162603 000186b5 07f81008 07f81014 216c6864
Call Trace:
 [<2227627c>] nlmclnt_recovery+0x8c/0x118 [lockd]
 [<2227dcc9>] nlm4svc_proc_sm_notify+0xd2/0x100 [lockd]
 [<2227ce3c>] nlm4svc_decode_reboot+0x0/0x74 [lockd]
 [<22277c57>] nlmsvc_dispatch+0x7c/0x122 [lockd]
 [<22162603>] svc_process+0x432/0x6e3 [sunrpc]
 [<22277e80>] lockd+0x183/0x270 [lockd]
 [<22277cfd>] lockd+0x0/0x270 [lockd]
 [<021041f5>] kernel_thread_helper+0x5/0xb
Code: <3>Debug: sleeping function called from invalid context at
include/linux/rwsem.h:43
in_atomic():0[expected: 0], irqs_disabled():1
 [<02120209>] __might_sleep+0x7d/0x88
 [<0215537c>] rw_vm+0xe4/0x29c
 [<222761ad>] nlmclnt_mark_reclaim+0xa/0x4d [lockd]
 [<222761ad>] nlmclnt_mark_reclaim+0xa/0x4d [lockd]
 [<021557f3>] get_user_size+0x30/0x57
 [<222761ad>] nlmclnt_mark_reclaim+0xa/0x4d [lockd]
 [<021061bb>] show_registers+0x115/0x16c
 [<02106352>] die+0xdb/0x16b
 [<02122a14>] vprintk+0x136/0x14a
 [<0211b236>] do_page_fault+0x421/0x5f7
 [<222761d8>] nlmclnt_mark_reclaim+0x35/0x4d [lockd]
 [<022ca5d4>] schedule+0x838/0x8d6
 [<022ca65f>] schedule+0x8c3/0x8d6
 [<022cacbd>] __cond_resched+0x14/0x39
 [<2227a147>] nlm_traverse_files+0x20/0x137 [lockd]
 [<0211ae15>] do_page_fault+0x0/0x5f7
 [<222761d8>] nlmclnt_mark_reclaim+0x35/0x4d [lockd]
 [<2227627c>] nlmclnt_recovery+0x8c/0x118 [lockd]
 [<2227dcc9>] nlm4svc_proc_sm_notify+0xd2/0x100 [lockd]
 [<2227ce3c>] nlm4svc_decode_reboot+0x0/0x74 [lockd]
 [<22277c57>] nlmsvc_dispatch+0x7c/0x122 [lockd]
 [<22162603>] svc_process+0x432/0x6e3 [sunrpc]
 [<22277e80>] lockd+0x183/0x270 [lockd]
 [<22277cfd>] lockd+0x0/0x270 [lockd]
 [<021041f5>] kernel_thread_helper+0x5/0xb
 Bad EIP value.
Comment 3 Nate Straz 2006-10-02 14:39:22 EDT
I've hit this at least twice now while testing RHEL4 U4 errata, without Samba in
the picture.  I'm going to reassign this to nfs-utils to try to get some more
exposure, although it should probably be something more like nfs-kernel
Comment 5 Jeff Layton 2007-01-16 10:46:52 EST
I believe this is likely the same problem as bz 176848.


*** This bug has been marked as a duplicate of 176848 ***

Note You need to log in before you can comment on or make changes to this bug.