Bug 175628 - kernel panic related to nfsd - general protection fault
Summary: kernel panic related to nfsd - general protection fault
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: kernel
Version: 4.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: ---
Assignee: Steve Dickson
QA Contact: Brian Brock
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-12-13 15:40 UTC by Rigoberto Corujo
Modified: 2012-06-20 15:59 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-06-20 15:59:24 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Rigoberto Corujo 2005-12-13 15:40:14 UTC
Description of problem:

Kernel: 2.6.9-11.4hp.XCsmp

This occured on a cluster comprised of 48 nodes.

Dec 06 13:56:32 mtrr: type mismatch for f6000000,800000 old: uncachable new: 
write-combining
Dec 10 09:30:01 general protection fault: 0000 [1] SMP 
Dec 10 09:35:27 CPU 0 
Dec 10 09:35:27 Modules linked in: md5(U) ipv6(U) llite(U) mdc(U) lov(U) osc
(U) ptlrpc(U) obdclass(U) lvfs(U) kvibnal(U) ksocknal(U)
 portals(U) libcfs(U) ip_vs_rr(U) ip_vs(U) parport_pc(U) lp(U) parport(U) 
supermon_sensors(U) supermon_proc(U) autofs4(U) i2c_dev(U)
 i2c_core(U) nfsd(U) exportfs(U) lockd(U) sunrpc(U) ds(U) yenta_socket(U) 
pcmcia_core(U) ipoib_ud(U) ats(U) devugsi(U) devucm(U) iba
t(U) cm(U) gsim(U) ad_tavor(U) vverbs(U) mlog(U) repository(U) hadump(U) 
mod_ib_mgt(U) mod_vapi(U) mod_vipkl(U) mod_thh(U) mod_hh(U)
 mod_vapi_common(U) mod_mpga(U) mosal(U) ipt_REJECT(U) ipt_state(U) 
ip_conntrack(U) iptable_filter(U) ip_tables(U) dm_mod(U) button(
U) battery(U) ac(U) ohci_hcd(U) hw_random(U) e1000(U) tg3(U) floppy(U) ext3(U) 
jbd(U) sata_nv(U) mptscsih(U) mptbase(U) ata_piix(U) 
libata(U) cciss(U) sd_mod(U) scsi_mod(U)
Dec 10 09:35:27 Pid: 3049, comm: nfsd Tainted: PF     2.6.9-11.4hp.XCsmp
Dec 10 09:35:27 RIP: 0010:[<ffffffff801cfb59>] <ffffffff801cfb59>{strcmp+0}
Dec 10 09:35:27 RSP: 0000:00000100ddffdde0  EFLAGS: 00010282
Dec 10 09:35:27 RAX: 00000000ffffff93 RBX: dead4ead00000001 RCX: 
0000000000000000
Dec 10 09:35:27 RDX: 0000000000000001 RSI: 00000100ddffde50 RDI: 
dead4ead00000029
Dec 10 09:35:27 RBP: 00000100ddffde28 R08: 0000006e66736404 R09: 
572666a0d6636404
Dec 10 09:35:27 R10: 00000100ddfc4028 R11: ffffffffa03556eb R12: 
0000000000000000
Dec 10 09:35:27 R13: 000001006bf7c580 R14: 0000000000000000 R15: 
ffffffffa0371690
Dec 10 09:35:27 FS:  0000002a9589fb00(0000) GS:ffffffff804b8200(0000) 
knlGS:00000000f61b8bb0
Dec 10 09:35:27 CS:  0010 DS: 0000 ES: 0000 CR0: 000000008005003b
Dec 10 09:35:27 CR2: 00000000005b4960 CR3: 0000000000101000 CR4: 
00000000000006e0
Dec 10 09:35:27 Process nfsd (pid: 3049, threadinfo 00000100ddffc000, task 
000001007feaa030)
Dec 10 09:35:27 Stack: ffffffffa035518f 0000000000000007 000001007fd2c400 
000001007fd2c400 
Dec 10 09:35:27        0000000000000001 0000000000000003 00000000000186a3 
ffffffffa03bc900 
Dec 10 09:35:27        ffffffffa0355738 000001007fd2c468 
Dec 10 09:35:27 Call Trace:<ffffffffa035518f>{:sunrpc:ip_map_lookup+276} 
<ffffffffa0355738>{:sunrpc:svcauth_unix_set_client+77} 
Dec 10 09:35:27        <ffffffff801437b3>{groups_alloc+64} <ffffffffa0355acd>
{:sunrpc:svcauth_unix_accept+409} 
Dec 10 09:35:27        <ffffffffa0354ad1>{:sunrpc:svc_set_client+58} 
<ffffffffa03520c3>{:sunrpc:svc_process+775} 
Dec 10 09:35:27        <ffffffff80131a32>{default_wake_function+0} 
<ffffffffa038d245>{:nfsd:nfsd+0} 
Dec 10 09:35:27        <ffffffffa038d47d>{:nfsd:nfsd+568} <ffffffff80110d4b>
{child_rip+8} 
Dec 10 09:35:27        <ffffffffa038d245>{:nfsd:nfsd+0} <ffffffffa038d245>
{:nfsd:nfsd+0} 
Dec 10 09:35:27        <ffffffff80110d43>{child_rip+0} 
Dec 10 09:35:27 
Dec 10 09:35:27 Code: 0f b6 17 89 d0 2a 06 48 ff c6 84 c0 75 07 48 ff c7 84 d2 
75 
Dec 10 09:35:27 Reconfiguring memory bank information....
Dec 10 09:35:27 This may take a while....
Dec 10 09:35:27 unexpected IRQ trap at vector d8
Dec 10 09:35:27 CPU #0 is dumping; frozen CPUs: #1 
Dec 10 09:35:27 Dumping to block device (104,6) on CPU 0 ...
Dec 10 09:35:27 ............../
.|
Dec 10 09:35:27  84170 dump pages saved of 4096 each in pass 0
Dec 10 09:35:27 
Dec 10 09:35:27  842544 dump pages skipped of 4096 each in pass 1
Dec 10 09:35:27 
Dec 10 09:35:27  21399 dump pages skipped of 4096 each in pass 2
Dec 10 09:35:27 
Dec 10 09:35:27  0 dump pages skipped of 4096 each in pass 3
Dec 10 09:35:27 

We see a call to âsvcauth_unix_set_clientâ, which then calls âip_map_lookupâ, 
which then calls the built-in âstrcmpâ.  I had to run âsvcauth_unix.câ through 
the âCâ preprocessor or otherwise one could spend years trying to 
find âip_map_lookupâ in the code and never find it.  It is created by 
the "DefineSimpleCacheLookup" macro in "obj/x86_64/kernel-2.6.9/linux-
2.6.9/include/linux/sunrpc/cache.h".

Anyway, in âip_map_lookupâ, just before the call to âip_map_matchâ, which is 
an âinlineâ routine (that may be why it doesnât show up in the stack trace), 
isnât the âread_lockâ supposed to be acquired before the âhead = â¦â line?

Say that we execute the âhead = â line and then some other kernel code trashes 
the address that âheadâ was pointing to before the âread_lockâ is acquired.  
Since âtmpâ is really a pointer to âheadâ and were doing a âstrcmpâ on âtmpâ, 
then wouldnât the possibility exist that âstrcmpâ would try to access a bad 
address pointed to by âtmpâ?

----------------------------------

FILE: obj/x86_64/kernel-2.6.9/linux-2.6.9/net/sunrpc/svcauth_unix.c

svcauth_unix_set_client(struct svc_rqst *rqstp)
{
        struct ip_map key, *ipm;

        rqstp->rq_client = NULL;

        if (rqstp->rq_proc == 0)
                return SVC_OK;

        strcpy(key.m_class, rqstp->rq_server->sv_program->pg_class);
        key.m_addr = rqstp->rq_addr.sin_addr;

        ipm = ip_map_lookup(&key, 0);
â¦
}

FILE: obj/x86_64/kernel-2.6.9/linux-2.6.9/net/sunrpc/svcauth_unix.i 
(Preprocessed file which has the "DefineSimpleCacheLookup" macro in "cache.h" 
expanded)

static struct ip_map *ip_map_lookup (struct ip_map *item, int set)
{
  struct ip_map *tmp, *new=((void *)0);

  struct cache_head **hp, **head; ;

  head = &(& ip_map_cache)->hash_table[ip_map_hash(item)];

  retry:

        if (set||new) _write_lock(&(& ip_map_cache)->hash_lock);
        else _read_lock(&(& ip_map_cache)->hash_lock);

        for(hp=head; *hp != ((void *)0); hp = &tmp->h.next)
        {
                tmp = ({ const typeof( ((struct ip_map *)0)->h ) *__mptr = 
(*hp);

                (struct ip_map *)( (char *)__mptr - ((size_t) &((struct ip_map 
*)0)->h) );});

                if (ip_map_match(item, tmp))
â¦
}

FILE: obj/x86_64/kernel-2.6.9/linux-2.6.9/net/sunrpc/svcauth_unix.i 
(Preprocessed File)

static inline int ip_map_match(struct ip_map *item, struct ip_map *tmp)
{
 return __builtin_strcmp(tmp->m_class, item->m_class) == 0
  && tmp->m_addr.s_addr == item->m_addr.s_addr;
}

---------------------------------------------------

We had someone analyze the crash dump and this is what they reported:

After spending about 3 hours looking at the source code, the log, and the 
crash dump, here's what we've come up with:

1) The crash is occurring in the code that looks up entries in the
   "ip_map" table that NFS uses to keep track of its clients.

2) All the "cache head" entries in the table are zero, except
   one. (ineffective hash function)

3) Apparently all the clients map to that same non-zero cache head.  

4) After finding the correct cache head, the code walks down the list
   of "ip map" entries, trying to match the current entry with the IP of
   the new request.

5) The cluster has 48 members.

6) In the 35th entry, the "next" pointer is pointing to a structure
   that is clearly NOT a valid "ip map" entry.  This structure happens,
   apparently by accident, to have a "next" field that is a
   "non-canonical" address (it is a "magic" value component of a
   spinlock).  So, when the loop tries to walk to this invalid "next"
   field, it trips an invalid address fault - this triggers the crash.

   So, either a valid "ip map" entry has been clobbered, or (more likely)
   the previous "next" value has been clobbered (or is stale).  In any
   case, it seems unlikely that we can make further progress without
   understanding what the NFS server on the cluster is
   supporting, and being able to monitor its data structures.

7) It seems reasonable to assume that each of the cluster
   members is an NFS client, although this needs to be checked.  If so,
   there ought to be 47 entries in the chain.  So, perhaps a complete
   chain has been clobbered somewhere in the middle?



Version-Release number of selected component (if applicable):


How reproducible:

Have not tried to reproduce.

Steps to Reproduce:
1.
2.
3.
  
Actual results:


Expected results:


Additional info:

Comment 1 Jason Baron 2005-12-16 14:59:30 UTC
Can you please describe the: 2.6.9-11.4hp.XCsmp kernel? that is not a Red Hat
supported kernel.

Comment 2 Rigoberto Corujo 2005-12-16 15:12:06 UTC
It is a Red Hat kernel with additional patches for Quadrics, Infiniband, etc.  
We rebuild the kernel with the additional patches, hence the "4hp" in the name 
to indicate that its been updated 4 times.

After looking through some of the code related to the crash, we think that this 
fix (see below) that went into the 2.6.12 kernel may be relevant.

Rigoberto

------------------------------------------------

http://kernel.org/git/?p=linux/kernel/git/torvalds/old-2.6-
bkcvs.git;a=commit;h=851dfd8298dbd31358ae1df9ef3c6ab1453141c7

[PATCH] nfsd: discard CACHE_HASHED flag, keeping information in refcount 
instead.
 
author neilb <neilb> 
 Sat, 5 Mar 2005 17:15:09 +0000 (17:15 +0000) 
committer neilb <neilb> 
 Sat, 5 Mar 2005 17:15:09 +0000 (17:15 +0000) 
commit 851dfd8298dbd31358ae1df9ef3c6ab1453141c7 
tree a5c009bd58b82019d5a57cb3b530f87cf77da001 tree 
parent 4881bd0daf953852e016a4dc91a4e2c9cebe1542 commit | commitdiff 

[PATCH] nfsd: discard CACHE_HASHED flag, keeping information in refcount 
instead.

This patch should fix a problem that has been experienced on at-least one
busy NFS server, but it has not had lots of testing yet.  If -mm could provide
that .....

The rpc auth cache currently differentiates between a reference due to
being in a hash chain (signalled by CACHE_HASHED flag) and any other
reference (counted in refcnt).

This is an artificial difference due to an historical accident, and it
makes cache_put unsafe.

This patch removes the distinction so now existance in a hash chain is
counted just like any other reference.  Thus a race window in cache_put is
closed.

Signed-off-by: Neil Brown <neilb.edu.au>
Signed-off-by: Andrew Morton <akpm>
Signed-off-by: Linus Torvalds <torvalds>
BKrev: 4229e91dksfEwyIcWvN9kqVoJyUxQg

include/linux/sunrpc/cache.h  blob | diff | history  
net/sunrpc/cache.c  blob | diff | history  
net/sunrpc/svcauth.c  blob | diff | history  


Comment 3 Jason Baron 2005-12-19 19:27:33 UTC
thanks for the pointer....Also we're planning infiniband support for U3. 

Comment 4 Steve Dickson 2006-07-18 16:19:54 UTC
The patch in Comment #2 has already commited so is not clear (at least to me) 
what needs to happen... 

Comment 6 Steve Dickson 2006-07-18 21:06:50 UTC
Please try the least RHEL3 U3 kernel to see if resolves this issue

Comment 8 Jiri Pallich 2012-06-20 15:59:24 UTC
Thank you for submitting this issue for consideration in Red Hat Enterprise Linux. The release for which you requested us to review is now End of Life. 
Please See https://access.redhat.com/support/policy/updates/errata/

If you would like Red Hat to re-consider your feature request for an active release, please re-open the request via appropriate support channels and provide additional supporting details about the importance of this issue.


Note You need to log in before you can comment on or make changes to this bug.