Bug 468939

Summary: Oops in __link_path_walk 2.6.18.92.N (and possibly later).
Product: Red Hat Enterprise Linux 5 Reporter: Wade Mealing <wmealing>
Component: kernelAssignee: Ian Kent <ikent>
Status: CLOSED CURRENTRELEASE QA Contact: Red Hat Kernel QE team <kernel-qe>
Severity: medium Docs Contact:
Priority: medium    
Version: 5.2CC: akarlsso, bmr, dzickus, ikent, james.leddy, jmoyer, jnansi, tao
Target Milestone: rc   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2009-07-02 16:27:14 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Wade Mealing 2008-10-29 00:32:41 UTC
Description of problem:

I don't have a lot on this right now, but creating the bug so that others will be able to find it.  I've got two oopses from two different machines, x86_64 and i686.

The kernel oopses  in __link_path_wait while attempting to walk a full path.  The specific dentry it oopses on in both of these cases is a mount point for a network filesystem.

At the time of panic, the system doesn't seem to have anything mounted at this point.

It seems as though the nameidata is attempting to reference a NULL pointer to an inode. 

# dentry -l nameidata.dentry ffff8102f8843df8
struct dentry {

 ....

 d_inode = 0x0,

 ...
   name = 0xe54e86cc "fs-mount-point-here"
 ...

Example oops:

PID: 28520  TASK: ffff810a5bd7d100  CPU: 2   COMMAND: "sh"
 #0 [ffff81044be1da00] crash_kexec at ffffffff800aac4a
 #1 [ffff81044be1dac0] __die at ffffffff800650ff
 #2 [ffff81044be1db00] do_page_fault at ffffffff80066af1
 #3 [ffff81044be1dbf0] error_exit at ffffffff8005dde9
    [exception RIP: __link_path_walk+84]
    RIP: ffffffff800095c6  RSP: ffff81044be1dca8  RFLAGS: 00010206
    RAX: ffff8102f8843df8  RBX: ffff8104b4e2d000  RCX: 0000000000000000
    RDX: 0000000000000088  RSI: ffff81044be1de48  RDI: ffff8104b4e2d000
    RBP: ffff81044be1de48   R8: ffff810a5bd7d100   R9: ffff810a5bd7d100
    R10: ffffffff80097652  R11: ffff81044be1df48  R12: 0000000000000000
    R13: ffff811022ed8b80  R14: ffff8104b4e2d000  R15: ffff8104b4e2d000
    ORIG_RAX: ffffffffffffffff  CS: 0010  SS: 0018
 #4 [ffff81044be1dd10] link_path_walk at ffffffff8000e782
 #5 [ffff81044be1ddd0] do_path_lookup at ffffffff8000c965
 #6 [ffff81044be1de10] __user_walk_fd at ffffffff80023700
 #7 [ffff81044be1de40] vfs_stat_fd at ffffffff80028617
 #8 [ffff81044be1def0] sys_newstat at ffffffff80023432
 #9 [ffff81044be1df80] tracesys at ffffffff8005d28d (via system_call)

Version-Release number of selected component (if applicable):

kernel 2.6.18-92.1.10

How reproducible:

Unknown

Additional info:

Vmcores are available.

Comment 1 Bryn M. Reeves 2008-11-03 14:42:06 UTC
Were either of the two systems using autofs to manage the affected mount point?

Comment 2 Wade Mealing 2008-11-03 23:05:39 UTC
Ah sorry Bryn,

Was waiting for customer to finish test before continuing, but yes.  It looks very similar to another issue that was in EL4 autofs.  I'm waiting for customer to respond to test kernel with autofs patches, then will update this bz.

Comment 3 Ian Kent 2008-11-21 01:32:31 UTC
(In reply to comment #2)
> Ah sorry Bryn,
> 
> Was waiting for customer to finish test before continuing, but yes.  It looks
> very similar to another issue that was in EL4 autofs.  I'm waiting for customer
> to respond to test kernel with autofs patches, then will update this bz.

Which kernel revision?

Comment 4 Ian Kent 2008-11-21 02:43:22 UTC
(In reply to comment #3)
> (In reply to comment #2)
> > Ah sorry Bryn,
> > 
> > Was waiting for customer to finish test before continuing, but yes.  It looks
> > very similar to another issue that was in EL4 autofs.  I'm waiting for customer
> > to respond to test kernel with autofs patches, then will update this bz.
> 
> Which kernel revision?

The kernel provided was revision 107 and the RHEL-5.3 autofs patch
series went into revision 106 so we'll have to wait and see how this
goes. I expect it will take a while since, if this is an example
of the autofs issue of this type included in corrections of the
series, then it happens only very occasionally.

Ian

Comment 7 Jatin Nansi 2008-12-01 08:37:47 UTC
Ian,
I spoke to Wade who told me that you have some patches that fix this bug. Are there any test packages available incorporating the patches? I would like the customers for the attached ITs (both mine) to test them out.

Comment 8 Ian Kent 2008-12-01 11:32:52 UTC
All the patches are in the current pre-release RHEL-5.3 kernel.
This is the best candidate for customer testing unless they have
a need to retain their current kernel, in which case I can make
a scratch build for the kernel the customer needs to use.

Ian

Comment 11 Ian Kent 2008-12-02 03:16:41 UTC
Looking at the IT I've noticed another potential problem.

The "exportfs" output doesn't tell us what export is the
root of the export tree. I presume /home1/shares has the
fsid=1 option and is the root of the export tree. But we
also can't tell if the subordinate mounts in the tree have
the "nohide" option. If that is the case then we know that
the NFS kernel client mounting can cause autofs revision
0.rc2.88 to become confused.

The issue has been addressed in a later revision of autofs
so please be aware of it.

Ian

Comment 12 RHEL Program Management 2009-02-16 15:27:43 UTC
Updating PM score.

Comment 14 Ian Kent 2009-07-02 16:27:14 UTC
This bug is addressed in RHEL-5.3 release kernel as mentioned
in comments #10 and #11.
Closing as CURRENTRELEASE.