Bug 158618 - kernel panic when mounting filesystem
Summary: kernel panic when mounting filesystem
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat Enterprise Linux 3
Classification: Red Hat
Component: autofs
Version: 3.0
Hardware: i386
OS: Linux
medium
high
Target Milestone: ---
Assignee: Jeff Moyer
QA Contact: Brock Organ
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2005-05-24 05:34 UTC by David L. Crow
Modified: 2007-11-30 22:07 UTC (History)
1 user (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2005-05-25 17:11:25 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Full kernel dump via netdump and syslog (7.54 KB, text/plain)
2005-05-24 05:36 UTC, David L. Crow
no flags Details

Description David L. Crow 2005-05-24 05:34:22 UTC
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7.8) Gecko/20050511 Firefox/1.0.4

Description of problem:
My system was upgraded to RHEL3 Update 5 over the weekend by up2date.  I
installed the new kernel (2.4.21-32.ELsmp) and re-booted to activate it and
the kernel would panic when the automounter mounted a filesystem in the /home
map.

/etc/auto.master contains:

   /misc   /etc/auto.misc
   /net    /etc/auto.net
   /home   /etc/auto.home

/etc/auto.home is an executable file that contains

  #!/bin/sh
  key="$1"
  /usr/bin/ypmatch -k "$key" auto.home | sed "s,\&$,$key,"

(this is to workaround the lack of support for the '&' token)

I downgraded to autofs-4.1.3-47 and the problem went away.

I enabled netdump to capture the kernel panic via remote syslog and the first
few lines look like (I'll create an attachment with the entire message):

  Unable to handle kernel NULL pointer dereference at virtual address 00000040
   printing eip:
  c011ff4d
  *pde = 2810f001
  *pte = 420d0067
  Oops: 0000
  autofs4 netconsole nfsd lockd sunrpc usbserial lp parport tg3 floppy sg microcode loop keybdev mousedev hid input usb-ohci usbcore ext3 jbd lvm-mod aacraid sd
  CPU:    2
  EIP:    0060:[<c011ff4d>]    Not tainted
  EFLAGS: 00010086

  EIP is at do_page_fault [kernel] 0x2d (2.4.21-32.ELsmp/i686)
  eax: 00000000   ebx: e80aa000   ecx: 00000040   edx: e80aa1a8
  esi: e80aa000   edi: c011ff20   ebp: c6f1abc0   esp: e80aa0dc
  ds: 0068   es: 0068   ss: 0068
  Process automount (pid: 3947, stackpage=e80a9000)
  Stack: 00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000040
         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000
         00000000 00000000 00000000 00000000 00000000 00000000 00000000 00000000

This is the message when I disabled page space (swapoff -a).  When I a
pagespace was enabled, the primary message was

  Unable to handle kernel paging request at virtual address 92ad6380

I can get the entire dump if required.

Version-Release number of selected component (if applicable):
autofs-4.1.3-130

How reproducible:
Always

Steps to Reproduce:
1. ls /home/foo (where foo is an entry in the auto.home map)


Actual Results:  kernel panic (see description above)

Expected Results:  a directory listing of /home/foo

Additional info:

Comment 1 David L. Crow 2005-05-24 05:36:34 UTC
Created attachment 114763 [details]
Full kernel dump via netdump and syslog

Comment 2 Jeff Moyer 2005-05-24 14:48:34 UTC
First, autofs does in fact support the & token.  So you could have an entry like:

*    server:/export/&

Can you please attach the vmcore file created by netdump?

Thanks.

Comment 3 David L. Crow 2005-05-25 16:45:49 UTC
Thanks for the clarification on the & token.

I tried to create a vmcore file with netdump, but was not able to.

On the failing machine (the netdump client), I see the following in syslog output:

  May 25 00:40:45 host1 kernel: netlog: network logging started up successfully!
  May 25 00:40:45 host1 netdump: initializing netdump succeeded

My ethernet device is a tg3 which isn't listed as supported in the white paper
at <http://www.redhat.com/support/wpapers/redhat/netdump/>, but that white paper
indicates that netdump will complain if it finds an unsupported adapter and it
didn't.

On the netdump server machine, the dump information is shown and then

  May 24 22:04:37 host1 CPU#0 is executing netdump.
  May 24 22:04:37 host1 CPU#2 is frozen.
  May 24 22:04:37 host1 < netdump activated - performing handshake with the
server. >
  May 24 22:05:25 host2 netdump[2688]: Got too many timeouts in handshaking,
ignoring client 0x....a00e
  May 24 22:05:28 host2 netdump[2688]: Got too many timeouts waiting for
SHOW_STATUS for client 0x....a00e, rebooting it

Any suggestions as to what might be the problem would be appreciated.

Comment 4 Jeff Moyer 2005-05-25 16:49:36 UTC
I've managed to reproduce the problem and get a netdump.

The issue you are running into is a stack overflow, and as such, netdump isn't
quite reliable in running afterwards.

I am currently debugging the problem, and will keep this bug updated with status.

Thanks.

Comment 5 Jeff Moyer 2005-05-25 16:57:34 UTC
Could you please attach the output from your script when passed a valid home
directory?  It actually looks like you are triggering a recursive bind mount,
which will definitely cause problems!

Comment 6 David L. Crow 2005-05-25 17:04:39 UTC
userid@host1:/home/userid> sh /etc/auto.home userid
userid server.central:/export/home11/userid

As an FYI, I changed the home configuration in auto.master to

  yp:auto.home

and still saw the problem.


Comment 7 Jeff Moyer 2005-05-25 17:11:25 UTC
> userid server.central:/export/home11/userid

That line is wrong.  Check the output from the auto.net script.  i.e.:

# sh /etc/auto.net somehost
-fstype=nfs,hard,intr,nodev,nosuid \
        /vol/vol1 somehost:/vol/vol1

Notice that they key is not repeated in the output!  That is because the daemon
already knows what the key is, it just wants the rest of the entry.

I'm guessing that this worked for you before by accident.  =)  So, for you it is
bad that we "fixed" the broken behaviour.

Anyway, what will end up happening now is that autofs thinks that userid is the
host from which to mount.  When it determines that it isn't a host, it will fall
back to using it as a local directory from which to bind mount.  So, you end up
with an equivalent command of:

mount --bind /home/userid /home/userid

Which is really bad.

As mentioned above, you can simply use the wildcard matching features of
automount to achieve your goal.  I am closing this as NOTABUG, since the kernel
hang can only be triggered by a broken configuration (and hence, only by root).

Thanks.


Note You need to log in before you can comment on or make changes to this bug.