Bug 438623

Summary:

autofs (2.6.24/SELinux) hangs on create_udp_client/tcp_client.

Product:

[Fedora] Fedora

Reporter:

Gilboa Davara <gilboad>

Component:

autofs

Assignee:

Ian Kent <ikent>

Status:

CLOSED WONTFIX

QA Contact:

Fedora Extras Quality Assurance <extras-qa>

Severity:

low

Docs Contact:

Priority:

low

Version:

CC:

ikent, jdogalt

Target Milestone:

---

Keywords:

Reopened

Target Release:

---

Hardware:

All

OS:

Linux

Whiteboard:

Fixed In Version:

Doc Type:

Bug Fix

Doc Text:

Story Points:

---

Clone Of:

Environment:

Last Closed:

2009-01-09 07:44:19 UTC

Type:

---

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
NFS wireshark recording	none
autofs debug log.	none
Process list (SysRq-t)	none
Patch to fix incorrect pthreads condition handling for mount requests.	none

Description Gilboa Davara 2008-03-23 07:59:25 UTC

Hello all,

Since the latest update push (the included a new SELinux/policy update) autofs
stopped working.
Mount attempts hang (instead of timing-out and/or aborting on error) and the
following entries are added to the log:

$ cat /var/log/messages | grep automount
Mar 23 09:27:41 gilboa-home-dev automount[3734]: create_udp_client:99: hostname
lookup failed: No such process
Mar 23 09:27:41 gilboa-home-dev automount[3734]: create_tcp_client:308: hostname
lookup failed: No such process
Mar 23 09:27:41 gilboa-home-dev automount[3734]: lookup_mount: exports lookup
failed for .directory
Mar 23 09:30:17 gilboa-home-dev automount[3717]: lookup_read_master:
lookup(nisplus): couldn't locate nis+ table auto.master
Mar 23 09:34:31 gilboa-home-dev automount[3717]: create_udp_client:99: hostname
lookup failed: No such process
Mar 23 09:34:31 gilboa-home-dev automount[3717]: create_tcp_client:308: hostname
lookup failed: No such process
Mar 23 09:34:31 gilboa-home-dev automount[3717]: lookup_mount: exports lookup
failed for .directory
Mar 23 09:34:31 gilboa-home-dev automount[3717]: lookup_mount: lookup(file): key
".directory" not found in map
Mar 23 09:34:31 gilboa-home-dev automount[3717]: lookup_mount: lookup(file): key
".directory" not found in map
Mar 23 09:34:49 gilboa-home-dev automount[3717]: create_udp_client:99: hostname
lookup failed: No such process
Mar 23 09:34:49 gilboa-home-dev automount[3717]: create_tcp_client:308: hostname
lookup failed: No such process
Mar 23 09:34:49 gilboa-home-dev automount[3717]: lookup_mount: exports lookup
failed for .directory
Mar 23 09:34:49 gilboa-home-dev automount[3717]: lookup_mount: lookup(file): key
".directory" not found in map
Mar 23 09:34:49 gilboa-home-dev automount[3717]: lookup_mount: lookup(file): key
".directory" not found in map

As far as I could see, there were no SELinux related messages. (beyond xauth -
but this another issue altogether)

Versions:
$ rpm -qa | egrep -e 'autofs|selinux|kernel' | sort
autofs-5.0.2-27.x86_64
kernel-2.6.24.3-12.fc8.x86_64
kernel-2.6.24.3-34.fc8.x86_64
kernel-devel-2.6.24.3-12.fc8.x86_64
kernel-devel-2.6.24.3-34.fc8.x86_64
kernel-doc-2.6.24.3-34.fc8.noarch
kernel-headers-2.6.24.3-34.fc8.x86_64
libselinux-2.0.43-1.fc8.i386
libselinux-2.0.43-1.fc8.x86_64
libselinux-devel-2.0.43-1.fc8.i386
libselinux-devel-2.0.43-1.fc8.x86_64
libselinux-python-2.0.43-1.fc8.x86_64
selinux-policy-3.0.8-93.fc8.noarch
selinux-policy-devel-3.0.8-93.fc8.noarch
selinux-policy-targeted-3.0.8-93.fc8.noarch

Comment 1 Gilboa Davara 2008-03-23 08:01:18 UTC

$ uname -a
Linux gilboa-home-dev 2.6.24.3-34.fc8 #1 SMP Wed Mar 12 16:51:49 EDT 2008 x86_64
x86_64 x86_64 GNU/Linux

Comment 2 Jeff Moyer 2008-03-25 14:41:45 UTC

Mar 23 09:27:41 gilboa-home-dev automount[3734]: create_udp_client:99: hostname
lookup failed: No such process

"No such process" corresponds with errno 3.  However, that's not what is
returned, here.  This is the real error:

#define NO_RECOVERY     3       /* Non recoverable errors, FORMERR, REFUSED,
                                   NOTIMP.  */

So, it could be that the connection was refused.  If you believe this to be
SELinux related, could you run in permissive mode and try again?  (just run
setenforce 0, then your test, then switch it back on using setenforce 1)

Comment 3 Gilboa Davara 2008-03-25 16:27:54 UTC

Two things.
First, if I disable autofs and mount the NFS exports by hand, everything seems
to work just fine.
Second, if the NFS client is indeed getting ECONNREFUSED, shouldn't it:
  A. Gracefully abort. (instead of handing)
  B. Display a meaningful error message.
  C. Timeout after a while?

Other then that, I'll disable SELinux and try again.
If it doesn't work I'll use wireshark to snoop on the NFS traffic between the
two; if indeed the server is acting up, I should see it in the traffic log.

P.S. The server (gilboa-home-srv) is the hosts file and as such, the upstream
(ISP) DNS server cannot resolve the name. Isn't FORMERR a DNS related error?

- Gilboa

Comment 4 Jeff Moyer 2008-03-25 17:23:35 UTC

(In reply to comment #3)
> Two things.
> First, if I disable autofs and mount the NFS exports by hand, everything seems
> to work just fine.

OK, good to know.

> Second, if the NFS client is indeed getting ECONNREFUSED, shouldn't it:
>   A. Gracefully abort. (instead of handing)
>   B. Display a meaningful error message.
>   C. Timeout after a while?

We're not talking about the NFS client, here.  The error printed is from the
gethostbyname_r function call.  I'm not sure why automount is hanging.  If you
could provide a debug log, that would help in the diagnosis.  See
http://people.redhat.com/jmoyer for information on collecting debug information.

> Other then that, I'll disable SELinux and try again.
> If it doesn't work I'll use wireshark to snoop on the NFS traffic between the
> two; if indeed the server is acting up, I should see it in the traffic log.

Not NFS traffic, DNS traffic.

> P.S. The server (gilboa-home-srv) is the hosts file and as such, the upstream
> (ISP) DNS server cannot resolve the name. Isn't FORMERR a DNS related error?

Exactly.  I only surmised that the error was connrefused because of the SELinux
reference.

Comment 5 Gilboa Davara 2008-03-29 21:44:22 UTC

I had a number of other SELinux-related errors that forced me to .autorelabel
the file system. (Couldn't really restart the machine without it)
Currently (post relabel), I cannot seem to reproduce this problem.
I'd suggest you close this bug; I'll reopen it if the problem persists.

- Gilboa

Comment 6 Gilboa Davara 2008-04-05 08:34:05 UTC

Huge come-back.

Apr  5 11:25:29 gilboa-home-dev automount[8216]: create_udp_client:99: hostname
lookup failed: No su
ch process
Apr  5 11:25:29 gilboa-home-dev automount[8216]: create_tcp_client:308: hostname
lookup failed: No s
uch process
Apr  5 11:25:29 gilboa-home-dev automount[8216]: lookup_mount: exports lookup
failed for .directory
Apr  5 11:25:29 gilboa-home-dev automount[8216]: lookup_mount: lookup(file): key
".directory" not fo
und in map
Apr  5 11:25:29 gilboa-home-dev automount[8216]: lookup_mount: lookup(file): key
".directory" not fo
und in map

Trying again with selinux disabled.

- Gilboa

Comment 7 Gilboa Davara 2008-04-05 16:15:31 UTC

Cannot seem to reproduce it with selinux disabled.
However, I can't seem to find anything in the selinux log trail about autofs.
This -is- weird.

- Gilboa

Comment 8 Gilboa Davara 2008-04-06 07:09:34 UTC

No go reproduced with selinux disabled - took me around 20 hours to get there.
(instead of 20m...)
I should stress again that if I mount the directories manually - everything
works just fine.

- Gilboa

Comment 9 Gilboa Davara 2008-04-06 07:54:17 UTC

s/No go reproduced with selinux disabled/No go[.] Reproduced with selinux
disabled/g (Read: managed to reproduce it w/ selinux disabled)
Such a small punctuation mark... :)

Comment 10 Jeff Moyer 2008-04-07 14:05:45 UTC

(In reply to comment #9)
> s/No go reproduced with selinux disabled/No go[.] Reproduced with selinux
> disabled/g (Read: managed to reproduce it w/ selinux disabled)
> Such a small punctuation mark... :)

OK, reenable SELinux and collect a debug log for us.  It's not yet clear to me
where things are hanging, so we'll need to look at that log.

Thanks!

Comment 11 Ian Kent 2008-04-07 14:13:37 UTC

Can you also post your master map and any maps used by
it please.

Comment 12 Gilboa Davara 2008-04-07 14:34:00 UTC

My autofs is static.

$ cat auto.master
#
# Sample auto.master file
# This is an automounter map and it has the following format
# key [ -mount-options-separated-by-comma ] location
# For details of the format look at autofs(5).
#
/misc   /etc/auto.misc
/net    -hosts
/autofs /etc/auto.autofs --timeout=60
#
# Include central master map if it can be found using
# nsswitch sources.
#
# Note that if there are entries for /net or /misc (as
# above) in the included master map any keys that are the
# same will not be seen as the first read key seen takes
# precedence.
#
+auto.master

$ cat auto.autofs
Backup          -fstype=nfs     gilboa-home-srv:/mnt/Backup
Install         -fstype=nfs     gilboa-home-srv:/mnt/Install
Multimedia      -fstype=nfs     gilboa-home-srv:/mnt/Multimedia
vmware          -fstype=nfs     gilboa-home-srv:/usr/drives/vmware
Games           -fstype=nfs     gilboa-home-srv:/usr/games
#Work           -fstype=nfs     gilboa-home-dev:/mnt/Work

$ cat /etc/hosts | egrep -e 'gilboa-home-dev|gilboa-home-srv'
192.168.1.2 gilboa-home-dev.localdomain gilboa-home-dev
2000::2002 gilboa-home-dev-6.localdomain gilboa-home-dev-6
192.168.1.3 gilboa-home-srv.localdomain gilboa-home-srv
2000::2003 gilboa-home-srv-6.localdomain gilboa-home-srv-6

- Gilboa

Comment 13 Gilboa Davara 2008-04-07 14:36:45 UTC

gilboa-home-srv is a CentOS5/x86_64 machine and being used as a NFS server.
gilboa-home-dev (problematic machine) is a F8/x86_64 machine and being uses a
development workstation.
Both have static IPv4 addresses.

- Gilboa

Comment 14 Gilboa Davara 2008-04-10 07:10:56 UTC

Created attachment 301947 [details]
NFS wireshark recording

The only thing weird in the wireshark recording is malformed NFS procedure
calls.
(Which in-turn might be a problem with wireshark analysis and not a real
malformed packet)

Comment 15 Jeff Moyer 2008-04-10 18:23:33 UTC

Still waiting for an autofs debug log.  See http://people.redhat.com/jmoyer.

Comment 16 Gilboa Davara 2008-04-13 06:08:22 UTC

Created attachment 302253 [details]
autofs debug log.

Log attached

Comment 17 Jeff Moyer 2008-04-16 20:47:12 UTC

Strange, the debug log skips some of the tokens (3, 5, and 6).  I wonder what
that's all about.

handle_packet_missing_indirect: token 4, name .directory, request pid 5878
attempting to mount entry /autofs/.directory
lookup_mount: lookup(file): looking up .directory
lookup_mount: lookup(file): key ".directory" not found in map
send_fail: token = 4

handle_packet: type = 3
handle_packet_missing_indirect: token 7, name .directory, request pid 5878
attempting to mount entry /autofs/.directory
lookup_mount: lookup(file): looking up .directory
lookup_mount: lookup(file): key ".directory" not found in map
send_fail: token = 7

Is pid 5878 the one that was hanging?  How did you determine that automount were
hanging initially?  Did you use a shell and try to cd or ls or stat, or were you
just observing that applications were hung?

I think the most informative bit of information you could provide at this point
would be a gdb back trace of all threads in the automounter.  Would you mind
installing the autofs debuginfo package and running gdb on the automounter? 
We'll need you to attach to the pid of automount, and get the output from
"thread apply all bt" or "thr a a bt" for short.  sysrq-t output would also be
helpful.

Basically, according to the logs it looks like the automounter is continuing to
service requests.  Which mounts are hung?  All of them?

Comment 18 Jeff Moyer 2008-04-16 20:56:45 UTC

You can grab the debuginfo package from:
  http://koji.fedoraproject.org/koji/buildinfo?buildID=36158

assuming you're still using 5.0.2-27.

thanks!

Comment 19 Gilboa Davara 2008-04-20 16:34:19 UTC

Autofs does seem to service requests, while previous requests stay zombie.
AFAIK, all mounts, at one time or another, hang - there doesn't seem to be logic
behind it.

Beyond that, I've downloaded the debugingo (debuginfo-install autofs.x86_64) -
and I'm now running autofs under gdb. Hopefully I'll find something useful.

- Gilboa

Comment 20 Gilboa Davara 2008-04-20 17:06:06 UTC

Managed to reproduce it. both ls and konq hang while trying to access
-different- mounts.
GDB is active - but Ctrl-C inactive, and sending SIGTERM/SIGSEGV from a remote
terminal doesn't seem help. :(
I can only assume that automount is sleeping somewhere in the kernel.
Should I enable magic SysReq and get a kernel callstack dump over serial?

$ gdb /usr/sbin/automount
GNU gdb Red Hat Linux (6.6-45.fc8rh)
Copyright (C) 2006 Free Software Foundation, Inc.
GDB is free software, covered by the GNU General Public License, and you are
welcome to change it and/or distribute copies of it under certain conditions.
Type "show copying" to see the conditions.
There is absolutely no warranty for GDB.  Type "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu"...
Using host libthread_db library "/lib64/libthread_db.so.1".
(gdb) r -d -f
Starting program: /usr/sbin/automount -d -f
[Thread debugging using libthread_db enabled]
Starting automounter version 5.0.2-27, master map auto.master
using kernel protocol version 5.00
[New Thread 46912510630656 (LWP 11390)]
[New Thread 1073809744 (LWP 11393)]
[New Thread 1073879376 (LWP 11394)]
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
[Detaching after fork from child process 11395. (Try `set detach-on-fork off'.)]
[Detaching after fork from child process 11396.]
lookup_read_master: lookup(file): read entry /misc
lookup_read_master: lookup(file): read entry /net
lookup_read_master: lookup(file): read entry /autofs
lookup_read_master: lookup(file): read entry +auto.master
lookup_nss_read_master: reading master files auto.master
parse_init: parse(sun): init gathered global options: (null)
lookup_nss_read_master: reading master nisplus auto.master
parse_init: parse(sun): init gathered global options: (null)
lookup_read_master: lookup(nisplus): couldn't locate nis+ table auto.master
lookup(file): failed to read included master map auto.master
master_do_mount: mounting /misc
[New Thread 1074932048 (LWP 11397)]
automount_path_to_fifo: fifo name /var/run/autofs.fifo-misc
lookup_nss_read_map: reading map file /etc/auto.misc
parse_init: parse(sun): init gathered global options: (null)
[Detaching after fork from child process 11398.]
[Detaching after fork from child process 11399.]
mounted indirect mount on /misc with timeout 300, freq 75 seconds
master_do_mount: mounting /net
[New Thread 1075984720 (LWP 11400)]
automount_path_to_fifo: fifo name /var/run/autofs.fifo-net
lookup_nss_read_map: reading map hosts (null)
parse_init: parse(sun): init gathered global options: (null)
mounted indirect mount on /net with timeout 300, freq 75 seconds
master_do_mount: mounting /autofs
[New Thread 1077037392 (LWP 11401)]
automount_path_to_fifo: fifo name /var/run/autofs.fifo-autofs
lookup_nss_read_map: reading map file /etc/auto.autofs
parse_init: parse(sun): init gathered global options: (null)
[Detaching after fork from child process 11402.]
>> umount: /autofs/Multimedia: not mounted
[Detaching after fork from child process 11405.]
>> umount: /autofs/Install: not mounted
[Detaching after fork from child process 11408.]
>> umount: /autofs/Backup: not mounted
mounted indirect mount on /autofs with timeout 60, freq 15 seconds
handle_packet: type = 3
handle_packet_missing_indirect: token 5, name Multimedia, request pid 11458
[New Thread 1078090064 (LWP 11459)]
attempting to mount entry /autofs/Multimedia
lookup_mount: lookup(file): looking up Multimedia
lookup_mount: lookup(file): Multimedia -> -fstype=nfs  
gilboa-home-srv:/mnt/Multimedia
parse_mount: parse(sun): expanded entry: -fstype=nfs   
gilboa-home-srv:/mnt/Multimedia
parse_mount: parse(sun): gathered options: fstype=nfs
parse_mount: parse(sun): dequote("gilboa-home-srv:/mnt/Multimedia") ->
gilboa-home-srv:/mnt/Multimedia
parse_mount: parse(sun): core of entry: options=fstype=nfs,
loc=gilboa-home-srv:/mnt/Multimedia
sun_mount: parse(sun): mounting root /autofs, mountpoint Multimedia, what
gilboa-home-srv:/mnt/Multimedia, fstype nfs, options
mount_mount: mount(nfs): root=/autofs name=Multimedia
what=gilboa-home-srv:/mnt/Multimedia, fstype=nfs, options=
mount_mount: mount(nfs): nfs options="", nosymlink=0, ro=0
mount_mount: mount(nfs): calling mkdir_path /autofs/Multimedia
mount_mount: mount(nfs): calling mount -t nfs gilboa-home-srv:/mnt/Multimedia
/autofs/Multimedia
[Detaching after fork from child process 11460.]
mount(nfs): mounted gilboa-home-srv:/mnt/Multimedia on /autofs/Multimedia
send_ready: token = 5
mounted /autofs/Multimedia
handle_packet: type = 3
handle_packet_missing_indirect: token 6, name Install, request pid 11458
[Thread 1078090064 (LWP 11459) exited]
[New Thread 1079142736 (LWP 11462)]
attempting to mount entry /autofs/Install
lookup_mount: lookup(file): looking up Install
lookup_mount: lookup(file): Install -> -fstype=nfs      gilboa-home-srv:/mnt/Install
parse_mount: parse(sun): expanded entry: -fstype=nfs    gilboa-home-srv:/mnt/Install
parse_mount: parse(sun): gathered options: fstype=nfs
parse_mount: parse(sun): dequote("gilboa-home-srv:/mnt/Install") ->
gilboa-home-srv:/mnt/Install
parse_mount: parse(sun): core of entry: options=fstype=nfs,
loc=gilboa-home-srv:/mnt/Install
sun_mount: parse(sun): mounting root /autofs, mountpoint Install, what
gilboa-home-srv:/mnt/Install, fstype nfs, options
mount_mount: mount(nfs): root=/autofs name=Install
what=gilboa-home-srv:/mnt/Install, fstype=nfs, options=
mount_mount: mount(nfs): nfs options="", nosymlink=0, ro=0
mount_mount: mount(nfs): calling mkdir_path /autofs/Install
mount_mount: mount(nfs): calling mount -t nfs gilboa-home-srv:/mnt/Install
/autofs/Install
[Detaching after fork from child process 11463.]
mount(nfs): mounted gilboa-home-srv:/mnt/Install on /autofs/Install
send_ready: token = 6
mounted /autofs/Install
[Thread 1079142736 (LWP 11462) exited]
handle_packet: type = 3
handle_packet_missing_indirect: token 7, name Backup, request pid 11468
[New Thread 1079142736 (LWP 11469)]
attempting to mount entry /autofs/Backup
lookup_mount: lookup(file): looking up Backup
lookup_mount: lookup(file): Backup -> -fstype=nfs       gilboa-home-srv:/mnt/Backup
parse_mount: parse(sun): expanded entry: -fstype=nfs    gilboa-home-srv:/mnt/Backup
parse_mount: parse(sun): gathered options: fstype=nfs
parse_mount: parse(sun): dequote("gilboa-home-srv:/mnt/Backup") ->
gilboa-home-srv:/mnt/Backup
parse_mount: parse(sun): core of entry: options=fstype=nfs,
loc=gilboa-home-srv:/mnt/Backup
sun_mount: parse(sun): mounting root /autofs, mountpoint Backup, what
gilboa-home-srv:/mnt/Backup, fstype nfs, options
mount_mount: mount(nfs): root=/autofs name=Backup
what=gilboa-home-srv:/mnt/Backup, fstype=nfs, options=
mount_mount: mount(nfs): nfs options="", nosymlink=0, ro=0
mount_mount: mount(nfs): calling mkdir_path /autofs/Backup
mount_mount: mount(nfs): calling mount -t nfs gilboa-home-srv:/mnt/Backup
/autofs/Backup
[Detaching after fork from child process 11470.]
mount(nfs): mounted gilboa-home-srv:/mnt/Backup on /autofs/Backup
send_ready: token = 7
mounted /autofs/Backup
[Thread 1079142736 (LWP 11469) exited]
handle_packet: type = 3
handle_packet_missing_indirect: token 8, name .directory, request pid 11477
[New Thread 1079142736 (LWP 11480)]
attempting to mount entry /net/.directory
lookup_mount: lookup(hosts): fetchng export list for .directory
create_udp_client:99: hostname lookup failed: No such process
create_tcp_client:308: hostname lookup failed: No such process
lookup_mount: exports lookup failed for .directory
send_fail: token = 8
failed to mount /net/.directory
[Thread 1079142736 (LWP 11480) exited]
handle_packet: type = 3
handle_packet_missing_indirect: token 9, name .directory, request pid 11477
[New Thread 1078090064 (LWP 11481)]
attempting to mount entry /autofs/.directory
lookup_mount: lookup(file): looking up .directory
lookup_mount: lookup(file): key ".directory" not found in map
send_fail: token = 9
failed to mount /autofs/.directory
[Thread 1078090064 (LWP 11481) exited]
handle_packet: type = 3
handle_packet_missing_indirect: token 10, name .directory, request pid 11477
[New Thread 1078090064 (LWP 11482)]
attempting to mount entry /misc/.directory
lookup_mount: lookup(file): looking up .directory
lookup_mount: lookup(file): key ".directory" not found in map
send_fail: token = 10
failed to mount /misc/.directory
[Thread 1078090064 (LWP 11482) exited]
st_expire: state 1 path /autofs
[New Thread 1078090064 (LWP 11494)]
expire_proc: exp_proc = 1078090064 path /autofs
expire_proc_indirect: expire /autofs/Multimedia
expire_proc_indirect: expire /autofs/Install
expire_proc_indirect: expire /autofs/Backup
3 remaining in /autofs
expire_cleanup: got thid 1078090064 path /autofs stat 4
expire_cleanup: sigchld: exp 1078090064 finished, switching from 2 to 1
st_ready: st_ready(): state = 2 path /autofs
[Thread 1078090064 (LWP 11494) exited]
st_expire: state 1 path /autofs
[New Thread 1078090064 (LWP 11502)]
expire_proc: exp_proc = 1078090064 path /autofs
expire_proc_indirect: expire /autofs/Multimedia
expire_proc_indirect: expire /autofs/Install
expire_proc_indirect: expire /autofs/Backup
3 remaining in /autofs
expire_cleanup: got thid 1078090064 path /autofs stat 4
expire_cleanup: sigchld: exp 1078090064 finished, switching from 2 to 1
st_ready: st_ready(): state = 2 path /autofs
[Thread 1078090064 (LWP 11502) exited]
st_expire: state 1 path /autofs
[New Thread 1078090064 (LWP 11527)]
expire_proc: exp_proc = 1078090064 path /autofs
expire_proc_indirect: expire /autofs/Multimedia
expire_proc_indirect: expire /autofs/Install
expire_proc_indirect: expire /autofs/Backup
3 remaining in /autofs
expire_cleanup: got thid 1078090064 path /autofs stat 4
expire_cleanup: sigchld: exp 1078090064 finished, switching from 2 to 1
st_ready: st_ready(): state = 2 path /autofs
[Thread 1078090064 (LWP 11527) exited]
st_expire: state 1 path /autofs
[New Thread 1078090064 (LWP 11580)]
expire_proc: exp_proc = 1078090064 path /autofs
expire_proc_indirect: expire /autofs/Multimedia
handle_packet: type = 4
handle_packet_expire_indirect: token 11, name Backup

Comment 21 Ian Kent 2008-04-21 09:51:39 UTC

(In reply to comment #20)
> Managed to reproduce it. both ls and konq hang while trying to access
> -different- mounts.

What is this konq thing.

If KDE insists on creating a .directory in every directory it
finds then it's not going to very happy when it can't create
these within automount managed directories.

Ian

Comment 22 Ian Kent 2008-04-21 10:04:20 UTC

(In reply to comment #20)
> Managed to reproduce it. both ls and konq hang while trying to access
> -different- mounts.
> GDB is active - but Ctrl-C inactive, and sending SIGTERM/SIGSEGV from a remote
> terminal doesn't seem help. :(
> I can only assume that automount is sleeping somewhere in the kernel.
> Should I enable magic SysReq and get a kernel callstack dump over serial?

btw, this isn't the backtrace that we're looking for.

What we'd like to get is, when automount is running (and stuck),
use (,if you can):
gdb -p <automount pid> /usr/sbin/automount
gdb> thr a a bt

and post the output.

I suspect we will also need the sysrq-t log output as well.

Ian

Comment 23 Gilboa Davara 2008-04-22 06:36:12 UTC

Created attachment 303269 [details]
Process list (SysRq-t)

As suspected, automount is sleeping inside the kernel.
... And by sleeping, I mean "he is pining for the fjords" [1].

- Gilboa
[1] http://www.youtube.com/watch?v=e6Lq771TVm4

Comment 24 Ian Kent 2008-04-22 11:40:31 UTC

(In reply to comment #23)
> Created an attachment (id=303269) [edit]
> Process list (SysRq-t)
> 
> As suspected, automount is sleeping inside the kernel.
> ... And by sleeping, I mean "he is pining for the fjords" [1].
> 
> - Gilboa
> [1] http://www.youtube.com/watch?v=e6Lq771TVm4
> 

And the gdb backtrace will tell us if if there are threads
sleeping that shouldn't be.

Ian

Comment 25 Jeff Moyer 2008-04-22 13:42:01 UTC

(In reply to comment #24)
> And the gdb backtrace will tell us if if there are threads
> sleeping that shouldn't be.
> 
> Ian

I think Ian meant to put this back into needinfo.

Comment 26 Gilboa Davara 2008-04-23 05:21:03 UTC

I just had automount crash on me.
However, SELinux prevented the coredump file from being created...

Beyond that, 
Autmount seems to be dead (callstack attached) - but I'm not sure it's the
original dead-in-the-water-state (automount responses to Ctrl-C - which it
previously didn't).

(gdb) thread apply all bt

Thread 4 (Thread 1073809744 (LWP 15201)):
#0  0x00002aaaaaf068f9 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002aaaaaac719f in alarm_handler (arg=<value optimized out>) at alarm.c:184
#2  0x00002aaaaaf02407 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaaab5feb0d in clone () from /lib64/libc.so.6
Current language:  auto; currently asm

Thread 3 (Thread 1073879376 (LWP 15202)):
#0  0x00002aaaaaf06b7d in pthread_cond_timedwait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002aaaaaac163e in st_queue_handler (arg=<value optimized out>) at
state.c:928
#2  0x00002aaaaaf02407 in start_thread () from /lib64/libpthread.so.0
#3  0x00002aaaab5feb0d in clone () from /lib64/libc.so.6

Thread 2 (Thread 1077037392 (LWP 15209)):
#0  0x00002aaaaaf068f9 in pthread_cond_wait@@GLIBC_2.3.2 () from
/lib64/libpthread.so.0
#1  0x00002aaaaaab8777 in handle_packet_missing_indirect (ap=0x2aaaaad1a980,
pkt=0x0)
    at indirect.c:919
#2  0x00002aaaaaab442b in handle_packet (ap=0x2aaaaad1a980) at automount.c:1086
#3  0x00002aaaaaab60ba in handle_mounts (arg=0x2aaaaad1a980) at automount.c:1542
#4  0x00002aaaaaf02407 in start_thread () from /lib64/libpthread.so.0
#5  0x00002aaaab5feb0d in clone () from /lib64/libc.so.6

Thread 1 (Thread 46912510630656 (LWP 15200)):
#0  0x00002aaaaaf0a268 in do_sigwait () from /lib64/libpthread.so.0
#1  0x00002aaaaaf0a30d in sigwait () from /lib64/libpthread.so.0
#2  0x00002aaaaaab34cd in statemachine (arg=<value optimized out>) at
automount.c:1366
#3  0x00002aaaaaab547e in main (argc=0, argv=<value optimized out>) at
automount.c:2088
Current language:  auto; currently c
#0  0x00002aaaaaf0a268 in do_sigwait () from /lib64/libpthread.so.0

- Gilboa

Comment 27 Gilboa Davara 2008-04-23 05:22:01 UTC

P.S. konq equal konqueror - KDE's browser / file manager.

- Gilboa

Comment 28 Gilboa Davara 2008-04-23 05:37:14 UTC

Sigh.
Tried restarting automount - and it crashed (again)
No core file.

Can I run automount under debugger (with -f -d -v) - and prevent it from
stealing SIGINT signals without modifying its code?

- Gilboa

Comment 29 Gilboa Davara 2008-04-23 08:07:53 UTC

(sorry for the spam, just woke up; mind partially-coherent at the moment:
sigwait beats gdb no way around it [beyond attach])
I'll run automount under gdb (without a way to suspend it) - and try to
reproduce the crash.

- Gilboa

Comment 30 Ian Kent 2008-04-23 08:47:31 UTC

(In reply to comment #26)
> I just had automount crash on me.
> However, SELinux prevented the coredump file from being created...
> 
> Beyond that, 
> Autmount seems to be dead (callstack attached) - but I'm not sure it's the
> original dead-in-the-water-state (automount responses to Ctrl-C - which it
> previously didn't).
> 
> (gdb) thread apply all bt

This looks like one I know about.
I've started seeing this recently, after nearly 18 months
of this code working fine. Odd.

I'll post the patch.
Ian

Comment 31 Ian Kent 2008-04-23 08:50:29 UTC

Created attachment 303457 [details]
Patch to fix incorrect pthreads condition handling for mount requests.

Comment 32 Gilboa Davara 2008-04-23 12:10:10 UTC

Built against F8 or against -devel?

Comment 33 Ian Kent 2008-04-23 12:45:23 UTC

(In reply to comment #32)
> Built against F8 or against -devel?

The patch is already in devel and applies cleanly to the
F8 revision.

I've applied this and requested revision 29 be pushed to
testing. I'm not sure when it will be available so you
may wish to apply the patch in the meantime.

Ian

Comment 34 Gilboa Davara 2008-05-03 01:13:12 UTC

Patch applied. autofs seems to be working just fine.
Any plans to push a new autofs release?

- Gilboa

Comment 35 Ian Kent 2008-05-05 07:52:53 UTC

(In reply to comment #34)
> Patch applied. autofs seems to be working just fine.
> Any plans to push a new autofs release?

Yep, I had requested it be pushed to test, I think it
did get there.

I've now requested the update be marked stable so we
should see it soonish.

That still leaves us with the question of the poorly
handled error condition you were seeing. This area of
the code (the name lookup bits) will be updated some
time in the reasonably near future (for other reasons)
so I will see if I can improve on it then.

Ian

Comment 36 Gilboa Davara 2008-05-08 08:28:52 UTC

Thanks.
- Gilboa

Comment 37 Jane Dogalt 2008-05-19 22:01:28 UTC

I'm hitting this on F9.  It's actually slowing down my attempt to apply another
bugfix to lftp so that I can update my updates mirrors, so that I can update my
f9 system, and find out of this bug is already fixed in an f9 update.

Comment 38 Jane Dogalt 2008-05-19 22:43:54 UTC

fyi, I'm already running autofs-5.0.3-13.x86_64 which seems to already have the
patch in comment#31.  Please advise how I can help debug this.

Comment 39 Ian Kent 2008-05-22 08:07:11 UTC

(In reply to comment #38)
> fyi, I'm already running autofs-5.0.3-13.x86_64 which seems to already have the
> patch in comment#31.  Please advise how I can help debug this.
> 

We need to establish if this is the same issue that the patch
here is supposed to fix or something different.

Could you post a gdb trace as describe in comment #22.
That will require the appropriate debuginfo package be installed
as well.

Ian

Comment 40 Bug Zapper 2008-11-26 10:15:04 UTC

This message is a reminder that Fedora 8 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 8.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '8'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 8's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 8 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 41 Bug Zapper 2009-01-09 07:44:19 UTC

Fedora 8 changed to end-of-life (EOL) status on 2009-01-07. Fedora 8 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.