Bug 667770 - rpc.gssd locks up and hangs nfs mount when idle for long time (ticket expires?)
Summary: rpc.gssd locks up and hangs nfs mount when idle for long time (ticket expires?)
Keywords:
Status: CLOSED CANTFIX
Alias: None
Product: Fedora
Classification: Fedora
Component: nfs-utils
Version: 14
Hardware: Unspecified
OS: Unspecified
low
medium
Target Milestone: ---
Assignee: Steve Dickson
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2011-01-06 18:13 UTC by Orion Poplawski
Modified: 2016-03-14 22:29 UTC (History)
6 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2012-03-13 19:47:13 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Orion Poplawski 2011-01-06 18:13:52 UTC
Description of problem:

Just starting to test out nfsv4 with krb5.  I have the following mount:

saga:/ /mnt nfs4 rw,sec=krb5,addr=192.168.0.12,clientaddr=192.168.0.39 0 0

If I leave this for a while, access to the mount will hang.  No messages from rpc.gssd in /var/log/messages (I'm running with -vvv).  Do get:

Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d3
Jan  6 11:11:07 orca rpc.idmapd[15362]: Opened /var/lib/nfs/rpc_pipefs//nfs/clnt1d3/idmap
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d4
Jan  6 11:11:07 orca rpc.idmapd[15362]: Stale client: 1d4
Jan  6 11:11:07 orca rpc.idmapd[15362]: #011-> closed /var/lib/nfs/rpc_pipefs//nfs/clnt1d4/idmap
Jan  6 11:11:07 orca rpc.idmapd[15362]: Stale client: 1d3
Jan  6 11:11:07 orca rpc.idmapd[15362]: #011-> closed /var/lib/nfs/rpc_pipefs//nfs/clnt1d3/idmap
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d5
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d6
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d7

If I restart rpc.gssd, everything comes back.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.3-2.fc14.i686

How reproducible:
Very.

Comment 1 Orion Poplawski 2011-01-10 22:40:39 UTC
Back trace of hung process:

#0  0x00ae8416 in __kernel_vsyscall ()
#1  0x004be5d1 in __lll_lock_wait_private () from /lib/libc.so.6
#2  0x0044985c in _L_lock_12621 () from /lib/libc.so.6
#3  0x00447797 in malloc () from /lib/libc.so.6
#4  0x0043a398 in open_memstream () from /lib/libc.so.6
#5  0x004a9ae5 in __vsyslog_chk () from /lib/libc.so.6
#6  0x0017d15f in vsyslog (kind=512, 
    fmt=0x1806cc "dir_notify_handler: sig %d si %p data %p\n", args=0xbfc03938 "%")
    at /usr/include/bits/syslog.h:48
#7  xlog_backend (kind=512, fmt=0x1806cc "dir_notify_handler: sig %d si %p data %p\n", 
    args=0xbfc03938 "%") at xlog.c:150
#8  0x001777d4 in printerr (priority=2, 
    format=0x1806cc "dir_notify_handler: sig %d si %p data %p\n") at err_util.c:64
#9  0x00177c9e in dir_notify_handler (sig=37, si=0xbfc0396c, data=0xbfc039ec)
    at gssd_main_loop.c:66
#10 <signal handler called>
#11 0x00444984 in _int_malloc () from /lib/libc.so.6
#12 0x004477a0 in malloc () from /lib/libc.so.6
#13 0x0046db77 in __alloc_dir () from /lib/libc.so.6
#14 0x0046dc5a in opendir () from /lib/libc.so.6
#15 0x0046e7ef in scandir64@@GLIBC_2.2 () from /lib/libc.so.6
#16 0x00179285 in process_pipedir () at gssd_proc.c:565
#17 update_client_list () at gssd_proc.c:594
#18 0x00177f40 in gssd_run () at gssd_main_loop.c:216
#19 0x00177bf9 in main (argc=2, argv=0xbfc04134) at gssd.c:187

Comment 2 Orion Poplawski 2011-01-10 22:56:30 UTC
Looks like malloc is getting called from a signal handler called while in a malloc call, which is verboten.  Not sure what the best way around this, but it looks like dir_notify_handler cannot call printerr.  I suppose this only occurs when -vv or greater is given.

Comment 3 Matt Kinni 2011-05-29 03:42:58 UTC
What's even more hilarious is that when nfs hangs, your entire gnome session freezes.

Comment 4 Orion Poplawski 2012-03-13 20:01:04 UTC
I thought the solution was to drop the printerr call:

http://article.gmane.org/gmane.linux.nfs/45443

Comment 5 bcodding 2012-06-05 15:05:15 UTC
Why was this closed - cantfix?  We just ran into this one in RHEL6.

Comment 6 Ender 2016-03-14 22:29:10 UTC
For reference, this was fixed in 1.2.3-63 for RHEL 6:

* Mon May 18 2015 Steve Dickson <steved> 1.2.3-63
- Removed printerr from gssd (bz 949100)


Note You need to log in before you can comment on or make changes to this bug.