Bug 667770 - rpc.gssd locks up and hangs nfs mount when idle for long time (ticket expires?)
rpc.gssd locks up and hangs nfs mount when idle for long time (ticket expires?)
Status: CLOSED CANTFIX
Product: Fedora
Classification: Fedora
Component: nfs-utils (Show other bugs)
14
Unspecified Unspecified
low Severity medium
: ---
: ---
Assigned To: Steve Dickson
Fedora Extras Quality Assurance
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-01-06 13:13 EST by Orion Poplawski
Modified: 2016-03-14 18:29 EDT (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-03-13 15:47:13 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Orion Poplawski 2011-01-06 13:13:52 EST
Description of problem:

Just starting to test out nfsv4 with krb5.  I have the following mount:

saga:/ /mnt nfs4 rw,sec=krb5,addr=192.168.0.12,clientaddr=192.168.0.39 0 0

If I leave this for a while, access to the mount will hang.  No messages from rpc.gssd in /var/log/messages (I'm running with -vvv).  Do get:

Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d3
Jan  6 11:11:07 orca rpc.idmapd[15362]: Opened /var/lib/nfs/rpc_pipefs//nfs/clnt1d3/idmap
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d4
Jan  6 11:11:07 orca rpc.idmapd[15362]: Stale client: 1d4
Jan  6 11:11:07 orca rpc.idmapd[15362]: #011-> closed /var/lib/nfs/rpc_pipefs//nfs/clnt1d4/idmap
Jan  6 11:11:07 orca rpc.idmapd[15362]: Stale client: 1d3
Jan  6 11:11:07 orca rpc.idmapd[15362]: #011-> closed /var/lib/nfs/rpc_pipefs//nfs/clnt1d3/idmap
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d5
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d6
Jan  6 11:11:07 orca rpc.idmapd[15362]: New client: 1d7

If I restart rpc.gssd, everything comes back.

Version-Release number of selected component (if applicable):
nfs-utils-1.2.3-2.fc14.i686

How reproducible:
Very.
Comment 1 Orion Poplawski 2011-01-10 17:40:39 EST
Back trace of hung process:

#0  0x00ae8416 in __kernel_vsyscall ()
#1  0x004be5d1 in __lll_lock_wait_private () from /lib/libc.so.6
#2  0x0044985c in _L_lock_12621 () from /lib/libc.so.6
#3  0x00447797 in malloc () from /lib/libc.so.6
#4  0x0043a398 in open_memstream () from /lib/libc.so.6
#5  0x004a9ae5 in __vsyslog_chk () from /lib/libc.so.6
#6  0x0017d15f in vsyslog (kind=512, 
    fmt=0x1806cc "dir_notify_handler: sig %d si %p data %p\n", args=0xbfc03938 "%")
    at /usr/include/bits/syslog.h:48
#7  xlog_backend (kind=512, fmt=0x1806cc "dir_notify_handler: sig %d si %p data %p\n", 
    args=0xbfc03938 "%") at xlog.c:150
#8  0x001777d4 in printerr (priority=2, 
    format=0x1806cc "dir_notify_handler: sig %d si %p data %p\n") at err_util.c:64
#9  0x00177c9e in dir_notify_handler (sig=37, si=0xbfc0396c, data=0xbfc039ec)
    at gssd_main_loop.c:66
#10 <signal handler called>
#11 0x00444984 in _int_malloc () from /lib/libc.so.6
#12 0x004477a0 in malloc () from /lib/libc.so.6
#13 0x0046db77 in __alloc_dir () from /lib/libc.so.6
#14 0x0046dc5a in opendir () from /lib/libc.so.6
#15 0x0046e7ef in scandir64@@GLIBC_2.2 () from /lib/libc.so.6
#16 0x00179285 in process_pipedir () at gssd_proc.c:565
#17 update_client_list () at gssd_proc.c:594
#18 0x00177f40 in gssd_run () at gssd_main_loop.c:216
#19 0x00177bf9 in main (argc=2, argv=0xbfc04134) at gssd.c:187
Comment 2 Orion Poplawski 2011-01-10 17:56:30 EST
Looks like malloc is getting called from a signal handler called while in a malloc call, which is verboten.  Not sure what the best way around this, but it looks like dir_notify_handler cannot call printerr.  I suppose this only occurs when -vv or greater is given.
Comment 3 Matt Kinni 2011-05-28 23:42:58 EDT
What's even more hilarious is that when nfs hangs, your entire gnome session freezes.
Comment 4 Orion Poplawski 2012-03-13 16:01:04 EDT
I thought the solution was to drop the printerr call:

http://article.gmane.org/gmane.linux.nfs/45443
Comment 5 bcodding 2012-06-05 11:05:15 EDT
Why was this closed - cantfix?  We just ran into this one in RHEL6.
Comment 6 Ender 2016-03-14 18:29:10 EDT
For reference, this was fixed in 1.2.3-63 for RHEL 6:

* Mon May 18 2015 Steve Dickson <steved@redhat.com> 1.2.3-63
- Removed printerr from gssd (bz 949100)

Note You need to log in before you can comment on or make changes to this bug.