Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 612107 - NFS blocks after "Callback slot table overflowed" message
NFS blocks after "Callback slot table overflowed" message
Status: CLOSED CURRENTRELEASE
Product: Red Hat Enterprise Linux 6
Classification: Red Hat
Component: kernel (Show other bugs)
6.0
All Linux
high Severity high
: rc
: ---
Assigned To: Steve Dickson
Red Hat Kernel QE team
:
Depends On: 607695
Blocks:
  Show dependency treegraph
 
Reported: 2010-07-07 06:33 EDT by Sachin Prabhu
Modified: 2018-10-27 09:24 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 607695
Environment:
Last Closed: 2010-07-22 09:22:21 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Proposed upstream patch (3.27 KB, patch)
2010-07-20 16:31 EDT, Steve Dickson
no flags Details | Diff

  None (edit)
Description Sachin Prabhu 2010-07-07 06:33:18 EDT
Cloning issue for RHEL 6 beta.

There are significant differences in the 2 reported cases here. The first one caused the NFS share to hang indefinitely while in the second case, we see delays when the error is seen.

+++ This bug was initially created as a clone of Bug #607695 +++

My $HOME is an NFS mount. I noticed that sometimes (~90% of the time) when copying a large from the NFS mount to a local mount point like /tmp, will lock completely the desktop and apparently all NFS mounts.

I noticed that this happens always after the following message is shown in /var/log/messages:
kernel: Callback slot table overflowed

Version-Release number of selected component:
$ uname -a
Linux sestows351 2.6.33.5-124.fc13.i686.PAE #1 SMP Fri Jun 11 09:42:24 UTC 2010 i686 i686 i386 GNU/Linux

Server NFS seems to use version 3:
$ cat /proc/fs/nfsfs/servers 
NV SERVER   PORT USE HOSTNAME
v3 ac150302  801   3 nfs_server

fstab line for /home:
nfs_server:/home       /home                  nfs     rw,async,rsize=8192,wsize=8192,timeo=14,retrans=6 0 0


I can fairly easily reproduce this issue, I just need to copy one big file from the NFS mount to a locally mounted directory. Sometimes it also happens when an application like evolution starts.

The only solution I found so far is to reboot the computer. I once left it overnight to see if it recovered, but the next morning the NFS mounts were still not responding.

--- Additional comment from kevin.constantine@disney.com on 2010-07-06 21:30:24 EDT ---

I'm seeing similar behavior both on FC13 and RHEL6 beta.  My clients hang for up to 15 seconds, and then send a tcp reset packet to the server.  There are far more frequent 3 second pauses and then it seems like the tcp conversation gets restarted.  

When I run a simple lmdd of a 150MB file from an NFSv3 server, throughput can be measured anywhere from 114MB/s when there are no Callback errors, to 2-6MB/s when there are many Callback errors.

I found that if I set /proc/sys/sunrpc/tcp_slot_table_entries to 2 on the client, the "Callback slot table overflowed" errors disappear, and throughput is consistently 114MB/s.  Setting it to 3, there are a few errors in the logs, but performance is not affected.  Setting it to 4, performance is significantly affected (30MB/s), and there are consistent Callback errors in the logs.

What's interesting to me, is that I see this behavior when reading from one particular vendor's NAS device, and not from another's.
Comment 7 Kevin Constantine 2010-07-07 17:33:32 EDT
Should CONFIG_NFS_V4_1=y be set in RHEL6?  I ask because the kernel code that prints out "Callback slot table overflowed" is wrapped in "if defined(CONFIG_NFS_V4_1)", and I don't remember hearing that V4.1 was going to be supported in RHEL6.
Comment 8 Kevin Constantine 2010-07-07 22:13:30 EDT
I've done some more testing on Fedora 13, where I'm seeing pauses not hangs (so, to me, the symptoms are the same).  I re-compiled 2.6.33.5-124 with "# CONFIG_NFS_V4_1 is not set" instead of "CONFIG_NFS_V4_1=y", and I cannot reproduce the pauses.
Comment 10 Trond Myklebust 2010-07-20 13:02:36 EDT
Please see commit b76ce56192bcf618013fb9aecd83488cffd645cc (SUNRPC: Fix a re-entrancy bug in xs_tcp_read_calldir()).

There is also a bugzilla documenting it at:
  https://bugzilla.kernel.org/show_bug.cgi?id=16213
Comment 11 Steve Dickson 2010-07-20 16:31:57 EDT
Created attachment 433258 [details]
Proposed upstream patch
Comment 13 Steve Dickson 2010-07-20 16:44:25 EDT
I turns out the  proposed patch is already in the RHEL6 git tree, please have them try a beta2 kernel...
Comment 16 Kevin Constantine 2010-07-20 17:28:25 EDT
I'm still seeing this with (what I believe is a beta2 kernel)

[kconstan@beaver build]$ uname -a
Linux beaver 2.6.32-37.el6.x86_64 #1 SMP Sun Jun 20 19:29:35 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux


[kconstan@beaver build]$ sudo mount 172.30.224.151:/ifs/data/adai0102 /mnt; lmdd if=/mnt/kctest/150mb bs=1m; sudo umount /mnt
157.2864 MB in 4.5694 secs, 34.4213 MB/sec
[kconstan@beaver build]$ sudo mount 172.30.224.151:/ifs/data/adai0102 /mnt; lmdd if=/mnt/kctest/150mb bs=1m; sudo umount /mnt
157.2864 MB in 1.3720 secs, 114.6435 MB/sec
[kconstan@beaver build]$ sudo mount 172.30.224.151:/ifs/data/adai0102 /mnt; lmdd if=/mnt/kctest/150mb bs=1m; sudo umount /mnt
157.2864 MB in 19.4463 secs, 8.0883 MB/sec
[kconstan@beaver build]$ uname -a
Linux beaver 2.6.32-37.el6.x86_64 #1 SMP Sun Jun 20 19:29:35 EDT 2010 x86_64 x86_64 x86_64 GNU/Linux


from the messages log:
Jul 20 14:26:33 beaver sudo: kconstan : TTY=pts/67 ; PWD=/share/el6/build ; USER=root ; COMMAND=/bin/mount 172.30.224.151:/ifs/data/adai0102 /mnt
Jul 20 14:26:34 beaver kernel: Callback slot table overflowed
Jul 20 14:26:34 beaver kernel: Callback slot table overflowed
Jul 20 14:26:35 beaver kernel: Callback slot table overflowed
Jul 20 14:26:37 beaver automount[1804]: 2 remaining in /apps
Jul 20 14:26:38 beaver sudo: kconstan : TTY=pts/67 ; PWD=/share/el6/build ; USER=root ; COMMAND=/bin/umount /mnt

Note You need to log in before you can comment on or make changes to this bug.