Bug 763793 - (GLUSTER-2061) NFS server crashes in readdir_fstat_cbk due to extra fd unref
NFS server crashes in readdir_fstat_cbk due to extra fd unref
Status: CLOSED CURRENTRELEASE
Product: GlusterFS
Classification: Community
Component: nfs (Show other bugs)
3.1.0
All Linux
low Severity high
: ---
: ---
Assigned To: Shehjar Tikoo
:
: GLUSTER-2079 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-11-07 23:21 EST by Shehjar Tikoo
Modified: 2015-12-01 11:45 EST (History)
6 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: ---
Regression: RTP
Mount Type: nfs
Documentation: DNR
CRM:
Verified Versions: 3.1.0
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
TRACE mode nfs.log files showing crash output (52 bytes, text/plain)
2010-11-10 08:37 EST, Dan Bretherton
no flags Details

  None (edit)
Description Shehjar Tikoo 2010-11-07 23:21:28 EST
From email on gluster-users:

I upgraded to GlusterFS 3.1 a couple of weeks ago and overall I am very
impressed; I think it is a big step forward.  Unfortunately there is one
"feature" that is causing me a big problem - the NFS process crashes
every few hours  when under load.  I have pasted the relevant error
messages from nfs.log at the end of this message.  The rest of the log
file is swamped with these messages incidentally.

[2010-11-06 23:07:04.977055] E [rpcsvc.c:1249:nfs_rpcsvc_program_actor]
nfsrpc: RPC program not available

There are no apparent problems while these errors are being produced so
this issue probably isn't relevant to the crashes.

To give an indication of what I mean by "under load", we have a small
HPC cluster that is used for running ocean models.  A typical model run
involves 20 processors, all needing to read simultaneously from the same
input data files at regular intervals during the run.  There are roughly
20 files, each ~1GB in size.  At the same time this is going on several
people, typically, are processing output from previous runs from this
and other (much bigger) clusters, chugging through hundreds of GB and
tens of thousands of files every few hours.  I don't think the
Gluster-NFS crashes are purely load dependant because they seem to occur
at different load levels, which is what leads me to suspect something
subtle related to the cluster's 20-processor model runs.  I would prefer
to use the GlusterFS client on the cluster's compute nodes, but
unfortunately the pre-FUSE Linux kernel has been customised in a way
that has thwarted all my attempts to build a FUSE module that the kernel
will accept (see
http://gluster.org/pipermail/gluster-users/2010-April/004538.html)

The servers that are exporting NFS are all running CentOS 5.5 with
GlusterFS installed from RPMs, and the GlusterFS volumes are distributed
(not repicated).  Two of the servers with GlusterFS bricks are actually
running SuSE Enterprise 10; I don't know if this is relevant.  I used
previous GlusterFS versions with SLES10 without any problems, but as
RPMs are not provided for SuSE I presume it is not an officially
supported distro.  For that reason I am only using the CentOS machines
as NFS servers for the GlusterFS volumes.

I would be very grateful for any suggested solutions or workarounds that
might help to prevent these NFS crashes.

-Dan.
nfs.log extract
------------------
[2010-11-06 23:07:10.380744] E [fd.c:506:fd_unref_unbind]
(-->/usr/lib64/glusterfs/3.1.0/xlator/debug/io-stats.so(io_stats_fstat_cbk+0x8e)
[0x2aaaab30813e]
(-->/usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs_fop_fstat_cbk+0x41)
[0x2aaaab9a6da1]
(-->/usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x22d)
[0x2aaaab9b0bdd]))) : Assertion failed: fd->refcount
pending frames:

patchset: v3.1.0
signal received: 11
time of crash: 2010-11-06 23:07:10
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.0
/lib64/libc.so.6[0x35746302d0]
/lib64/libpthread.so.0(pthread_spin_lock+0x2)[0x357520b722]
/usr/lib64/libglusterfs.so.0(fd_unref_unbind+0x3d)[0x38f223511d]
/usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x22d)[0x2aaaab9b0bdd]
/usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs_fop_fstat_cbk+0x41)[0x2aaaab9a6da1]
/usr/lib64/glusterfs/3.1.0/xlator/debug/io-stats.so(io_stats_fstat_cbk+0x8e)[0x2aaaab30813e]
/usr/lib64/libglusterfs.so.0(default_fstat_cbk+0x79)[0x38f2222a69]
/usr/lib64/glusterfs/3.1.0/xlator/performance/read-ahead.so(ra_attr_cbk+0x79)[0x2aaaaaeec459]
/usr/lib64/glusterfs/3.1.0/xlator/performance/write-behind.so(wb_fstat_cbk+0x9f)[0x2aaaaace402f]
/usr/lib64/glusterfs/3.1.0/xlator/cluster/distribute.so(dht_attr_cbk+0xf4)[0x2aaaab521d24]
/usr/lib64/glusterfs/3.1.0/xlator/protocol/client.so(client3_1_fstat_cbk+0x287)[0x2aaaaaacd2b7]
/usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x38f1a0f2e2]
/usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x38f1a0f4dd]
/usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x38f1a0a77c]
/usr/lib64/glusterfs/3.1.0/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2aaac3eb435f]
/usr/lib64/glusterfs/3.1.0/rpc-transport/socket.so(socket_event_handler+0x168)[0x2aaac3eb44e8]
/usr/lib64/libglusterfs.so.0[0x38f2236ee7]
/usr/sbin/glusterfs(main+0x37d)[0x4046ad]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x357461d994]
/usr/sbin/glusterfs[0x402dc9]
Comment 1 Dan Bretherton 2010-11-09 05:10:52 EST
ello Gluster developers,
Thanks for looking at this NFS problem.  I have been running with the NFS
server processes in TRACE mode for a day or so but they are stubbornly refusing
to crash.  There are a couple of possible reasons that I can think of (based on
not very much evidence and limited understanding of how GlusterFS works):

1) It is one of those programs that doesn't go wrong when you are debugging it
(which happened to me quite often when I used to do C programming)

2) The fault occurs only for newly created volumes where there is pre-existing
data on the bricks (as when upgrading from 3.0.5 to 3.1).  The reason I think
that might be the case is some odd behaviour I noticed after I first upgraded.
I think I encountered this bug:
http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2057, before another
user posted their similar findings on the Gluster users list.  I didn't think
it was a bug because I've seen similar behaviour in the past after creating
distributed volumes with pre-existing data.  To solve the problem of not
finding any subdirectories I saved the output of "find . -print" run on each of
the bricks, then ran a script to "ls" each line in each of these output files
on a GlusterFS client.  After that I could see all the files in all the
subdirectories (after a non-exhaustive search), but still experienced long
delays when browsing some directories for the first time.  It was after that
when the NFS process started crashing, when people came to work on the Monday
morning and started processing the data and running jobs on the HPC cluster.  I
thought perhaps that GlusterFS was still settling down in some sense,
optimising its hashing algorithm perhaps.  I am guessing here, but thought it
might be worth mentioning my observations.

I should add that the NFS load is distributed much better now than it was at
first.  Initially there was only one NFS server machine, but there are now
several machines acting as NFS servers for the different GluseterFS volumes. 
Having said that, one of the NFS processes did crash almost as soon as the
first model run started on the cluster, and that is what led me to believe that
the problem was not load dependent.   All the cluster's 64 processors are in
use now (not just 20 as before) but NFS is still refusing to crash.  I am not
sure how best to proceed.  I will probably wait another day or so to see if
anything happens while still running in TRACE mode.  I will be away from the
office Thursday-Sunday so won't be able to do anything else until next week.

Regards,
Dan Bretherton
Comment 2 Dan Bretherton 2010-11-10 08:37:37 EST
Created attachment 374
Comment 3 Dan Bretherton 2010-11-10 08:40:38 EST
Comment on attachment 374


There were 5 crashes in a row, and I made a separate copy of the nfs.log file each time to avoid my logrotate cron job cleaning them up.  Therefore some of the files may contain the same information. The first log file in the sequence (nfs_trace.log.CRASH) goes back 5 minutes and is 1.3GB uncompressed.
Comment 4 Shehjar Tikoo 2010-11-10 20:56:28 EST
Thanks! I'll try to look at this soon.
Comment 5 Raghavendra Bhat 2010-11-10 21:00:12 EST
*** Bug 2079 has been marked as a duplicate of this bug. ***
Comment 6 Shehjar Tikoo 2010-11-15 01:26:13 EST
Dan, please break the tarball into separate log files. I may not have to download the full 1.3g file to investigate the bug. Smaller log files in this tarball may be just as useful. Thanks.
Comment 7 Dan Bretherton 2010-11-15 02:30:17 EST
(In reply to comment #6)
> Dan, please break the tarball into separate log files. I may not have to
> download the full 1.3g file to investigate the bug. Smaller log files in this
> tarball may be just as useful. Thanks.

It is only 97 MB compressed.  I can still split it up if you like.  The big one is called nfs_trace.log.CRASH, so you can save some time by not extracting it.  The small ones are nfs_trace.log.CRASH2, ..., nfs_trace.log.CRASH5.

-Dan.
Comment 8 Anand Avati 2010-11-15 04:03:53 EST
PATCH: http://patches.gluster.com/patch/5699 in master (nfs: opendir/closedir for every readdir)
Comment 9 Udo Waechter 2010-11-17 07:42:41 EST
Hi, 
just to give our experience.
We have also experienced crashes of NFS with gluster 3.1 (Distributed volume, 3 Servers, Lucid Lynx 64bit with kernel 2.6.35-22-server)

Our crashes happened when a lot of clients connected (~100) to the nfs-volume. It was not so much the bandwith or requests.

Maybe the nfs-server runs out of file-descriptors?

thanks -- udo.
Comment 10 Shehjar Tikoo 2010-11-21 22:25:27 EST
(In reply to comment #9)
> Hi, 
> just to give our experience.
> We have also experienced crashes of NFS with gluster 3.1 (Distributed volume, 3
> Servers, Lucid Lynx 64bit with kernel 2.6.35-22-server)
> 
> Our crashes happened when a lot of clients connected (~100) to the nfs-volume.
> It was not so much the bandwith or requests.
> 
> Maybe the nfs-server runs out of file-descriptors?
> 
> thanks -- udo.

Nah. It was just a oversight on my part. The directory reading code didnt behave well with hundreds of clients accessing the same directory. The patch above will fix it.
Comment 11 Vijaykumar 2011-09-07 04:08:47 EDT
As discussed with developer , its a very corner case which is difficult to hit.
so moving it to verified state.

Note You need to log in before you can comment on or make changes to this bug.