Red Hat Bugzilla – Full Text Bug Listing
|Summary:||NFS server crashes in readdir_fstat_cbk due to extra fd unref|
|Product:||[Community] GlusterFS||Reporter:||Shehjar Tikoo <shehjart>|
|Component:||nfs||Assignee:||Shehjar Tikoo <shehjart>|
|Status:||CLOSED CURRENTRELEASE||QA Contact:|
|Version:||3.1.0||CC:||aavati, admin, d.a.bretherton, gluster-bugs, rabhat, vijaykumar|
|Fixed In Version:||Doc Type:||Bug Fix|
|Doc Text:||Story Points:||---|
|oVirt Team:||---||RHEL 7.3 requirements from Atomic Host:|
Description Shehjar Tikoo 2010-11-07 23:21:28 EST
From email on gluster-users: I upgraded to GlusterFS 3.1 a couple of weeks ago and overall I am very impressed; I think it is a big step forward. Unfortunately there is one "feature" that is causing me a big problem - the NFS process crashes every few hours when under load. I have pasted the relevant error messages from nfs.log at the end of this message. The rest of the log file is swamped with these messages incidentally. [2010-11-06 23:07:04.977055] E [rpcsvc.c:1249:nfs_rpcsvc_program_actor] nfsrpc: RPC program not available There are no apparent problems while these errors are being produced so this issue probably isn't relevant to the crashes. To give an indication of what I mean by "under load", we have a small HPC cluster that is used for running ocean models. A typical model run involves 20 processors, all needing to read simultaneously from the same input data files at regular intervals during the run. There are roughly 20 files, each ~1GB in size. At the same time this is going on several people, typically, are processing output from previous runs from this and other (much bigger) clusters, chugging through hundreds of GB and tens of thousands of files every few hours. I don't think the Gluster-NFS crashes are purely load dependant because they seem to occur at different load levels, which is what leads me to suspect something subtle related to the cluster's 20-processor model runs. I would prefer to use the GlusterFS client on the cluster's compute nodes, but unfortunately the pre-FUSE Linux kernel has been customised in a way that has thwarted all my attempts to build a FUSE module that the kernel will accept (see http://gluster.org/pipermail/gluster-users/2010-April/004538.html) The servers that are exporting NFS are all running CentOS 5.5 with GlusterFS installed from RPMs, and the GlusterFS volumes are distributed (not repicated). Two of the servers with GlusterFS bricks are actually running SuSE Enterprise 10; I don't know if this is relevant. I used previous GlusterFS versions with SLES10 without any problems, but as RPMs are not provided for SuSE I presume it is not an officially supported distro. For that reason I am only using the CentOS machines as NFS servers for the GlusterFS volumes. I would be very grateful for any suggested solutions or workarounds that might help to prevent these NFS crashes. -Dan. nfs.log extract ------------------ [2010-11-06 23:07:10.380744] E [fd.c:506:fd_unref_unbind] (-->/usr/lib64/glusterfs/3.1.0/xlator/debug/io-stats.so(io_stats_fstat_cbk+0x8e) [0x2aaaab30813e] (-->/usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs_fop_fstat_cbk+0x41) [0x2aaaab9a6da1] (-->/usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x22d) [0x2aaaab9b0bdd]))) : Assertion failed: fd->refcount pending frames: patchset: v3.1.0 signal received: 11 time of crash: 2010-11-06 23:07:10 configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.1.0 /lib64/libc.so.6[0x35746302d0] /lib64/libpthread.so.0(pthread_spin_lock+0x2)[0x357520b722] /usr/lib64/libglusterfs.so.0(fd_unref_unbind+0x3d)[0x38f223511d] /usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs3svc_readdir_fstat_cbk+0x22d)[0x2aaaab9b0bdd] /usr/lib64/glusterfs/3.1.0/xlator/nfs/server.so(nfs_fop_fstat_cbk+0x41)[0x2aaaab9a6da1] /usr/lib64/glusterfs/3.1.0/xlator/debug/io-stats.so(io_stats_fstat_cbk+0x8e)[0x2aaaab30813e] /usr/lib64/libglusterfs.so.0(default_fstat_cbk+0x79)[0x38f2222a69] /usr/lib64/glusterfs/3.1.0/xlator/performance/read-ahead.so(ra_attr_cbk+0x79)[0x2aaaaaeec459] /usr/lib64/glusterfs/3.1.0/xlator/performance/write-behind.so(wb_fstat_cbk+0x9f)[0x2aaaaace402f] /usr/lib64/glusterfs/3.1.0/xlator/cluster/distribute.so(dht_attr_cbk+0xf4)[0x2aaaab521d24] /usr/lib64/glusterfs/3.1.0/xlator/protocol/client.so(client3_1_fstat_cbk+0x287)[0x2aaaaaacd2b7] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa2)[0x38f1a0f2e2] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x8d)[0x38f1a0f4dd] /usr/lib64/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x38f1a0a77c] /usr/lib64/glusterfs/3.1.0/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2aaac3eb435f] /usr/lib64/glusterfs/3.1.0/rpc-transport/socket.so(socket_event_handler+0x168)[0x2aaac3eb44e8] /usr/lib64/libglusterfs.so.0[0x38f2236ee7] /usr/sbin/glusterfs(main+0x37d)[0x4046ad] /lib64/libc.so.6(__libc_start_main+0xf4)[0x357461d994] /usr/sbin/glusterfs[0x402dc9]
Comment 1 Dan Bretherton 2010-11-09 05:10:52 EST
ello Gluster developers, Thanks for looking at this NFS problem. I have been running with the NFS server processes in TRACE mode for a day or so but they are stubbornly refusing to crash. There are a couple of possible reasons that I can think of (based on not very much evidence and limited understanding of how GlusterFS works): 1) It is one of those programs that doesn't go wrong when you are debugging it (which happened to me quite often when I used to do C programming) 2) The fault occurs only for newly created volumes where there is pre-existing data on the bricks (as when upgrading from 3.0.5 to 3.1). The reason I think that might be the case is some odd behaviour I noticed after I first upgraded. I think I encountered this bug: http://bugs.gluster.com/cgi-bin/bugzilla3/show_bug.cgi?id=2057, before another user posted their similar findings on the Gluster users list. I didn't think it was a bug because I've seen similar behaviour in the past after creating distributed volumes with pre-existing data. To solve the problem of not finding any subdirectories I saved the output of "find . -print" run on each of the bricks, then ran a script to "ls" each line in each of these output files on a GlusterFS client. After that I could see all the files in all the subdirectories (after a non-exhaustive search), but still experienced long delays when browsing some directories for the first time. It was after that when the NFS process started crashing, when people came to work on the Monday morning and started processing the data and running jobs on the HPC cluster. I thought perhaps that GlusterFS was still settling down in some sense, optimising its hashing algorithm perhaps. I am guessing here, but thought it might be worth mentioning my observations. I should add that the NFS load is distributed much better now than it was at first. Initially there was only one NFS server machine, but there are now several machines acting as NFS servers for the different GluseterFS volumes. Having said that, one of the NFS processes did crash almost as soon as the first model run started on the cluster, and that is what led me to believe that the problem was not load dependent. All the cluster's 64 processors are in use now (not just 20 as before) but NFS is still refusing to crash. I am not sure how best to proceed. I will probably wait another day or so to see if anything happens while still running in TRACE mode. I will be away from the office Thursday-Sunday so won't be able to do anything else until next week. Regards, Dan Bretherton
Comment 2 Dan Bretherton 2010-11-10 08:37:37 EST
Created attachment 374
Comment 3 Dan Bretherton 2010-11-10 08:40:38 EST
Comment on attachment 374 There were 5 crashes in a row, and I made a separate copy of the nfs.log file each time to avoid my logrotate cron job cleaning them up. Therefore some of the files may contain the same information. The first log file in the sequence (nfs_trace.log.CRASH) goes back 5 minutes and is 1.3GB uncompressed.
Comment 4 Shehjar Tikoo 2010-11-10 20:56:28 EST
Thanks! I'll try to look at this soon.
Comment 5 Raghavendra Bhat 2010-11-10 21:00:12 EST
*** Bug 2079 has been marked as a duplicate of this bug. ***
Comment 6 Shehjar Tikoo 2010-11-15 01:26:13 EST
Dan, please break the tarball into separate log files. I may not have to download the full 1.3g file to investigate the bug. Smaller log files in this tarball may be just as useful. Thanks.
Comment 7 Dan Bretherton 2010-11-15 02:30:17 EST
(In reply to comment #6) > Dan, please break the tarball into separate log files. I may not have to > download the full 1.3g file to investigate the bug. Smaller log files in this > tarball may be just as useful. Thanks. It is only 97 MB compressed. I can still split it up if you like. The big one is called nfs_trace.log.CRASH, so you can save some time by not extracting it. The small ones are nfs_trace.log.CRASH2, ..., nfs_trace.log.CRASH5. -Dan.
Comment 8 Anand Avati 2010-11-15 04:03:53 EST
PATCH: http://patches.gluster.com/patch/5699 in master (nfs: opendir/closedir for every readdir)
Comment 9 Udo Waechter 2010-11-17 07:42:41 EST
Hi, just to give our experience. We have also experienced crashes of NFS with gluster 3.1 (Distributed volume, 3 Servers, Lucid Lynx 64bit with kernel 2.6.35-22-server) Our crashes happened when a lot of clients connected (~100) to the nfs-volume. It was not so much the bandwith or requests. Maybe the nfs-server runs out of file-descriptors? thanks -- udo.
Comment 10 Shehjar Tikoo 2010-11-21 22:25:27 EST
(In reply to comment #9) > Hi, > just to give our experience. > We have also experienced crashes of NFS with gluster 3.1 (Distributed volume, 3 > Servers, Lucid Lynx 64bit with kernel 2.6.35-22-server) > > Our crashes happened when a lot of clients connected (~100) to the nfs-volume. > It was not so much the bandwith or requests. > > Maybe the nfs-server runs out of file-descriptors? > > thanks -- udo. Nah. It was just a oversight on my part. The directory reading code didnt behave well with hundreds of clients accessing the same directory. The patch above will fix it.
Comment 11 Vijaykumar 2011-09-07 04:08:47 EDT
As discussed with developer , its a very corner case which is difficult to hit. so moving it to verified state.