Bug 859162 - FUSE client crashes
FUSE client crashes
Product: GlusterFS
Classification: Community
Component: core (Show other bugs)
Unspecified Unspecified
medium Severity unspecified
: ---
: ---
Assigned To: Raghavendra Bhat
Depends On:
  Show dependency treegraph
Reported: 2012-09-20 13:08 EDT by Louis Zuckerman
Modified: 2012-11-30 13:05 EST (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2012-11-30 13:05:20 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Louis Zuckerman 2012-09-20 13:08:30 EDT
Description of problem:

One of my clients has been crashing every night for the past few days.

Version-Release number of selected component (if applicable):

Glusterfs 3.1.7 on Ubuntu Oneiric 11.10 (servers & client)...

Server & client kernels: 3.0.0-26-virtual (latest currently)

This is the only client for this volume, and the client machine doesn't mount any other glusterfs volumes.

The servers do host several other volumes which are mounted by several other client machines, all running the same glusterfs & os/kernel versions, which don't have any problems.

How reproducible:

I dont know how to reproduce it but it has happened a few times this week already.

Here is the log file from the last crash, which begins with mounting & shows no activity until the crash many hours later...

[2012-09-19 15:10:59.251188] I [client-handshake.c:1016:select_server_supported_programs] 0-builder-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2012-09-19 15:10:59.251361] I [client-handshake.c:1016:select_server_supported_programs] 0-builder-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2012-09-19 15:10:59.270508] I [client-handshake.c:852:client_setvolume_cbk] 0-builder-client-0: Connected to, attached to remote volume '/bricks/builder0'.
[2012-09-19 15:10:59.270587] I [afr-common.c:2646:afr_notify] 0-builder-replicate-0: Subvolume 'builder-client-0' came back up; going online.
[2012-09-19 15:10:59.286563] I [client-handshake.c:852:client_setvolume_cbk] 0-builder-client-1: Connected to, attached to remote volume '/bricks/builder0'.
[2012-09-19 15:10:59.295463] I [fuse-bridge.c:3312:fuse_graph_setup] 0-fuse: switched graph to 0
[2012-09-19 15:10:59.295686] I [fuse-bridge.c:2900:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16
[2012-09-19 15:10:59.320368] I [afr-common.c:893:afr_fresh_lookup_cbk] 0-builder-replicate-0: added root inode
[2012-09-20 05:03:01.834802] W [fuse-bridge.c:1751:fuse_readv_cbk] 0-glusterfs-fuse: 6750101: READ => -1 (No such file or directory)
[2012-09-20 05:03:01.834911] E [mem-pool.c:469:mem_put] 0-mem-pool: invalid argument
[2012-09-20 05:03:01.836458] W [fuse-bridge.c:1751:fuse_readv_cbk] 0-glusterfs-fuse: 6750104: READ => -1 (No such file or directory)
[2012-09-20 05:03:01.836496] E [mem-pool.c:469:mem_put] 0-mem-pool: invalid argument
pending frames:
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(READ)
frame : type(1) op(LOOKUP)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-09-20 05:03:01
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.7
Comment 1 Louis Zuckerman 2012-09-20 13:11:17 EDT
Here is the volume info for this volume...

Volume Name: builder
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Brick1: server1:/bricks/builder0
Brick2: server2:/bricks/builder0
Options Reconfigured:
diagnostics.client-log-level: INFO

And here is a sample line from both of the brick logs on the servers, they produce a bunch of lines like this right before the client crashes...

[2012-09-20 05:03:04.976830] I [server-helpers.c:459:do_fd_cleanup] 0-builder-server: fd cleanup on /nexus/sonatype-work/nexus/timeline/index/_3pi.prx
Comment 2 Amar Tumballi 2012-09-21 04:14:43 EDT
3.1.7? any chance of at least upgrading to 3.2.x series at the least? Meantime, will try to figure out the issue in release-3.1 branch
Comment 3 Louis Zuckerman 2012-09-21 13:01:21 EDT
The client made it through the night without crashing yesterday.  And yes, I will work on upgrading to 3.2.

I was hoping someone would see that stacktrace or log & recognize an obvious problem, because I do not think this will be easy to reproduce.  I have lots of clients, many with uptimes of monthsm, and have almost never seen any crash.
Comment 4 Louis Zuckerman 2012-09-21 17:19:17 EDT
I was able to reproduce the crash again this afternoon since my last comment.  This volume stores SVN repos & a Nexus maven repository, among other things.  When I tried doing a build which checked out from SVN and Nexus the mount crashed.  This explains the nightly crashes, they happened when Jenkins would run the nightly builds.

Joe Julian suggested (in IRC) that I try stopping & starting the volume.  I did that, and also rebooted the client machine, and now everything seems to be working fine -- I am able to do the Jenkins builds without the client crashing.

This whole problem seemed to be caused by a rolling reboot of the servers for this volume.  I have done this many times in the past with this volume and other volumes and never ran into this kind of trouble.  In any case, it seems to be resolved now since I stopped & restarted the volume.
Comment 5 Amar Tumballi 2012-11-29 06:09:21 EST
moving the priority down as workaround exists, and also because the version is 3.1.x which is not *actively* looked into. Louis, does that sound ok for you?
Comment 6 Louis Zuckerman 2012-11-29 10:08:53 EST
Yes that is fine with me.  I have not seen this bug happen again since I reported it.  My volumes have been very stable.

Comment 7 Amar Tumballi 2012-11-30 13:05:20 EST
WORKSFORME with latest release then. Please upgrade to 3.3.x (or at least 3.2.x), don't remain in 3.1.x releases

Note You need to log in before you can comment on or make changes to this bug.