859162 – FUSE client crashes

Bug 859162 - FUSE client crashes

Summary: FUSE client crashes

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	core
Sub Component:
Version:	3.1.7
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	unspecified
Target Milestone:	---
Assignee:	Raghavendra Bhat
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-20 17:08 UTC by Louis Zuckerman
Modified:	2012-11-30 18:05 UTC (History)
CC List:	2 users (show)
Fixed In Version:
Clone Of:
Environment:
Last Closed:	2012-11-30 18:05:20 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Louis Zuckerman 2012-09-20 17:08:30 UTC

Description of problem:

One of my clients has been crashing every night for the past few days.

Version-Release number of selected component (if applicable):

Glusterfs 3.1.7 on Ubuntu Oneiric 11.10 (servers & client)...

Server & client kernels: 3.0.0-26-virtual (latest currently)

This is the only client for this volume, and the client machine doesn't mount any other glusterfs volumes.

The servers do host several other volumes which are mounted by several other client machines, all running the same glusterfs & os/kernel versions, which don't have any problems.

How reproducible:

I dont know how to reproduce it but it has happened a few times this week already.

Here is the log file from the last crash, which begins with mounting & shows no activity until the crash many hours later...

[2012-09-19 15:10:59.251188] I [client-handshake.c:1016:select_server_supported_programs] 0-builder-client-1: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2012-09-19 15:10:59.251361] I [client-handshake.c:1016:select_server_supported_programs] 0-builder-client-0: Using Program GlusterFS-3.1.0, Num (1298437), Version (310)
[2012-09-19 15:10:59.270508] I [client-handshake.c:852:client_setvolume_cbk] 0-builder-client-0: Connected to 10.44.185.22:24020, attached to remote volume '/bricks/builder0'.
[2012-09-19 15:10:59.270587] I [afr-common.c:2646:afr_notify] 0-builder-replicate-0: Subvolume 'builder-client-0' came back up; going online.
[2012-09-19 15:10:59.286563] I [client-handshake.c:852:client_setvolume_cbk] 0-builder-client-1: Connected to 10.4.126.119:24036, attached to remote volume '/bricks/builder0'.
[2012-09-19 15:10:59.295463] I [fuse-bridge.c:3312:fuse_graph_setup] 0-fuse: switched graph to 0
[2012-09-19 15:10:59.295686] I [fuse-bridge.c:2900:fuse_init] 0-glusterfs-fuse: FUSE inited with protocol versions: glusterfs 7.13 kernel 7.16
[2012-09-19 15:10:59.320368] I [afr-common.c:893:afr_fresh_lookup_cbk] 0-builder-replicate-0: added root inode
[2012-09-20 05:03:01.834802] W [fuse-bridge.c:1751:fuse_readv_cbk] 0-glusterfs-fuse: 6750101: READ => -1 (No such file or directory)
[2012-09-20 05:03:01.834911] E [mem-pool.c:469:mem_put] 0-mem-pool: invalid argument
[2012-09-20 05:03:01.836458] W [fuse-bridge.c:1751:fuse_readv_cbk] 0-glusterfs-fuse: 6750104: READ => -1 (No such file or directory)
[2012-09-20 05:03:01.836496] E [mem-pool.c:469:mem_put] 0-mem-pool: invalid argument
pending frames:
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(CREATE)
frame : type(1) op(READ)
frame : type(1) op(LOOKUP)
frame : type(1) op(FLUSH)
frame : type(1) op(FLUSH)
 
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2012-09-20 05:03:01
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.1.7
/lib/x86_64-linux-gnu/libc.so.6(+0x36420)[0x7f54074dd420]
/lib/x86_64-linux-gnu/libpthread.so.0(pthread_spin_lock+0x0)[0x7f5407854d60]
/usr/lib/libglusterfs.so.0(fd_unref+0x3b)[0x7f5407ec74ab]
/usr/lib/glusterfs/3.1.7/xlator/protocol/client.so(client_local_wipe+0x1f)[0x7f54045d824f]
/usr/lib/glusterfs/3.1.7/xlator/protocol/client.so(client3_1_open_cbk+0x19b)[0x7f54045dd08b]
/usr/lib/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7f5407c8a065]
/usr/lib/libgfrpc.so.0(rpc_clnt_notify+0x7d)[0x7f5407c8a42d]
/usr/lib/libgfrpc.so.0(rpc_transport_notify+0x27)[0x7f5407c86867]
/usr/lib/glusterfs/3.1.7/rpc-transport/socket.so(socket_event_poll_in+0x34)[0x7f5405635984]
/usr/lib/glusterfs/3.1.7/rpc-transport/socket.so(socket_event_handler+0xb3)[0x7f5405635c23]
/usr/lib/libglusterfs.so.0(+0x37f81)[0x7f5407ec8f81]
/usr/sbin/glusterfs(main+0x23a)[0x40313a]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0xed)[0x7f54074c830d]
/usr/sbin/glusterfs[0x4031d5]

Comment 1 Louis Zuckerman 2012-09-20 17:11:17 UTC

Here is the volume info for this volume...

Volume Name: builder
Type: Replicate
Status: Started
Number of Bricks: 2
Transport-type: tcp
Bricks:
Brick1: server1:/bricks/builder0
Brick2: server2:/bricks/builder0
Options Reconfigured:
diagnostics.client-log-level: INFO



And here is a sample line from both of the brick logs on the servers, they produce a bunch of lines like this right before the client crashes...


[2012-09-20 05:03:04.976830] I [server-helpers.c:459:do_fd_cleanup] 0-builder-server: fd cleanup on /nexus/sonatype-work/nexus/timeline/index/_3pi.prx

Comment 2 Amar Tumballi 2012-09-21 08:14:43 UTC

3.1.7? any chance of at least upgrading to 3.2.x series at the least? Meantime, will try to figure out the issue in release-3.1 branch

Comment 3 Louis Zuckerman 2012-09-21 17:01:21 UTC

The client made it through the night without crashing yesterday.  And yes, I will work on upgrading to 3.2.

I was hoping someone would see that stacktrace or log & recognize an obvious problem, because I do not think this will be easy to reproduce.  I have lots of clients, many with uptimes of monthsm, and have almost never seen any crash.

Comment 4 Louis Zuckerman 2012-09-21 21:19:17 UTC

I was able to reproduce the crash again this afternoon since my last comment.  This volume stores SVN repos & a Nexus maven repository, among other things.  When I tried doing a build which checked out from SVN and Nexus the mount crashed.  This explains the nightly crashes, they happened when Jenkins would run the nightly builds.

Joe Julian suggested (in IRC) that I try stopping & starting the volume.  I did that, and also rebooted the client machine, and now everything seems to be working fine -- I am able to do the Jenkins builds without the client crashing.

This whole problem seemed to be caused by a rolling reboot of the servers for this volume.  I have done this many times in the past with this volume and other volumes and never ran into this kind of trouble.  In any case, it seems to be resolved now since I stopped & restarted the volume.

Comment 5 Amar Tumballi 2012-11-29 11:09:21 UTC

moving the priority down as workaround exists, and also because the version is 3.1.x which is not *actively* looked into. Louis, does that sound ok for you?

Comment 6 Louis Zuckerman 2012-11-29 15:08:53 UTC

Yes that is fine with me.  I have not seen this bug happen again since I reported it.  My volumes have been very stable.

Thanks!

Comment 7 Amar Tumballi 2012-11-30 18:05:20 UTC

WORKSFORME with latest release then. Please upgrade to 3.3.x (or at least 3.2.x), don't remain in 3.1.x releases

Note You need to log in before you can comment on or make changes to this bug.