Created attachment 96 [details] Patch fixing the described problems.
Initial observation showed that this->private was NULL (gdb) p *table cannot access memory 0x0 Seems like we need a NULL check. By looking at logs it seems that the fini was trigged as glusterfs got a sigterm while it was waiting on "waitpid" inside "fuse_mnt_add_mount"
Can you please place the core file in /share/bugzilla/<bugid>?
I can't this was reproduced on Storage Platform and the machine i am using is a laptop with no connectivity.. to internet.. I tried getting as many logs as i could.
Observed a segfault while testing on "Storage Platform" Follwoing the gdb backtrace. (gdb) #0 0x00000038bc4087a0 in pthread_mutex_destroy () from /lib64/libpthread.so.0 #1 0x00007f7483fae5d1 in fini (this=0x1ba6d60) at io-cache.c:1351 #2 0x0000000000402d1f in cleanup_and_exit (signum=<value optimized out>) at glusterfsd.c:950 #3 <signal handler called> #4 0x00000038bc40ea2b in waitpid () from /lib64/libpthread.so.0 #5 0x00007f7483da6c11 in fuse_mnt_add_mount (fsname=0x1ba01e0 "/etc/glusterfs/test.vol", mnt=0x1ba81e0 "/nfs/test", type=0x7f7483da8d24 "fuse.glusterfs", opts=0x7f7483da7df8 "allow_other,default_permissions,max_read=131072", progname=<value optimized out>) at ../../../../contrib/fuse-lib/mount.c:153 #6 0x00007f7483da733d in fuse_mount_sys (mnt_param=<value optimized out>, fsname=<value optimized out>, mountpoint=<value optimized out>) at ../../../../contrib/fuse-lib/mount.c:553 #7 gf_fuse_mount (mnt_param=<value optimized out>, fsname=<value optimized out>, mountpoint=<value optimized out>) at ../../../../contrib/fuse-lib/mount.c:582 #8 0x00007f7483d99fab in init (this_xl=0x1ba1520) at fuse-bridge.c:3391 #9 0x00000038bbc1408b in xlator_init (xl=0x1ba1520) at xlator.c:940 #10 0x00000038bbc14121 in xlator_init_rec (xl=<value optimized out>) at xlator.c:833 #11 xlator_tree_init (xl=<value optimized out>) at xlator.c:871 #12 0x00000000004033cc in _xlator_graph_init (xl=<value optimized out>) at glusterfsd.c:581 #13 glusterfs_graph_init (xl=<value optimized out>) at glusterfsd.c:631 #14 0x0000000000404038 in main (argc=<value optimized out>, argv=<value optimized out>) at glusterfsd.c:1344 (gdb) (gdb) (gdb) quit
Everyone, whats going on with this bug, is it still viable?. As i don't have much information other than the backtrace and client log file.
which translator's this->private was NULL? The information contained is very less to fix the bug. It does not point to whether io-cache is the culprit.
There are two things in bug one is waitpid() code is from fuse_mnt_add_sys which is waiting to mount not sure why it is waiting long enough here. Now i am not sure how it is reproducible. If i can remember "this->private" is from the fini which is called for io-cache right with SIGTERM (cleanup_and_exit).
PATCH: http://patches.gluster.com/patch/2618 in master (Added null checks in "fini")
PATCH: http://patches.gluster.com/patch/2619 in release-2.0 (Add null pointer checks in "fini")
Verifed with 2.0.10rc1