Description of problem: I see that client process of my system crashed (HC setup) and it is due to write behind translator. bt from core dump: ======================== warning: core file may not match specified executable file. Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done. done. Missing separate debuginfo for Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5b/9ed38e31ce6bd04ecca183ad6d6ee05a4535d0 [Thread debugging using libthread_db enabled] Using host libthread_db library "/lib64/libthread_db.so.1". Core was generated by `/usr/sbin/glusterfs --volfile-server=10.70.36.82 --volfile-server=10.70.36.83 -'. Program terminated with signal 11, Segmentation fault. #0 0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode") at write-behind.c:2355 2355 gf_proc_dump_write ("offset", "%"PRId64, Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64 (gdb) bt #0 0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode") at write-behind.c:2355 #1 0x00007fa23068510e in wb_inode_dump (this=<optimized out>, inode=0x7fa2281dc590) at write-behind.c:2424 #2 0x00007fa238f0b550 in inode_dump (inode=0x7fa2281dc590, prefix=0x7fa2354ba420 "xlator.mount.fuse.itable.active.2") at inode.c:2367 #3 0x00007fa238f0b78f in inode_table_dump (itable=0x7fa21400fa90, prefix=prefix@entry=0x7fa235ce5994 "xlator.mount.fuse.itable") at inode.c:2408 #4 0x00007fa235ccaa2b in fuse_itable_dump (this=<optimized out>) at fuse-bridge.c:5103 #5 0x00007fa238f25722 in gf_proc_dump_xlator_info (top=<optimized out>) at statedump.c:504 #6 0x00007fa238f25cb9 in gf_proc_dump_info (signum=signum@entry=10, ctx=0x7fa23ad83010) at statedump.c:830 #7 0x00007fa2393eec3e in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2069 #8 0x00007fa237d5adc5 in start_thread () from /lib64/libpthread.so.0 #9 0x00007fa23769f73d in clone () from /lib64/libc.so.6 (gdb) Version-Release number of selected component (if applicable): glusterfs-3.8.4-11.el7rhgs.x86_64 How reproducible: Steps to Reproduce: 1. 2. 3. Actual results: client process crashes. Due to this crash, all my app vms residing on that node goes to paused state and the host is in non operational state. Expected results: client process should not crash. Additional info: #0 0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode") at write-behind.c:2355 (gdb) p req->stub $7 = (call_stub_t *) 0x0 2355 gf_proc_dump_write ("offset", "%"PRId64, │ │2356 req->stub->args.offset);
raghavendra g took a look at the setup and he says that state dump triggered this crash. But not sure who triggered state dump as it was not user driven one.
proposing this as a blocker as this is a crash. But not sure how to reproduce this issue as i found that client process has been crashing during my hc testing. This issue is reproducible 1/1.
upstream patch : http://review.gluster.org/#/c/16440/
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/96726
Tested with RHVH 4.1 Beta platform and updating glusterfs rpms to glusterfs-3.8.4-18.el7rhgs with the following cases: 1. volume statedump operation initiated on the volume 2. continuous volume statedump operation triggered 3. concurrent volume statedump operation triggered There are no crashes observed and after all that testing glusterfs fuse mount is accessible
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html