Bug 1414247 - client process crashed due to write behind translator
Summary: client process crashed due to write behind translator
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: write-behind
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
unspecified
Target Milestone: ---
: RHGS 3.2.0
Assignee: Raghavendra G
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On: 1415115 1418624
Blocks: 1351528 1418623
TreeView+ depends on / blocked
 
Reported: 2017-01-18 07:13 UTC by RamaKasturi
Modified: 2017-03-23 06:03 UTC (History)
4 users (show)

Fixed In Version: glusterfs-3.8.4-14
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1415115 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:03:36 UTC
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description RamaKasturi 2017-01-18 07:13:16 UTC
Description of problem:
I see that client process of my system crashed (HC setup) and it is due to write behind translator. 

bt from core dump:
========================
warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5b/9ed38e31ce6bd04ecca183ad6d6ee05a4535d0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --volfile-server=10.70.36.82 --volfile-server=10.70.36.83 -'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode")
    at write-behind.c:2355
2355	                        gf_proc_dump_write ("offset", "%"PRId64,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode")
    at write-behind.c:2355
#1  0x00007fa23068510e in wb_inode_dump (this=<optimized out>, inode=0x7fa2281dc590) at write-behind.c:2424
#2  0x00007fa238f0b550 in inode_dump (inode=0x7fa2281dc590, prefix=0x7fa2354ba420 "xlator.mount.fuse.itable.active.2") at inode.c:2367
#3  0x00007fa238f0b78f in inode_table_dump (itable=0x7fa21400fa90, prefix=prefix@entry=0x7fa235ce5994 "xlator.mount.fuse.itable") at inode.c:2408
#4  0x00007fa235ccaa2b in fuse_itable_dump (this=<optimized out>) at fuse-bridge.c:5103
#5  0x00007fa238f25722 in gf_proc_dump_xlator_info (top=<optimized out>) at statedump.c:504
#6  0x00007fa238f25cb9 in gf_proc_dump_info (signum=signum@entry=10, ctx=0x7fa23ad83010) at statedump.c:830
#7  0x00007fa2393eec3e in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2069
#8  0x00007fa237d5adc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fa23769f73d in clone () from /lib64/libc.so.6
(gdb) 


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-11.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
client process  crashes. Due to this crash, all my app vms residing on that node goes to paused state and the host is in non operational state.

Expected results:
client process should not crash.

Additional info:

#0  0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode")
    at write-behind.c:2355
(gdb) p	req->stub
$7 = (call_stub_t *) 0x0


2355                            gf_proc_dump_write ("offset", "%"PRId64,                                                                                                │
   │2356                                                req->stub->args.offset);

Comment 2 RamaKasturi 2017-01-18 07:15:16 UTC
raghavendra g took a look at the setup and he says that state dump triggered this crash. But not sure who triggered state dump as it was not user driven one.

Comment 3 RamaKasturi 2017-01-18 09:07:58 UTC
proposing this as a blocker as this is a crash. 

But not sure how to reproduce this issue as i found that client process has been crashing during my hc testing. This issue is reproducible 1/1.

Comment 4 Atin Mukherjee 2017-01-20 11:30:08 UTC
upstream patch : http://review.gluster.org/#/c/16440/

Comment 5 Atin Mukherjee 2017-02-02 12:48:37 UTC
downstream patch : https://code.engineering.redhat.com/gerrit/#/c/96726

Comment 7 SATHEESARAN 2017-03-09 03:17:39 UTC
Tested with RHVH 4.1 Beta platform and updating glusterfs rpms to glusterfs-3.8.4-18.el7rhgs with the following cases:

1. volume statedump operation initiated on the volume
2. continuous volume statedump operation triggered
3. concurrent volume statedump operation triggered

There are no crashes observed and after all that testing glusterfs fuse mount is accessible

Comment 9 errata-xmlrpc 2017-03-23 06:03:36 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.