1414247 – client process crashed due to write behind translator

Bug 1414247 - client process crashed due to write behind translator

Summary: client process crashed due to write behind translator

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	write-behind
Sub Component:
Version:	rhgs-3.2
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Target Release:	RHGS 3.2.0
Assignee:	Raghavendra G
QA Contact:	SATHEESARAN
Docs Contact:
URL:
Whiteboard:
Depends On:	1415115 1418624
Blocks:	1351528 1418623
TreeView+	depends on / blocked

Reported:	2017-01-18 07:13 UTC by RamaKasturi
Modified:	2017-03-23 06:03 UTC (History)
CC List:	4 users (show)
Fixed In Version:	glusterfs-3.8.4-14
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1415115 (view as bug list)
Environment:
Last Closed:	2017-03-23 06:03:36 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2017:0486	0	normal	SHIPPED_LIVE	Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update	2017-03-23 09:18:45 UTC

Description RamaKasturi 2017-01-18 07:13:16 UTC

Description of problem:
I see that client process of my system crashed (HC setup) and it is due to write behind translator. 

bt from core dump:
========================
warning: core file may not match specified executable file.
Reading symbols from /usr/sbin/glusterfsd...Reading symbols from /usr/lib/debug/usr/sbin/glusterfsd.debug...done.
done.
Missing separate debuginfo for 
Try: yum --enablerepo='*debug*' install /usr/lib/debug/.build-id/5b/9ed38e31ce6bd04ecca183ad6d6ee05a4535d0
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib64/libthread_db.so.1".
Core was generated by `/usr/sbin/glusterfs --volfile-server=10.70.36.82 --volfile-server=10.70.36.83 -'.
Program terminated with signal 11, Segmentation fault.
#0  0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode")
    at write-behind.c:2355
2355	                        gf_proc_dump_write ("offset", "%"PRId64,
Missing separate debuginfos, use: debuginfo-install glibc-2.17-157.el7_3.1.x86_64 keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-27.el7_3.x86_64 libcom_err-1.42.9-9.el7.x86_64 libgcc-4.8.5-11.el7.x86_64 libselinux-2.5-6.el7.x86_64 libuuid-2.23.2-33.el7.x86_64 openssl-libs-1.0.1e-60.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 zlib-1.2.7-17.el7.x86_64
(gdb) bt
#0  0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode")
    at write-behind.c:2355
#1  0x00007fa23068510e in wb_inode_dump (this=<optimized out>, inode=0x7fa2281dc590) at write-behind.c:2424
#2  0x00007fa238f0b550 in inode_dump (inode=0x7fa2281dc590, prefix=0x7fa2354ba420 "xlator.mount.fuse.itable.active.2") at inode.c:2367
#3  0x00007fa238f0b78f in inode_table_dump (itable=0x7fa21400fa90, prefix=prefix@entry=0x7fa235ce5994 "xlator.mount.fuse.itable") at inode.c:2408
#4  0x00007fa235ccaa2b in fuse_itable_dump (this=<optimized out>) at fuse-bridge.c:5103
#5  0x00007fa238f25722 in gf_proc_dump_xlator_info (top=<optimized out>) at statedump.c:504
#6  0x00007fa238f25cb9 in gf_proc_dump_info (signum=signum@entry=10, ctx=0x7fa23ad83010) at statedump.c:830
#7  0x00007fa2393eec3e in glusterfs_sigwaiter (arg=<optimized out>) at glusterfsd.c:2069
#8  0x00007fa237d5adc5 in start_thread () from /lib64/libpthread.so.0
#9  0x00007fa23769f73d in clone () from /lib64/libc.so.6
(gdb) 


Version-Release number of selected component (if applicable):
glusterfs-3.8.4-11.el7rhgs.x86_64

How reproducible:


Steps to Reproduce:
1.
2.
3.

Actual results:
client process  crashes. Due to this crash, all my app vms residing on that node goes to paused state and the host is in non operational state.

Expected results:
client process should not crash.

Additional info:

#0  0x00007fa230684e5b in __wb_dump_requests (head=head@entry=0x7fa2100256b8, prefix=prefix@entry=0x7fa2354b9370 "xlator.performance.write-behind.wb_inode")
    at write-behind.c:2355
(gdb) p	req->stub
$7 = (call_stub_t *) 0x0


2355                            gf_proc_dump_write ("offset", "%"PRId64,                                                                                                │
   │2356                                                req->stub->args.offset);

Comment 2 RamaKasturi 2017-01-18 07:15:16 UTC

raghavendra g took a look at the setup and he says that state dump triggered this crash. But not sure who triggered state dump as it was not user driven one.

Comment 3 RamaKasturi 2017-01-18 09:07:58 UTC

proposing this as a blocker as this is a crash. 

But not sure how to reproduce this issue as i found that client process has been crashing during my hc testing. This issue is reproducible 1/1.

Comment 4 Atin Mukherjee 2017-01-20 11:30:08 UTC

upstream patch : http://review.gluster.org/#/c/16440/

Comment 5 Atin Mukherjee 2017-02-02 12:48:37 UTC

downstream patch : https://code.engineering.redhat.com/gerrit/#/c/96726

Comment 7 SATHEESARAN 2017-03-09 03:17:39 UTC

Tested with RHVH 4.1 Beta platform and updating glusterfs rpms to glusterfs-3.8.4-18.el7rhgs with the following cases:

1. volume statedump operation initiated on the volume
2. continuous volume statedump operation triggered
3. concurrent volume statedump operation triggered

There are no crashes observed and after all that testing glusterfs fuse mount is accessible

Comment 9 errata-xmlrpc 2017-03-23 06:03:36 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html

Note You need to log in before you can comment on or make changes to this bug.