Bug 1193995

Summary: [RHEV-RHS] Fuse mount process crashed, while using gluster volume as storage domain in RHEV
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: SATHEESARAN <sasundar>
Component: glusterfsAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED ERRATA QA Contact: SATHEESARAN <sasundar>
Severity: high Docs Contact:
Priority: high    
Version: rhgs-3.0CC: annair, nbalacha, nlevinki, nsathyan, rcyriac, vagarwal, vbellur
Target Milestone: ---Keywords: Regression, ZStream
Target Release: RHGS 3.0.4   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.6.0.48-1 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
RHEV-RHS Integration
Last Closed: 2015-03-26 06:36:08 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1194525, 1197118    
Bug Blocks: 1104459, 1182947    
Attachments:
Description Flags
fuse mount log file none

Description SATHEESARAN 2015-02-18 17:43:22 UTC
Description of problem:
-----------------------
RHEV setup uses the gluster volume to store virtual machine images.
Gluster volume is fuse mounted on 2 RHEL 6.6 Hypervisors and Application VMs are created. 

After few hours, the mount process got crashed in one of the hypervisor and VMs running in those machines are paused

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
glusterfs-3.6.0.45-1.el6rhs

How reproducible:
-----------------
never tried to reproduce

Steps to Reproduce:
-------------------
1. Create 2x2 distribute-replicate volume

2. Optimize the volume for virt-store
(i.e) gluster volume set <vol-name> group virt
      gluster volume set storage.owner-uid 36
      gluster volume set storage.owner-gid 36

3. Set up epoll configuration
(i.e) gluster volume set <vol-name> client.event-threads 2 
      gluster volume set <vol-name> server.event-threads 2

4. Start the volume. Use this volume as the Data Domain ( storage-backend for imagestore ) in RHEV

5. Use 2 RHEL 6.6 as Hypervisors

6. Create 4 App VMs installed with RHEL 6.6. In my setup, there were 2 App VMs running on each hypervisor

7. Continuously create files, delete them from App VMs.
This is done to simulate IO Load on the VMs

8. Check for the status of the VM after sometime

Actual results:
---------------
Fuse mount process on one of the Hypervisor got crashed.

Expected results:
-----------------
Everything should be working fine and there shouldn't be any problems neither to App VMs nor to storage domain

Comment 2 SATHEESARAN 2015-02-18 17:56:40 UTC
No operations on the volume was performed. This volume is just fuse mounted and used to storing VM Images.

I have created 4 App VMs and left it for around 10 hours ( approx) and found these crash. 

I noticed that there were continuous flow of error messages in the fuse mount logs as follows :

<error_from_fuse_mount_logs>

[2015-02-18 14:45:49.728257] E [dht-helper.c:1345:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_readdirp_cbk+0x30c) [0x7f1dbfdd9
b6c] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f1dbfdb1a0e] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute
.so(dht_inode_ctx_layout_set+0x34) [0x7f1dbfdb3ca4]))) 0-Imstore1-dht: invalid argument: inode
[2015-02-18 14:45:49.728293] E [dht-helper.c:1364:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_readdirp_cbk+0x30c) [0x7f1dbfdd9
b6c] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f1dbfdb1a0e] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute
.so(dht_inode_ctx_layout_set+0x52) [0x7f1dbfdb3cc2]))) 0-Imstore1-dht: invalid argument: inode

</error_from_fuse_mount_logs>

The above error messages were repeated right from using this volume for image store till it crashed.

Comment 3 SATHEESARAN 2015-02-18 17:59:53 UTC
Crash information as seen in the fuse mount logs:
--------------------------------------------------

pending frames:
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(WRITE)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 
2015-02-18 15:56:19
configuration details:
argp 1
backtrace 1
dlfcn 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.6.0.45
/usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f1dc9a947b6]
/usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f1dc9aaf3cf]
/lib64/libc.so.6[0x36d84329a0]
/usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0x9594)[0x7f1dc550e594]
/usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0xad1d)[0x7f1dc550fd1d]
/usr/lib64/libglusterfs.so.0(+0x77d1c)[0x7f1dc9aebd1c]
/lib64/libpthread.so.0[0x36d88079d1]
/lib64/libc.so.6(clone+0x6d)[0x36d84e8b6d]
---------

Comment 4 SATHEESARAN 2015-02-18 18:08:26 UTC
Created attachment 993260 [details]
fuse mount log file

Attaching the fuse mount log file

Comment 8 SATHEESARAN 2015-03-17 10:31:23 UTC
Tested with glusterfs-3.6.0.50-1.el6rhs with the steps mentioned in comment0.

I am not seeing any fuse mount crash.
Marking this bug as verified.

Comment 10 errata-xmlrpc 2015-03-26 06:36:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2015-0682.html