Description of problem: ----------------------- RHEV setup uses the gluster volume to store virtual machine images. Gluster volume is fuse mounted on 2 RHEL 6.6 Hypervisors and Application VMs are created. After few hours, the mount process got crashed in one of the hypervisor and VMs running in those machines are paused Version-Release number of selected component (if applicable): ------------------------------------------------------------- glusterfs-3.6.0.45-1.el6rhs How reproducible: ----------------- never tried to reproduce Steps to Reproduce: ------------------- 1. Create 2x2 distribute-replicate volume 2. Optimize the volume for virt-store (i.e) gluster volume set <vol-name> group virt gluster volume set storage.owner-uid 36 gluster volume set storage.owner-gid 36 3. Set up epoll configuration (i.e) gluster volume set <vol-name> client.event-threads 2 gluster volume set <vol-name> server.event-threads 2 4. Start the volume. Use this volume as the Data Domain ( storage-backend for imagestore ) in RHEV 5. Use 2 RHEL 6.6 as Hypervisors 6. Create 4 App VMs installed with RHEL 6.6. In my setup, there were 2 App VMs running on each hypervisor 7. Continuously create files, delete them from App VMs. This is done to simulate IO Load on the VMs 8. Check for the status of the VM after sometime Actual results: --------------- Fuse mount process on one of the Hypervisor got crashed. Expected results: ----------------- Everything should be working fine and there shouldn't be any problems neither to App VMs nor to storage domain
No operations on the volume was performed. This volume is just fuse mounted and used to storing VM Images. I have created 4 App VMs and left it for around 10 hours ( approx) and found these crash. I noticed that there were continuous flow of error messages in the fuse mount logs as follows : <error_from_fuse_mount_logs> [2015-02-18 14:45:49.728257] E [dht-helper.c:1345:dht_inode_ctx_get] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_readdirp_cbk+0x30c) [0x7f1dbfdd9 b6c] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f1dbfdb1a0e] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute .so(dht_inode_ctx_layout_set+0x34) [0x7f1dbfdb3ca4]))) 0-Imstore1-dht: invalid argument: inode [2015-02-18 14:45:49.728293] E [dht-helper.c:1364:dht_inode_ctx_set] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_readdirp_cbk+0x30c) [0x7f1dbfdd9 b6c] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute.so(dht_layout_preset+0x5e) [0x7f1dbfdb1a0e] (-->/usr/lib64/glusterfs/3.6.0.45/xlator/cluster/distribute .so(dht_inode_ctx_layout_set+0x52) [0x7f1dbfdb3cc2]))) 0-Imstore1-dht: invalid argument: inode </error_from_fuse_mount_logs> The above error messages were repeated right from using this volume for image store till it crashed.
Crash information as seen in the fuse mount logs: -------------------------------------------------- pending frames: frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(1) op(WRITE) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) frame : type(0) op(0) patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2015-02-18 15:56:19 configuration details: argp 1 backtrace 1 dlfcn 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.6.0.45 /usr/lib64/libglusterfs.so.0(_gf_msg_backtrace_nomem+0xb6)[0x7f1dc9a947b6] /usr/lib64/libglusterfs.so.0(gf_print_trace+0x33f)[0x7f1dc9aaf3cf] /lib64/libc.so.6[0x36d84329a0] /usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0x9594)[0x7f1dc550e594] /usr/lib64/glusterfs/3.6.0.45/rpc-transport/socket.so(+0xad1d)[0x7f1dc550fd1d] /usr/lib64/libglusterfs.so.0(+0x77d1c)[0x7f1dc9aebd1c] /lib64/libpthread.so.0[0x36d88079d1] /lib64/libc.so.6(clone+0x6d)[0x36d84e8b6d] ---------
Created attachment 993260 [details] fuse mount log file Attaching the fuse mount log file
Tested with glusterfs-3.6.0.50-1.el6rhs with the steps mentioned in comment0. I am not seeing any fuse mount crash. Marking this bug as verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHBA-2015-0682.html