+++ This bug was initially created as a clone of Bug #1405299 +++ Description of problem: ----------------------- Fuse mount crashed when the VM installation is in progress on the VM image file residing on the replica 3 volume, and one of the brick being killed. Version-Release number of selected component (if applicable): ------------------------------------------------------------- How reproducible: ----------------- 1/1 Steps to Reproduce: -------------------- 1. Create a replica 3 volume with compound-fops and granular-entry-heal enabled 2. Optimize the volume for VM store with shard-block-size set to 4MB 3. Fuse mount the volume on the RHEL 7.3 client/hypervisor 4. Create a VM image file ( sparse ) on the fuse mounted volume 5. Start OS installation on the VM with RHEL 7.3 server 6. While VM installation is in progress, kill one of the brick of the volume Actual results: -------------- Fuse mount crashed/core dumped Expected results: ------------------ There should not be any process crashing --- Additional comment from SATHEESARAN on 2016-12-16 02:16:03 EST --- Backtrace: ---------- Core was generated by `/usr/sbin/glusterfs --volfile-server=10.70.37.138 --volfile-id=/rep3vol /mnt/re'. Program terminated with signal 11, Segmentation fault. #0 afr_pre_op_writev_cbk (frame=0x7f24e25d2974, cookie=0x1, this=0x7f24d000a7b0, op_ret=<optimized out>, op_errno=<optimized out>, data=<optimized out>, xdata=0x0) at afr-transaction.c:1255 1255 write_args_cbk = &args_cbk->rsp_list[1]; Missing separate debuginfos, use: debuginfo-install keyutils-libs-1.5.8-3.el7.x86_64 krb5-libs-1.14.1-26.el7.x86_64 libcom_err-1.42.9-9.el7.x86_64 libselinux-2.5-6.el7.x86_64 pcre-8.32-15.el7_2.1.x86_64 (gdb) bt #0 afr_pre_op_writev_cbk (frame=0x7f24e25d2974, cookie=0x1, this=0x7f24d000a7b0, op_ret=<optimized out>, op_errno=<optimized out>, data=<optimized out>, xdata=0x0) at afr-transaction.c:1255 #1 0x00007f24d6e91dd7 in client3_3_compound_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f24e25ceea8) at client-rpc-fops.c:3214 #2 0x00007f24e48ad785 in saved_frames_unwind (saved_frames=saved_frames@entry=0x7f24c4001620) at rpc-clnt.c:369 #3 0x00007f24e48ad86e in saved_frames_destroy (frames=frames@entry=0x7f24c4001620) at rpc-clnt.c:386 #4 0x00007f24e48aefd4 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7f24d007cf18) at rpc-clnt.c:556 #5 0x00007f24e48af864 in rpc_clnt_handle_disconnect (conn=0x7f24d007cf18, clnt=0x7f24d007cec0) at rpc-clnt.c:881 #6 rpc_clnt_notify (trans=<optimized out>, mydata=0x7f24d007cf18, event=RPC_TRANSPORT_DISCONNECT, data=0x7f24d008cc10) at rpc-clnt.c:937 #7 0x00007f24e48ab883 in rpc_transport_notify (this=this@entry=0x7f24d008cc10, event=event@entry=RPC_TRANSPORT_DISCONNECT, data=data@entry=0x7f24d008cc10) at rpc-transport.c:537 #8 0x00007f24d9173302 in socket_event_poll_err (this=0x7f24d008cc10) at socket.c:1179 #9 socket_event_handler (fd=<optimized out>, idx=4, data=0x7f24d008cc10, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2404 #10 0x00007f24e4b3f4f0 in event_dispatch_epoll_handler (event=0x7f24cfffee80, event_pool=0x7f24e5b41f00) at event-epoll.c:571 #11 event_dispatch_epoll_worker (data=0x7f24d003f420) at event-epoll.c:674 #12 0x00007f24e3946dc5 in start_thread (arg=0x7f24cffff700) at pthread_create.c:308 #13 0x00007f24e328b73d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 --- Additional comment from SATHEESARAN on 2016-12-16 02:21 EST --- --- Additional comment from SATHEESARAN on 2016-12-16 02:22:15 EST --- Volume information # gluster volume info rep3vol Volume Name: rep3vol Type: Replicate Volume ID: 28e00021-7773-48f5-a31f-c9f8f2db0a2d Status: Started Snapshot Count: 0 Number of Bricks: 1 x 3 = 3 Transport-type: tcp Bricks: Brick1: server1:/gluster/brick1/b1 Brick2: server2:/gluster/brick1/b1 Brick3: server3:/gluster/brick1/b1 Options Reconfigured: cluster.use-compound-fops: on user.cifs: off network.ping-timeout: 30 performance.strict-o-direct: on cluster.shd-wait-qlength: 10000 cluster.shd-max-threads: 8 cluster.locking-scheme: granular performance.low-prio-threads: 32 features.shard-block-size: 4MB storage.owner-gid: 107 storage.owner-uid: 107 cluster.granular-entry-heal: enable cluster.data-self-heal-algorithm: full features.shard: on cluster.server-quorum-type: server cluster.quorum-type: auto network.remote-dio: off cluster.eager-lock: enable performance.stat-prefetch: off performance.io-cache: off performance.read-ahead: off performance.quick-read: off transport.address-family: inet performance.readdir-ahead: on nfs.disable: on --- Additional comment from SATHEESARAN on 2016-12-16 02:23:37 EST --- Krutika has RCA'ed the issue and found that the patch[1] is missed in the backport, which has caused this issue. [1] - http://review.gluster.org/#/c/15482/9 @Krutika, Requesting you to provide the detailed RCA
REVIEW: http://review.gluster.org/16161 (protocol/client: fix op_errno handling, was unused variable) posted (#1) for review on release-3.9 by Krutika Dhananjay (kdhananj)
COMMIT: http://review.gluster.org/16161 committed in release-3.9 by Kaleb KEITHLEY (kkeithle) ------ commit 45431070d742ac9398b41efd23c1ea500a639669 Author: Kaleb S. KEITHLEY <kkeithle> Date: Tue Sep 13 05:57:32 2016 -0400 protocol/client: fix op_errno handling, was unused variable Backport of: http://review.gluster.org/15482 see comment in patch set one. Match the general logic flow of the other fop-cbks and eliminate the unused variable and its associated warning also see comment in patch set seven, re: correct handling of client_process_response(); and the associated BZ https://bugzilla.redhat.com/show_bug.cgi?id=1376328 http://review.gluster.org/14085 fixes a "pragma leak" where the generated rpc/xdr headers have a pair of pragmas that disable these warnings. With the warnings disabled, many unused variables have crept into the code base. And 14085 won't pass its own smoke test until all these warnings are fixed. Change-Id: I9958a70b56023258921960410f9b641505fd4387 BUG: 1405308 Signed-off-by: Kaleb S. KEITHLEY <kkeithle> Reviewed-on: http://review.gluster.org/16161 Tested-by: Krutika Dhananjay <kdhananj> Smoke: Gluster Build System <jenkins.org> NetBSD-regression: NetBSD Build System <jenkins.org> CentOS-regression: Gluster Build System <jenkins.org>
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.9.1, please open a new bug report. glusterfs-3.9.1 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution. [1] http://lists.gluster.org/pipermail/gluster-users/2017-January/029725.html [2] https://www.gluster.org/pipermail/gluster-users/