Bug 1313293 - [HC] glusterfs mount crashed
[HC] glusterfs mount crashed
Product: GlusterFS
Classification: Community
Component: sharding (Show other bugs)
Unspecified Unspecified
unspecified Severity unspecified
: ---
: ---
Assigned To: Krutika Dhananjay
: Triaged
Depends On: 1313290
Blocks: 1313315
  Show dependency treegraph
Reported: 2016-03-01 05:50 EST by Krutika Dhananjay
Modified: 2016-06-16 09:59 EDT (History)
4 users (show)

See Also:
Fixed In Version: glusterfs-3.8rc2
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: 1313290
: 1313315 (view as bug list)
Last Closed: 2016-06-16 09:59:06 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)

  None (edit)
Description Krutika Dhananjay 2016-03-01 05:50:42 EST
+++ This bug was initially created as a clone of Bug #1313290 +++

Description of problem:
In a HC setup where there are three nodes in the system when the first host loses its network connectivity and it node comes backup glusterfs mount for engine domain is not present and the mount is crashed.

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Install HC 
2. once the engine vm is started and running add all the other hosts to the engine. 
3. Make sure that none of your interfaces has ips on the system where your engine is currently running.
4. Now login to the machine and run dhclient where there is no ip attached to the system.
5. Once the system is up login to the system and try to connect to the vm.

Actual results:
user will not be able to connect to vm plus engine volume is not mounted anymore and there is core dump in the system.

Expected results:
user should be able to connect to the vm and should not see any crashes.

--- Additional comment from RamaKasturi on 2016-03-01 05:39:16 EST ---

brack trace from the system:
(gdb) bt
#0  shard_fsync_cbk (frame=frame@entry=0x7fe701c2facc, cookie=0x7fe701c10ba0, this=0x7fe6f800d1c0, op_ret=op_ret@entry=-1, op_errno=op_errno@entry=107, prebuf=prebuf@entry=0x0, postbuf=postbuf@entry=0x0,
    xdata=xdata@entry=0x0) at shard.c:3884
#1  0x00007fe6fc4e915f in dht_fsync_cbk (frame=0x7fe701c10ba0, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=107, prebuf=0x0, postbuf=0x0, xdata=0x0) at dht-inode-read.c:861
#2  0x00007fe6fc74dbe1 in afr_fsync (frame=0x7fe701c33540, this=<optimized out>, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at afr-common.c:2969
#3  0x00007fe6fc4ebb19 in dht_fsync (frame=0x7fe701c10ba0, this=<optimized out>, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at dht-inode-read.c:930
#4  0x00007fe6fc2814d5 in shard_fsync (frame=0x7fe701c2facc, this=0x7fe6f800d1c0, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at shard.c:3894
#5  0x00007fe6fc070935 in wb_fsync_helper (frame=0x7fe701c3d074, this=0x7fe6f800e630, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at write-behind.c:1760
#6  0x00007fe70412b17d in call_resume (stub=0x7fe7016daf94) at call-stub.c:2576
#7  0x00007fe6fc073f29 in wb_do_winds (wb_inode=wb_inode@entry=0x7fe6e8041d20, tasks=tasks@entry=0x7fe6f5dcf990) at write-behind.c:1460
#8  0x00007fe6fc074037 in wb_process_queue (wb_inode=wb_inode@entry=0x7fe6e8041d20) at write-behind.c:1495
#9  0x00007fe6fc074c28 in wb_fsync (frame=0x7fe701c3d074, this=0x7fe6f800e630, fd=0x7fe6f80b9dcc, datasync=1, xdata=0x0) at write-behind.c:1785
#10 0x00007fe7040ff4cd in default_fsync (frame=0x7fe701c3d074, this=0x7fe6f800fa00, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at defaults.c:1818
#11 0x00007fe70410b8d5 in default_fsync_resume (frame=0x7fe701c30aec, this=0x7fe6f8010dd0, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at defaults.c:1377
#12 0x00007fe70412b17d in call_resume (stub=0x7fe701722a6c) at call-stub.c:2576
#13 0x00007fe6f7bf5648 in open_and_resume (this=this@entry=0x7fe6f8010dd0, fd=fd@entry=0x7fe6f80b9dcc, stub=0x7fe701722a6c) at open-behind.c:242
#14 0x00007fe6f7bf5a62 in ob_fsync (frame=0x7fe701c30aec, this=0x7fe6f8010dd0, fd=0x7fe6f80b9dcc, flag=<optimized out>, xdata=<optimized out>) at open-behind.c:499
#15 0x00007fe6f79dad20 in io_stats_fsync (frame=0x7fe701c3b238, this=0x7fe6f8012180, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at io-stats.c:2207
#16 0x00007fe7040ff4cd in default_fsync (frame=0x7fe701c3b238, this=0x7fe6f8013660, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at defaults.c:1818
#17 0x00007fe6f77c538b in meta_fsync (frame=0x7fe701c3b238, this=0x7fe6f8013660, fd=0x7fe6f80b9dcc, flags=1, xdata=0x0) at meta.c:176
#18 0x00007fe7012b1697 in fuse_fsync_resume (state=0x7fe6e8046590) at fuse-bridge.c:2489
#19 0x00007fe7012a8ec5 in fuse_resolve_done (state=<optimized out>) at fuse-resolve.c:665
#20 fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:692
#21 0x00007fe7012a8c08 in fuse_resolve (state=0x7fe6e8046590) at fuse-resolve.c:656
#22 0x00007fe7012a8f0e in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:688
#23 0x00007fe7012a8373 in fuse_resolve_continue (state=state@entry=0x7fe6e8046590) at fuse-resolve.c:708
#24 0x00007fe7012a8ba8 in fuse_resolve_fd (state=0x7fe6e8046590) at fuse-resolve.c:568
#25 fuse_resolve (state=0x7fe6e8046590) at fuse-resolve.c:645
#26 0x00007fe7012a8eee in fuse_resolve_all (state=<optimized out>) at fuse-resolve.c:681
#27 0x00007fe7012a8f30 in fuse_resolve_and_resume (state=0x7fe6e8046590, fn=0x7fe7012b14a0 <fuse_fsync_resume>) at fuse-resolve.c:720
#28 0x00007fe7012bbcde in fuse_thread_proc (data=0x7fe705bceac0) at fuse-bridge.c:4944
#29 0x00007fe702f63dc5 in start_thread (arg=0x7fe6f5dd0700) at pthread_create.c:308
#30 0x00007fe7028aa28d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113
Comment 1 Vijay Bellur 2016-03-01 06:32:08 EST
REVIEW: http://review.gluster.org/13562 (features/shard: Fix NULL-dereference when fsync fails) posted (#1) for review on master by Krutika Dhananjay (kdhananj@redhat.com)
Comment 2 Vijay Bellur 2016-03-02 00:13:13 EST
REVIEW: http://review.gluster.org/13562 (features/shard: Fix NULL-dereference when fsync fails) posted (#2) for review on master by Krutika Dhananjay (kdhananj@redhat.com)
Comment 3 Vijay Bellur 2016-03-02 11:54:53 EST
COMMIT: http://review.gluster.org/13562 committed in master by Pranith Kumar Karampuri (pkarampu@redhat.com) 
commit bc07cb8cc45e79c54e4d411b2e2dd5b2f68bae17
Author: Krutika Dhananjay <kdhananj@redhat.com>
Date:   Tue Mar 1 16:23:22 2016 +0530

    features/shard: Fix NULL-dereference when fsync fails
    Change-Id: I4e51961c158c3b5c78791846ca7f0f6cf7fb5c4a
    BUG: 1313293
    Signed-off-by: Krutika Dhananjay <kdhananj@redhat.com>
    Reviewed-on: http://review.gluster.org/13562
    Smoke: Gluster Build System <jenkins@build.gluster.com>
    CentOS-regression: Gluster Build System <jenkins@build.gluster.com>
    NetBSD-regression: NetBSD Build System <jenkins@build.gluster.org>
Comment 4 Niels de Vos 2016-06-16 09:59:06 EDT
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.0, please open a new bug report.

glusterfs-3.8.0 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://blog.gluster.org/2016/06/glusterfs-3-8-released/
[2] http://thread.gmane.org/gmane.comp.file-systems.gluster.user

Note You need to log in before you can comment on or make changes to this bug.