Bug 1136835 - crash on fsync
Summary: crash on fsync
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: GlusterFS
Classification: Community
Component: fuse
Version: 3.5.3
Hardware: Unspecified
OS: Unspecified
unspecified
urgent
Target Milestone: ---
Assignee: Pranith Kumar K
QA Contact:
URL:
Whiteboard:
Depends On: 1126048
Blocks: 1123289 glusterfs-3.5.3 1136827
TreeView+ depends on / blocked
 
Reported: 2014-09-03 12:03 UTC by Pranith Kumar K
Modified: 2014-11-21 16:14 UTC (History)
6 users (show)

Fixed In Version: glusterfs-3.5.3
Doc Type: Bug Fix
Doc Text:
Clone Of: 1126048
Environment:
Last Closed: 2014-11-21 16:02:06 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description Pranith Kumar K 2014-09-03 12:03:49 UTC
+++ This bug was initially created as a clone of Bug #1126048 +++

+++ This bug was initially created as a clone of Bug #1123289 +++

[2014-07-25 02:54:44.925942] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd 
(ptr:0x7f1a9c6f0dcc inode-gfid:66143cb4-7050-4c4a-a638-4f4c549c1a16) did not complete, failing fop with 
EBADF (old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-07-25 02:54:44configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.3
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0)[0x7f1a98f5e4a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/cluster/distribute.so(dht_fsync+0x166)[0x7f1a8fdee2a6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/performance/write-behind.so(wb_fsync+0x2e5)[0x7f1a94019965]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsync_resume+0x13f)[0x7f1a9994d21f]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x1f7)[0x7f1a99965457]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/performance/open-behind.so(open_and_resume+0xf0)[0x7f1a8fbb41a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/performance/open-behind.so(ob_fsync+0x40)[0x7f1a8fbb5130]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/debug/io-stats.so(io_stats_fsync+0x16a)[0x7f1a8f99d9fa]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(fuse_fsync_resume+0x230)[0x7f1a971837b0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x8196)[0x7f1a97177196]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x7ed6)[0x7f1a97176ed6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x81de)[0x7f1a971771de]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f1a97176ab1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x7e5e)[0x7f1a97176e5e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x81be)[0x7f1a971771be]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(fuse_resolve_and_resume+0x28)[0x7f1a971776c8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x201f4)[0x7f1a9718f1f4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f1a992efe9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f1a9901c3fd]

--- Additional comment from Joe Julian on 2014-07-25 17:11:03 EDT ---

Second client to fail in 12 hours:
[2014-07-25 07:00:29.287680] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd 
(ptr:0x7f17cb846444 inode-gfid:87544fde-9bad-46d8-b610-1a8c93b85113) did not complete, failing fop with 
EBADF (old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-07-25 07:00:29configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.4
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0)[0x7f17c763f4a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/distribute.so(dht_fsync+0x166)[0x7f17c26ee296]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/write-behind.so(wb_fsync+0x2e5)[0x7f17c24b3965]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsync_resume+0x13f)[0x7f17c802e21f]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x1f7)[0x7f17c8046457]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/open-behind.so(open_and_resume+0xf0)[0x7f17c22a71a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/open-behind.so(ob_fsync+0x40)[0x7f17c22a8130]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_fsync+0x16a)[0x7f17c20909fa]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(fuse_fsync_resume+0x230)[0x7f17c58647b0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x8196)[0x7f17c5858196]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x7ed6)[0x7f17c5857ed6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x81de)[0x7f17c58581de]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f17c5857ab1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x7e5e)[0x7f17c5857e5e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x81be)[0x7f17c58581be]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(fuse_resolve_and_resume+0x28)[0x7f17c58586c8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x201f4)[0x7f17c58701f4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f17c79d0e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f17c76fd3fd]

--- Additional comment from Joe Julian on 2014-07-25 18:10:10 EDT ---

I replaced a brick on Monday. Could this be related to that?

--- Additional comment from Raghavendra G on 2014-07-26 06:58:02 EDT ---

Joe,

Replacing a brick results in a graph switch to new graph containing new-brick. That is when migration of fds get triggered. So, replacing brick uncovered this bug.

regards,
Raghavendra.

--- Additional comment from Joe Julian on 2014-07-26 13:52:32 EDT ---

Volume Name: gv-nova
Type: Distributed-Replicate
Volume ID: 978c6843-1d6b-48dc-a498-862c57637916
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: storage02-stor:/gluster/brick02/nova
Brick2: storage06-stor:/gluster/brick02/nova
Brick3: storage03-stor:/gluster/brick03/nova
Brick4: storage07-stor:/gluster/brick03/nova
Brick5: storage03-stor:/gluster/brick04/nova
Brick6: storage08-stor:/gluster/brick03/nova
Brick7: storage04-stor:/gluster/brick03/nova
Brick8: storage07-stor:/gluster/brick04/nova
Brick9: storage04-stor:/gluster/brick04/nova
Brick10: storage08-stor:/gluster/brick04/nova
Brick11: storage02-stor:/gluster/brick03/nova
Brick12: storage07-stor:/gluster/brick02/nova
Brick13: storage02-stor:/gluster/brick04/nova
Brick14: storage08-stor:/gluster/brick02/nova
Options Reconfigured:
cluster.heal-timeout: 500
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
cluster.background-self-heal-count: 32
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 500
storage.owner-gid: 500
performance.io-thread-count: 24
cluster.self-heal-daemon: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

--- Additional comment from Joe Julian on 2014-07-26 13:54:28 EDT ---

graph switch occurs at 2014-07-24 05:23:43 in the attached log file

--- Additional comment from Joe Julian on 2014-07-29 16:29:09 EDT ---

The series of:

[2014-07-29 06:39:13.410535] W [socket.c:514:__socket_rwv] 3-gv-nova-client-0: readv failed (No data available)
[2014-07-29 06:39:13.410560] I [client.c:2098:client_rpc_notify] 3-gv-nova-client-0: disconnected

Is the client querying for the removed brick's port number:
116	4.036814	10.30.6.28	10.30.6.3	Gluster Portmap	154	V1 PORTBYBRICK Call (Reply In 117)
+ Gluster Portmap 
  [Program Version: 1]
  [Gluster Portmap: PORTBYBRICK (1)]
  + Brick: /gluster/st01brick01/nova

117	4.036993	10.30.6.3	10.30.6.28	Gluster Portmap	98	V1 PORTBYBRICK Reply (Call In 116)
+ Gluster Portmap
  [Program Version: 1]
  [Gluster Portmap: PORTBYBRICK (1)]
  Return value: -1
  Errno: 0 (Success)
  Status: 0
  Port: 0

Since that brick no longer exists due to using the appreciated command, "remove-brick ... commit force", that continues until the crash happens.

--- Additional comment from Joe Julian on 2014-07-30 04:26:36 EDT ---

replace-brick, not remove-brick

--- Additional comment from Pranith Kumar K on 2014-07-30 08:58:26 EDT ---

Sorry was on leave till yesterday so couldn't debug this issue.
I am able to re-create the crash. Also found the root cause for the crash. All I need to find now is why the nameless lookup is failing on some of the files which lead to failures in fd-migration to new graph.

For the crash to happen, fd-migration needs to fail and fsync needs to come on the fd on which the migration failed. There are checks in fuse code to prevent sending the fops on fds which failed migration. But the problem is that error checking code has a bug introduced by http://review.gluster.org/6262. But this patch is fixing an issue with 'link' fop. So I need to revert this patch and fix the 'link' fop bug separately to fix the issue. This fixes the crash but we still don't know why fd-migration failed. We need to find root-cause for the same.

Joe,
    Please check if there are any other mounts where you have fd-migration errors. They may also crash when fsync is issued on the mounts on those fds.

    Could you please give me the info of what are the files represented by the following gfids and on which bricks do we have these files? Brick logs would be helpful as well.

pk1@pranithk-laptop - ~/Downloads/gluster-users/JoeJulian 
18:21:12 :) ⚡ grep Invalid client.log 
[2014-07-24 05:23:44.931310] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (b7e794e9-67c0-4d47-9ee5-2b41d1de792f) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
[2014-07-24 05:23:44.980643] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (9abcf733-05b4-45e1-9845-ff8f4b97fbc4) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
[2014-07-24 05:23:45.088577] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (87544fde-9bad-46d8-b610-1a8c93b85113) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
[2014-07-24 05:23:45.122628] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (37c5ffcb-3937-4bd5-a719-244526c278f5) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)

Here is the backtrace when the crash happens after fsync on fd which failed migration.

Program received signal SIGSEGV, Segmentation fault.
0x00007f1003b494a9 in dht_fsync (frame=0x7f100b2ef1a4, this=0x13f8620, fd=0x13cd5c8, datasync=0, xdata=0x0) at dht-inode-read.c:817
817	        STACK_WIND (frame, dht_fsync_cbk, subvol, subvol->fops->fsync,
(gdb) bt
#0  0x00007f1003b494a9 in dht_fsync (frame=0x7f100b2ef1a4, this=0x13f8620, fd=0x13cd5c8, datasync=0, xdata=0x0) at dht-inode-read.c:817
#1  0x00007f100803ba8d in wb_fsync (frame=0x7f100b2ef3a8, this=0x13fdfc0, fd=0x13cd5c8, datasync=0, xdata=0x0) at write-behind.c:1523
#2  0x00007f10038ee95c in ra_fsync (frame=0x7f100b2ef704, this=0x13fea20, fd=0x13cd5c8, datasync=0, xdata=0x0) at read-ahead.c:628
#3  0x00007f100c4e5aca in default_fsync (frame=0x7f100b2ef704, this=0x13ff480, fd=0x13cd5c8, flags=0, xdata=0x0) at defaults.c:991
#4  0x00007f100c4e5aca in default_fsync (frame=0x7f100b2ef704, this=0x13ffee0, fd=0x13cd5c8, flags=0, xdata=0x0) at defaults.c:991
#5  0x00007f100c4dfd63 in default_fsync_resume (frame=0x7f100b2efbb8, this=0x1400a60, fd=0x13cd5c8, flags=0, xdata=0x0) at defaults.c:561
#6  0x00007f100c4fe7bc in call_resume_wind (stub=0x7f100ad78da4) at call-stub.c:2389
#7  0x00007f100c5064d5 in call_resume (stub=0x7f100ad78da4) at call-stub.c:2841
#8  0x00007f10032c2c1a in open_and_resume (this=0x1400a60, fd=0x13cd5c8, stub=0x7f100ad78da4) at open-behind.c:236
#9  0x00007f10032c3e9f in ob_fsync (frame=0x7f100b2efbb8, this=0x1400a60, fd=0x13cd5c8, flag=0, xdata=0x0) at open-behind.c:487
#10 0x00007f10030b644c in mdc_fsync (frame=0x7f100b2efb0c, this=0x1403710, fd=0x13cd5c8, datasync=0, xdata=0x0) at md-cache.c:1635
#11 0x00007f1002ea1a3c in io_stats_fsync (frame=0x7f100b2ef5ac, this=0x1404170, fd=0x13cd5c8, flags=0, xdata=0x0) at io-stats.c:2168
#12 0x00007f100a8eb1b3 in fuse_fsync_resume (state=0x7f0ffc016f80) at fuse-bridge.c:2384
#13 0x00007f100a8da00b in fuse_resolve_done (state=0x7f0ffc016f80) at fuse-resolve.c:663
#14 0x00007f100a8da0e1 in fuse_resolve_all (state=0x7f0ffc016f80) at fuse-resolve.c:692
#15 0x00007f100a8d9fd4 in fuse_resolve (state=0x7f0ffc016f80) at fuse-resolve.c:649
#16 0x00007f100a8da0b8 in fuse_resolve_all (state=0x7f0ffc016f80) at fuse-resolve.c:688
#17 0x00007f100a8da13f in fuse_resolve_continue (state=0x7f0ffc016f80) at fuse-resolve.c:708
#18 0x00007f100a8d9d8d in fuse_resolve_fd (state=0x7f0ffc016f80) at fuse-resolve.c:561
#19 0x00007f100a8d9f82 in fuse_resolve (state=0x7f0ffc016f80) at fuse-resolve.c:638
#20 0x00007f100a8da063 in fuse_resolve_all (state=0x7f0ffc016f80) at fuse-resolve.c:681
#21 0x00007f100a8da17d in fuse_resolve_and_resume (state=0x7f0ffc016f80, fn=0x7f100a8eaba4 <fuse_fsync_resume>) at fuse-resolve.c:721
#22 0x00007f100a8eb2fa in fuse_fsync (this=0x1338860, finh=0x7f0ffc016160, msg=0x7f0ffc016188) at fuse-bridge.c:2403
#23 0x00007f100a8f8af9 in fuse_thread_proc (data=0x1338860) at fuse-bridge.c:4798
#24 0x00000038e4807f33 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#25 0x00000038e40f4ded in clone () from /lib64/libc.so.6
(gdb) fr 0
#0  0x00007f1003b494a9 in dht_fsync (frame=0x7f100b2ef1a4, this=0x13f8620, fd=0x13cd5c8, datasync=0, xdata=0x0) at dht-inode-read.c:817
817	        STACK_WIND (frame, dht_fsync_cbk, subvol, subvol->fops->fsync,


(gdb) p subvol <<----- This is the reason for the crash.
$1 = (xlator_t *) 0x0

--- Additional comment from Joe Julian on 2014-07-31 01:29:28 EDT ---

--- Additional comment from Anand Avati on 2014-08-01 13:05:42 EDT ---

REVIEW: http://review.gluster.org/8402 (mount/fuse: Fail FUSE_FOP on resolution errors after resume) posted (#1) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-08-04 07:14:23 EDT ---

REVIEW: http://review.gluster.org/8402 (mount/fuse: Handle fd resolution failures) posted (#2) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Niels de Vos on 2014-08-14 05:20:03 EDT ---

You can't have a mainline bug block the glusterfs-3.5.3 tracker. If this change needs to go into 3.5.x and 3.6.x you will need to create clones for each version and have them block their own tracker (glusterfs-3.5.3 or glusterfs-3.6.0).

--- Additional comment from Anand Avati on 2014-08-23 23:54:47 EDT ---

REVIEW: http://review.gluster.org/8402 (mount/fuse: Handle fd resolution failures) posted (#3) for review on master by Pranith Kumar Karampuri (pkarampu)

--- Additional comment from Anand Avati on 2014-08-25 12:00:59 EDT ---

COMMIT: http://review.gluster.org/8402 committed in master by Vijay Bellur (vbellur) 
------
commit b87f92c0eee630690e2b515a7f79c28af6496a5e
Author: Pranith Kumar K <pkarampu>
Date:   Fri Aug 1 18:30:32 2014 +0530

    mount/fuse: Handle fd resolution failures
    
    Problem:
    Even when the fd resolution failed, the fop is continuing on the
    new graph which may not have valid inode. This lead to NULL layout
    subvols in dht which lead to crash in fsync after graph migration.
    
    Fix:
    - Remove resolution error handling in FUSE_FOP as it was only added
      to handle fd migration failures.
    - check in fuse_resolve_done for fd resolution failures and fail the
      fop right away.
    - loc resolution failures are already handled in the corresponding
      fops.
    - Return errno from state->resolve.op_errno in resume functions.
    - Send error to fuse on frame allocation failures.
    - Removed unused variable state->resolved
    - Removed unused macro FUSE_FOP_COOKIE
    
    Change-Id: I479d6e1ff2ca626ad8c8fcb6f293022149474992
    BUG: 1126048
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8402
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>
    Reviewed-by: Vijay Bellur <vbellur>

Comment 1 Anand Avati 2014-09-03 12:06:42 UTC
REVIEW: http://review.gluster.org/8595 (mount/fuse: Handle fd resolution failures) posted (#1) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)

Comment 2 Anand Avati 2014-09-30 09:05:44 UTC
REVIEW: http://review.gluster.org/8595 (mount/fuse: Handle fd resolution failures) posted (#2) for review on release-3.5 by Pranith Kumar Karampuri (pkarampu)

Comment 3 Anand Avati 2014-10-01 10:44:32 UTC
COMMIT: http://review.gluster.org/8595 committed in release-3.5 by Niels de Vos (ndevos) 
------
commit db7d578da03a5d8bbc2169a45baea5e0e3ddc1e3
Author: Pranith Kumar K <pkarampu>
Date:   Fri Aug 1 18:30:32 2014 +0530

    mount/fuse: Handle fd resolution failures
    
            Backport of http://review.gluster.org/8402
    
    Problem:
    Even when the fd resolution failed, the fop is continuing on the
    new graph which may not have valid inode. This lead to NULL layout
    subvols in dht which lead to crash in fsync after graph migration.
    
    Fix:
    - Remove resolution error handling in FUSE_FOP as it was only added
      to handle fd migration failures.
    - check in fuse_resolve_done for fd resolution failures and fail the
      fop right away.
    - loc resolution failures are already handled in the corresponding
      fops.
    - Return errno from state->resolve.op_errno in resume functions.
    - Send error to fuse on frame allocation failures.
    - Removed unused variable state->resolved
    - Removed unused macro FUSE_FOP_COOKIE
    
    BUG: 1136835
    Change-Id: I5074f7a9b177c54051ef37a4f73de7f8d1fcc5b7
    Signed-off-by: Pranith Kumar K <pkarampu>
    Reviewed-on: http://review.gluster.org/8595
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Niels de Vos <ndevos>

Comment 4 Niels de Vos 2014-10-05 12:59:44 UTC
The first (and last?) Beta for GlusterFS 3.5.3 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.3beta1 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update (possibly an "updates-testing" repository) infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-October/018990.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/

Comment 5 Niels de Vos 2014-11-05 09:24:26 UTC
The second Beta for GlusterFS 3.5.3 has been released [1]. Please verify if the release solves this bug report for you. In case the glusterfs-3.5.3beta2 release does not have a resolution for this issue, leave a comment in this bug and move the status to ASSIGNED. If this release fixes the problem for you, leave a note and change the status to VERIFIED.

Packages for several distributions have been made available on [2] to make testing easier.

[1] http://supercolony.gluster.org/pipermail/gluster-users/2014-November/019359.html
[2] http://download.gluster.org/pub/gluster/glusterfs/qa-releases/3.5.3beta2/

Comment 6 Joe Julian 2014-11-05 15:55:06 UTC
We're on 3.4. I cannot test 3.5 in our environment.

Comment 7 Niels de Vos 2014-11-05 16:11:07 UTC
(In reply to Joe Julian from comment #6)
> We're on 3.4. I cannot test 3.5 in our environment.

Bug 1123289 is for getting the fix in 3.4, you'll get your chance on testing ;-)

Comment 8 Niels de Vos 2014-11-21 16:02:06 UTC
This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.5.3, please reopen this bug report.

glusterfs-3.5.3 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] http://supercolony.gluster.org/pipermail/announce/2014-November/000042.html
[2] http://supercolony.gluster.org/pipermail/gluster-users/


Note You need to log in before you can comment on or make changes to this bug.