Bug 1123289

Summary: crash on fsync
Product: [Community] GlusterFS Reporter: Joe Julian <joe>
Component: coreAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 3.4.4CC: bugs, gluster-bugs, kkeithle, pkarampu, rgowdapp
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
: 1126048 (view as bug list) Environment:
Last Closed: 2015-02-12 20:09:33 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1126048, 1136827, 1136835    
Bug Blocks: 1125245    
Attachments:
Description Flags
Client log from the second crash
none
brick log with the do_fd_cleanup on at least one of the gfids referenced none

Description Joe Julian 2014-07-25 08:56:38 UTC
[2014-07-25 02:54:44.925942] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd 
(ptr:0x7f1a9c6f0dcc inode-gfid:66143cb4-7050-4c4a-a638-4f4c549c1a16) did not complete, failing fop with 
EBADF (old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(1) op(LOOKUP)
frame : type(1) op(LOOKUP)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-07-25 02:54:44configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.3
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0)[0x7f1a98f5e4a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/cluster/distribute.so(dht_fsync+0x166)[0x7f1a8fdee2a6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/performance/write-behind.so(wb_fsync+0x2e5)[0x7f1a94019965]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsync_resume+0x13f)[0x7f1a9994d21f]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x1f7)[0x7f1a99965457]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/performance/open-behind.so(open_and_resume+0xf0)[0x7f1a8fbb41a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/performance/open-behind.so(ob_fsync+0x40)[0x7f1a8fbb5130]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/debug/io-stats.so(io_stats_fsync+0x16a)[0x7f1a8f99d9fa]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(fuse_fsync_resume+0x230)[0x7f1a971837b0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x8196)[0x7f1a97177196]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x7ed6)[0x7f1a97176ed6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x81de)[0x7f1a971771de]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f1a97176ab1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x7e5e)[0x7f1a97176e5e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x81be)[0x7f1a971771be]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(fuse_resolve_and_resume+0x28)[0x7f1a971776c8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.3/xlator/mount/fuse.so(+0x201f4)[0x7f1a9718f1f4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f1a992efe9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f1a9901c3fd]

Comment 1 Joe Julian 2014-07-25 21:11:03 UTC
Second client to fail in 12 hours:
[2014-07-25 07:00:29.287680] W [fuse-resolve.c:546:fuse_resolve_fd] 0-fuse-resolve: migration of basefd 
(ptr:0x7f17cb846444 inode-gfid:87544fde-9bad-46d8-b610-1a8c93b85113) did not complete, failing fop with 
EBADF (old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
pending frames:
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(1) op(FSYNC)
frame : type(0) op(0)
frame : type(1) op(OPEN)
frame : type(1) op(OPEN)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)
frame : type(0) op(0)

patchset: git://git.gluster.com/glusterfs.git
signal received: 11
time of crash: 2014-07-25 07:00:29configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.4.4
/lib/x86_64-linux-gnu/libc.so.6(+0x364a0)[0x7f17c763f4a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/cluster/distribute.so(dht_fsync+0x166)[0x7f17c26ee296]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/write-behind.so(wb_fsync+0x2e5)[0x7f17c24b3965]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(default_fsync_resume+0x13f)[0x7f17c802e21f]
/usr/lib/x86_64-linux-gnu/libglusterfs.so.0(call_resume+0x1f7)[0x7f17c8046457]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/open-behind.so(open_and_resume+0xf0)[0x7f17c22a71a0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/performance/open-behind.so(ob_fsync+0x40)[0x7f17c22a8130]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/debug/io-stats.so(io_stats_fsync+0x16a)[0x7f17c20909fa]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(fuse_fsync_resume+0x230)[0x7f17c58647b0]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x8196)[0x7f17c5858196]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x7ed6)[0x7f17c5857ed6]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x81de)[0x7f17c58581de]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(fuse_resolve_continue+0x41)[0x7f17c5857ab1]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x7e5e)[0x7f17c5857e5e]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x81be)[0x7f17c58581be]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(fuse_resolve_and_resume+0x28)[0x7f17c58586c8]
/usr/lib/x86_64-linux-gnu/glusterfs/3.4.4/xlator/mount/fuse.so(+0x201f4)[0x7f17c58701f4]
/lib/x86_64-linux-gnu/libpthread.so.0(+0x7e9a)[0x7f17c79d0e9a]
/lib/x86_64-linux-gnu/libc.so.6(clone+0x6d)[0x7f17c76fd3fd]

Comment 2 Joe Julian 2014-07-25 22:10:10 UTC
I replaced a brick on Monday. Could this be related to that?

Comment 3 Raghavendra G 2014-07-26 10:58:02 UTC
Joe,

Replacing a brick results in a graph switch to new graph containing new-brick. That is when migration of fds get triggered. So, replacing brick uncovered this bug.

regards,
Raghavendra.

Comment 4 Joe Julian 2014-07-26 17:52:32 UTC
Volume Name: gv-nova
Type: Distributed-Replicate
Volume ID: 978c6843-1d6b-48dc-a498-862c57637916
Status: Started
Number of Bricks: 7 x 2 = 14
Transport-type: tcp
Bricks:
Brick1: storage02-stor:/gluster/brick02/nova
Brick2: storage06-stor:/gluster/brick02/nova
Brick3: storage03-stor:/gluster/brick03/nova
Brick4: storage07-stor:/gluster/brick03/nova
Brick5: storage03-stor:/gluster/brick04/nova
Brick6: storage08-stor:/gluster/brick03/nova
Brick7: storage04-stor:/gluster/brick03/nova
Brick8: storage07-stor:/gluster/brick04/nova
Brick9: storage04-stor:/gluster/brick04/nova
Brick10: storage08-stor:/gluster/brick04/nova
Brick11: storage02-stor:/gluster/brick03/nova
Brick12: storage07-stor:/gluster/brick02/nova
Brick13: storage02-stor:/gluster/brick04/nova
Brick14: storage08-stor:/gluster/brick02/nova
Options Reconfigured:
cluster.heal-timeout: 500
cluster.self-heal-window-size: 2
cluster.self-heal-readdir-size: 2KB
cluster.background-self-heal-count: 32
performance.quick-read: off
performance.read-ahead: off
performance.io-cache: off
performance.stat-prefetch: off
cluster.eager-lock: enable
network.remote-dio: enable
storage.owner-uid: 500
storage.owner-gid: 500
performance.io-thread-count: 24
cluster.self-heal-daemon: on
diagnostics.latency-measurement: on
diagnostics.count-fop-hits: on

Comment 5 Joe Julian 2014-07-26 17:54:28 UTC
Created attachment 921196 [details]
Client log from the second crash

graph switch occurs at 2014-07-24 05:23:43 in the attached log file

Comment 6 Joe Julian 2014-07-29 20:29:09 UTC
The series of:

[2014-07-29 06:39:13.410535] W [socket.c:514:__socket_rwv] 3-gv-nova-client-0: readv failed (No data available)
[2014-07-29 06:39:13.410560] I [client.c:2098:client_rpc_notify] 3-gv-nova-client-0: disconnected

Is the client querying for the removed brick's port number:
116	4.036814	10.30.6.28	10.30.6.3	Gluster Portmap	154	V1 PORTBYBRICK Call (Reply In 117)
+ Gluster Portmap 
  [Program Version: 1]
  [Gluster Portmap: PORTBYBRICK (1)]
  + Brick: /gluster/st01brick01/nova

117	4.036993	10.30.6.3	10.30.6.28	Gluster Portmap	98	V1 PORTBYBRICK Reply (Call In 116)
+ Gluster Portmap
  [Program Version: 1]
  [Gluster Portmap: PORTBYBRICK (1)]
  Return value: -1
  Errno: 0 (Success)
  Status: 0
  Port: 0

Since that brick no longer exists due to using the appreciated command, "remove-brick ... commit force", that continues until the crash happens.

Comment 7 Joe Julian 2014-07-30 08:26:36 UTC
replace-brick, not remove-brick

Comment 8 Pranith Kumar K 2014-07-30 12:58:26 UTC
Sorry was on leave till yesterday so couldn't debug this issue.
I am able to re-create the crash. Also found the root cause for the crash. All I need to find now is why the nameless lookup is failing on some of the files which lead to failures in fd-migration to new graph.

For the crash to happen, fd-migration needs to fail and fsync needs to come on the fd on which the migration failed. There are checks in fuse code to prevent sending the fops on fds which failed migration. But the problem is that error checking code has a bug introduced by http://review.gluster.org/6262. But this patch is fixing an issue with 'link' fop. So I need to revert this patch and fix the 'link' fop bug separately to fix the issue. This fixes the crash but we still don't know why fd-migration failed. We need to find root-cause for the same.

Joe,
    Please check if there are any other mounts where you have fd-migration errors. They may also crash when fsync is issued on the mounts on those fds.

    Could you please give me the info of what are the files represented by the following gfids and on which bricks do we have these files? Brick logs would be helpful as well.

pk1@pranithk-laptop - ~/Downloads/gluster-users/JoeJulian 
18:21:12 :) ⚡ grep Invalid client.log 
[2014-07-24 05:23:44.931310] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (b7e794e9-67c0-4d47-9ee5-2b41d1de792f) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
[2014-07-24 05:23:44.980643] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (9abcf733-05b4-45e1-9845-ff8f4b97fbc4) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
[2014-07-24 05:23:45.088577] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (87544fde-9bad-46d8-b610-1a8c93b85113) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)
[2014-07-24 05:23:45.122628] W [fuse-bridge.c:3932:fuse_migrate_fd_open] 0-glusterfs-fuse: name-less lookup of gfid (37c5ffcb-3937-4bd5-a719-244526c278f5) failed (Invalid argument)(old-subvolume:gv-nova-3 new-subvolume:gv-nova-4)

Here is the backtrace when the crash happens after fsync on fd which failed migration.

Program received signal SIGSEGV, Segmentation fault.
0x00007f1003b494a9 in dht_fsync (frame=0x7f100b2ef1a4, this=0x13f8620, fd=0x13cd5c8, datasync=0, xdata=0x0) at dht-inode-read.c:817
817	        STACK_WIND (frame, dht_fsync_cbk, subvol, subvol->fops->fsync,
(gdb) bt
#0  0x00007f1003b494a9 in dht_fsync (frame=0x7f100b2ef1a4, this=0x13f8620, fd=0x13cd5c8, datasync=0, xdata=0x0) at dht-inode-read.c:817
#1  0x00007f100803ba8d in wb_fsync (frame=0x7f100b2ef3a8, this=0x13fdfc0, fd=0x13cd5c8, datasync=0, xdata=0x0) at write-behind.c:1523
#2  0x00007f10038ee95c in ra_fsync (frame=0x7f100b2ef704, this=0x13fea20, fd=0x13cd5c8, datasync=0, xdata=0x0) at read-ahead.c:628
#3  0x00007f100c4e5aca in default_fsync (frame=0x7f100b2ef704, this=0x13ff480, fd=0x13cd5c8, flags=0, xdata=0x0) at defaults.c:991
#4  0x00007f100c4e5aca in default_fsync (frame=0x7f100b2ef704, this=0x13ffee0, fd=0x13cd5c8, flags=0, xdata=0x0) at defaults.c:991
#5  0x00007f100c4dfd63 in default_fsync_resume (frame=0x7f100b2efbb8, this=0x1400a60, fd=0x13cd5c8, flags=0, xdata=0x0) at defaults.c:561
#6  0x00007f100c4fe7bc in call_resume_wind (stub=0x7f100ad78da4) at call-stub.c:2389
#7  0x00007f100c5064d5 in call_resume (stub=0x7f100ad78da4) at call-stub.c:2841
#8  0x00007f10032c2c1a in open_and_resume (this=0x1400a60, fd=0x13cd5c8, stub=0x7f100ad78da4) at open-behind.c:236
#9  0x00007f10032c3e9f in ob_fsync (frame=0x7f100b2efbb8, this=0x1400a60, fd=0x13cd5c8, flag=0, xdata=0x0) at open-behind.c:487
#10 0x00007f10030b644c in mdc_fsync (frame=0x7f100b2efb0c, this=0x1403710, fd=0x13cd5c8, datasync=0, xdata=0x0) at md-cache.c:1635
#11 0x00007f1002ea1a3c in io_stats_fsync (frame=0x7f100b2ef5ac, this=0x1404170, fd=0x13cd5c8, flags=0, xdata=0x0) at io-stats.c:2168
#12 0x00007f100a8eb1b3 in fuse_fsync_resume (state=0x7f0ffc016f80) at fuse-bridge.c:2384
#13 0x00007f100a8da00b in fuse_resolve_done (state=0x7f0ffc016f80) at fuse-resolve.c:663
#14 0x00007f100a8da0e1 in fuse_resolve_all (state=0x7f0ffc016f80) at fuse-resolve.c:692
#15 0x00007f100a8d9fd4 in fuse_resolve (state=0x7f0ffc016f80) at fuse-resolve.c:649
#16 0x00007f100a8da0b8 in fuse_resolve_all (state=0x7f0ffc016f80) at fuse-resolve.c:688
#17 0x00007f100a8da13f in fuse_resolve_continue (state=0x7f0ffc016f80) at fuse-resolve.c:708
#18 0x00007f100a8d9d8d in fuse_resolve_fd (state=0x7f0ffc016f80) at fuse-resolve.c:561
#19 0x00007f100a8d9f82 in fuse_resolve (state=0x7f0ffc016f80) at fuse-resolve.c:638
#20 0x00007f100a8da063 in fuse_resolve_all (state=0x7f0ffc016f80) at fuse-resolve.c:681
#21 0x00007f100a8da17d in fuse_resolve_and_resume (state=0x7f0ffc016f80, fn=0x7f100a8eaba4 <fuse_fsync_resume>) at fuse-resolve.c:721
#22 0x00007f100a8eb2fa in fuse_fsync (this=0x1338860, finh=0x7f0ffc016160, msg=0x7f0ffc016188) at fuse-bridge.c:2403
#23 0x00007f100a8f8af9 in fuse_thread_proc (data=0x1338860) at fuse-bridge.c:4798
#24 0x00000038e4807f33 in start_thread () from /lib64/libpthread.so.0
---Type <return> to continue, or q <return> to quit---
#25 0x00000038e40f4ded in clone () from /lib64/libc.so.6
(gdb) fr 0
#0  0x00007f1003b494a9 in dht_fsync (frame=0x7f100b2ef1a4, this=0x13f8620, fd=0x13cd5c8, datasync=0, xdata=0x0) at dht-inode-read.c:817
817	        STACK_WIND (frame, dht_fsync_cbk, subvol, subvol->fops->fsync,


(gdb) p subvol <<----- This is the reason for the crash.
$1 = (xlator_t *) 0x0

Comment 9 Joe Julian 2014-07-31 05:29:28 UTC
Created attachment 922813 [details]
brick log with the do_fd_cleanup on at least one of the gfids referenced

Comment 10 Anand Avati 2014-08-28 16:35:10 UTC
REVIEW: http://review.gluster.org/8562 (mount/fuse: Handle fd resolution failures) posted (#1) for review on release-3.4 by Kaleb KEITHLEY (kkeithle)

Comment 11 Anand Avati 2014-09-02 11:30:52 UTC
COMMIT: http://review.gluster.org/8562 committed in release-3.4 by Kaleb KEITHLEY (kkeithle) 
------
commit 1d4ef0b891899e3a6dbc8c2087e73cee6f5a7fbe
Author: Kaleb S. KEITHLEY <kkeithle>
Date:   Thu Aug 28 12:33:01 2014 -0400

    mount/fuse: Handle fd resolution failures
    
    Backport from http://review.gluster.org/#/c/8402, BZ 1126048
    
    Problem:
    Even when the fd resolution failed, the fop is continuing on the
    new graph which may not have valid inode. This lead to NULL layout
    subvols in dht which lead to crash in fsync after graph migration.
    
    Fix:
    - Remove resolution error handling in FUSE_FOP as it was only added
      to handle fd migration failures.
    - check in fuse_resolve_done for fd resolution failures and fail the
      fop right away.
    - loc resolution failures are already handled in the corresponding
      fops.
    - Return errno from state->resolve.op_errno in resume functions.
    - Send error to fuse on frame allocation failures.
    - Removed unused variable state->resolved
    - Removed unused macro FUSE_FOP_COOKIE
    
    Change-Id: I3f3129b9b4fc34ff1fe07218e6924a027346a95e
    BUG: 1123289
    Signed-off-by: Kaleb S. KEITHLEY <kkeithle>
    Reviewed-on: http://review.gluster.org/8562
    Tested-by: Gluster Build System <jenkins.com>
    Reviewed-by: Raghavendra G <rgowdapp>