Bug 786766

Summary: glusterfs fuse client crashed due to 'fd' being null in 'afr_fd_ctx_get'
Product: [Community] GlusterFS Reporter: M S Vishwanath Bhat <vbhat>
Component: access-controlAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: urgent    
Version: pre-releaseCC: gluster-bugs, mzywusko
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.4.0 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-07-24 17:27:36 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 817967    
Attachments:
Description Flags
500 entries from client log none

Description M S Vishwanath Bhat 2012-02-02 10:57:07 UTC
Description of problem:
Created a 3 way replicate volume and ran some opearations on it like untarring the linux kernel, building glusterfs, running posix compliance etc. Meanwhile I was taking down and bringing up the same replicate subvolume. Then client process crashed with segfault.fd was null in 'afr_fd_ctx_get'

Version-Release number of selected component (if applicable):
c3aa99d907591f72b6302287b9b8899514fb52f1

How reproducible:
1/1

Steps to Reproduce:
1. Create and start a 3 way replicate volume.
2. mount via fuse and untar linux kernel and glusterfs.
3. From different terminal run posix compliance test and 'make' of glusterfs source.
4. Bring down 2 of the replicate subvolumes down and after sometime bring them back online. keep doing this.
5. After posix compliance and 'make' run fileop and dbench.
  
Actual results:
make, fileop and dbench all failed because fuse client crashed with following backtrace

(gdb) bt
#0  0x000000334d20c100 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007fa92bb6fc23 in afr_fd_ctx_get (fd=0x0, this=0x10cfff0) at afr-transaction.c:74
#2  0x00007fa92bb6e934 in afr_openfd_fix_open_cbk (frame=0x7fa932f00f58, cookie=0x2, this=0x10cfff0, op_ret=-1, op_errno=107, fd=0x0) at afr-open.c:335
#3  0x00007fa92bde7382 in client3_1_open (frame=0x7fa933f8fcb0, this=0x10cf110, data=0x7fa928cb2eb0) at client3_1-fops.c:3542
#4  0x00007fa92bdcfcbc in client_open (frame=0x7fa933f8fcb0, this=0x10cf110, loc=0x7fa924726e48, flags=32770, fd=0x7fa928cb62fc, wbflags=0) at client.c:743
#5  0x00007fa92bb6f45a in afr_fix_open (frame=0x7fa933f14560, this=0x10cfff0, fd_ctx=0x18bef00, need_open_count=1, need_open=0x7fa9255a2080) at afr-open.c:435
#6  0x00007fa92bb63efd in afr_open_fd_fix (frame=0x7fa933f14560, this=0x10cfff0, pause_fop=_gf_false) at afr-inode-write.c:431
#7  0x00007fa92bb61de5 in afr_readv (frame=0x7fa933f14560, this=0x10cfff0, fd=0x7fa928cb62fc, size=131072, offset=0) at afr-inode-read.c:1147
#8  0x00007fa92b93919e in wb_readv_helper (frame=0x7fa933f4ea9c, this=0x10d12f0, fd=0x7fa928cb62fc, size=131072, offset=0) at write-behind.c:2241
#9  0x00007fa935304712 in call_resume_wind (stub=0x7fa932db0d34) at call-stub.c:2257
#10 0x00007fa93530be65 in call_resume (stub=0x7fa932db0d34) at call-stub.c:3932
#11 0x00007fa92b9375cf in wb_resume_other_requests (frame=0x7fa933f4ea9c, file=0x1365f60, other_requests=0x7fa928cb33c0) at write-behind.c:1832
#12 0x00007fa92b937833 in wb_do_ops (frame=0x7fa933f4ea9c, file=0x1365f60, winds=0x7fa928cb33e0, unwinds=0x7fa928cb33d0, other_requests=0x7fa928cb33c0) at write-behind.c:1870
#13 0x00007fa92b938033 in wb_process_queue (frame=0x7fa933f4ea9c, file=0x1365f60) at write-behind.c:2053
#14 0x00007fa92b9394cc in wb_readv (frame=0x7fa933f4ea9c, this=0x10d12f0, fd=0x7fa928cb62fc, size=131072, offset=0) at write-behind.c:2302
#15 0x00007fa92b72940a in ra_page_fault (file=0x118a060, frame=0x7fa933f8f29c, offset=0) at page.c:278
#16 0x00007fa92b723c95 in dispatch_requests (frame=0x7fa933f8f29c, file=0x118a060) at read-ahead.c:435
#17 0x00007fa92b72458f in ra_readv (frame=0x7fa933f8f29c, this=0x10d2570, fd=0x7fa928cb62fc, size=131072, offset=0) at read-ahead.c:543
#18 0x00007fa92b5196f1 in ioc_page_fault (ioc_inode=0x6128e20, frame=0x7fa933f49140, fd=0x7fa928cb62fc, offset=0) at page.c:631
#19 0x00007fa92b512cb8 in ioc_dispatch_requests (frame=0x7fa933f49140, ioc_inode=0x6128e20, fd=0x7fa928cb62fc, offset=0, size=131072) at io-cache.c:1041
#20 0x00007fa92b513be7 in ioc_readv (frame=0x7fa933f49140, this=0x10d3840, fd=0x7fa928cb62fc, size=131072, offset=0) at io-cache.c:1204
#21 0x00007fa92b2f904f in qr_readv (frame=0x7fa933f93ddc, this=0x10d49a0, fd=0x7fa928cb62fc, size=131072, offset=0) at quick-read.c:1320
#22 0x00007fa92b0dfc9c in sp_readv (frame=0x7fa933f1639c, this=0x10d5c60, fd=0x7fa928cb62fc, size=131072, offset=0) at stat-prefetch.c:2817
#23 0x00007fa92aebffd3 in io_stats_readv (frame=0x7fa933f49cac, this=0x10d6f20, fd=0x7fa928cb62fc, size=131072, offset=0) at io-stats.c:2064
#24 0x00007fa932b81ad0 in fuse_readv_resume (state=0x7fa92408ae10) at fuse-bridge.c:2036
#25 0x00007fa932b75c81 in fuse_resolve_and_resume (state=0x7fa92408ae10, fn=0x7fa932b81695 <fuse_readv_resume>) at fuse-resolve.c:578
#26 0x00007fa932b81c7c in fuse_readv (this=0x10c2680, finh=0x7fa9244778f0, msg=0x7fa924477918) at fuse-bridge.c:2065
#27 0x00007fa932b8b042 in fuse_thread_proc (data=0x10c2680) at fuse-bridge.c:3707
#28 0x000000334d2077e1 in start_thread () from /lib64/libpthread.so.0
#29 0x000000334cae577d in clone () from /lib64/libc.so.6
(gdb) f 2
#2  0x00007fa92bb6e934 in afr_openfd_fix_open_cbk (frame=0x7fa932f00f58, cookie=0x2, this=0x10cfff0, op_ret=-1, op_errno=107, fd=0x0) at afr-open.c:335
335             fd_ctx = afr_fd_ctx_get (fd, this);
(gdb) f 1
#1  0x00007fa92bb6fc23 in afr_fd_ctx_get (fd=0x0, this=0x10cfff0) at afr-transaction.c:74
74              LOCK(&fd->lock);


Expected results:
fuse client should not crash.

Additional info:

I have attached the client log file and archived other log files and core.

Comment 1 M S Vishwanath Bhat 2012-02-02 11:01:31 UTC
Created attachment 559033 [details]
500 entries from client log

Comment 2 M S Vishwanath Bhat 2012-02-02 11:03:01 UTC
Actual client file is too big to attach (42MB). Attaching file which has only last 500 entries from client log file.

Comment 3 Anand Avati 2012-02-22 12:14:39 UTC
CHANGE: http://review.gluster.com/2792 (cluster/afr: Don't trust the fd returned in open_cbk) merged in master by Vijay Bellur (vijay)

Comment 4 M S Vishwanath Bhat 2012-05-13 19:10:00 UTC
Not reproducible consistently, but did not see the crash with glusterfs-3.3.0qa41.