Bug 786766 - glusterfs fuse client crashed due to 'fd' being null in 'afr_fd_ctx_get'
glusterfs fuse client crashed due to 'fd' being null in 'afr_fd_ctx_get'
Product: GlusterFS
Classification: Community
Component: access-control (Show other bugs)
Unspecified Linux
urgent Severity high
: ---
: ---
Assigned To: Pranith Kumar K
Depends On:
Blocks: 817967
  Show dependency treegraph
Reported: 2012-02-02 05:57 EST by M S Vishwanath Bhat
Modified: 2016-05-31 21:55 EDT (History)
2 users (show)

See Also:
Fixed In Version: glusterfs-3.4.0
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2013-07-24 13:27:36 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
500 entries from client log (88.65 KB, text/x-log)
2012-02-02 06:01 EST, M S Vishwanath Bhat
no flags Details

  None (edit)
Description M S Vishwanath Bhat 2012-02-02 05:57:07 EST
Description of problem:
Created a 3 way replicate volume and ran some opearations on it like untarring the linux kernel, building glusterfs, running posix compliance etc. Meanwhile I was taking down and bringing up the same replicate subvolume. Then client process crashed with segfault.fd was null in 'afr_fd_ctx_get'

Version-Release number of selected component (if applicable):

How reproducible:

Steps to Reproduce:
1. Create and start a 3 way replicate volume.
2. mount via fuse and untar linux kernel and glusterfs.
3. From different terminal run posix compliance test and 'make' of glusterfs source.
4. Bring down 2 of the replicate subvolumes down and after sometime bring them back online. keep doing this.
5. After posix compliance and 'make' run fileop and dbench.
Actual results:
make, fileop and dbench all failed because fuse client crashed with following backtrace

(gdb) bt
#0  0x000000334d20c100 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007fa92bb6fc23 in afr_fd_ctx_get (fd=0x0, this=0x10cfff0) at afr-transaction.c:74
#2  0x00007fa92bb6e934 in afr_openfd_fix_open_cbk (frame=0x7fa932f00f58, cookie=0x2, this=0x10cfff0, op_ret=-1, op_errno=107, fd=0x0) at afr-open.c:335
#3  0x00007fa92bde7382 in client3_1_open (frame=0x7fa933f8fcb0, this=0x10cf110, data=0x7fa928cb2eb0) at client3_1-fops.c:3542
#4  0x00007fa92bdcfcbc in client_open (frame=0x7fa933f8fcb0, this=0x10cf110, loc=0x7fa924726e48, flags=32770, fd=0x7fa928cb62fc, wbflags=0) at client.c:743
#5  0x00007fa92bb6f45a in afr_fix_open (frame=0x7fa933f14560, this=0x10cfff0, fd_ctx=0x18bef00, need_open_count=1, need_open=0x7fa9255a2080) at afr-open.c:435
#6  0x00007fa92bb63efd in afr_open_fd_fix (frame=0x7fa933f14560, this=0x10cfff0, pause_fop=_gf_false) at afr-inode-write.c:431
#7  0x00007fa92bb61de5 in afr_readv (frame=0x7fa933f14560, this=0x10cfff0, fd=0x7fa928cb62fc, size=131072, offset=0) at afr-inode-read.c:1147
#8  0x00007fa92b93919e in wb_readv_helper (frame=0x7fa933f4ea9c, this=0x10d12f0, fd=0x7fa928cb62fc, size=131072, offset=0) at write-behind.c:2241
#9  0x00007fa935304712 in call_resume_wind (stub=0x7fa932db0d34) at call-stub.c:2257
#10 0x00007fa93530be65 in call_resume (stub=0x7fa932db0d34) at call-stub.c:3932
#11 0x00007fa92b9375cf in wb_resume_other_requests (frame=0x7fa933f4ea9c, file=0x1365f60, other_requests=0x7fa928cb33c0) at write-behind.c:1832
#12 0x00007fa92b937833 in wb_do_ops (frame=0x7fa933f4ea9c, file=0x1365f60, winds=0x7fa928cb33e0, unwinds=0x7fa928cb33d0, other_requests=0x7fa928cb33c0) at write-behind.c:1870
#13 0x00007fa92b938033 in wb_process_queue (frame=0x7fa933f4ea9c, file=0x1365f60) at write-behind.c:2053
#14 0x00007fa92b9394cc in wb_readv (frame=0x7fa933f4ea9c, this=0x10d12f0, fd=0x7fa928cb62fc, size=131072, offset=0) at write-behind.c:2302
#15 0x00007fa92b72940a in ra_page_fault (file=0x118a060, frame=0x7fa933f8f29c, offset=0) at page.c:278
#16 0x00007fa92b723c95 in dispatch_requests (frame=0x7fa933f8f29c, file=0x118a060) at read-ahead.c:435
#17 0x00007fa92b72458f in ra_readv (frame=0x7fa933f8f29c, this=0x10d2570, fd=0x7fa928cb62fc, size=131072, offset=0) at read-ahead.c:543
#18 0x00007fa92b5196f1 in ioc_page_fault (ioc_inode=0x6128e20, frame=0x7fa933f49140, fd=0x7fa928cb62fc, offset=0) at page.c:631
#19 0x00007fa92b512cb8 in ioc_dispatch_requests (frame=0x7fa933f49140, ioc_inode=0x6128e20, fd=0x7fa928cb62fc, offset=0, size=131072) at io-cache.c:1041
#20 0x00007fa92b513be7 in ioc_readv (frame=0x7fa933f49140, this=0x10d3840, fd=0x7fa928cb62fc, size=131072, offset=0) at io-cache.c:1204
#21 0x00007fa92b2f904f in qr_readv (frame=0x7fa933f93ddc, this=0x10d49a0, fd=0x7fa928cb62fc, size=131072, offset=0) at quick-read.c:1320
#22 0x00007fa92b0dfc9c in sp_readv (frame=0x7fa933f1639c, this=0x10d5c60, fd=0x7fa928cb62fc, size=131072, offset=0) at stat-prefetch.c:2817
#23 0x00007fa92aebffd3 in io_stats_readv (frame=0x7fa933f49cac, this=0x10d6f20, fd=0x7fa928cb62fc, size=131072, offset=0) at io-stats.c:2064
#24 0x00007fa932b81ad0 in fuse_readv_resume (state=0x7fa92408ae10) at fuse-bridge.c:2036
#25 0x00007fa932b75c81 in fuse_resolve_and_resume (state=0x7fa92408ae10, fn=0x7fa932b81695 <fuse_readv_resume>) at fuse-resolve.c:578
#26 0x00007fa932b81c7c in fuse_readv (this=0x10c2680, finh=0x7fa9244778f0, msg=0x7fa924477918) at fuse-bridge.c:2065
#27 0x00007fa932b8b042 in fuse_thread_proc (data=0x10c2680) at fuse-bridge.c:3707
#28 0x000000334d2077e1 in start_thread () from /lib64/libpthread.so.0
#29 0x000000334cae577d in clone () from /lib64/libc.so.6
(gdb) f 2
#2  0x00007fa92bb6e934 in afr_openfd_fix_open_cbk (frame=0x7fa932f00f58, cookie=0x2, this=0x10cfff0, op_ret=-1, op_errno=107, fd=0x0) at afr-open.c:335
335             fd_ctx = afr_fd_ctx_get (fd, this);
(gdb) f 1
#1  0x00007fa92bb6fc23 in afr_fd_ctx_get (fd=0x0, this=0x10cfff0) at afr-transaction.c:74
74              LOCK(&fd->lock);

Expected results:
fuse client should not crash.

Additional info:

I have attached the client log file and archived other log files and core.
Comment 1 M S Vishwanath Bhat 2012-02-02 06:01:31 EST
Created attachment 559033 [details]
500 entries from client log
Comment 2 M S Vishwanath Bhat 2012-02-02 06:03:01 EST
Actual client file is too big to attach (42MB). Attaching file which has only last 500 entries from client log file.
Comment 3 Anand Avati 2012-02-22 07:14:39 EST
CHANGE: http://review.gluster.com/2792 (cluster/afr: Don't trust the fd returned in open_cbk) merged in master by Vijay Bellur (vijay@gluster.com)
Comment 4 M S Vishwanath Bhat 2012-05-13 15:10:00 EDT
Not reproducible consistently, but did not see the crash with glusterfs-3.3.0qa41.

Note You need to log in before you can comment on or make changes to this bug.