Bug 1423065

Summary:	Deadlock in gf_timer calls and a possible core (illegal memory access) in AFR on dropping FUSE cache
Product:	[Community] GlusterFS	Reporter:	Shyamsundar <srangana>
Component:	replicate	Assignee:	Ravishankar N <ravishankar>
Status:	CLOSED DUPLICATE	QA Contact:
Severity:	unspecified	Docs Contact:
Priority:	unspecified
Version:	3.10	CC:	bugs, jdarcy, ravishankar, srangana
Target Milestone:	---
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:		Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2017-02-20 02:57:51 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:
Bug Depends On:
Bug Blocks:	1416031

Description Shyamsundar 2017-02-17 00:56:21 UTC

Description of problem:
- FUSE mount point became non-responsive post a "echo 3 > /proc/sys/vm/drop_caches" based drop of cache on the client

- On investigation, the glusterfsd process was deadlocked in the timer code

- Further this deadlock occurred due to a SEGFAULT in possibly AFR routines

Version-Release number of selected component (if applicable):
3.10

How reproducible:
Happened once, retesting as I file the bug to see if it happens again.

Steps to Reproduce:
It happened when I ran the gluster bench script from here [1]

But, on further debugging, the real cause is the dropping of cache, so I would state, the steps to reproduce it so run a few IOZone jobs and then some smallfile jobs, and keep dropping cache in between each test like in [1] for this to happen.

[1] https://github.com/gluster/gbench/blob/master/bench-tests/bt-0000-0001/GlusterBench.py

Additional info:

Providing core file details and hang details here,

NOTE Thread 10: Attempting reg->lock in timer.c and waiting there
Thread 10 (Thread 0x7f376c728700 (LWP 65253)):
#0  0x00007f3774132f4d in __lll_lock_wait () from /lib64/libpthread.so.0
No symbol table info available.
#1  0x00007f377412ed02 in _L_lock_791 () from /lib64/libpthread.so.0
No symbol table info available.
#2  0x00007f377412ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
No symbol table info available.
#3  0x00007f37752d8475 in gf_timer_proc (data=0x7f37771a70f0) at timer.c:135
        at = <optimized out>
        need_cbk = 0 '\000'
        now = 277103100605893
        now_ts = {tv_sec = 277103, tv_nsec = 100605893}
        reg = 0x7f37771a70f0
        sleepts = {tv_sec = 1, tv_nsec = 0}
        event = <optimized out>
        tmp = <optimized out>
        old_THIS = <optimized out>
#4  0x00007f377412cdc5 in start_thread () from /lib64/libpthread.so.0
No symbol table info available.
#5  0x00007f3773a7321d in clone () from /lib64/libc.so.6
No symbol table info available.

NOTE: Thread 3 SEG faulted in frame #7, but after taking the same reg->lock, thus invoking the cleanup routine, that again tries to acquire the same lock in frame 3

[Switching to thread 3 (Thread 0x7f375ffff700 (LWP 65262))]
#0  0x00007f3774132f4d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) bt
#0  0x00007f3774132f4d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f377412ed02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f377412ec08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f37752d8a45 in gf_timer_call_cancel (ctx=ctx@entry=0x7f3777153010, event=0x7f374c0009b0) at timer.c:98
#4  0x00007f37752ce7a3 in gf_log_disable_suppression_before_exit (ctx=0x7f3777153010) at logging.c:428
#5  0x00007f37752d4a55 in gf_print_trace (signum=<optimized out>, ctx=<optimized out>) at common-utils.c:714
#6  <signal handler called>
#7  gf_timer_call_cancel (ctx=0x7f3777153010, event=0x2b1a134558a5f533) at timer.c:100
#8  0x00007f37675c5263 in afr_spb_choice_timeout_cancel (this=this@entry=0x7f3760021f20, inode=inode@entry=0x7f3758e193f0) at afr-common.c:713
#9  0x00007f37675c5369 in afr_forget (this=0x7f3760021f20, inode=0x7f3758e193f0) at afr-common.c:4478
#10 0x00007f37752d8d12 in __inode_ctx_free (inode=inode@entry=0x7f3758e193f0) at inode.c:332
#11 0x00007f37752d9ed2 in __inode_destroy (inode=0x7f3758e193f0) at inode.c:353
#12 inode_table_prune (table=table@entry=0x7f37600be5d0) at inode.c:1585
#13 0x00007f37752da1b4 in inode_unref (inode=0x7f3758e193f0) at inode.c:569
#14 0x00007f376c730731 in do_forget (this=this@entry=0x7f3777191ee0, unique=<optimized out>, nodeid=<optimized out>, nlookup=<optimized out>) at fuse-bridge.c:623
#15 0x00007f376c7307ca in fuse_batch_forget (this=0x7f3777191ee0, finh=0x7f3741667960, msg=0x7f3741667988) at fuse-bridge.c:661
#16 0x00007f376c74652e in fuse_thread_proc (data=0x7f3777191ee0) at fuse-bridge.c:5068
#17 0x00007f377412cdc5 in start_thread () from /lib64/libpthread.so.0
#18 0x00007f3773a7321d in clone () from /lib64/libc.so.6

NOTE: The seg fault in frame #7 seems to be due to bad inode context pointer from AFR, when forgetting the inodes,

#6  <signal handler called>
No symbol table info available.
#7  gf_timer_call_cancel (ctx=0x7f3777153010, event=0x2b1a134558a5f533) at timer.c:100
        reg = 0x7f37771a70f0
        fired = _gf_false
        __FUNCTION__ = "gf_timer_call_cancel"
#8  0x00007f37675c5263 in afr_spb_choice_timeout_cancel (this=this@entry=0x7f3760021f20, inode=inode@entry=0x7f3758e193f0) at afr-common.c:713
        ctx = 0x7f374642d980
        ret = <optimized out>
        __FUNCTION__ = "afr_spb_choice_timeout_cancel"
#9  0x00007f37675c5369 in afr_forget (this=0x7f3760021f20, inode=0x7f3758e193f0) at afr-common.c:4478
        ctx_int = 0
        ctx = 0x0
#10 0x00007f37752d8d12 in __inode_ctx_free (inode=inode@entry=0x7f3758e193f0) at inode.c:332
        index = 2
        xl = 0x7f3760021f20
        old_THIS = 0x7f3777191ee0
        __FUNCTION__ = "__inode_ctx_free"
#11 0x00007f37752d9ed2 in __inode_destroy (inode=0x7f3758e193f0) at inode.c:353
No locals.
#12 inode_table_prune (table=table@entry=0x7f37600be5d0) at inode.c:1585

Some further context:
(gdb) f 7
#7  gf_timer_call_cancel (ctx=0x7f3777153010, event=0x2b1a134558a5f533) at timer.c:100
100                     fired = event->fired;
(gdb) p event
$1 = (gf_timer_t *) 0x2b1a134558a5f533
(gdb) p *event
Cannot access memory at address 0x2b1a134558a5f533
(gdb) f 8
#8  0x00007f37675c5263 in afr_spb_choice_timeout_cancel (this=this@entry=0x7f3760021f20, inode=inode@entry=0x7f3758e193f0) at afr-common.c:713
713                             gf_timer_call_cancel (this->ctx, ctx->timer);
(gdb) p ctx
$2 = (afr_inode_ctx_t *) 0x7f374642d980
(gdb) p *ctx
$3 = {read_subvol = 4379182848, spb_choice = -1, timer = 0x2b1a134558a5f533, need_refresh = (_gf_true | unknown: 1487271126)}
(gdb) f 7
#7  gf_timer_call_cancel (ctx=0x7f3777153010, event=0x2b1a134558a5f533) at timer.c:100
100                     fired = event->fired;
(gdb) info registers
rax            0x0      0
rbx            0x2b1a134558a5f533       3105816081586320691
rcx            0x0      0
<edited>

(gdb) disassemble 
Dump of assembler code for function gf_timer_call_cancel:
<edited>
   0x00007f37752d89b3 <+115>:   je     0x7f37752d8a40 <gf_timer_call_cancel+256>
   0x00007f37752d89b9 <+121>:   callq  0x7f37752c1a80 <pthread_spin_lock@plt>
=> 0x00007f37752d89be <+126>:   mov    0x38(%rbx),%r13d
   0x00007f37752d89c2 <+130>:   test   %r13d,%r13d
   0x00007f37752d89c5 <+133>:   jne    0x7f37752d89e9 <gf_timer_call_cancel+169>

Comment 1 Shyamsundar 2017-02-17 01:12:29 UTC

I just hit this again on the same test case, with the same exact stack etc.

Request some attention to this, as it blocks performance regression testing (and also seems to be consistent) for the release.

Comment 2 Jeff Darcy 2017-02-17 02:04:31 UTC

This is disturbing, as it looks somewhat similar to bug 1421721.  Seems like something might have gone awry in the timer code recently.

Comment 3 Jeff Darcy 2017-02-17 02:21:25 UTC

Nothing has changed in the timer code recently.  However, check out https://bugzilla.redhat.com/show_bug.cgi?id=1421721#c4 (just added).  That's not *directly* applicable here, since it involves glusterd code and we're not in glusterd, but it seems pretty likely that it's something similar corrupting the timer list.

Comment 4 Ravishankar N 2017-02-17 10:24:46 UTC

----------------------------
Notes to self:
(gdb) p *ctx
$4 = {read_subvol = 4379182848, spb_choice = -1, timer = 0x2b1a134558a5f533, need_refresh = (_gf_true | unknown: 1487271126)}

ctx->timer is not NULL. But ctx->timer is initialized only when split-brain resolution related setfattr commands are executed from mount, which was not done here in the test run.

ctx->read_subvol is supposed to be a bit map array of readable subvols (https://github.com/gluster/glusterfs/blob/v3.10.0rc0/xlators/cluster/afr/src/afr-common.c#L162) but the value 4379182848 in this case when converted to binary to get the data/metadata/event_gen bits in the bitmap is giving gibberish results

ctx->need_refresh not being just _gf_true (i.e 0x01) but a large non zero value (1487271126) is again gibberish.
----------------------------
Shyam, it looks like the inode context values are corrupt here and the problem is likely to be BZ 1423385 where the inode context of a different xlator is returned when we get it with a key of a given xlator. (I don't see any bug in AFR itself from a looking at the code).

Can you wait until https://review.gluster.org/#/c/16655/ gets merged to see if it fixes the problem?

Comment 5 Shyamsundar 2017-02-17 15:33:19 UTC

(In reply to Ravishankar N from comment #4)
> Shyam, it looks like the inode context values are corrupt here and the
> problem is likely to be BZ 1423385 where the inode context of a different
> xlator is returned when we get it with a key of a given xlator. (I don't see
> any bug in AFR itself from a looking at the code).
> 
> Can you wait until https://review.gluster.org/#/c/16655/ gets merged to see
> if it fixes the problem?

I intend to do that, had a separate IRC conversation with Poornima, and I might give this fix a shot in my setup to see if the problem goes away. Also, I can possibly narrow the test case a bit in the process.

Comment 6 Shyamsundar 2017-02-20 02:57:51 UTC

Retested with the possible fix posted here,https://review.gluster.org/#/c/16655/

Problem is not reproducible. As prior to the fix this hung/crashed for me every time, and the last 3 runs have not, marking this as a duplicate of bug #1423385

*** This bug has been marked as a duplicate of bug 1423385 ***

Comment 7 Ravishankar N 2017-02-20 04:06:56 UTC

(In reply to Shyamsundar from comment #6)
> Retested with the possible fix posted
> here,https://review.gluster.org/#/c/16655/
> 
> Problem is not reproducible. As prior to the fix this hung/crashed for me
> every time, and the last 3 runs have not, marking this as a duplicate of bug
> #1423385
> 
> *** This bug has been marked as a duplicate of bug 1423385 ***

Many thanks for testing Shyam!