1371912 – `gluster system:: uuid get` hangs

Bug 1371912 - `gluster system:: uuid get` hangs

Summary: `gluster system:: uuid get` hangs

Keywords:
Status:	CLOSED CURRENTRELEASE
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	cli
Sub Component:
Version:	3.8
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	unspecified
Target Milestone:	---
Assignee:	Ravishankar N
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1371775
Blocks:
TreeView+	depends on / blocked

Reported:	2016-08-31 12:21 UTC by Ravishankar N
Modified:	2016-09-16 18:28 UTC (History)
CC List:	1 user (show)
Fixed In Version:	glusterfs-3.8.4
Clone Of:	1371775
Environment:
Last Closed:	2016-09-16 18:28:44 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Ravishankar N 2016-08-31 12:21:32 UTC

+++ This bug was initially created as a clone of Bug #1371775 +++

Description of problem:

    Problem: `gluster system:: uuid get` hangs due to double unref of dict.

    Fix:
    Remove the unnecessary unref in cli_cmd_uuid_get_cbk(). In the said
    function, if calling proc->fn() is sucessful, the dict is automatically
    unrefed in its cbk as a part of cli_local_wipe(). If calling proc->fn() fails,
    then CLI_STACK_DESTROY() takes care of the unref.

--- Additional comment from Worker Ant on 2016-08-31 01:26:44 EDT ---

REVIEW: http://review.gluster.org/15368 (cli: Fix double unref of dict) posted (#1) for review on master by Ravishankar N (ravishankar)

--- Additional comment from Worker Ant on 2016-08-31 08:11:01 EDT ---

COMMIT: http://review.gluster.org/15368 committed in master by Atin Mukherjee (amukherj) 
------
commit 2d3292fd29884b16cac058f937f91cfda197eca6
Author: Ravishankar N <ravishankar>
Date:   Wed Aug 31 10:47:45 2016 +0530

    cli: Fix double unref of dict
    
    Problem: `gluster system:: uuid get` hangs due to double unref of dict.
    
    Fix:
    Remove the unnecessary unref in cli_cmd_uuid_get_cbk(). In the said
    function, if calling proc->fn() is sucessful, the dict is automatically
    unrefed in its cbk as a part of cli_local_wipe(). If calling proc->fn() fails,
    then CLI_STACK_DESTROY() takes care of the unref.
    
    Change-Id: Ib656d200f14a27415b36794a0bdadfe36b0d5306
    BUG: 1371775
    Signed-off-by: Ravishankar N <ravishankar>
    Reviewed-on: http://review.gluster.org/15368
    NetBSD-regression: NetBSD Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    Smoke: Gluster Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

--- Additional comment from Ravishankar N on 2016-08-31 08:18:16 EDT ---

Just posting the backtrace of the cli (indicating the corrupted dict) for posterity since the hang doesn't apparently happen on all machines:

(gdb) thread apply all bt

Thread 4 (Thread 0x7faaedc8a700 (LWP 3926)):
#0  0x00007faaf7250c7d in nanosleep () from /lib64/libpthread.so.0
#1  0x00007faaf8e0caca in gf_timer_proc (data=0xee30b0) at timer.c:164
#2  0x00007faaf7248555 in start_thread () from /lib64/libpthread.so.0
#3  0x00007faaf6b99ded in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7faaed489700 (LWP 3932)):
#0  0x00007faaf725029d in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007faaf724a89d in pthread_mutex_lock () from /lib64/libpthread.so.0
#2  0x00007faaf8df4621 in dict_unref (this=0x7faae0012cec) at dict.c:603
#3  0x000000000044341c in cli_cmd_uuid_get_cbk (state=0x7ffc95a0d470, word=0xee6540, words=0x7faae0011b10, wordcount=3) at cli-cmd-system.c:323
#4  0x000000000040cf04 in cli_cmd_process (state=0x7ffc95a0d470, argc=3, argv=0x7faae0011b10) at cli-cmd.c:135
#5  0x000000000040d098 in cli_cmd_process_line (state=0x7ffc95a0d470, text=0x7faae0012da0 "system:: uuid get") at cli-cmd.c:197
#6  0x000000000040d78f in cli_rl_process_line (line=0x7faae0012da0 "system:: uuid get") at cli-rl.c:98
#7  0x000000000040de2f in cli_rl_input (_data=0x7ffc95a0d470) at cli-rl.c:364
#8  0x00007faaf7248555 in start_thread () from /lib64/libpthread.so.0
#9  0x00007faaf6b99ded in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7faaecc88700 (LWP 3933)):
#0  0x00007faaf6b9a3e3 in epoll_wait () from /lib64/libc.so.6
#1  0x00007faaf8e668a1 in event_dispatch_epoll_worker (data=0xee72d0) at event-epoll.c:664
#2  0x00007faaf7248555 in start_thread () from /lib64/libpthread.so.0
#3  0x00007faaf6b99ded in clone () from /lib64/libc.so.6

Thread 1 (Thread 0x7faaf92dd780 (LWP 3925)):
#0  0x00007faaf724959d in pthread_join () from /lib64/libpthread.so.0
#1  0x00007faaf8e66aed in event_dispatch_epoll (event_pool=0xea70d0) at event-epoll.c:758
#2  0x00007faaf8e2c239 in event_dispatch (event_pool=0xea70d0) at event.c:124
#3  0x000000000040bf1c in main (argc=1, argv=0x7ffc95a0d638) at cli.c:752
(gdb) thread  3
[Switching to thread 3 (Thread 0x7faaed489700 (LWP 3932))]
#0  0x00007faaf725029d in __lll_lock_wait () from /lib64/libpthread.so.0
(gdb) frame 2
#2  0x00007faaf8df4621 in dict_unref (this=0x7faae0012cec) at dict.c:603
603             LOCK (&this->lock);



(gdb) p this
$1 = (dict_t *) 0x7faae0012cec
(gdb) p *this
$2 = {is_static = 0 '\000', hash_size = 19714048, count = 8366816, refcount = -1379869184, members = 0xe7143000007faade, members_list = 0x300000000000,
  extra_free = 0xac00007ffc00 <error: Cannot access memory at address 0xac00007ffc00>, extra_stdfree = 0x12cd00000000000 <error: Cannot access memory at address 0x12cd00000000000>, lock = {spinlock = 2, mutex = {__data = {__lock = 2,
        __count = 2915098112, __owner = 8366814, __nusers = 3876859904, __kind = 0, __spins = 12288, __elision = 0, __list = {__prev = 0xac00007ffc00, __next = 0x12cd00000000000}},
      __size = "\002\000\000\000\000\336\300\255ު\177\000\000\060\024\347\000\000\000\000\000\060\000\000\000\374\177\000\000\254\000\000\000\000\000\000\000\320,\001", __align = -5926493018038206462}},
  members_internal = 0xadc0de00007faae0, free_pair = {hash_next = 0xe7143000007faade, prev = 0x300000000000, next = 0xac00007ffc00, value = 0x12cd00000000000,
    key = 0xffffff00007faae0 <error: Cannot access memory at address 0xffffff00007faae0>}, free_pair_in_use = (_gf_true | unknown: 4294967294)}

Comment 1 Worker Ant 2016-08-31 12:22:46 UTC

REVIEW: http://review.gluster.org/15376 (cli: Fix double unref of dict) posted (#1) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 2 Worker Ant 2016-09-01 06:11:32 UTC

REVIEW: http://review.gluster.org/15376 (cli: Fix double unref of dict) posted (#2) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 3 Worker Ant 2016-09-01 07:10:50 UTC

REVIEW: http://review.gluster.org/15376 (cli: Fix double unref of dict) posted (#3) for review on release-3.8 by Ravishankar N (ravishankar)

Comment 4 Worker Ant 2016-09-01 07:12:39 UTC

COMMIT: http://review.gluster.org/15376 committed in release-3.8 by Atin Mukherjee (amukherj) 
------
commit f2ce05561c29c48640b72d0e813dd93b0282bd5e
Author: Ravishankar N <ravishankar>
Date:   Wed Aug 31 10:47:45 2016 +0530

    cli: Fix double unref of dict
    
    Backport of http://review.gluster.org/#/c/15368/
    
    Problem: `gluster system:: uuid get` hangs due to double unref of dict.
    
    Fix:
    Remove the unnecessary unref in cli_cmd_uuid_get_cbk(). In the said
    function, if calling proc->fn() is sucessful, the dict is automatically
    unrefed in its cbk as a part of cli_local_wipe(). If calling proc->fn() fails,
    then CLI_STACK_DESTROY() takes care of the unref.
    
    Change-Id: Ib656d200f14a27415b36794a0bdadfe36b0d5306
    BUG: 1371912
    Signed-off-by: Ravishankar N <ravishankar>
    (cherry picked from commit 2d3292fd29884b16cac058f937f91cfda197eca6)
    Reviewed-on: http://review.gluster.org/15376
    Smoke: Gluster Build System <jenkins.org>
    CentOS-regression: Gluster Build System <jenkins.org>
    NetBSD-regression: NetBSD Build System <jenkins.org>
    Reviewed-by: Atin Mukherjee <amukherj>

Comment 5 Niels de Vos 2016-09-12 05:39:14 UTC

All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 6 Niels de Vos 2016-09-16 18:28:44 UTC

This bug is getting closed because a release has been made available that should address the reported issue. In case the problem is still not fixed with glusterfs-3.8.4, please open a new bug report.

glusterfs-3.8.4 has been announced on the Gluster mailinglists [1], packages for several distributions should become available in the near future. Keep an eye on the Gluster Users mailinglist [2] and the update infrastructure for your distribution.

[1] https://www.gluster.org/pipermail/announce/2016-September/000060.html
[2] https://www.gluster.org/pipermail/gluster-users/

Note You need to log in before you can comment on or make changes to this bug.