Bug 765536 - (GLUSTER-3804) [glusterfs-3.2.5qa6]: glusterd crashed due to abort
[glusterfs-3.2.5qa6]: glusterd crashed due to abort
Status: CLOSED WORKSFORME
Product: GlusterFS
Classification: Community
Component: glusterd (Show other bugs)
pre-release
x86_64 Linux
medium Severity high
: ---
: ---
Assigned To: Raghavendra Bhat
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2011-11-10 05:17 EST by Raghavendra Bhat
Modified: 2012-07-11 06:22 EDT (History)
2 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2012-07-11 06:22:44 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:


Attachments (Terms of Use)

  None (edit)
Description Raghavendra Bhat 2011-11-10 05:17:25 EST
glusterd crashed due to abort. Had enabled gsync, quota, profile. Had started replace brick to reproduce the bug 765522. Tried to disable the quota and glusterd crashed with the below backtrace.


Core was generated by `/opt/glusterfs/3.2.5qa6/sbin/glusterd'.
Program terminated with signal 6, Aborted.
#0  0x0000003d6ba32905 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.25.el6_1.3.x86_64 libgcc-4.4.5-6.el6.x86_64 libxml2-2.7.6-1.el6.x86_64 zlib-1.2.3-25.el6.x86_64
(gdb) bt
#0  0x0000003d6ba32905 in raise () from /lib64/libc.so.6
#1  0x0000003d6ba340e5 in abort () from /lib64/libc.so.6
#2  0x0000003d6ba6f827 in __libc_message () from /lib64/libc.so.6
#3  0x0000003d6ba75146 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007fd37fe38734 in __gf_free (free_ptr=0xb3ded0) at mem-pool.c:259
#5  0x00007fd37fe0ae4b in data_destroy (data=0xb3def0) at dict.c:140
#6  0x00007fd37fe0b3fd in dict_del (this=0xb47bc0, key=0x7fd37dfb1e0a "features.limit-usage") at dict.c:353
#7  0x00007fd37df7b693 in glusterd_quota_disable (volinfo=0xb4f340, op_errstr=0x7fff1646ad28) at glusterd-op-sm.c:5270
#8  0x00007fd37df7c103 in glusterd_op_quota (dict=0xb4fa80, op_errstr=0x7fff1646ad28) at glusterd-op-sm.c:5472
#9  0x00007fd37df84a6e in glusterd_op_commit_perform (op=<value optimized out>, dict=0xb4fa80, op_errstr=0x7fff1646ad28, 
    rsp_dict=0xffffffffffffffff) at glusterd-op-sm.c:7655
#10 0x00007fd37df85edb in glusterd_op_ac_send_commit_op (event=<value optimized out>, ctx=<value optimized out>) at glusterd-op-sm.c:6821
#11 0x00007fd37df72a8f in glusterd_op_sm () at glusterd-op-sm.c:8458
#12 0x00007fd37df9afe2 in glusterd3_1_stage_op_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, 
    myframe=0x7fd37ec70314) at glusterd-rpc-ops.c:1099
#13 0x00007fd37fbf2502 in rpc_clnt_handle_reply (clnt=0xe2a580, pollin=<value optimized out>) at rpc-clnt.c:741
#14 0x00007fd37fbf26fd in rpc_clnt_notify (trans=<value optimized out>, mydata=0xe2a5b0, event=<value optimized out>, 
    data=0xffffffffffffffff) at rpc-clnt.c:854
#15 0x00007fd37fbed317 in rpc_transport_notify (this=0x1088, event=RPC_TRANSPORT_MSG_SENT, data=0xffffffffffffffff) at rpc-transport.c:919
#16 0x00007fd37dc915ef in socket_event_poll_in (this=0xe2f9c0) at socket.c:1647
#17 0x00007fd37dc91798 in socket_event_handler (fd=<value optimized out>, idx=10, data=0xe2f9c0, poll_in=1, poll_out=0, poll_err=0)
    at socket.c:1762
#18 0x00007fd37fe37781 in event_dispatch_epoll_handler (event_pool=0xb3c360) at event.c:794
#19 event_dispatch_epoll (event_pool=0xb3c360) at event.c:856
#20 0x000000000040566e in main (argc=1, argv=0x7fff1646be88) at glusterfsd.c:1509
(gdb) f 4
#4  0x00007fd37fe38734 in __gf_free (free_ptr=0xb3ded0) at mem-pool.c:259
259                     FREE (free_ptr);
(gdb) p *(data_t *)free_ptr
$1 = {is_static = 0 '\000', is_const = 0 '\000', is_stdalloc = 0 '\000', len = 0, vec = 0x424732323a646e75, data = 0x0, refcount = 49, 
  lock = 0}
(gdb) f 6
#6  0x00007fd37fe0b3fd in dict_del (this=0xb47bc0, key=0x7fd37dfb1e0a "features.limit-usage") at dict.c:353
353                             data_unref (pair->value);
(gdb) p *this
$2 = {is_static = 0 '\000', hash_size = 1, count = 9, refcount = 1, members = 0xb490d0, members_list = 0xb52090, extra_free = 0x0, 
  extra_stdfree = 0x0, lock = 0}
(gdb)  p *this->members_list 
$3 = {hash_next = 0xe229b0, prev = 0x0, next = 0xe229b0, value = 0xb54ab0, key = 0xb4e510 "enable-pump"}
(gdb) p *this->members_list->next
$4 = {hash_next = 0xe279c0, prev = 0xb52090, next = 0xe279c0, value = 0xb48f30, key = 0xb48e00 "geo-replication.indexing"}
(gdb) p *this->members_list->next->next
$5 = {hash_next = 0xb4c420, prev = 0xe229b0, next = 0xb4c420, value = 0xb48b00, key = 0xe27b30 "performance.write-behind"}
(gdb) p *this->members_list->next->next->next
$6 = {hash_next = 0xb4c380, prev = 0xe279c0, next = 0xb4c380, value = 0xe2c600, key = 0xb4c450 "performance.stat-prefetch"}
(gdb) p *this->members_list->next->next->next->next
$7 = {hash_next = 0xb4c2d0, prev = 0xb4c420, next = 0xb4c2d0, value = 0xb4c350, key = 0xb4c3b0 "performance.io-cache"}
(gdb) p *this->members_list->next->next->next->next->next
$8 = {hash_next = 0xb4a950, prev = 0xb4c380, next = 0xb4a950, value = 0xb4c2a0, key = 0xb4c300 "diagnostics.count-fop-hits"}
(gdb) p *this->members_list->next->next->next->next->next->next
$9 = {hash_next = 0xb3e0c0, prev = 0xb4c2d0, next = 0xb3df20, value = 0xb4a920, key = 0xb4a980 "diagnostics.latency-measurement"}
(gdb) p *this->members_list->next->next->next->next->next->next->next
$10 = {hash_next = 0xb3e0c0, prev = 0xb4a950, next = 0xb3e0c0, value = 0xb3def0, key = 0xb3df50 "features.limit-usage"}
(gdb) p *this->members_list->next->next->next->next->next->next->next->value
$11 = {is_static = 0 '\000', is_const = 0 '\000', is_stdalloc = 0 '\000', len = 17, vec = 0x0, 
  data = 0xb3ded0 "\220", <incomplete sequence \340\263>, refcount = 0, lock = 1}
(gdb) p *this->members_list->next->next->next->next->next->next->next->value->data
$12 = -112 '\220'
(gdb)
Comment 1 Amar Tumballi 2011-11-10 19:02:43 EST
Even though it below code is fine for now, (considering single thread behavior of glusterd), this is very race prone..

code in 'glusterd_quota_disable()'
------
        ret = glusterd_volinfo_get (volinfo, VKEY_FEATURES_LIMIT_USAGE,
                                    &quota_limits);
        if (ret) {
                gf_log ("", GF_LOG_WARNING, "failed to get the quota limits");
        } else {
                GF_FREE (quota_limits);
        }

        dict_del (volinfo->dict, VKEY_FEATURES_LIMIT_USAGE);
------

Here, quota_limits is the entry which is getting 'deleted' in dict_del(), but when the entry is inside the dict, its getting freed. This may lead to some problems. Not a good practice.
Comment 2 Amar Tumballi 2012-02-21 23:08:44 EST
Johny, please check for such similar behavior in 3.3.0qa23
Comment 3 Amar Tumballi 2012-06-07 06:55:47 EDT
if it doesn't happen any more on master/release-3.3 branch, mark it UPSTREAM fixed
Comment 4 Amar Tumballi 2012-07-11 06:22:44 EDT
not happening anymore with both release-3.3 branch and upstream

Note You need to log in before you can comment on or make changes to this bug.