Description of problem: had a 6x2 volume rhs nodes[1, 2, 3, 4] set limit on volume and directories while I/O going on had put down two nodes after sometime used gluster volume start <vol> force to bring back bricks started self heal, before this step the I/O was stopped. [root@nfs1 ~]# gluster volume quota quota-dist-rep list Path Hard-limit Soft-limit Used Available -------------------------------------------------------------------------------- / 30GB 90% 7.2GB 22.8GB /dir2 1GB 90% 1023.9MB 64.0KB /dir3 1GB 90% 1022.9MB 1.1MB /dir4 1GB 90% 1023.9MB 64.0KB /dir5 1GB 90% 1022.9MB 1.1MB /dir6 1GB 90% 1022.9MB 1.1MB /dir7 1GB 90% 1.0GB 0Bytes /dir8 1GB 90% 104.0MB 920.0MB /dir9 1GB 90% 0Bytes 1.0GB /dir10 1GB 90% 0Bytes 1.0GB /dir1 2GB 90% 1023.9MB 1.0GB /bar 10MB 90% N/A N/A /foo 10MB 90% 95.4MB 0Bytes Version-Release number of selected component (if applicable): [root@nfs1 ~]# rpm -qa | grep glusterfs glusterfs-3.4.0.12rhs.beta4-1.el6rhs.x86_64 glusterfs-fuse-3.4.0.12rhs.beta4-1.el6rhs.x86_64 glusterfs-server-3.4.0.12rhs.beta4-1.el6rhs.x86_64 How reproducible: happened this time Actual results: found core on both node2 and node3, Status of volume: quota-dist-rep Gluster process Port Online Pid ------------------------------------------------------------------------------ Brick 10.70.37.180:/rhs/bricks/quota-d1r1 49172 Y 23303 Brick 10.70.37.139:/rhs/bricks/quota-d2r2 49172 Y 17100 Brick 10.70.37.180:/rhs/bricks/quota-d3r1 49173 Y 23314 Brick 10.70.37.139:/rhs/bricks/quota-d4r2 49173 Y 17111 Brick 10.70.37.180:/rhs/bricks/quota-d5r1 49174 Y 23325 Brick 10.70.37.139:/rhs/bricks/quota-d6r2 49174 Y 17122 NFS Server on localhost 2049 Y 25673 Self-heal Daemon on localhost N/A Y 25680 NFS Server on 10.70.37.139 2049 Y 18714 Self-heal Daemon on 10.70.37.139 N/A Y 18721 Task ID Status ---- -- ------ Rebalance 9e281276-6e32-43d6-8028-d06c80dc3b18 3 (gdb) bt #0 0x000000396ba328a5 in raise () from /lib64/libc.so.6 #1 0x000000396ba34085 in abort () from /lib64/libc.so.6 #2 0x000000396ba707b7 in __libc_message () from /lib64/libc.so.6 #3 0x000000396ba760e6 in malloc_printerr () from /lib64/libc.so.6 #4 0x000000348f415715 in data_destroy (data=0x7f041c24f200) at dict.c:147 #5 0x000000348f416309 in _dict_set (this=<value optimized out>, key=0x7f041a21b8ff "features.limit-usage", value=0x7f041c25088c, replace=_gf_true) at dict.c:262 #6 0x000000348f41654a in dict_set (this=0x7f041c431144, key=0x7f041a21b8ff "features.limit-usage", value=0x7f041c25088c) at dict.c:334 #7 0x00007f041a1f9ff7 in glusterd_quota_limit_usage (volinfo=0x19ad930, dict=0x7f041c4327b0, op_errstr=0x1c4f538) at glusterd-quota.c:717 #8 0x00007f041a1faf78 in glusterd_op_quota (dict=0x7f041c4327b0, op_errstr=0x1c4f538, rsp_dict=0x7f041c432468) at glusterd-quota.c:1019 #9 0x00007f041a1c6046 in glusterd_op_commit_perform (op=GD_OP_QUOTA, dict=0x7f041c4327b0, op_errstr=0x1c4f538, rsp_dict=0x7f041c432468) at glusterd-op-sm.c:3899 #10 0x00007f041a1c7843 in glusterd_op_ac_commit_op (event=<value optimized out>, ctx=0x7f0410000c70) at glusterd-op-sm.c:3645 #11 0x00007f041a1c3281 in glusterd_op_sm () at glusterd-op-sm.c:5309 #12 0x00007f041a1b137d in __glusterd_handle_commit_op (req=0x7f041a12602c) at glusterd-handler.c:750 #13 0x00007f041a1ae53f in glusterd_big_locked_handler (req=0x7f041a12602c, actor_fn=0x7f041a1b1280 <__glusterd_handle_commit_op>) at glusterd-handler.c:75 #14 0x000000348f447292 in synctask_wrap (old_task=<value optimized out>) at syncop.c:131 #15 0x000000396ba43b70 in ?? () from /lib64/libc.so.6 #16 0x0000000000000000 in ?? () (gdb) Expected results: crash is not expected. Additional info: script used for creating data #!/bin/bash set -x create-data() { for i in `seq 1 10` do while [ 1 ] do cmd=`dd if=/dev/urandom of=dir$i/$(date +%s) bs=1024 count=1024 2>&1` echo $cmd if [ "$(echo $cmd | awk '/Disk quota exceeded/')" ] then echo "quota limit reached" break fi done done return 1 } create-data
Looking at the backtrace, it seems to me that the cause of this crash is the same as the cause of the crash in https://bugzilla.redhat.com/show_bug.cgi?id=983544. CAUSE: This happens because in the earlier code (in function glusterd_quota_limit_usage()), the pointer @quota_limits pointed to a location that is pointed to by the 'value' for key='features.limit-usage' in volinfo->dict. At some point in time, we do a GF_FREE on quota_limits. This implies that the 'value' in volinfo->dict gets freed as well, making 'value' a dangling pointer. Now some time later, we do a dict_set_str on key='features.limit-usage' in this same function, which tries to GF_FREE the object pointed to by 'value' before making it point to the new value. This causes the process to crash. In the end, this bug is a case of process crash due to double free. The fix for 983544 is available in glusterfs-3.4.0.12rhs.beta5. Could you please check if this bug is valid in the latest version, i.e., glusterfs-3.4.0.12rhs.beta5?
As per the root cause analysis in comment #4, the bug was fixed as part of the build glusterfs-3.4.0.12rhs.beta5. This is very much true with respect to the new design as well. Hence moving the state of the bug to ON_QA.