Bug 764281 (GLUSTER-2549)

Summary: Quota[glusterfs-3.2.1qa3]: enable/disable crashes the glusterd on other node
Product: [Community] GlusterFS Reporter: Saurabh <saurabh>
Component: quotaAssignee: Pranith Kumar K <pkarampu>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: high Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, pkarampu
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
test cases I used for unit-testing none

Description Saurabh 2011-03-17 13:09:46 UTC
I have two servers with a distribute volume on it,
Enabling the quota will work.
but disabling it will kill the glusterd process on the other node, but the quota disable fails.
and again if I manually kill all processes and enable glusterd on both nodes, then disable works. 

but again enabling it back will kill the glusterd on other node. 

I have tried to test in similar fashion from node A and node B.

here is the bt of one of the cores,


(gdb) bt
#0  0x00002ab67feba28c in _dict_lookup (this=0x1aed50b0, key=0x2aaaaab0b58d "errstr") at dict.c:220
#1  0x00002ab67febb4ad in _dict_set (this=0x1aed50b0, key=<value optimized out>, value=0x1aed3810) at dict.c:251
#2  dict_set (this=0x1aed50b0, key=<value optimized out>, value=0x1aed3810) at dict.c:315
#3  0x00002aaaaaae37eb in glusterd_op_quota (dict=0x1aed38c0, op_errstr=0x7fff9183f4f0) at glusterd-op-sm.c:4645
#4  0x00002aaaaaaed9b3 in glusterd_op_stage_validate (op=<value optimized out>, dict=0x1aed38c0, op_errstr=0x7fff9183f4f0, rsp_dict=0x2aaaaab0b591)
    at glusterd-op-sm.c:6675
#5  0x00002aaaaaaeea0c in glusterd_op_ac_stage_op (event=<value optimized out>, ctx=0x1aed51c0) at glusterd-op-sm.c:6504
#6  0x00002aaaaaadb39f in glusterd_op_sm () at glusterd-op-sm.c:7557
#7  0x00002aaaaaac73d7 in glusterd_handle_stage_op (req=<value optimized out>) at glusterd-handler.c:565
#8  0x00002ab680119a7e in rpcsvc_handle_rpc_call (svc=0x1aec7020, trans=<value optimized out>, msg=0x1aed7640) at rpcsvc.c:1003
#9  0x00002ab680119c7c in rpcsvc_notify (trans=0x1aed1520, mydata=0x2aaaaab0b591, event=<value optimized out>, data=0x1aed7640) at rpcsvc.c:1099
#10 0x00002ab68011ab9c in rpc_transport_notify (this=0x2aaaaab0b591, event=RPC_TRANSPORT_DISCONNECT, data=0x2aaaaab0b591) at rpc-transport.c:1029
#11 0x00002aaaaadd743f in socket_event_poll_in (this=0x1aed1520) at socket.c:1639
#12 0x00002aaaaadd75c8 in socket_event_handler (fd=<value optimized out>, idx=1, data=0x1aed1520, poll_in=1, poll_out=0, poll_err=0) at socket.c:1753
#13 0x00002ab67fee1717 in event_dispatch_epoll_handler (event_pool=0x1aec5360) at event.c:812
#14 event_dispatch_epoll (event_pool=0x1aec5360) at event.c:876
#15 0x0000000000404fdb in main (argc=1, argv=0x7fff9183fdd8) at glusterfsd.c:1458
(gdb) frame 3
#3  0x00002aaaaaae37eb in glusterd_op_quota (dict=0x1aed38c0, op_errstr=0x7fff9183f4f0) at glusterd-op-sm.c:4645
4645	                ret = dict_set_str (ctx, "errstr", *op_errstr);
(gdb) info threads
  4 Thread 4770  0x0000003a8ae0e838 in do_sigwait () from /lib64/libpthread.so.0
  3 Thread 4818  0x0000003a8a69a1a1 in nanosleep () from /lib64/libc.so.6
  2 Thread 4862  0x0000003a8a699daf in waitpid () from /lib64/libc.so.6
* 1 Thread 4769  0x00002ab67feba28c in _dict_lookup (this=0x1aed50b0, key=0x2aaaaab0b58d "errstr") at dict.c:220
(gdb) p *this
No symbol "this" in current context.
(gdb) down
#2  dict_set (this=0x1aed50b0, key=<value optimized out>, value=0x1aed3810) at dict.c:315
315		ret = _dict_set (this, key, value);
(gdb) p *this
$1 = {is_static = 0 '\000', hash_size = 0, count = -2141910640, refcount = 10934, members = 0x3a8a41c4e8, members_list = 0x0, extra_free = 0x1b18fe80 "", 
  extra_stdfree = 0x2ab680550990 "", lock = -1975401240}
(gdb) p *this->members
$2 = (data_pair_t *) 0x0


################################# volume log file messages####################
[2011-03-16 19:45:10.255600] I [glusterd-handler.c:488:glusterd_req_ctx_create] glusterd: Received op from uuid: eb79e865-3435-4fe0-8389-66f819026df0
pending frames:

patchset: v3.2.1qa3
signal received: 8
time of crash: 2011-03-16 19:45:18
configuration details:
argp 1
backtrace 1
dlfcn 1
fdatasync 1
libpthread 1
llistxattr 1
setfsid 1
spinlock 1
epoll.h 1
xattr.h 1
st_atim.tv_nsec 1
package-string: glusterfs 3.2.1qa3
/lib64/libc.so.6[0x3a8a6302d0]
/opt/glusterfs/3.2.1/inst//lib/libglusterfs.so.0[0x2ab67feba28c]
/opt/glusterfs/3.2.1/inst//lib/libglusterfs.so.0(dict_set+0x8d)[0x2ab67febb4ad]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/xlator/mgmt/glusterd.so(glusterd_op_quota+0xeb)[0x2aaaaaae37eb]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/xlator/mgmt/glusterd.so(glusterd_op_stage_validate+0x723)[0x2aaaaaaed9b3]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/xlator/mgmt/glusterd.so[0x2aaaaaaeea0c]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/xlator/mgmt/glusterd.so(glusterd_op_sm+0x15f)[0x2aaaaaadb39f]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/xlator/mgmt/glusterd.so(glusterd_handle_stage_op+0xb7)[0x2aaaaaac73d7]
/opt/glusterfs/3.2.1/inst//lib/libgfrpc.so.0(rpcsvc_handle_rpc_call+0x28e)[0x2ab680119a7e]
/opt/glusterfs/3.2.1/inst//lib/libgfrpc.so.0(rpcsvc_notify+0x16c)[0x2ab680119c7c]
/opt/glusterfs/3.2.1/inst//lib/libgfrpc.so.0(rpc_transport_notify+0x2c)[0x2ab68011ab9c]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/rpc-transport/socket.so(socket_event_poll_in+0x3f)[0x2aaaaadd743f]
/opt/glusterfs/3.2.1/inst//lib/glusterfs/3.2.1qa3/rpc-transport/socket.so(socket_event_handler+0x168)[0x2aaaaadd75c8]
/opt/glusterfs/3.2.1/inst//lib/libglusterfs.so.0[0x2ab67fee1717]
/opt/glusterfs/3.2.1/inst/sbin/glusterd(main+0x38b)[0x404fdb]
/lib64/libc.so.6(__libc_start_main+0xf4)[0x3a8a61d994]
/opt/glusterfs/3.2.1/inst/sbin/glusterd[0x403619]
---------

Comment 1 Saurabh 2011-03-18 04:06:34 UTC
just now I tried to use quota, and enabled it but the issue happens is that while setting space-limit kills glusterd on the other node,


gluster> volume quota dist1 enable
quota translator is enabled
gluster> volume quotaa dist1 limit-usage /dist1 2GB
unrecognized word: quotaa (position 1)
gluster> volume quota dist1 limit-usage /dist1 2GB
Quota command failed
gluster> peer status
Number of Peers: 1

Hostname: 10.1.12.135
Uuid: eb79e865-3435-4fe0-8389-66f819026df0
State: Peer in Cluster (Disconnected)


#######################from other node#############################

root      8245     1  0 16:22 ?        00:00:01 /opt/glusterfs/3.2.1/inst//sbin/glusterfsd --xlator-option dist1-server.listen-port=24017 -s localhost --volfile-id dist1.10.1.12.135.mnt-dist1 -p /etc/glusterd/vols/dist1/run/10.1.12.135-mnt-dist1.pid -S /tmp/73c5bcf416ce90433cab2ded1614ede3.socket --brick-name /mnt/dist1 --brick-port 24017 -l /opt/glusterfs/3.2.1/inst//var/log/glusterfs/bricks/mnt-dist1.log
root      8285     1  0 16:33 ?        00:00:01 /opt/glusterfs/3.2.1/inst//sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /opt/glusterfs/3.2.1/inst//var/log/glusterfs/nfs.log
root      8373  8143  0 16:59 pts/0    00:00:00 grep glu


##################bt of the core########################


Core was generated by `/opt/glusterfs/3.2.1/inst//sbin/glusterd'.
Program terminated with signal 11, Segmentation fault.
#0  0x00002ad82e02c299 in _dict_lookup (this=0x1efc7810, key=0x2aaaaab0b58d "errstr") at dict.c:220
220		for (pair = this->members[hashval]; pair != NULL; pair = pair->hash_next) {
(gdb) bt
#0  0x00002ad82e02c299 in _dict_lookup (this=0x1efc7810, key=0x2aaaaab0b58d "errstr") at dict.c:220
#1  0x00002ad82e02d4ad in _dict_set (this=0x1efc7810, key=<value optimized out>, value=0x1efc99f0) at dict.c:251
#2  dict_set (this=0x1efc7810, key=<value optimized out>, value=0x1efc99f0) at dict.c:315
#3  0x00002aaaaaae37eb in glusterd_op_quota (dict=0x1efbdcd0, op_errstr=0x7fff27bcfd20) at glusterd-op-sm.c:4645
#4  0x00002aaaaaaed9b3 in glusterd_op_stage_validate (op=<value optimized out>, dict=0x1efbdcd0, op_errstr=0x7fff27bcfd20, rsp_dict=0x2aaaaab0b591)
    at glusterd-op-sm.c:6675
#5  0x00002aaaaaaeea0c in glusterd_op_ac_stage_op (event=<value optimized out>, ctx=0x1efc7ad0) at glusterd-op-sm.c:6504
#6  0x00002aaaaaadb39f in glusterd_op_sm () at glusterd-op-sm.c:7557
#7  0x00002aaaaaac73d7 in glusterd_handle_stage_op (req=<value optimized out>) at glusterd-handler.c:565
#8  0x00002ad82e28ba7e in rpcsvc_handle_rpc_call (svc=0x1efbe020, trans=<value optimized out>, msg=0x1efc7130) at rpcsvc.c:1003
#9  0x00002ad82e28bc7c in rpcsvc_notify (trans=0x1efc8730, mydata=0x2aaaaab0b591, event=<value optimized out>, data=0x1efc7130) at rpcsvc.c:1099
#10 0x00002ad82e28cb9c in rpc_transport_notify (this=0x2aaaaab0b591, event=RPC_TRANSPORT_DISCONNECT, data=0x2aaaaab0b591) at rpc-transport.c:1029
#11 0x00002aaaaadd743f in socket_event_poll_in (this=0x1efc8730) at socket.c:1639
#12 0x00002aaaaadd75c8 in socket_event_handler (fd=<value optimized out>, idx=3, data=0x1efc8730, poll_in=1, poll_out=0, poll_err=0) at socket.c:1753
#13 0x00002ad82e053717 in event_dispatch_epoll_handler (event_pool=0x1efbc360) at event.c:812
#14 event_dispatch_epoll (event_pool=0x1efbc360) at event.c:876
#15 0x0000000000404fdb in main (argc=1, argv=0x7fff27bd0608) at glusterfsd.c:1458

Comment 2 Vijay Bellur 2011-03-25 08:38:17 UTC
PATCH: http://patches.gluster.com/patch/6588 in master (mgmt/glusterd: Fix import friend volumes)

Comment 3 Pranith Kumar K 2011-03-28 05:56:29 UTC
Created attachment 466


Attached is the test script I used for unit-testing. You will have to start the glusterd on both the machines with valgrind for it to work. I think you can improve this script to add as regression.

Comment 4 Saurabh 2011-04-17 06:51:28 UTC
test done over fuse mount for a dist-rep volume

[root@centos-qa-client-3 gluster-test]# rm -rf *
[root@centos-qa-client-3 gluster-test]# dd if=/dev/zero of=f.1 bs=1KB count=512
512+0 records in
512+0 records out
512000 bytes (512 kB) copied, 0.210623 seconds, 2.4 MB/s
[root@centos-qa-client-3 gluster-test]# dd if=/dev/zero of=f.2 bs=1KB count=512
512+0 records in
512+0 records out
512000 bytes (512 kB) copied, 0.170953 seconds, 3.0 MB/s
[root@centos-qa-client-3 gluster-test]# dd if=/dev/zero of=f.3 bs=1KB count=200
dd: closing output file `f.3': Disk quota exceeded
[root@centos-qa-client-3 gluster-test]# dd if=/dev/zero of=f.4 bs=1KB count=1
1+0 records in
1+0 records out
1000 bytes (1.0 kB) copied, 0.000647 seconds, 1.5 MB/s
[root@centos-qa-client-3 gluster-test]# ls -l
total 1044
-rw-r--r-- 1 root root 512000 Apr 17 02:48 f.1
-rw-r--r-- 1 root root 512000 Apr 17 02:48 f.2
-rw-r--r-- 1 root root  14000 Apr 17 02:48 f.3
-rw-r--r-- 1 root root   1000 Apr 17 02:48 f.4
[root@centos-qa-client-3 gluster-test]# 


########################################################################


[root@centos-qa-client-2 sbin]# ./gluster volume quota dr2 list 
	path		  limit_set	     size
----------------------------------------------------------------------------------
/                       1048576                    0
[root@centos-qa-client-2 sbin]# ./gluster volume quota dr2 list 
	path		  limit_set	     size
----------------------------------------------------------------------------------
/                       1048576              1024000
[root@centos-qa-client-2 sbin]# ./gluster volume quota dr2 list 
	path		  limit_set	     size
----------------------------------------------------------------------------------
/                       1048576              1039000
[root@centos-qa-client-2 sbin]# 


Hence moving this bug to verified state

Comment 5 Saurabh 2011-04-17 06:56:19 UTC
last time wrongly updated this bug, the bug to be moved verified was 2741, as the tabs were open for both bug, this one got moved to other state.

Comment 6 Saurabh 2011-04-18 15:32:50 UTC
ran valgrind on the bricks and leaks were not found. ran posix and untar of linux kernel tarball from the mount point