Description of problem: --------------------------- While running rebalance on Distribute Replicate Volume , Rebalance process crashed . Version-Release number of selected component (if applicable): ------------------------------------------------------------------ 3.4.0.5rhs-1.el6rhs.x86_64 How reproducible: -------------------- Steps to Reproduce: ------------------- - One of the Rebalance processes earlier had failed due to : Request received from non-privileged port. Failing request [2013-05-10 07:34:23.057388] E [rpcsvc.c:519:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request [2013-05-10 07:34:23.073047] E [rpcsvc.c:519:rpcsvc_handle_rpc_call] 0-glusterd: Request received from non-privileged port. Failing request - Hence enabled requests from insecure ports in /etc/glusterfs/glusterd.vol : option rpc-auth-allow-insecure on - Restarted glusterd 1.Created a 2x2 distributed volume and started it 2.Mounted the volume and created some files 3.Add brick and started rebalance 4. Check Rebalance status gluster v rebalance distribute-replicate status Node Rebalanced-files size scanned failures status run time in secs ------ --------------- ----- ------ --------- ------ ---------------- localhost 0 0Bytes 0 0 failed 0.00 localhost 0 0Bytes 0 0 failed 0.00 localhost 0 0Bytes 0 0 failed 0.00 10.70.34.86 0 0Bytes 0 0 failed 0.00 ------------------------------------------------------------------------- patchset: git://git.gluster.com/glusterfs.git signal received: 11 time of crash: 2013-05-10 07:44:01configuration details: argp 1 backtrace 1 dlfcn 1 fdatasync 1 libpthread 1 llistxattr 1 setfsid 1 spinlock 1 epoll.h 1 xattr.h 1 st_atim.tv_nsec 1 package-string: glusterfs 3.4.0.5rhs /lib64/libc.so.6[0x38ce432920] /lib64/libc.so.6[0x38ce4b2483] /lib64/libc.so.6(fnmatch+0x73)[0x38ce4b68e3] /usr/lib64/libglusterfs.so.0(+0x52f71)[0x7fddf0a8bf71] /usr/lib64/libglusterfs.so.0(+0x560d3)[0x7fddf0a8f0d3] /usr/lib64/libglusterfs.so.0(dict_foreach+0x45)[0x7fddf0a4c565] /usr/lib64/libglusterfs.so.0(xlator_options_validate_list+0x2f)[0x7fddf0a8ed5f] /usr/lib64/libglusterfs.so.0(xlator_options_validate+0x39)[0x7fddf0a8edc9] /usr/lib64/libglusterfs.so.0(glusterfs_graph_validate_options+0x2f)[0x7fddf0a7bf6f] /usr/lib64/libglusterfs.so.0(glusterfs_graph_activate+0x1e)[0x7fddf0a7bffe] /usr/sbin/glusterfs(glusterfs_process_volfp+0xeb)[0x404ffb] /usr/sbin/glusterfs(mgmt_getspec_cbk+0x2eb)[0x40bbdb] /usr/lib64/libgfrpc.so.0(rpc_clnt_handle_reply+0xa5)[0x7fddf08323d5] /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x127)[0x7fddf0832fb7] (gdb) bt #0 0x00000038ce4b2483 in internal_fnmatch () from /lib64/libc.so.6 #1 0x00000038ce4b68e3 in fnmatch@@GLIBC_2.2.5 () from /lib64/libc.so.6 #2 0x00007fddf0a8bf71 in xlator_volume_option_get_list (vol_list=<value optimized out>, key=0x1953d00 "data-self-heal") at options.c:786 #3 0x00007fddf0a8f0d3 in xl_opt_validate (dict=0x7fddef253a04, key=0x1953d00 "data-self-heal", value=0x7fddef071abc, data=0x7fff2242b0a0) at options.c:832 #4 0x00007fddf0a4c565 in dict_foreach (dict=0x7fddef253a04, fn=0x7fddf0a8f090 <xl_opt_validate>, data=0x7fff2242b0a0) at dict.c:1109 #5 0x00007fddf0a8ed5f in xlator_options_validate_list (xl=<value optimized out>, options=<value optimized out>, vol_opt=<value optimized out>, op_errstr=0x7fff2242b118) at options.c:871 #6 0x00007fddf0a8edc9 in xlator_options_validate (xl=0x1965700, options=0x7fddef253a04, op_errstr=0x7fff2242b118) at options.c:899 #7 0x00007fddf0a7bf6f in glusterfs_graph_validate_options (graph=<value optimized out>) at graph.c:267 #8 0x00007fddf0a7bffe in glusterfs_graph_activate (graph=0x1953ae0, ctx=0x1917010) at graph.c:470 #9 0x0000000000404ffb in glusterfs_process_volfp (ctx=0x1917010, fp=0x19538a0) at glusterfsd.c:1802 #10 0x000000000040bbdb in mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, count=<value optimized out>, myframe=0x7fddef68c7a4) at glusterfsd-mgmt.c:1583 #11 0x00007fddf08323d5 in rpc_clnt_handle_reply (clnt=0x1949c80, pollin=0x1952b10) at rpc-clnt.c:771 #12 0x00007fddf0832fb7 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x1949cb0, event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:890 #13 0x00007fddf082e8e8 in rpc_transport_notify (this=<value optimized out>, event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:497 #14 0x00007fdded0653a4 in socket_event_poll_in (this=0x194e830) at socket.c:2119 #15 0x00007fdded0654fd in socket_event_handler (fd=<value optimized out>, idx=<value optimized out>, data=0x194e830, poll_in=1, poll_out=0, poll_err=0) at socket.c:2231 #16 0x00007fddf0a939b7 in event_dispatch_epoll_handler (event_pool=0x1932eb0) at event-epoll.c:384 #17 event_dispatch_epoll (event_pool=0x1932eb0) at event-epoll.c:445 #18 0x0000000000406776 in main (argc=31, argv=0x7fff2242c8f8) at glusterfsd.c:1943 Actual results: Expected results: Additional info: gluster v info distribute-replicate Volume Name: distribute-replicate Type: Distributed-Replicate Volume ID: 8cdd9b2b-7e92-4311-8f44-1615e21cf010 Status: Started Number of Bricks: 3 x 2 = 6 Transport-type: tcp Bricks: Brick1: 10.70.34.85:/rhs/brick1/h1 Brick2: 10.70.34.86:/rhs/brick1/h2 Brick3: 10.70.34.105:/rhs/brick1/h3 Brick4: 10.70.34.85:/rhs/brick1/h4 Brick5: 10.70.34.105:/rhs/brick1/h5 Brick6: 10.70.34.85:/rhs/brick1/h6
sos reports at : http://rhsqe-repo.lab.eng.blr.redhat.com/bugs_necessary_info/961682/
Looks like a corruption in volume option list. in xlator_volume_option_get_list cmp_key = opt[index].key[i]; but opt[4].key[1] is corrupted. (gdb) p stub->vol_opt->given_opt[4] $19 = {key = {0x7f29eb5d329b "system.posix_acl_access", 0x100000000 <Address 0x100000000 out of bounds>, 0x7f29eb5d32b3 "system.posix_acl_default", 0x100000000 <Address 0x100000000 out of bounds>}, type = 3948753612, min = 2.1219957909652723e-314, max = 6.907927992680449e-310, value = { 0x100000000 <Address 0x100000000 out of bounds>, 0x7f29eb5d32f1 "gfid-req", 0x100000000 <Address 0x100000000 out of bounds>, 0x0, 0x0, 0x0, 0x0, 0xf5b89c3d00000067 <Address 0xf5b89c3d00000067 out of bounds>, 0x7472747368732e00 <Address 0x7472747368732e00 out of bounds>, 0x65746f6e2e006261 <Address 0x65746f6e2e006261 out of bounds>, 0x6975622e756e672e <Address 0x6975622e756e672e out of bounds>, 0x672e0064692d646c <Address 0x672e0064692d646c out of bounds>, 0x687361682e756e <Address 0x687361682e756e out of bounds>, 0x6d79736e79642e <Address 0x6d79736e79642e out of bounds>, 0x7274736e79642e <Address 0x7274736e79642e out of bounds>, 0x7265762e756e672e <Address 0x7265762e756e672e out of bounds>, 0x6e672e006e6f6973 <Address 0x6e672e006e6f6973 out of bounds>, 0x6f69737265762e75 <Address 0x6f69737265762e75 out of bounds>, 0x6c65722e00725f6e <Address 0x6c65722e00725f6e out of bounds>, 0x722e006e79642e61 <Address 0x722e006e79642e61 out of bounds>, 0x746c702e616c65 <Address 0x746c702e616c65 out of bounds>, 0x742e0074696e692e <Address 0x742e0074696e692e out of bounds>, 0x6e69662e00747865 <Address 0x6e69662e00747865 out of bounds>, 0x7461646f722e0069 <Address 0x7461646f722e0069 out of bounds>, 0x72665f68652e0061 <Address 0x72665f68652e0061 out of bounds>, 0x7264685f656d61 <Address 0x7264685f656d61 out of bounds>, 0x6d6172665f68652e <Address 0x6d6172665f68652e out of bounds>, 0x73726f74632e0065 <Address 0x73726f74632e0065 out of bounds>, 0x73726f74642e00 <Address 0x73726f74642e00 out of bounds>, 0x61642e0072636a2e <Address 0x61642e0072636a2e out of bounds>, 0x722e6c65722e6174 <Address 0x722e6c65722e6174 out of bounds>, 0x6d616e79642e006f <Address 0x6d616e79642e006f out of bounds>, 0x746f672e006369 <Address 0x746f672e006369 out of bounds>, 0x746c702e746f672e <Address 0x746c702e746f672e out of bounds>, 0x2e00617461642e00 <Address 0x2e00617461642e00 out of bounds>, 0x756e672e00737362 <Address 0x756e672e00737362 out of bounds>, 0x696c67756265645f <Address 0x696c67756265645f out of bounds>, 0x6b6e <Address 0x6b6e out of bounds>, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x70000000b <Address 0x70000000b out of bounds>, 0x2 <Address 0x2 out of bounds>, 0x190 <Address 0x190 out of bounds>, 0x190 <Address 0x190 out of bounds>, 0x24 <Address 0x24 out of bounds>, 0x0, 0x4 <Address 0x4 out of bounds>, 0x0, 0x6ffffff60000001e <Address 0x6ffffff60000001e out of bounds>, 0x2 <Address 0x2 out of bounds>, 0x1b8 <Address 0x1b8 out of bounds>, 0x1b8 <Address 0x1b8 out of bounds>, 0x2a0 <Address 0x2a0 out of bounds>, 0x3 <Address 0x3 out of bounds>, 0x8 <Address 0x8 out of bounds>, 0x0, 0xb00000028 <Address 0xb00000028 out of bounds>, 0x2 <Address 0x2 out of bounds>}, default_value = 0x458 <Address 0x458 out of bounds>, description = 0x458 <Address 0x458 out of bounds>, validate = 2928} (gdb) info reg rax 0x0 0 rbx 0x7fff4bcad330 140734464971568 rcx 0x0 0 rdx 0x0 0 rsi 0x7fff4bcac970 140734464969072 rdi 0x100000000 4294967296 rbp 0x7f29f38bfa04 0x7f29f38bfa04 rsp 0x7fff4bcad2a0 0x7fff4bcad2a0 r8 0x2 2 r9 0x0 0 r10 0x6ea81e 7251998 r11 0x7f29f50fad30 139818181831984 r12 0x6ea810 7251984 r13 0x7f29f36ddabc 139818154449596 r14 0x6fc230 7324208 r15 0x7f29f38bfa04 139818156423684 rip 0x7f29f50fb0d3 0x7f29f50fb0d3 <xl_opt_validate+67> eflags 0x246 [ PF ZF IF ] cs 0x33 51 ss 0x2b 43 ds 0x0 0 es 0x0 0 fs 0x0 0 gs 0x0 0 Other options: (gdb) p stub->vol_opt->given_opt[1] $22 = {key = {0x7f29eb5d3082 "cache-posix-acl", 0x0, 0x0, 0x0}, type = GF_OPTION_TYPE_BOOL, min = 0, max = 0, value = {0x0 <repeats 64 times>}, default_value = 0x7f29eb5d328e "false", description = 0x0, validate = GF_OPT_VALIDATE_BOTH} (gdb) p stub->vol_opt->given_opt[2] $23 = {key = {0x7f29eb5d3059 "md-cache-timeout", 0x0, 0x0, 0x0}, type = GF_OPTION_TYPE_INT, min = 0, max = 60, value = {0x0 <repeats 64 times>}, default_value = 0x7f29eb5d3294 "1", description = 0x7f29eb5d3648 "Time period after which cache has to be refreshed", validate = GF_OPT_VALIDATE_BOTH} (gdb) p stub->vol_opt->given_opt[3] $24 = {key = {0x7f29eb5d30a4 "force-readdirp", 0x0, 0x0, 0x0}, type = GF_OPTION_TYPE_BOOL, min = 0, max = 0, value = {0x0 <repeats 64 times>}, default_value = 0x7f29eb5d3296 "true", description = 0x7f29eb5d3680 "Convert all readdir requests to readdirplus to collect stat info on each entry.", validate = GF_OPT_VALIDATE_BOTH}
(gdb) p stub->vol_opt->given_opt[0] $27 = {key = {0x7f29eb5d306a "cache-selinux", 0x0, 0x0, 0x0}, type = GF_OPTION_TYPE_BOOL, min = 0, max = 0, value = {0x0 <repeats 64 times>}, default_value = 0x7f29eb5d328e "false", description = 0x0, validate = GF_OPT_VALIDATE_BOTH}
Version : ======= 3.4.0.7rhs-1.el6rhs.x86_64 Could not reproduce the issue . Marking it as Verfied .
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-1262.html