Created attachment 997003 [details] corefile of the crashed machine Description of problem: ====================== glusterd crashed after a peer probe. Steps that i followed : 1. Removed all the existing gluster packages using yum 2. Installed with "yum localinstall" the packages which are downloaded from: http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs/epel-6-x86_64/glusterfs-3.7dev-0.577.gitf18a3f3.autobuild/ 3. Tried to create a disperse 1x(8+4) volume and gave out error message that host is not connected 4. Then peer probed the partner and peer status. 5. started the glusterd and then peer status 6. volume create with force option and is successful Version-Release number of selected component (if applicable): ============================================================= [root@vertigo /]# gluster --version glusterfs 3.7dev built on Mar 1 2015 01:03:38 Repository revision: git://git.gluster.com/glusterfs.git Copyright (c) 2006-2011 Gluster Inc. <http://www.gluster.com> GlusterFS comes with ABSOLUTELY NO WARRANTY. You may redistribute copies of GlusterFS under the terms of the GNU General Public License. [root@vertigo /]# How reproducible: ================== Tried only once Steps to Reproduce: =================== As mentioned in the description Actual results: =============== Glusterd crashed Expected results: ================= No crash should be seen Additional info: ================ Attaching the crash file and sosreports of both nodes.
Created attachment 997004 [details] sosreport of Node1
Created attachment 997005 [details] sosreport of Node2
Above link for rpm is 8 Feb 2015 but glusterd version showing 01 Mar 2015 ============================================ [root@vertigo /]# gluster --version glusterfs 3.7dev built on Mar 1 2015 01:03:38 rpms for 01-mar-2015 is : http://download.gluster.org/pub/gluster/glusterfs/nightly/glusterfs/epel-6-x86_64/glusterfs-3.7dev-0.627.git32dd227.autobuild/ Back trace from core dump : Loaded symbols for /lib64/libnss_dns-2.12.so Core was generated by `/usr/sbin/glusterd --pid-file=/var/run/glusterd.pid'. Program terminated with signal 7, Bus error. #0 __gf_free (free_ptr=0x7fb4c8000d90) at mem-pool.c:261 261 if (!xl->mem_acct.rec) { (gdb) bt #0 __gf_free (free_ptr=0x7fb4c8000d90) at mem-pool.c:261 #1 0x00007fb4e31553a5 in data_destroy (data=0x7fb4e180fbe0) at dict.c:148 #2 0x00007fb4e315561e in dict_get_str (this=<value optimized out>, key=<value optimized out>, str=0x7fb4c4203178) at dict.c:2097 #3 0x00007fb4d8e7465a in glusterd_xfer_cli_probe_resp (req=0x19fe35c, op_ret=-1, op_errno=0, op_errstr=0x0, hostname=0x7fb4c8000d50 "", port=24007, dict=0x7fb4e19f1538) at glusterd-handler.c:3455 #4 0x00007fb4d8e75442 in __glusterd_handle_cli_probe (req=0x19fe35c) at glusterd-handler.c:1056 #5 0x00007fb4d8e6064f in glusterd_big_locked_handler (req=0x19fe35c, actor_fn=0x7fb4d8e75090 <__glusterd_handle_cli_probe>) at glusterd-handler.c:82 #6 0x00007fb4e3199502 in synctask_wrap (old_task=<value optimized out>) at syncop.c:375 #7 0x0000003a38c438f0 in ?? () from /lib64/libc-2.12.so #8 0x0000000000000000 in ?? () (gdb) list 256 memcpy (&xl, ptr, sizeof(xlator_t *)); 257 258 //gf_free expects xl to be available 259 GF_ASSERT (xl != NULL); 260 261 if (!xl->mem_acct.rec) { 262 ptr = (char *)free_ptr - GF_MEM_HEADER_SIZE; 263 goto free; 264 } 265 I am not able to reproduce this bug as mention above steps, Please can explain when it happen or can you reproduce it , Based on core and sos report I found that glusterd was killed due to SIGBUS while accessing the dictionary . I am not able to figure out what could be the problem ,but I found it is happen due to memory alignment issue , not related to peer probe . As per discussed with Bhaskar , It is happening one time during nfs related bug( https://bugzilla.redhat.com/show_bug.cgi?id=1196546) . Need more info to analyse this bug ..........
Bhaskarakiran, From the analysis it seems like the executables and the rpms referred are not the same. We wouldn't be able to analyse this with the current state. If this doesn't get reproduced, could you please close this bug saying incorrect setup? Thanks, Atin
Atin, I had pointed to the wrong nightly builds by mistake but the correct ones are mentioned by Anand. I will try to reproduce if not can close saying not-reproducible. Thanks, Bhaskarakiran.
Closing this bug as its not reproducible. Kindly re-open if it happens to hit the same problem.