Bug 860606

Summary: glusterfsd crashed
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Rahul Hinduja <rhinduja>
Component: glusterdAssignee: Raghavendra Bhat <rabhat>
Status: CLOSED WORKSFORME QA Contact: Sachidananda Urs <surs>
Severity: high Docs Contact:
Priority: high    
Version: 2.0CC: amarts, rhs-bugs, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: glusterfs-3.3.0.5rhs-36,glusterfs-3.4.0qa4 Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-12-04 09:55:21 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
'thread apply all bt full' output none

Description Rahul Hinduja 2012-09-26 09:42:30 UTC
Description of problem:

Found a core in "/var/log/core" directory of one of the machine in 2*2 distributed-replicate volume. 

There were lot of self heal operations performed. Not sure which operation caused this crash.


Version-Release number of selected component (if applicable):

[root@hicks entries]# gluster --version 
glusterfs 3.3.0rhs built on Sep 10 2012 00:49:11

(glusterfs-rdma-3.3.0rhs-28.el6rhs.x86_64)


Backtrace:
==========

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'.
Program terminated with signal 6, Aborted.
#0  0x00007f8b1ddc38a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.5.x86_64 libgcc-4.4.6-4.el6.x86_64 openssl-1.0.0-25.el6_3.1.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x00007f8b1ddc38a5 in raise () from /lib64/libc.so.6
#1  0x00007f8b1ddc5085 in abort () from /lib64/libc.so.6
#2  0x00007f8b1de00fe7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f8b1de06916 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f8b1de0a394 in _int_malloc () from /lib64/libc.so.6
#5  0x00007f8b1de0b141 in malloc () from /lib64/libc.so.6
#6  0x00007f8b1de92738 in __vasprintf_chk () from /lib64/libc.so.6
#7  0x0000003032e19520 in vasprintf (domain=0x40ea58 "glusterfsd-mgmt", 
    file=0x40e9c6 "glusterfsd-mgmt.c", function=0x40f5f0 "is_graph_topology_equal", line=1437, 
    level=GF_LOG_DEBUG, fmt=0x40ee20 "graphs are not equal") at /usr/include/bits/stdio2.h:199
#8  _gf_log (domain=0x40ea58 "glusterfsd-mgmt", file=0x40e9c6 "glusterfsd-mgmt.c", 
    function=0x40f5f0 "is_graph_topology_equal", line=1437, level=GF_LOG_DEBUG, 
    fmt=0x40ee20 "graphs are not equal") at logging.c:565
#9  0x000000000040cb3d in is_graph_topology_equal (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, myframe=0x7f8b1cddb904)
    at glusterfsd-mgmt.c:1436
#10 glusterfs_volfile_reconfigure (req=<value optimized out>, iov=<value optimized out>, 
    count=<value optimized out>, myframe=0x7f8b1cddb904) at glusterfsd-mgmt.c:1487
#11 mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, 
    count=<value optimized out>, myframe=0x7f8b1cddb904) at glusterfsd-mgmt.c:1589
#12 0x000000303360f0c5 in rpc_clnt_handle_reply (clnt=0x2732a10, pollin=0x313e580) at rpc-clnt.c:788
#13 0x000000303360f8c0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x2732a40, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907
#14 0x000000303360b018 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#15 0x00007f8b1aa5f954 in socket_event_poll_in (this=0x2737640) at socket.c:1677
#16 0x00007f8b1aa5fa37 in socket_event_handler (fd=<value optimized out>, idx=5, data=0x2737640, 
    poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#17 0x0000003032e3ed84 in event_dispatch_epoll_handler (event_pool=0x2720630) at event.c:785
#18 event_dispatch_epoll (event_pool=0x2720630) at event.c:847
#19 0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fff018d6de8)
---Type <return> to continue, or q <return> to quit---q
 at glusterfsd.c:1689Quit
(gdb) q

Comment 2 Amar Tumballi 2012-10-01 03:20:59 UTC
looking at the backtrace, didn't notice anything serious... was the memory usage very high? do you still have the core? want 'thread apply all bt full' command output...

Comment 3 Rahul Hinduja 2012-10-01 05:55:19 UTC
I am not sure about the memory usage at the time of core. I am attaching the output of 'thread apply all bt full' command.

Let me know in case you want to have complete core file.

Comment 4 Rahul Hinduja 2012-10-01 05:56:08 UTC
Created attachment 619621 [details]
'thread apply all bt full' output

Comment 5 Raghavendra Bhat 2012-10-17 09:06:56 UTC
Were any volume set operations running? Was there any nfs mount of the volume? The crashed process seems to be nfs server.

Comment 6 Raghavendra Bhat 2012-10-17 09:30:09 UTC
Also can you attached the logs of the crashed glusterfs process?

Comment 7 Rahul Hinduja 2012-10-19 07:29:17 UTC
VM is re-provisioned, hence can not provide any further log

Comment 8 Raghavendra Bhat 2012-12-04 09:55:21 UTC
With master branch not seen this happening anymore, running the similar type of tests in longevity test-bed for more than 2weeks and this issue is not seen. Marking it as WORKSFORME (with Fixed in version as 3.3.0.5rhs-36), please feel free to reopen if seen again.