860606 – glusterfsd crashed

Bug 860606 - glusterfsd crashed

Summary: glusterfsd crashed

Keywords:
Status:	CLOSED WORKSFORME
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	2.0
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Raghavendra Bhat
QA Contact:	Sachidananda Urs
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2012-09-26 09:42 UTC by Rahul Hinduja
Modified:	2012-12-04 09:56 UTC (History)
CC List:	3 users (show)
Fixed In Version:	glusterfs-3.3.0.5rhs-36,glusterfs-3.4.0qa4
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2012-12-04 09:55:21 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
'thread apply all bt full' output (20.13 KB, application/octet-stream) 2012-10-01 05:56 UTC, Rahul Hinduja	no flags	Details
View All

Description Rahul Hinduja 2012-09-26 09:42:30 UTC

Description of problem:

Found a core in "/var/log/core" directory of one of the machine in 2*2 distributed-replicate volume. 

There were lot of self heal operations performed. Not sure which operation caused this crash.


Version-Release number of selected component (if applicable):

[root@hicks entries]# gluster --version 
glusterfs 3.3.0rhs built on Sep 10 2012 00:49:11

(glusterfs-rdma-3.3.0rhs-28.el6rhs.x86_64)


Backtrace:
==========

Core was generated by `/usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/'.
Program terminated with signal 6, Aborted.
#0  0x00007f8b1ddc38a5 in raise () from /lib64/libc.so.6
Missing separate debuginfos, use: debuginfo-install glibc-2.12-1.80.el6_3.5.x86_64 libgcc-4.4.6-4.el6.x86_64 openssl-1.0.0-25.el6_3.1.x86_64 zlib-1.2.3-27.el6.x86_64
(gdb) bt
#0  0x00007f8b1ddc38a5 in raise () from /lib64/libc.so.6
#1  0x00007f8b1ddc5085 in abort () from /lib64/libc.so.6
#2  0x00007f8b1de00fe7 in __libc_message () from /lib64/libc.so.6
#3  0x00007f8b1de06916 in malloc_printerr () from /lib64/libc.so.6
#4  0x00007f8b1de0a394 in _int_malloc () from /lib64/libc.so.6
#5  0x00007f8b1de0b141 in malloc () from /lib64/libc.so.6
#6  0x00007f8b1de92738 in __vasprintf_chk () from /lib64/libc.so.6
#7  0x0000003032e19520 in vasprintf (domain=0x40ea58 "glusterfsd-mgmt", 
    file=0x40e9c6 "glusterfsd-mgmt.c", function=0x40f5f0 "is_graph_topology_equal", line=1437, 
    level=GF_LOG_DEBUG, fmt=0x40ee20 "graphs are not equal") at /usr/include/bits/stdio2.h:199
#8  _gf_log (domain=0x40ea58 "glusterfsd-mgmt", file=0x40e9c6 "glusterfsd-mgmt.c", 
    function=0x40f5f0 "is_graph_topology_equal", line=1437, level=GF_LOG_DEBUG, 
    fmt=0x40ee20 "graphs are not equal") at logging.c:565
#9  0x000000000040cb3d in is_graph_topology_equal (req=<value optimized out>, 
    iov=<value optimized out>, count=<value optimized out>, myframe=0x7f8b1cddb904)
    at glusterfsd-mgmt.c:1436
#10 glusterfs_volfile_reconfigure (req=<value optimized out>, iov=<value optimized out>, 
    count=<value optimized out>, myframe=0x7f8b1cddb904) at glusterfsd-mgmt.c:1487
#11 mgmt_getspec_cbk (req=<value optimized out>, iov=<value optimized out>, 
    count=<value optimized out>, myframe=0x7f8b1cddb904) at glusterfsd-mgmt.c:1589
#12 0x000000303360f0c5 in rpc_clnt_handle_reply (clnt=0x2732a10, pollin=0x313e580) at rpc-clnt.c:788
#13 0x000000303360f8c0 in rpc_clnt_notify (trans=<value optimized out>, mydata=0x2732a40, 
    event=<value optimized out>, data=<value optimized out>) at rpc-clnt.c:907
#14 0x000000303360b018 in rpc_transport_notify (this=<value optimized out>, 
    event=<value optimized out>, data=<value optimized out>) at rpc-transport.c:489
#15 0x00007f8b1aa5f954 in socket_event_poll_in (this=0x2737640) at socket.c:1677
#16 0x00007f8b1aa5fa37 in socket_event_handler (fd=<value optimized out>, idx=5, data=0x2737640, 
    poll_in=1, poll_out=0, poll_err=<value optimized out>) at socket.c:1792
#17 0x0000003032e3ed84 in event_dispatch_epoll_handler (event_pool=0x2720630) at event.c:785
#18 event_dispatch_epoll (event_pool=0x2720630) at event.c:847
#19 0x00000000004073ca in main (argc=<value optimized out>, argv=0x7fff018d6de8)
---Type <return> to continue, or q <return> to quit---q
 at glusterfsd.c:1689Quit
(gdb) q

Comment 2 Amar Tumballi 2012-10-01 03:20:59 UTC

looking at the backtrace, didn't notice anything serious... was the memory usage very high? do you still have the core? want 'thread apply all bt full' command output...

Comment 3 Rahul Hinduja 2012-10-01 05:55:19 UTC

I am not sure about the memory usage at the time of core. I am attaching the output of 'thread apply all bt full' command.

Let me know in case you want to have complete core file.

Comment 4 Rahul Hinduja 2012-10-01 05:56:08 UTC

Created attachment 619621 [details]
'thread apply all bt full' output

Comment 5 Raghavendra Bhat 2012-10-17 09:06:56 UTC

Were any volume set operations running? Was there any nfs mount of the volume? The crashed process seems to be nfs server.

Comment 6 Raghavendra Bhat 2012-10-17 09:30:09 UTC

Also can you attached the logs of the crashed glusterfs process?

Comment 7 Rahul Hinduja 2012-10-19 07:29:17 UTC

VM is re-provisioned, hence can not provide any further log

Comment 8 Raghavendra Bhat 2012-12-04 09:55:21 UTC

With master branch not seen this happening anymore, running the similar type of tests in longevity test-bed for more than 2weeks and this issue is not seen. Marking it as WORKSFORME (with Fixed in version as 3.3.0.5rhs-36), please feel free to reopen if seen again.

Note You need to log in before you can comment on or make changes to this bug.