Bug 795131

Summary: glusterd is killed when volume statedump operation is performed
Product: [Community] GlusterFS Reporter: Shwetha Panduranga <shwetha.h.panduranga>
Component: glusterdAssignee: Kaushal <kaushal>
Status: CLOSED WORKSFORME QA Contact:
Severity: medium Docs Contact:
Priority: high    
Version: mainlineCC: gluster-bugs, kaushal
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-28 03:13:31 EDT Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Description Shwetha Panduranga 2012-02-19 10:06:52 EST
Description of problem:
Glusterd process is killed when volume statedump is performed. Happens when the statedump file size is big. (ex:- 123K)

Version-Release number of selected component (if applicable):
mainline

How reproducible:
occasionally 

Steps to Reproduce:
1.gluster volume statedump <volumename> all 

Actual results:
Dumps the state information of the volume into a files and kills glusterd process

Expected results:
Should dump the state information of the volume into a file. Shouldn't kill glusterd

Additional info:
[root@APP-SERVER1 ~]# ps -ef | grep gluster
root      1131 32390  0 23:02 pts/0    00:00:00 grep gluster
root     25773     1  0 Feb15 ?        00:12:28 /usr/local/sbin/glusterfsd -s localhost --volfile-id datastore.192.168.2.35.export1 -p /etc/glusterd/vols/datastore/run/192.168.2.35-export1.pid -S /tmp/2333e1dc05141f93d1455de5da4d7188.socket --brick-name /export1 -l /usr/local/var/log/glusterfs/bricks/export1.log --brick-port 24009 --xlator-option datastore-server.listen-port=24009
root     29803     1  0 Feb16 ?        00:00:01 glusterd
root     29823     1  0 Feb16 ?        00:00:00 /usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /usr/local/var/log/glusterfs/nfs.log
root     29830     1  0 Feb16 ?        00:00:00 /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /etc/glusterd/glustershd/run/glustershd.pid -l /usr/local/var/log/glusterfs/glustershd.log -S /tmp/ed4a200f26afd8f124ea87fd774eee7a.socket


[root@APP-SERVER1 ~]# gluster volume statedump datastore all
Connection failed. Please check if gluster daemon is operational.


[root@APP-SERVER1 ~]# ps -ef | grep gluster
root      1140 32390  0 23:02 pts/0    00:00:00 grep gluster
root     25773     1  0 Feb15 ?        00:12:30 /usr/local/sbin/glusterfsd -s localhost --volfile-id datastore.192.168.2.35.export1 -p /etc/glusterd/vols/datastore/run/192.168.2.35-export1.pid -S /tmp/2333e1dc05141f93d1455de5da4d7188.socket --brick-name /export1 -l /usr/local/var/log/glusterfs/bricks/export1.log --brick-port 24009 --xlator-option datastore-server.listen-port=24009
root     29823     1  0 Feb16 ?        00:00:00 /usr/local/sbin/glusterfs -f /etc/glusterd/nfs/nfs-server.vol -p /etc/glusterd/nfs/run/nfs.pid -l /usr/local/var/log/glusterfs/nfs.log
root     29830     1  0 Feb16 ?        00:00:00 /usr/local/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /etc/glusterd/glustershd/run/glustershd.pid -l /usr/local/var/log/glusterfs/glustershd.log -S /tmp/ed4a200f26afd8f124ea87fd774eee7a.socket

GLusterd Logs:-
---------------
glusterd logs:-
[2012-02-17 23:02:23.296831] I [glusterd-volume-ops.c:539:glusterd_handle_cli_statedump_volume] 0-glusterd: Recieved statedump request for volume datastore with options all
[2012-02-17 23:02:23.296918] I [glusterd-utils.c:262:glusterd_lock] 0-glusterd: Cluster lock held by bd6f3667-7c8f-4b06-90d7-eedf12139065
[2012-02-17 23:02:23.296956] I [glusterd-handler.c:448:glusterd_op_txn_begin] 0-management: Acquired local lock

Brick Log:-
---------
[2012-02-17 23:06:21.120155] D [socket.c:1796:socket_event_handler] 0-transport: disconnecting now
[2012-02-17 23:06:24.125217] D [common-utils.c:161:gf_resolve_ip6] 0-resolver: returning ip-::1 (port-24007) for hostname: localhost and port: 24007
[2012-02-17 23:06:24.129801] D [common-utils.c:181:gf_resolve_ip6] 0-resolver: next DNS query will return: ip-127.0.0.1 port-24007
[2012-02-17 23:06:24.129981] D [socket.c:289:__socket_disconnect] 0-glusterfs: shutdown() returned -1. Transport endpoint is not connected
[2012-02-17 23:06:24.130080] D [socket.c:193:__socket_rwv] 0-glusterfs: EOF from peer 127.0.0.1:24007
[2012-02-17 23:06:24.130132] D [socket.c:1510:__socket_proto_state_machine] 0-glusterfs: reading from socket failed. Error (Transport endpoint is not connected), peer (127.0.0.1:24007)
Comment 1 Kaushal 2012-02-20 00:38:07 EST
Can you get the full glusterd DEBUG log when this happens again? The log given is not very informative.
Comment 2 Kaushal 2012-03-01 02:17:12 EST
If "volume statedump" causes glusterd to crash for anyone else please reply with the logs of glusterd.
Comment 3 Kaushal 2012-03-28 02:45:30 EDT
One last call for information on this.
Comment 4 Shwetha Panduranga 2012-03-28 02:53:57 EDT
Unable to recreate this issue.
Comment 5 Kaushal 2012-03-28 03:13:31 EDT
Closing as the issue cannot be reproduced.