Bug 988286

Summary: restarting glusterd doesn't start the brick, nfs and self-heal daemon process
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: spandura
Component: glusterdAssignee: Bug Updates Notification Mailing List <rhs-bugs>
Status: CLOSED WORKSFORME QA Contact: Sudhir D <sdharane>
Severity: high Docs Contact:
Priority: high    
Version: 2.1CC: pkarampu, rhs-bugs, vagarwal, vbellur
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2014-01-17 11:50:46 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
SOS Reports none

Description spandura 2013-07-25 09:07:06 UTC
Description of problem:
========================
In a replicate volume ( 1 x 2 , storage_node1 and storage_node2 ) all the brick process , nfs process, self-heal-daemon process and glusterd process are killed on both the nodes ( killall glusterfs glusterfsd glusterd ) and started glusterd (service glusterd start) on one of the node. glusterd fails to start brick, nfs and self-heal daemon process on the node where the glusterd is restarted. 

Version-Release number of selected component (if applicable):
============================================================
root@rhs-client11 [Jul-25-2013-14:30:55] >rpm -qa | grep glusterfs-server
glusterfs-server-3.4.0.12rhs.beta6-1.el6rhs.x86_64

root@rhs-client11 [Jul-25-2013-14:30:58] >gluster --version
glusterfs 3.4.0.12rhs.beta6 built on Jul 23 2013 16:20:03

How reproducible:
=================
Often

Steps to Reproduce:
=====================
1. Create a replicate volume ( 1 x 2 ). Start the volume.

2. Create nfs, fuse mount. Create files/dirs from both the mounts. 

3. killall glusterfs glusterfsd glusterd  from both the storage nodes. 

4. From one of the storage node start glusterd (service glusterd start)

Actual results:
================
1. Brick, NFS, Self-heal daemon process are not started. 

2. Even after restarting glusterd multiple times it doesn't start any of the process

root@rhs-client11 [Jul-25-2013-13:55:18] >killall glusterfs glusterfsd glusterd
root@rhs-client11 [Jul-25-2013-13:55:31] >
root@rhs-client11 [Jul-25-2013-13:55:36] >service glusterd start
Starting glusterd:                                         [  OK  ]
root@rhs-client11 [Jul-25-2013-13:55:44] >
root@rhs-client11 [Jul-25-2013-13:55:44] >
root@rhs-client11 [Jul-25-2013-13:55:44] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/brick1/b0			N/A	N	N/A
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A
 
There are no active volume tasks
root@rhs-client11 [Jul-25-2013-13:55:47] >
root@rhs-client11 [Jul-25-2013-13:55:48] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/brick1/b0			N/A	N	N/A
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A
 
There are no active volume tasks
root@rhs-client11 [Jul-25-2013-13:55:49] >
root@rhs-client11 [Jul-25-2013-13:55:53] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/brick1/b0			N/A	N	N/A
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	N/A
 
There are no active volume tasks
root@rhs-client11 [Jul-25-2013-13:55:54] >
root@rhs-client11 [Jul-25-2013-13:55:59] >ps -ef | grep gluster
root     23631 23619  0 11:49 pts/2    00:00:00 tail -f /var/log/glusterfs/glustershd.log
root     24906     1  0 13:55 ?        00:00:00 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
root     25058 22781  0 13:56 pts/0    00:00:00 grep gluster

root@rhs-client11 [Jul-25-2013-14:01:14] >service glusterd status
glusterd (pid  24906) is running...
root@rhs-client11 [Jul-25-2013-14:01:22] >
root@rhs-client11 [Jul-25-2013-14:01:22] >service glusterd restart
Starting glusterd:                                         [  OK  ]
root@rhs-client11 [Jul-25-2013-14:01:28] >
root@rhs-client11 [Jul-25-2013-14:01:29] >service glusterd status
glusterd (pid  25231) is running...
root@rhs-client11 [Jul-25-2013-14:01:31] >gluster v status
Status of volume: vol_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick rhs-client11:/rhs/brick1/b0			N/A	N	N/A
NFS Server on localhost					N/A	N	N/A
Self-heal Daemon on localhost				N/A	N	

Expected results:
================
should restart bricks, nfs and self-heal-daemon process. 

Additional info:
=================
root@rhs-client11 [Jul-25-2013-14:35:00] >gluster v info
 
Volume Name: vol_rep
Type: Replicate
Volume ID: f7928cb5-76bf-4a9f-93b2-a4ce3073519b
Status: Started
Number of Bricks: 1 x 2 = 2
Transport-type: tcp
Bricks:
Brick1: rhs-client11:/rhs/brick1/b0
Brick2: rhs-client12:/rhs/brick1/b1

Comment 2 spandura 2013-07-25 09:32:11 UTC
Created attachment 778159 [details]
SOS Reports

Comment 3 Pranith Kumar K 2014-01-03 13:22:36 UTC
Tried to re-create. It works fine.

root@pranithk-vm3 - ~/RPMS 
18:22:56 :) ⚡ killall glusterfs glusterd glusterfsd

root@pranithk-vm3 - ~/RPMS 
18:23:03 :) ⚡ ps aux | grep gluster
root     16432  0.0  0.0 103244   832 pts/0    S+   18:23   0:00 grep gluster

root@pranithk-vm3 - ~/RPMS 
18:23:06 :) ⚡ gluster volume start r2
Connection failed. Please check if gluster daemon is operational.

root@pranithk-vm3 - ~/RPMS 
18:23:12 :( ⚡ service glusterd start
Starting glusterd:                                         [  OK  ]

root@pranithk-vm3 - ~/RPMS 
18:23:24 :) ⚡ ps aux | grep gluster
root     16479  3.1  0.8 360520 16456 ?        Ssl  18:23   0:00 /usr/sbin/glusterd --pid-file=/var/run/glusterd.pid
root     16778  0.6  0.9 642520 19700 ?        Ssl  18:23   0:00 /usr/sbin/glusterfsd -s 10.70.43.148 --volfile-id r2.10.70.43.148.brick-2 -p /var/lib/glusterd/vols/r2/run/10.70.43.148-brick-2.pid -S /var/run/d3d3
root     16790  2.6  3.0 389888 62400 ?        Ssl  18:23   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/nfs -p /var/lib/glusterd/nfs/run/nfs.pid -l /var/log/glusterfs/nfs.log -S /var/run/55e1ee0bebt
root     16795  1.3  0.9 329196 20504 ?        Ssl  18:23   0:00 /usr/sbin/glusterfs -s localhost --volfile-id gluster/glustershd -p /var/lib/glusterd/glustershd/run/glustershd.pid -l /var/log/glusterfs/glustersh1
root     16814  0.0  0.0 103248   840 pts/0    S+   18:23   0:00 grep gluster

root@pranithk-vm3 - ~/RPMS 
18:23:27 :) ⚡ rpm -qa | grep gluster
glusterfs-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-server-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-debuginfo-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-libs-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-api-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-geo-replication-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-api-devel-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-devel-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.53rhs-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.53rhs-1.el6rhs.x86_64

Comment 4 Vivek Agarwal 2014-01-17 11:50:46 UTC
Based on comment 3, closing this bug