980468 – gluster self heal daemon process not started after the restarting glusterd on a storage node

Bug 980468 - gluster self heal daemon process not started after the restarting glusterd on a storage node

Summary: gluster self heal daemon process not started after the restarting glusterd on...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterfs
Sub Component:
Version:	2.1
Hardware:	Unspecified
OS:	Unspecified
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Krutika Dhananjay
QA Contact:	spandura
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	980097
TreeView+	depends on / blocked

Reported:	2013-07-02 13:16 UTC by spandura
Modified:	2014-06-25 00:51 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.4.0.12rhs.beta2
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2013-09-23 22:35:39 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)
SOS Reports and Other Useful Information (3.34 MB, application/x-gzip) 2013-07-02 13:25 UTC, spandura	no flags	Details
View All

Description spandura 2013-07-02 13:16:23 UTC

Description of problem:
=========================
In a 6 x 2 distribute-replicate volume , starting glusterd on a node on which all gluster processes were offline , brings all the bricks process online but self-heal daemon process is not started on that node.  

Version-Release number of selected component (if applicable):
==============================================================
root@hicks [Jul-02-2013-18:24:15] >gluster --version
glusterfs 3.4.0.12rhs.beta1 built on Jun 28 2013 06:41:38

root@hicks [Jul-02-2013-18:31:57] >rpm -qa | grep glusterfs
glusterfs-geo-replication-3.4.0.12rhs.beta1-1.el6rhs.x86_64
glusterfs-fuse-3.4.0.12rhs.beta1-1.el6rhs.x86_64
glusterfs-3.4.0.12rhs.beta1-1.el6rhs.x86_64
glusterfs-rdma-3.4.0.12rhs.beta1-1.el6rhs.x86_64
org.apache.hadoop.fs.glusterfs-glusterfs-0.20.2_0.2-1.noarch
glusterfs-server-3.4.0.12rhs.beta1-1.el6rhs.x86_64

How reproducible:
================

Steps to Reproduce:
=====================
1. Create 6 x 2 distribute replicate volume with 4 storage nodes and 3 bricks on each storage node. node1 is replica of node2  , node3 is replica of node4.

2. Start the volume. Create fuse and nfs mount. 

3. killall glusterfs ; killall glusterfsd ; killall glusterd on node2 and node3

4. Start creating files from the mount points.  ( around 5k files were created )

5. while file creation is in progress on mount point on node2 execute : "service glusterd start" . ( node3 remains offline )

6. execute : "gluster v status"

Actual results:
==================
root@hicks [Jul-02-2013-18:02:46] >service glusterd start
Starting glusterd:                                         [  OK  ]

root@hicks [Jul-02-2013-18:02:54] >service glusterd status
glusterd (pid  5536) is running...

root@hicks [Jul-02-2013-18:02:59] >
root@hicks [Jul-02-2013-18:03:00] >gluster v status
Status of volume: vol_dis_rep
Gluster process						Port	Online	Pid
------------------------------------------------------------------------------
Brick king:/rhs/brick1/brick0				49152	Y	4412
Brick hicks:/rhs/brick1/brick1				49152	Y	5657
Brick king:/rhs/brick1/brick2				49153	Y	4423
Brick hicks:/rhs/brick1/brick3				49153	Y	5670
Brick king:/rhs/brick1/brick4				49154	Y	4434
Brick hicks:/rhs/brick1/brick5				49154	Y	5666
Brick lizzie:/rhs/brick1/brick7				49152	Y	4369
Brick lizzie:/rhs/brick1/brick9				49153	Y	4380
Brick lizzie:/rhs/brick1/brick11			49154	Y	4391
NFS Server on localhost					2049	Y	5678
Self-heal Daemon on localhost				N/A	N	N/A
NFS Server on d738e7cb-9bff-4988-807e-10fc5f9f4a64	2049	Y	4446
Self-heal Daemon on d738e7cb-9bff-4988-807e-10fc5f9f4a6
4							N/A	Y	4452
NFS Server on ddda6b3a-d570-47fd-a07c-465c328610b4	2049	Y	4403
Self-heal Daemon on ddda6b3a-d570-47fd-a07c-465c328610b
4							N/A	Y	4410
 
There are no active volume tasks


Expected results:
=================
Self-heal daemon process on node2 should be started when glusterd is started. 

Additional info:
===================

root@king [Jul-02-2013-14:53:57] >gluster peer status
Number of Peers: 3

Hostname: hicks
Uuid: e741f4ff-0b1e-4445-8434-e2913030d849
State: Peer in Cluster (Connected)

Hostname: luigi
Uuid: 72f6a0e2-6f03-4e7c-91e0-9ac5dae8a729
State: Peer in Cluster (Connected)

Hostname: lizzie
Uuid: ddda6b3a-d570-47fd-a07c-465c328610b4
State: Peer in Cluster (Connected)


root@king [Jul-02-2013-16:55:45] >gluster v info
 
Volume Name: vol_dis_rep
Type: Distributed-Replicate
Volume ID: 414a0903-6b64-4cd1-9dbf-8eb0e20b6e3b
Status: Created
Number of Bricks: 6 x 2 = 12
Transport-type: tcp
Bricks:
Brick1: king:/rhs/brick1/brick0
Brick2: hicks:/rhs/brick1/brick1
Brick3: king:/rhs/brick1/brick2
Brick4: hicks:/rhs/brick1/brick3
Brick5: king:/rhs/brick1/brick4
Brick6: hicks:/rhs/brick1/brick5
Brick7: luigi:/rhs/brick1/brick6
Brick8: lizzie:/rhs/brick1/brick7
Brick9: luigi:/rhs/brick1/brick8
Brick10: lizzie:/rhs/brick1/brick9
Brick11: luigi:/rhs/brick1/brick10
Brick12: lizzie:/rhs/brick1/brick11
root@king [Jul-02-2013-16:55:47] >

Comment 1 spandura 2013-07-02 13:25:25 UTC

Created attachment 767724 [details]
SOS Reports and Other Useful Information

Comment 4 raghav 2013-07-03 07:31:02 UTC

*** Bug 980097 has been marked as a duplicate of this bug. ***

Comment 5 Sachidananda Urs 2013-07-04 09:36:37 UTC

When are we getting a patch for this... all the proactive self-heal cases are blocked by this bug.

Comment 7 Scott Haines 2013-09-23 22:35:39 UTC

Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. 

For information on the advisory, and where to find the updated files, follow the link below.

If the solution does not work for you, open a new bug report.

http://rhn.redhat.com/errata/RHBA-2013-1262.html

Note You need to log in before you can comment on or make changes to this bug.