1236554 – [geo-rep]: Once the bricks are killed, worker dies after few retry the worker comesback and session becomes active withount the brick online

Bug 1236554 - [geo-rep]: Once the bricks are killed, worker dies after few retry the worker comesback and session becomes active withount the brick online

Summary: [geo-rep]: Once the bricks are killed, worker dies after few retry the worker...

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	---
Target Release:	---
Assignee:	Bug Updates Notification Mailing List
QA Contact:	storage-qa-internal@redhat.com
Docs Contact:
URL:
Whiteboard:
Depends On:	1236546 1239044 1247882
Blocks:
TreeView+	depends on / blocked

Reported:	2015-06-29 12:20 UTC by Rahul Hinduja
Modified:	2018-04-16 15:55 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2018-04-16 15:55:53 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rahul Hinduja 2015-06-29 12:20:30 UTC

Description of problem:
=======================

The geo-rep status shows one of the brick as ACTIVE, even its corresponding brick process is not running. This brick was killed using kill -9 and the session goes to faulty which is expected, but it retries and comes back online. 

This is seen after the issue mentioned in bug id: 1236546

No brick process running from the node: georep3 for volume master:
==================================================================
[root@georep3 ~]# ps -eaf | grep glusterfsd | grep master
[root@georep3 ~]#

But the worker is running as:
=============================
[root@georep3 ~]# ps -eaf | grep gsyncd | grep feedback
root     27264 16706  0 19:40 ?        00:00:23 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/b1 --path=/rhs/brick2/b2  -c /var/lib/glusterd/geo-replication/master_10.70.46.101_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=932e669a-e61a-426b-8caf-d698a7ddb6f2 10.70.46.101::slave -N -p  --slave-id 868d5550-8bb6-4360-bfd5-40d2bd9b9adf --feedback-fd 13 --local-path /rhs/brick1/b1 --local-id .%2Frhs%2Fbrick1%2Fb1 --rpc-fd 10,9,7,11 --subvol-num 2 --resource-remote ssh://root.46.101:gluster://localhost:slave
[root@georep3 ~]#

Due to this, the geo-rep status is shown as active and participating in syncing:
================================================================================

[root@georep3 ~]# gluster volume geo status | grep georep3
georep3        master        /rhs/brick1/b1    root          ssh://10.70.46.101::slave    10.70.46.101    Active     Changelog Crawl    2015-06-29 18:11:28          
georep3        master        /rhs/brick2/b2    root          ssh://10.70.46.101::slave    N/A             Faulty     N/A                N/A                          
[root@georep3 ~]#


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-5.el6rhs.x86_64



How reproducible:
=================
Tried once, will update with the retry of bz: 1236546


Steps to Reproduce:
===================
As mentioned in bz: 1236546

Comment 5 Kotresh HR 2015-12-02 06:21:49 UTC

Rahul,

I believe the setup in which the bug is hit is invalid setup where ntp was not configured (BZ 1236546). Could you do a re-test and close this bug if it can't be reproduced.

Note You need to log in before you can comment on or make changes to this bug.