Bug 1236554 - [geo-rep]: Once the bricks are killed, worker dies after few retry the worker comesback and session becomes active withount the brick online
Summary: [geo-rep]: Once the bricks are killed, worker dies after few retry the worker...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: geo-replication
Version: rhgs-3.1
Hardware: x86_64
OS: Linux
high
high
Target Milestone: ---
: ---
Assignee: Bug Updates Notification Mailing List
QA Contact: storage-qa-internal@redhat.com
URL:
Whiteboard:
Depends On: 1236546 1239044 1247882
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-29 12:20 UTC by Rahul Hinduja
Modified: 2018-04-16 15:55 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-04-16 15:55:53 UTC
Embargoed:


Attachments (Terms of Use)

Description Rahul Hinduja 2015-06-29 12:20:30 UTC
Description of problem:
=======================

The geo-rep status shows one of the brick as ACTIVE, even its corresponding brick process is not running. This brick was killed using kill -9 and the session goes to faulty which is expected, but it retries and comes back online. 

This is seen after the issue mentioned in bug id: 1236546

No brick process running from the node: georep3 for volume master:
==================================================================
[root@georep3 ~]# ps -eaf | grep glusterfsd | grep master
[root@georep3 ~]#

But the worker is running as:
=============================
[root@georep3 ~]# ps -eaf | grep gsyncd | grep feedback
root     27264 16706  0 19:40 ?        00:00:23 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/b1 --path=/rhs/brick2/b2  -c /var/lib/glusterd/geo-replication/master_10.70.46.101_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=932e669a-e61a-426b-8caf-d698a7ddb6f2 10.70.46.101::slave -N -p  --slave-id 868d5550-8bb6-4360-bfd5-40d2bd9b9adf --feedback-fd 13 --local-path /rhs/brick1/b1 --local-id .%2Frhs%2Fbrick1%2Fb1 --rpc-fd 10,9,7,11 --subvol-num 2 --resource-remote ssh://root.46.101:gluster://localhost:slave
[root@georep3 ~]#

Due to this, the geo-rep status is shown as active and participating in syncing:
================================================================================

[root@georep3 ~]# gluster volume geo status | grep georep3
georep3        master        /rhs/brick1/b1    root          ssh://10.70.46.101::slave    10.70.46.101    Active     Changelog Crawl    2015-06-29 18:11:28          
georep3        master        /rhs/brick2/b2    root          ssh://10.70.46.101::slave    N/A             Faulty     N/A                N/A                          
[root@georep3 ~]#


Version-Release number of selected component (if applicable):
=============================================================

glusterfs-3.7.1-5.el6rhs.x86_64



How reproducible:
=================
Tried once, will update with the retry of bz: 1236546


Steps to Reproduce:
===================
As mentioned in bz: 1236546

Comment 5 Kotresh HR 2015-12-02 06:21:49 UTC
Rahul,

I believe the setup in which the bug is hit is invalid setup where ntp was not configured (BZ 1236546). Could you do a re-test and close this bug if it can't be reproduced.


Note You need to log in before you can comment on or make changes to this bug.