Description of problem: ======================= The geo-rep status shows one of the brick as ACTIVE, even its corresponding brick process is not running. This brick was killed using kill -9 and the session goes to faulty which is expected, but it retries and comes back online. This is seen after the issue mentioned in bug id: 1236546 No brick process running from the node: georep3 for volume master: ================================================================== [root@georep3 ~]# ps -eaf | grep glusterfsd | grep master [root@georep3 ~]# But the worker is running as: ============================= [root@georep3 ~]# ps -eaf | grep gsyncd | grep feedback root 27264 16706 0 19:40 ? 00:00:23 python /usr/libexec/glusterfs/python/syncdaemon/gsyncd.py --path=/rhs/brick1/b1 --path=/rhs/brick2/b2 -c /var/lib/glusterd/geo-replication/master_10.70.46.101_slave/gsyncd.conf --iprefix=/var :master --glusterd-uuid=932e669a-e61a-426b-8caf-d698a7ddb6f2 10.70.46.101::slave -N -p --slave-id 868d5550-8bb6-4360-bfd5-40d2bd9b9adf --feedback-fd 13 --local-path /rhs/brick1/b1 --local-id .%2Frhs%2Fbrick1%2Fb1 --rpc-fd 10,9,7,11 --subvol-num 2 --resource-remote ssh://root.46.101:gluster://localhost:slave [root@georep3 ~]# Due to this, the geo-rep status is shown as active and participating in syncing: ================================================================================ [root@georep3 ~]# gluster volume geo status | grep georep3 georep3 master /rhs/brick1/b1 root ssh://10.70.46.101::slave 10.70.46.101 Active Changelog Crawl 2015-06-29 18:11:28 georep3 master /rhs/brick2/b2 root ssh://10.70.46.101::slave N/A Faulty N/A N/A [root@georep3 ~]# Version-Release number of selected component (if applicable): ============================================================= glusterfs-3.7.1-5.el6rhs.x86_64 How reproducible: ================= Tried once, will update with the retry of bz: 1236546 Steps to Reproduce: =================== As mentioned in bz: 1236546
Rahul, I believe the setup in which the bug is hit is invalid setup where ntp was not configured (BZ 1236546). Could you do a re-test and close this bug if it can't be reproduced.