Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 823304

Summary: [1d939fe7adef651b90bb5c4cd5843768417f0138]: geo-replication status goes to faulty state due to corrupted timestamp
Product: [Community] GlusterFS Reporter: Raghavendra Bhat <rabhat>
Component: geo-replicationAssignee: Venky Shankar <vshankar>
Status: CLOSED NOTABUG QA Contact:
Severity: unspecified Docs Contact:
Priority: low    
Version: mainlineCC: amarts, gluster-bugs, vbellur
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Known Issue
Doc Text:
Cause: {add,remove}-brick causes gsyncd to to enter into faulty state temporarily. Consequence: Although the gsyncd state is faulty (as per cli and is temporary) there is no issues with data syncing. There is only a gsyncd worker restart. Workaround (if any): Although it's not a workaround, after the worker restart, the gsyncd status remains as 'faulty' for 60 secs and turns 'OK' after that. Result: Gsyncd status turns 'OK' after 60 secs and there no problems with syncing of data.
Story Points: ---
Clone Of:
: 849302 (view as bug list) Environment:
Last Closed: 2013-03-06 11:42:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 849302    
Attachments:
Description Flags
log file of the geo-replication process none

Description Raghavendra Bhat 2012-05-20 18:33:16 UTC
Description of problem:

Create a replicate volume (2 replica), enable gsync and quota on it. mount the client and start untarring and then compilation of the glusterfs tarball. While untarring and compilation was going on added one more brick to the volume (thus increasing the replica count to 3). gsync session became faulty. So stopped it and started it again. Then removed one of the bricks (again bring down replica count to 2). Then disabled quota. The session again became faulty.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. enable gsync and quota on a replicate volume
2. mount the volume via fuse client and start untar/compilation of glusterfs source.
3. while above test is running add one more brick to the volume and increase the replica count from 2 to 3.
4. remove brick from the volume and decrease the replica count to 2 from 3.
5. disable quota
  
Actual results:

geo-replication session became faulty
Expected results:
geo-replication session should not become faulty.

Additional info:

Comment 1 Raghavendra Bhat 2012-05-20 18:34:04 UTC
Created attachment 585670 [details]
log file of the geo-replication process

Comment 2 Venky Shankar 2013-03-06 09:10:40 UTC
well, the geo-replication status becomes faulty as it gets an ENOTCONN/ECONNABORTED during an add-brick or a remove-brick is performed.

You need not stop/start the session again as the monitor thread does that on encountering a state change from OK -> faulty or such. The issue is what is shown in geo-rep cli status - it remains faulty after self restart as there is a window of 60 secs when the status file is updated, after which the status become OK again. There is no problem with syncing of files.

Amar, does the above explanation seems that something needs to be fixed as there is actually no issue with the sync (only a temporary state change from gsyncd POV as it holds a reference to the lazily umounted volume).

Comment 3 Amar Tumballi 2013-03-06 10:38:12 UTC
Venky, do you think we should document this? Right now, I don't think this needs any fixes in code. Go ahead and close it with NOTABUG. (with doc-text field updated)

Comment 4 Venky Shankar 2013-03-06 11:42:16 UTC
yes, it's good to document this as it could very much seem alarming for a user.