Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 823304

Summary:

[1d939fe7adef651b90bb5c4cd5843768417f0138]: geo-replication status goes to faulty state due to corrupted timestamp

Product:

[Community] GlusterFS

Reporter:

Raghavendra Bhat <rabhat>

Component:

geo-replication

Assignee:

Venky Shankar <vshankar>

Status:

CLOSED NOTABUG

QA Contact:

Severity:

unspecified

Docs Contact:

Priority:

low

Version:

mainline

CC:

amarts, gluster-bugs, vbellur

Target Milestone:

---

Keywords:

Triaged

Target Release:

---

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

Known Issue

Doc Text:

Cause: {add,remove}-brick causes gsyncd to to enter into faulty state temporarily. Consequence: Although the gsyncd state is faulty (as per cli and is temporary) there is no issues with data syncing. There is only a gsyncd worker restart. Workaround (if any): Although it's not a workaround, after the worker restart, the gsyncd status remains as 'faulty' for 60 secs and turns 'OK' after that. Result: Gsyncd status turns 'OK' after 60 secs and there no problems with syncing of data.

Story Points:

---

Clone Of:

Clones:

849302 (view as bug list)

Environment:

Last Closed:

2013-03-06 11:42:16 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Bug Depends On:

Bug Blocks:

849302

Attachments:

Description	Flags
log file of the geo-replication process	none

Description Raghavendra Bhat 2012-05-20 18:33:16 UTC

Description of problem:

Create a replicate volume (2 replica), enable gsync and quota on it. mount the client and start untarring and then compilation of the glusterfs tarball. While untarring and compilation was going on added one more brick to the volume (thus increasing the replica count to 3). gsync session became faulty. So stopped it and started it again. Then removed one of the bricks (again bring down replica count to 2). Then disabled quota. The session again became faulty.

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. enable gsync and quota on a replicate volume
2. mount the volume via fuse client and start untar/compilation of glusterfs source.
3. while above test is running add one more brick to the volume and increase the replica count from 2 to 3.
4. remove brick from the volume and decrease the replica count to 2 from 3.
5. disable quota
  
Actual results:

geo-replication session became faulty
Expected results:
geo-replication session should not become faulty.

Additional info:

Comment 1 Raghavendra Bhat 2012-05-20 18:34:04 UTC

Created attachment 585670 [details]
log file of the geo-replication process

Comment 2 Venky Shankar 2013-03-06 09:10:40 UTC

well, the geo-replication status becomes faulty as it gets an ENOTCONN/ECONNABORTED during an add-brick or a remove-brick is performed.

You need not stop/start the session again as the monitor thread does that on encountering a state change from OK -> faulty or such. The issue is what is shown in geo-rep cli status - it remains faulty after self restart as there is a window of 60 secs when the status file is updated, after which the status become OK again. There is no problem with syncing of files.

Amar, does the above explanation seems that something needs to be fixed as there is actually no issue with the sync (only a temporary state change from gsyncd POV as it holds a reference to the lazily umounted volume).

Comment 3 Amar Tumballi 2013-03-06 10:38:12 UTC

Venky, do you think we should document this? Right now, I don't think this needs any fixes in code. Go ahead and close it with NOTABUG. (with doc-text field updated)

Comment 4 Venky Shankar 2013-03-06 11:42:16 UTC

yes, it's good to document this as it could very much seem alarming for a user.