1059255 – dist-geo-rep : checkpoint doesn't reach because checkpoint became stale.

Bug 1059255 - dist-geo-rep : checkpoint doesn't reach because checkpoint became stale.

Summary: dist-geo-rep : checkpoint doesn't reach because checkpoint became stale.

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	2.1
Hardware:	x86_64
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	RHGS 3.1.0
Assignee:	Aravinda VK
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:	checkpoint
Depends On:	1064309
Blocks:	1202842 1223636
TreeView+	depends on / blocked

Reported:	2014-01-29 13:54 UTC by Vijaykumar Koppad
Modified:	2015-07-29 04:33 UTC (History)
CC List:	7 users (show)
Fixed In Version:	glusterfs-3.7.0-2.el6rhs
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-07-29 04:33:42 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2015:1495	0	normal	SHIPPED_LIVE	Important: Red Hat Gluster Storage 3.1 update	2015-07-29 08:26:26 UTC

Description Vijaykumar Koppad 2014-01-29 13:54:50 UTC

Description of problem: checkpoint doesn't reach because of stale checkpoint. 

logs from geo-rep,

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-29 19:04:02.516289] I [master(/bricks/master_brick1):438:crawlwrap] _GMaster: crawl interval: 3 seconds
[2014-01-29 19:04:06.903163] I [master(/bricks/master_brick5):587:checkpt_service] _GMaster: checkpoint now:1391002418.973412 completed
[2014-01-29 19:04:07.96450] W [master(/bricks/master_brick1):580:checkpt_service] _GMaster: completion time 2014-01-29 19:04:02.236168 for checkpoint now:1391002418.973412 became stale
[2014-01-29 19:04:39.211195] I [monitor(monitor):81:set_state] Monitor: new state: Stable

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>



Version-Release number of selected component (if applicable):glusterfs-3.4.0.58rhs-1


How reproducible: doesn't happen everytime. 


Steps to Reproduce:
1. create and start geo-rep session between master and slave.
2. create data on master and set the checkpoint.
3. check the log file for checkpoint logs,

Actual results: checkpoint doesn't reach saying checkpoint became stale.
 

Expected results: checkpoint should complete properly when it checkpoint set is successful 


Additional info:

Comment 5 Rahul Hinduja 2015-07-07 11:32:44 UTC

Verified with build: glusterfs-3.7.1-7.el6rhs.x86_64

Tried below 2 cases:

a. Set the checkpoint, and kill the active brick before checkpoint could reach.
b. Set the checkpoint, and bring down the active brick Node before checkpoint could reach. 
c. Set the checkpoint, and let the checkpoint reach to make checkpoint completed as "YES"

All the above scenario, checkpoint eventually completed and status detail shows "YES". Didnt observe checkpoint becoming stale. Moving the bug to verified state.

Comment 8 errata-xmlrpc 2015-07-29 04:33:42 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html

Note You need to log in before you can comment on or make changes to this bug.