Bug 1059255

Summary: dist-geo-rep : checkpoint doesn't reach because checkpoint became stale.
Product: [Red Hat Storage] Red Hat Gluster Storage Reporter: Vijaykumar Koppad <vkoppad>
Component: geo-replicationAssignee: Aravinda VK <avishwan>
Status: CLOSED ERRATA QA Contact: Rahul Hinduja <rhinduja>
Severity: medium Docs Contact:
Priority: medium    
Version: 2.1CC: aavati, annair, avishwan, csaba, david.macdonald, nlevinki, rhinduja
Target Milestone: ---   
Target Release: RHGS 3.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard: checkpoint
Fixed In Version: glusterfs-3.7.0-2.el6rhs Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-07-29 04:33:42 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1064309    
Bug Blocks: 1202842, 1223636    

Description Vijaykumar Koppad 2014-01-29 13:54:50 UTC
Description of problem: checkpoint doesn't reach because of stale checkpoint. 

logs from geo-rep,

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>
[2014-01-29 19:04:02.516289] I [master(/bricks/master_brick1):438:crawlwrap] _GMaster: crawl interval: 3 seconds
[2014-01-29 19:04:06.903163] I [master(/bricks/master_brick5):587:checkpt_service] _GMaster: checkpoint now:1391002418.973412 completed
[2014-01-29 19:04:07.96450] W [master(/bricks/master_brick1):580:checkpt_service] _GMaster: completion time 2014-01-29 19:04:02.236168 for checkpoint now:1391002418.973412 became stale
[2014-01-29 19:04:39.211195] I [monitor(monitor):81:set_state] Monitor: new state: Stable

>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>



Version-Release number of selected component (if applicable):glusterfs-3.4.0.58rhs-1


How reproducible: doesn't happen everytime. 


Steps to Reproduce:
1. create and start geo-rep session between master and slave.
2. create data on master and set the checkpoint.
3. check the log file for checkpoint logs,

Actual results: checkpoint doesn't reach saying checkpoint became stale.
 

Expected results: checkpoint should complete properly when it checkpoint set is successful 


Additional info:

Comment 5 Rahul Hinduja 2015-07-07 11:32:44 UTC
Verified with build: glusterfs-3.7.1-7.el6rhs.x86_64

Tried below 2 cases:

a. Set the checkpoint, and kill the active brick before checkpoint could reach.
b. Set the checkpoint, and bring down the active brick Node before checkpoint could reach. 
c. Set the checkpoint, and let the checkpoint reach to make checkpoint completed as "YES"

All the above scenario, checkpoint eventually completed and status detail shows "YES". Didnt observe checkpoint becoming stale. Moving the bug to verified state.

Comment 8 errata-xmlrpc 2015-07-29 04:33:42 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2015-1495.html