1328397 – [geo-rep]: schedule_georep.py doesn't touch the mount in every iteration

Bug 1328397 - [geo-rep]: schedule_georep.py doesn't touch the mount in every iteration

Summary: [geo-rep]: schedule_georep.py doesn't touch the mount in every iteration

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	geo-replication
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	urgent
Target Milestone:	---
Target Release:	RHGS 3.1.3
Assignee:	Aravinda VK
QA Contact:	Rahul Hinduja
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	Gluster-HC-1 1311817 1328399 1330450
TreeView+	depends on / blocked

Reported:	2016-04-19 10:16 UTC by Rahul Hinduja
Modified:	2016-06-23 05:18 UTC (History)
CC List:	6 users (show)
Fixed In Version:	glusterfs-3.7.9-3
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Clones:	1328399 (view as bug list)
Environment:
Last Closed:	2016-06-23 05:18:15 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHBA-2016:1240	0	normal	SHIPPED_LIVE	Red Hat Gluster Storage 3.1 Update 3	2016-06-23 08:51:28 UTC

Description Rahul Hinduja 2016-04-19 10:16:35 UTC

Description of problem:
=======================

Ran the script while there was no IO inprogress, checkpoint never reached for few of the active workers and eventually the script never completed. The reason is not to touch the mount point in every iteration. 

Modified script provided by dev works:

[root@dhcp37-182 ~]# diff /usr/share/glusterfs/scripts/schedule_georep.py /tmp/schedule_georep.py
134d133
<              "--xlator-option=\"*dht.lookup-unhashed=off\"",
138d136
<              "--client-pid=-1",
142d139
< 
148c145
<     #cleanup(hostname, volname, mnt)
---
>     cleanup(hostname, volname, mnt)
416,422d412
<             if not summary["checkpoints_ok"]:
<                 # If Checkpoint is not complete after a iteration means brick
<                 # was down and came online now. SETATTR on mount is not
<                 # recorded, So again issue touch on mount root So that
<                 # Stime will increase and Checkpoint will complete.
<                 touch_mount_root(args.mastervol)
< 
432a423,428
>         else:
>             # If Checkpoint is not complete after a iteration means brick
>             # was down and came online now. SETATTR on mount is not
>             # recorded, So again issue touch on mount root So that
>             # Stime will increase and Checkpoint will complete.
>             touch_mount_root(args.mastervol)
[root@dhcp37-182 ~]# 

Version-Release number of selected component (if applicable):
==============================================================

glusterfs-3.7.9-1.el7rhgs.x86_64

How reproducible:
=================

1/1

Steps to Reproduce:
===================
1. Create data on master volume (6x2)
2. Create geo-rep session
3. Run the script

Comment 2 Aravinda VK 2016-04-19 10:50:19 UTC

Upstream patch sent.
http://review.gluster.org/14029

As a workaround, Touch the Master mount once script sets checkpoint.

Comment 4 Aravinda VK 2016-04-26 10:02:33 UTC

Downstream Patch: https://code.engineering.redhat.com/gerrit/#/c/73033/

Comment 6 Rahul Hinduja 2016-05-02 14:35:02 UTC

Verified with the build: 
glusterfs-3.7.9-3.el7rhgs.x86_64
glusterfs-geo-replication-3.7.9-3.el7rhgs.x86_64

Ran the script when no IO was in progress, script successfully stopped the geo-rep, started, set checkpoint and stopped before exit. 

[root@dhcp37-182 scripts]# python /usr/share/glusterfs/scripts/schedule_georep.py Tom 10.70.37.122 Jerry
[    OK] Stopped Geo-replication
[    OK] Set Checkpoint
[    OK] Started Geo-replication and watching Status for Checkpoint completion
[    OK] All Checkpoints NOT COMPLETE, All status OK (Turns   1)
[    OK] All Checkpoints COMPLETE, All status OK (Turns   2)
[    OK] Stopping Geo-replication session now
[root@dhcp37-182 scripts]# 

Moving the bug to verified state

Comment 9 errata-xmlrpc 2016-06-23 05:18:15 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2016:1240

Note You need to log in before you can comment on or make changes to this bug.