Bug 1599440 - Scheduling a geo-replication session with cron does not start everytime
Summary: Scheduling a geo-replication session with cron does not start everytime
Keywords:
Status: CLOSED INSUFFICIENT_DATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rhhi
Version: rhhi-1.1
Hardware: x86_64
OS: Linux
medium
low
Target Milestone: ---
: ---
Assignee: Sahina Bose
QA Contact: SATHEESARAN
URL:
Whiteboard:
Depends On:
Blocks: 1724792
TreeView+ depends on / blocked
 
Reported: 2018-07-09 20:04 UTC by Adam Scerra
Modified: 2021-11-16 06:48 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-12-18 07:37:07 UTC
Embargoed:
khiremat: needinfo-


Attachments (Terms of Use)
Geo-replication session that doesnt start (184.68 KB, image/png)
2018-07-09 20:04 UTC, Adam Scerra
no flags Details

Description Adam Scerra 2018-07-09 20:04:29 UTC
Created attachment 1457584 [details]
Geo-replication session that doesnt start

Description of problem:
I have been using ansible to automate the scheduling of geo-replication sessions using cron and 'python /usr/share/glusterfs/scripts/schedule_georep.py'
Here is an example of the playbook to set this cron job.
http://pastebin.test.redhat.com/615005

This method was taken from the following doc:
https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.3/html/administration_guide/chap-managing_geo-replication-schedule_cron_job

Using this method I have found that the geo-replication session does not always start when the cron job is scheduled for. I have seen this happen many time where the cron job is scheduled for say 2:30 and I start tracking the geo-rep session, it then passes 2:30 and the job never kicks off. 

When I create the geo-replication session I set the job for 2 minutes in the future, the setup is always finished, as far as I can tell, and it is just waiting to start. For the most part it does start, but about 20% of the time the geo-replication session never initiates.

I have also extended the time to kick off the cron job to 5 minutes after the geo-rep session has been created and I have still seen the same behaviour the geo-replication session will still be in the created state long after the cron job was supposed to start. 

Version-Release number of selected component (if applicable):
gluster 3.8.4
rhhi 1.1

How reproducible:
about 20% percent of the time

Steps to Reproduce:
1. Configure geo-rep session
2. Find a time 5 minutes into the future and create cron job to start a geo-replication session at that time
3. Watch the gluster v geo-rep status to track if the geo-rep session starts when the job is set for.

Actual results:
About 20% of the time the geo-replication session never starts

Expected results:
The geo-replication session should start at the time scheduled 100% of the time.

Additional info:
I have included a screenshot of this issue
Left side of screen:
My ansible playbook that is polling the gluster v geo-rep status command this will fail as you can see here if the the time has gona past the scheduled time of the cron job and the status command has not changed from created.

Upper Right screen:
This is the 'watch gluster v geo-rep status' command as you can see the geo-rep status is still at created.

Lower Right Screen:
This is the 'watch crontab -l' command you can see that the job was scheduled to start at 3:58, but it never did.

You can see at the top of the screen that it is 3:59 and the session has not kicked off, it missed its window of opportunity.

This job never kicks off late it just will wait until the next time it is scheduled.

Comment 4 Sahina Bose 2018-09-05 09:14:03 UTC
Marking priority as medium as the suggested way to schedule geo-rep is via the UI

Kotresh, can you take a look at the failure to kick off geo-rep?

Comment 6 Sahina Bose 2018-12-18 07:37:07 UTC
Closing as no data available.
Please re-open if you have the requested data to debug this.


Note You need to log in before you can comment on or make changes to this bug.