1380716 – [geo-rep]: After upgrading from 3.1.3, geo-rep sync doesn't happen

Bug 1380716 - [geo-rep]: After upgrading from 3.1.3, geo-rep sync doesn't happen

Summary: [geo-rep]: After upgrading from 3.1.3, geo-rep sync doesn't happen

Keywords:
Status:	CLOSED WONTFIX
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	glusterd
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Target Release:	---
Assignee:	Atin Mukherjee
QA Contact:	Byreddy
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:	1431209
TreeView+	depends on / blocked

Reported:	2016-09-30 12:15 UTC by Rahul Hinduja
Modified:	2017-08-07 08:10 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Clones:	1431209 (view as bug list)
Environment:
Last Closed:	2017-08-07 08:10:22 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Description Rahul Hinduja 2016-09-30 12:15:37 UTC

Description of problem:
=======================

Carried the inservice upgrade of system having geo-replication from 3.1.3 to 3.2 bits. Steps in general are as follows:

1. Stop the geo-rep
2. Upgrade Master and Slave
3. Start geo-rep

Once the geo-rep is started and crawl is changelog, no more sync happens for any new data. 

While analyzing found that the changelog rollover doesn't happen every 15 sec.  

-rw-r--r--. 1 root root    184 Sep 30 14:58 CHANGELOG.1475227751
-rw-r--r--. 1 root root     51 Sep 30 15:01 CHANGELOG.1475248009
-rw-r--r--. 1 root root  36683 Sep 30 15:36 CHANGELOG
[root@dhcp37-58 changelogs]#

Restarting geo-rep and volume resolved the issue.


Version-Release number of selected component (if applicable):
=============================================================
glusterfs-geo-replication-3.8.4-2.el7rhgs.x86_64

Comment 6 Kotresh HR 2016-12-08 09:34:34 UTC

Hi Rahul,

The logs captured is not helping in root causing the issue. There are no apparent changelog errors before and after upgrade. Hence putting needinfo on you to check if it's consistently reproducible. Please give it one more try and let me know.

Regards,
Kotresh

Comment 8 Kotresh HR 2017-03-10 12:38:32 UTC

QE has it hit this issue again. We analysed the issue and following is root cause.

Problem:
   1. Just after upgrade, geo-rep is not syncing data.
   2. After upgrade, geo-rep missed syncing few files to slave.

Cause:

   RHEL7 uses chronyd to synchronize time. But when the node is rebooted, it witnessed wrong time (future time) and adjusted back to the correct time with in the interval of around 30 secs.

    Mar  9 22:48:21 dhcp37-47 dracut: *** Resolving executable dependencies ***
    Mar  9 17:11:21 dhcp37-47 chronyd[987]: Selected source 96.126.105.86
    Mar  9 17:11:21 dhcp37-47 chronyd[987]: System clock wrong by -20220.700769  seconds, adjustment started
    Mar  9 17:11:21 dhcp37-47 systemd: Time has been changed
    Mar  9 17:11:21 dhcp37-47 chronyd[987]: System clock was stepped by -20220.700769 seconds


  Changelog xlator uses time syscall to get current epoch time and adds 15 secs to form abstime and waits on 'pthred_cond_timedwait'. So every 15sec, this is expired and new changelog is generated with 'CHANGELOG.<TSTAMP>', where TSTAMP = current time got using 'gettimeofday'.

  Problem 1 occurs because:
    When the brick came up, changelog xlator got the future time when it called 'time' syscall. So it's waiting on abstime (future time + 15 sec) in 'pthread_cond_timedwait'. So even though, the chronyd corrects the system time, pthread_cond_timedwait wakes up in 'future time + 15 sec' (in this case, around 5 hrs ahead). So changelog is not rolled over, and geo-rep is not syncing anything. It will sync after 5hrs when it gets the changelog.

  Problem 2 occurs because:
    When the brick came up, changelog xlator got the future time for around 30 sec. So two changelogs got rolled over with future time. And now by the time 'time' syscall is called, chronyd corrected the time, and next changelog got rolled over with correct time. In this case, changelog didn't wait till future time to rollover. But the changelogs generated is in following order.
                changelog.<t1>
                changelog.<t5>
                changelog.<t10>
                changelog.<t100>
                changelog.<t105>
                changelog.<t15>
                changelog.<t20> 
The htime index files will not have changelogs in ascending order and hence history crawl would missing giving few changelogs. In above case, if <t1> to <t20> is asked it would give t1, t5, t10 and rest of it is missed.
The above scenario is it one of the node and hence geo-rep has missed syncing few files. The below htime file and geo-rep log confirms it.

/rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489059602
/rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489059617<<<< Before Reboot --- Correc time
/rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489079883 <<<< After reboot — Future time
/rhs/brick2/b5/.glusterfs/changelogs/changelog.1489079898 <<<<< Future time 
/rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489063441  <<<< Correc time
/rhs/brick2/b5/.glusterfs/changelogs/changelog.1489063456
/rhs/brick2/b5/.glusterfs/changelogs/changelog.1489063471
/rhs/brick2/b5/.glusterfs/changelogs/changelog.1489063486


[2017-03-09 12:46:42.291949] I [master(/rhs/brick1/b2):461:mgmt_lock] _GMaster: Didn't get lock : /rhs/brick1/b2 : Becoming PASSIVE
[2017-03-09 12:46:42.307940] I [master(/rhs/brick2/b5):468:mgmt_lock] _GMaster: Got lock : /rhs/brick2/b5 : Becoming ACTIVE
[2017-03-09 12:46:42.341409] I [master(/rhs/brick2/b5):1167:crawl] _GMaster: starting history crawl... turns: 1, stime: (1489052939, 0), etime: 1489063602
[2017-03-09 12:46:43.348171] I [master(/rhs/brick2/b5):1196:crawl] _GMaster: slave's time: (1489052939, 0)
[2017-03-09 12:47:42.480091] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:48:42.684154] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:49:42.859257] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:50:43.73237] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:51:43.398479] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:52:43.572623] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:53:43.825984] I [master(/rhs/brick1/b2):534:crawlwrap] _GMaster: 0 crawls, 0 turns
[2017-03-09 12:53:46.447371] I [master(/rhs/brick2/b5):1210:crawl] _GMaster: finished history crawl syncing, endtime: 1489079883, stime: (1489059616, 0)
[2017-03-09 12:53:46.455058] I [master(/rhs/brick2/b5):512:crawlwrap] _GMaster: primary master with volume id bdb045f2-56e8-4de8-9378-ebbec77043b7 ...


SOLUTION:
    chronyd should be started before glusterd starts the brick. We should make sure somehow it fixed the time. Geo-rep is just the victim here. It could be causing issues to any of the components using pthread_cond_timedwait.

Comment 11 Atin Mukherjee 2017-03-10 15:36:41 UTC

Patch : https://review.gluster.org/16889

Comment 15 Rahul Hinduja 2017-06-30 14:31:21 UTC

(In reply to Kotresh HR from comment #6)
> Hi Rahul,
> 
> The logs captured is not helping in root causing the issue. There are no
> apparent changelog errors before and after upgrade. Hence putting needinfo
> on you to check if it's consistently reproducible. Please give it one more
> try and let me know.
> 
> Regards,
> Kotresh

Clearing the needinfo as the requested information is provided at comment 12 and comment 13

Note You need to log in before you can comment on or make changes to this bug.