1431209 – [geo-rep]: After upgrading from rhgs-3.1.3, geo-rep sync doesn't happen

Bug 1431209 - [geo-rep]: After upgrading from rhgs-3.1.3, geo-rep sync doesn't happen

Summary: [geo-rep]: After upgrading from rhgs-3.1.3, geo-rep sync doesn't happen

Keywords:
Status:	CLOSED NOTABUG
Alias:	None
Product:	GlusterFS
Classification:	Community
Component:	glusterd
Sub Component:
Version:	mainline
Hardware:	x86_64
OS:	Linux
Priority:	unspecified
Severity:	urgent
Target Milestone:	---
Assignee:	Atin Mukherjee
QA Contact:
Docs Contact:
URL:
Whiteboard:
Depends On:	1380716
Blocks:
TreeView+	depends on / blocked

Reported:	2017-03-10 15:31 UTC by Atin Mukherjee
Modified:	2017-08-09 07:49 UTC (History)
CC List:	9 users (show)
Fixed In Version:
Clone Of:	1380716
Environment:
Last Closed:	2017-08-09 07:49:07 UTC
Regression:	---
Mount Type:	---
Documentation:	---
CRM:
Verified Versions:
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Comment 1 Atin Mukherjee 2017-03-10 15:31:35 UTC

> QE has it hit this issue again. We analysed the issue and following is root
> cause.
> 
> Problem:
>    1. Just after upgrade, geo-rep is not syncing data.
>    2. After upgrade, geo-rep missed syncing few files to slave.
> 
> Cause:
> 
>    RHEL7 uses chronyd to synchronize time. But when the node is rebooted, it
> witnessed wrong time (future time) and adjusted back to the correct time
> with in the interval of around 30 secs.
> 
>     Mar  9 22:48:21 dhcp37-47 dracut: *** Resolving executable dependencies
> ***
>     Mar  9 17:11:21 dhcp37-47 chronyd[987]: Selected source 96.126.105.86
>     Mar  9 17:11:21 dhcp37-47 chronyd[987]: System clock wrong by
> -20220.700769  seconds, adjustment started
>     Mar  9 17:11:21 dhcp37-47 systemd: Time has been changed
>     Mar  9 17:11:21 dhcp37-47 chronyd[987]: System clock was stepped by
> -20220.700769 seconds
> 
> 
>   Changelog xlator uses time syscall to get current epoch time and adds 15
> secs to form abstime and waits on 'pthred_cond_timedwait'. So every 15sec,
> this is expired and new changelog is generated with 'CHANGELOG.<TSTAMP>',
> where TSTAMP = current time got using 'gettimeofday'.
> 
>   Problem 1 occurs because:
>     When the brick came up, changelog xlator got the future time when it
> called 'time' syscall. So it's waiting on abstime (future time + 15 sec) in
> 'pthread_cond_timedwait'. So even though, the chronyd corrects the system
> time, pthread_cond_timedwait wakes up in 'future time + 15 sec' (in this
> case, around 5 hrs ahead). So changelog is not rolled over, and geo-rep is
> not syncing anything. It will sync after 5hrs when it gets the changelog.
> 
>   Problem 2 occurs because:
>     When the brick came up, changelog xlator got the future time for around
> 30 sec. So two changelogs got rolled over with future time. And now by the
> time 'time' syscall is called, chronyd corrected the time, and next
> changelog got rolled over with correct time. In this case, changelog didn't
> wait till future time to rollover. But the changelogs generated is in
> following order.
>                 changelog.<t1>
>                 changelog.<t5>
>                 changelog.<t10>
>                 changelog.<t100>
>                 changelog.<t105>
>                 changelog.<t15>
>                 changelog.<t20> 
> The htime index files will not have changelogs in ascending order and hence
> history crawl would missing giving few changelogs. In above case, if <t1> to
> <t20> is asked it would give t1, t5, t10 and rest of it is missed.
> The above scenario is it one of the node and hence geo-rep has missed
> syncing few files. The below htime file and geo-rep log confirms it.
> 
> /rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489059602
> /rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489059617<<<< Before Reboot
> --- Correc time
> /rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489079883 <<<< After reboot
> — Future time
> /rhs/brick2/b5/.glusterfs/changelogs/changelog.1489079898 <<<<< Future time 
> /rhs/brick2/b5/.glusterfs/changelogs/CHANGELOG.1489063441  <<<< Correc time
> /rhs/brick2/b5/.glusterfs/changelogs/changelog.1489063456
> /rhs/brick2/b5/.glusterfs/changelogs/changelog.1489063471
> /rhs/brick2/b5/.glusterfs/changelogs/changelog.1489063486
> 
> 
> [2017-03-09 12:46:42.291949] I [master(/rhs/brick1/b2):461:mgmt_lock]
> _GMaster: Didn't get lock : /rhs/brick1/b2 : Becoming PASSIVE
> [2017-03-09 12:46:42.307940] I [master(/rhs/brick2/b5):468:mgmt_lock]
> _GMaster: Got lock : /rhs/brick2/b5 : Becoming ACTIVE
> [2017-03-09 12:46:42.341409] I [master(/rhs/brick2/b5):1167:crawl] _GMaster:
> starting history crawl... turns: 1, stime: (1489052939, 0), etime: 1489063602
> [2017-03-09 12:46:43.348171] I [master(/rhs/brick2/b5):1196:crawl] _GMaster:
> slave's time: (1489052939, 0)
> [2017-03-09 12:47:42.480091] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:48:42.684154] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:49:42.859257] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:50:43.73237] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:51:43.398479] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:52:43.572623] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:53:43.825984] I [master(/rhs/brick1/b2):534:crawlwrap]
> _GMaster: 0 crawls, 0 turns
> [2017-03-09 12:53:46.447371] I [master(/rhs/brick2/b5):1210:crawl] _GMaster:
> finished history crawl syncing, endtime: 1489079883, stime: (1489059616, 0)
> [2017-03-09 12:53:46.455058] I [master(/rhs/brick2/b5):512:crawlwrap]
> _GMaster: primary master with volume id bdb045f2-56e8-4de8-9378-ebbec77043b7
> ...
> 
> 
> SOLUTION:
>     chronyd should be started before glusterd starts the brick. We should
> make sure somehow it fixed the time. Geo-rep is just the victim here. It
> could be causing issues to any of the components using
> pthread_cond_timedwait.

Comment 2 Worker Ant 2017-03-10 15:36:20 UTC

REVIEW: https://review.gluster.org/16889 (glusterd.service: Ensure chronyd is started before GlusterD) posted (#1) for review on master by Atin Mukherjee (amukherj)

Comment 3 Worker Ant 2017-03-10 15:39:53 UTC

REVIEW: https://review.gluster.org/16889 (glusterd.service: Ensure chronyd is started before GlusterD) posted (#2) for review on master by Atin Mukherjee (amukherj)

Comment 4 Atin Mukherjee 2017-08-09 07:49:07 UTC

It turned out to a setup issue where the hypervisors used in the testing were not time synced which is a must in gluster storage nodes.

Note You need to log in before you can comment on or make changes to this bug.