1247912 – Time drifts while live migration in guests used as NTP servers

Bug 1247912 - Time drifts while live migration in guests used as NTP servers

Summary: Time drifts while live migration in guests used as NTP servers

Keywords:
Status:	CLOSED DUPLICATE of bug 1156194
Alias:	None
Product:	Red Hat Enterprise Virtualization Manager
Classification:	Red Hat
Component:	rhev-hypervisor
Sub Component:
Version:	3.5.1
Hardware:	x86_64
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	ovirt-3.6.0-rc3
Target Release:	---
Assignee:	Fabian Deutsch
QA Contact:	wanghui
Docs Contact:
URL:
Whiteboard:	node
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2015-07-29 08:36 UTC by pagupta
Modified:	2019-08-15 05:00 UTC (History)
CC List:	13 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2015-10-20 09:58:30 UTC
oVirt Team:	Node
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
NTP VMs Migration log (30.03 KB, text/plain) 2015-08-25 03:03 UTC, Andrew Fung	no flags	Details
Ext NTP VM - ntpq screen capture (7.00 KB, text/plain) 2015-08-25 03:04 UTC, Andrew Fung	no flags	Details
Ext NTP VM - ntpq screen capture (6.70 KB, text/plain) 2015-08-25 03:04 UTC, Andrew Fung	no flags	Details
Int NTP VM #1 - ntpq screen capture (8.59 KB, text/plain) 2015-08-25 03:05 UTC, Andrew Fung	no flags	Details
Int NTP VM #2- ntpq screen capture (8.51 KB, text/plain) 2015-08-25 03:05 UTC, Andrew Fung	no flags	Details
View All

Description pagupta 2015-07-29 08:36:31 UTC

Description of problem:

Guest time drifts a second ahead when live migrated between two RHEL 6.6 hosts.

Version-Release number of selected component (if applicable):
rhevm-3.5.1.1-0.1.el6ev.noarch
qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64
Red Hat Enterprise Virtualization Hypervisor release 6.6                           

How reproducible:

When Customer live migrate guests to other host synced to same source. NTP offset changes.  

Steps to Reproduce:
1. Trigger VM "VM1" migrate from RHEV-H1 to RHEV-H2
2. VM1  logs "ntpd: no servers reachable"
3. 
Actual results:

NTP Source offset changes to >22969.1

Expected results:

NTP source offset should not change to such high value.

Additional info:

Customer says NTP offset increasing with migration time.

Comment 2 Fabian Deutsch 2015-07-29 09:41:49 UTC

Can you also reproduce this when doing live migration with RHEL hosts?

Comment 6 Pavol Brilla 2015-08-12 14:32:06 UTC

I was not able to reproduce this issue.


15:09 "cmndntp01" logged "ntpd: no servers reachable" 
From this message it looks more like destination host has network issue to connect  to ntp server, are you able to reproduce migration with destination host pinging NTP server during migration ( issue should not appear in this scenario)

Comment 8 Andrew Fung 2015-08-24 10:28:10 UTC

Current Architecture
                       [External NTP Source]
                                  |
                        |-------------------|
                   |--------|           |--------|   
                   RHEVH -Ext NTP     Ext NTP - RHEVH   
                 cmnnhv01 cmndntp01  cmndntp02  cmnnhv02
                     |      |           |        |
                     |      -------------        | 
                     |            |              |
                     |      -------------        |
                     |      |           |        |
                     |---Int NTP     Int NTP-----|
                         cmnndns01   cmnndns02
                            |           |
                            -------------
                                  |
                             NTP Clients  

1. RHEVH hosting the External NTP Server VMs & Internal NTP Server VMs
2. RHEVH & Ext NTP Server Time sync with External NTP Source independent.
3. Int NTP Server source from Ext NTP Server. 
4. All Internal NTP Client looking at Internal NTP Server, DMZ NTP Client looking at External NTP Server.

Comment 9 Andrew Fung 2015-08-25 03:03:59 UTC

Created attachment 1066688 [details]
NTP VMs Migration log

Comment 10 Andrew Fung 2015-08-25 03:04:38 UTC

Created attachment 1066689 [details]
Ext NTP VM - ntpq screen capture

Comment 11 Andrew Fung 2015-08-25 03:04:55 UTC

Created attachment 1066690 [details]
Ext NTP VM - ntpq screen capture

Comment 12 Andrew Fung 2015-08-25 03:05:23 UTC

Created attachment 1066691 [details]
Int NTP VM #1 - ntpq screen capture

Comment 13 Andrew Fung 2015-08-25 03:05:46 UTC

Created attachment 1066702 [details]
Int NTP VM #2- ntpq screen capture

Comment 14 Andrew Fung 2015-08-25 03:43:24 UTC

Below is the VM Migration time (from cmnnhv02 to cmnnhv01), and the NTP VMs logged offset timestamp.

0943-0944 : Migration cmndntp01
0950-0951 : Migration cmndntp02
0954-0955 : Migration cmnndns01
0954 : cmndntp01 logged offset (422.899) with stdtime.gov.hk
1030 : cmnndns02 logged offset (-423.27) with cmndntp01
1029-1030 : Migration cmnndns02
1029 : cmnndns02 logged offset (-369.9) with cmndntp02 & (-423.22) with cmndntp01
1029 : cmnndns01 logged offset (120.316) with cmndntp02 & (67.054) with cmndntp01
1029 : cmndntp01 logged offset (423.456) with clock.cuhk.edu & (403.818) with stdtime.gov.hk
1029 : cmndntp02 logged offset (370.0) with clock.cuhk.edu
1058 : cmnndns02 logged offset (121.393) with cmndntp02 & (59.971) with cmndntp01
1059 : cmndntp02 logged offset (360.384) with clock.cuhk.edu & (369.932) with stdtime.gov.hk
1059 : cmnndns01 logged offset (117.404) with cmndntp02 & (96.738) with cmndntp01

Comment 15 Fabian Deutsch 2015-09-03 14:09:11 UTC

Michal, have you maybe seen something like this before?

Comment 16 Michal Skrivanek 2015-09-03 15:17:34 UTC

not really
There are some things which may help...

- it's not a good idea in general to use VM as an NTP server. VMs always struggle with exact micro-second precision timing. Even regardless migration

- migration downtime during handover from src to dst will always create a gap which is at most equal to the maximum allowed downtime value as configured (500ms default)

- NTP is well designed for a resilient deployment on real hw, with small footprint. I would consider running it on existing real infrastructure

- the actual behavior seems to indicate a problem with network connectivity on the destination host (it looks like some on-demand connection to the outside world when VM starts running there. Maybe some NIC-related hooks are used or a dial-up or something?). That further complicates things as after the migration the NTP server is unable to re-sync for some time

it sounds to me that the solution is architectural/deployment change

Comment 17 Fabian Deutsch 2015-09-08 09:55:58 UTC

Andrew, does comment 16 help you?

Comment 18 Moran Goldboim 2015-10-20 09:58:30 UTC


*** This bug has been marked as a duplicate of bug 1156194 ***

Note You need to log in before you can comment on or make changes to this bug.