Bug 1247912 - Time drifts while live migration in guests used as NTP servers [NEEDINFO]
Time drifts while live migration in guests used as NTP servers
Status: CLOSED DUPLICATE of bug 1156194
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: rhev-hypervisor (Show other bugs)
3.5.1
x86_64 Linux
high Severity high
: ovirt-3.6.0-rc3
: ---
Assigned To: Fabian Deutsch
wanghui
node
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-07-29 04:36 EDT by pagupta
Modified: 2016-07-03 20:39 EDT (History)
14 users (show)

See Also:
Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2015-10-20 05:58:30 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
fdeutsch: needinfo? (andrew.fung)
fdeutsch: needinfo? (pagupta)


Attachments (Terms of Use)
NTP VMs Migration log (30.03 KB, text/plain)
2015-08-24 23:03 EDT, Andrew Fung
no flags Details
Ext NTP VM - ntpq screen capture (7.00 KB, text/plain)
2015-08-24 23:04 EDT, Andrew Fung
no flags Details
Ext NTP VM - ntpq screen capture (6.70 KB, text/plain)
2015-08-24 23:04 EDT, Andrew Fung
no flags Details
Int NTP VM #1 - ntpq screen capture (8.59 KB, text/plain)
2015-08-24 23:05 EDT, Andrew Fung
no flags Details
Int NTP VM #2- ntpq screen capture (8.51 KB, text/plain)
2015-08-24 23:05 EDT, Andrew Fung
no flags Details

  None (edit)
Description pagupta 2015-07-29 04:36:31 EDT
Description of problem:

Guest time drifts a second ahead when live migrated between two RHEL 6.6 hosts.

Version-Release number of selected component (if applicable):
rhevm-3.5.1.1-0.1.el6ev.noarch
qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64
Red Hat Enterprise Virtualization Hypervisor release 6.6                           

How reproducible:

When Customer live migrate guests to other host synced to same source. NTP offset changes.  

Steps to Reproduce:
1. Trigger VM "VM1" migrate from RHEV-H1 to RHEV-H2
2. VM1  logs "ntpd: no servers reachable"
3. 
Actual results:

NTP Source offset changes to >22969.1

Expected results:

NTP source offset should not change to such high value.

Additional info:

Customer says NTP offset increasing with migration time.
Comment 2 Fabian Deutsch 2015-07-29 05:41:49 EDT
Can you also reproduce this when doing live migration with RHEL hosts?
Comment 6 Pavol Brilla 2015-08-12 10:32:06 EDT
I was not able to reproduce this issue.


15:09 "cmndntp01" logged "ntpd: no servers reachable" 
From this message it looks more like destination host has network issue to connect  to ntp server, are you able to reproduce migration with destination host pinging NTP server during migration ( issue should not appear in this scenario)
Comment 8 Andrew Fung 2015-08-24 06:28:10 EDT
Current Architecture
                       [External NTP Source]
                                  |
                        |-------------------|
                   |--------|           |--------|   
                   RHEVH -Ext NTP     Ext NTP - RHEVH   
                 cmnnhv01 cmndntp01  cmndntp02  cmnnhv02
                     |      |           |        |
                     |      -------------        | 
                     |            |              |
                     |      -------------        |
                     |      |           |        |
                     |---Int NTP     Int NTP-----|
                         cmnndns01   cmnndns02
                            |           |
                            -------------
                                  |
                             NTP Clients  

1. RHEVH hosting the External NTP Server VMs & Internal NTP Server VMs
2. RHEVH & Ext NTP Server Time sync with External NTP Source independent.
3. Int NTP Server source from Ext NTP Server. 
4. All Internal NTP Client looking at Internal NTP Server, DMZ NTP Client looking at External NTP Server.
Comment 9 Andrew Fung 2015-08-24 23:03:59 EDT
Created attachment 1066688 [details]
NTP VMs Migration log
Comment 10 Andrew Fung 2015-08-24 23:04:38 EDT
Created attachment 1066689 [details]
Ext NTP VM - ntpq screen capture
Comment 11 Andrew Fung 2015-08-24 23:04:55 EDT
Created attachment 1066690 [details]
Ext NTP VM - ntpq screen capture
Comment 12 Andrew Fung 2015-08-24 23:05:23 EDT
Created attachment 1066691 [details]
Int NTP VM #1 - ntpq screen capture
Comment 13 Andrew Fung 2015-08-24 23:05:46 EDT
Created attachment 1066702 [details]
Int NTP VM #2- ntpq screen capture
Comment 14 Andrew Fung 2015-08-24 23:43:24 EDT
Below is the VM Migration time (from cmnnhv02 to cmnnhv01), and the NTP VMs logged offset timestamp.

0943-0944 : Migration cmndntp01
0950-0951 : Migration cmndntp02
0954-0955 : Migration cmnndns01
0954 : cmndntp01 logged offset (422.899) with stdtime.gov.hk
1030 : cmnndns02 logged offset (-423.27) with cmndntp01
1029-1030 : Migration cmnndns02
1029 : cmnndns02 logged offset (-369.9) with cmndntp02 & (-423.22) with cmndntp01
1029 : cmnndns01 logged offset (120.316) with cmndntp02 & (67.054) with cmndntp01
1029 : cmndntp01 logged offset (423.456) with clock.cuhk.edu & (403.818) with stdtime.gov.hk
1029 : cmndntp02 logged offset (370.0) with clock.cuhk.edu
1058 : cmnndns02 logged offset (121.393) with cmndntp02 & (59.971) with cmndntp01
1059 : cmndntp02 logged offset (360.384) with clock.cuhk.edu & (369.932) with stdtime.gov.hk
1059 : cmnndns01 logged offset (117.404) with cmndntp02 & (96.738) with cmndntp01
Comment 15 Fabian Deutsch 2015-09-03 10:09:11 EDT
Michal, have you maybe seen something like this before?
Comment 16 Michal Skrivanek 2015-09-03 11:17:34 EDT
not really
There are some things which may help...

- it's not a good idea in general to use VM as an NTP server. VMs always struggle with exact micro-second precision timing. Even regardless migration

- migration downtime during handover from src to dst will always create a gap which is at most equal to the maximum allowed downtime value as configured (500ms default)

- NTP is well designed for a resilient deployment on real hw, with small footprint. I would consider running it on existing real infrastructure

- the actual behavior seems to indicate a problem with network connectivity on the destination host (it looks like some on-demand connection to the outside world when VM starts running there. Maybe some NIC-related hooks are used or a dial-up or something?). That further complicates things as after the migration the NTP server is unable to re-sync for some time

it sounds to me that the solution is architectural/deployment change
Comment 17 Fabian Deutsch 2015-09-08 05:55:58 EDT
Andrew, does comment 16 help you?
Comment 18 Moran Goldboim 2015-10-20 05:58:30 EDT

*** This bug has been marked as a duplicate of bug 1156194 ***

Note You need to log in before you can comment on or make changes to this bug.