Bug 1247912

Summary: Time drifts while live migration in guests used as NTP servers
Product: Red Hat Enterprise Virtualization Manager Reporter: pagupta
Component: rhev-hypervisorAssignee: Fabian Deutsch <fdeutsch>
Status: CLOSED DUPLICATE QA Contact: wanghui <huiwa>
Severity: high Docs Contact:
Priority: high    
Version: 3.5.1CC: andrew.fung, ecohen, juzhang, leiwang, lsurette, mgoldboi, michal.skrivanek, pagupta, pbrilla, pstehlik, scui, ycui, yeylon
Target Milestone: ovirt-3.6.0-rc3   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard: node
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2015-10-20 09:58:30 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
NTP VMs Migration log
none
Ext NTP VM - ntpq screen capture
none
Ext NTP VM - ntpq screen capture
none
Int NTP VM #1 - ntpq screen capture
none
Int NTP VM #2- ntpq screen capture none

Description pagupta 2015-07-29 08:36:31 UTC
Description of problem:

Guest time drifts a second ahead when live migrated between two RHEL 6.6 hosts.

Version-Release number of selected component (if applicable):
rhevm-3.5.1.1-0.1.el6ev.noarch
qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64
Red Hat Enterprise Virtualization Hypervisor release 6.6                           

How reproducible:

When Customer live migrate guests to other host synced to same source. NTP offset changes.  

Steps to Reproduce:
1. Trigger VM "VM1" migrate from RHEV-H1 to RHEV-H2
2. VM1  logs "ntpd: no servers reachable"
3. 
Actual results:

NTP Source offset changes to >22969.1

Expected results:

NTP source offset should not change to such high value.

Additional info:

Customer says NTP offset increasing with migration time.

Comment 2 Fabian Deutsch 2015-07-29 09:41:49 UTC
Can you also reproduce this when doing live migration with RHEL hosts?

Comment 6 Pavol Brilla 2015-08-12 14:32:06 UTC
I was not able to reproduce this issue.


15:09 "cmndntp01" logged "ntpd: no servers reachable" 
From this message it looks more like destination host has network issue to connect  to ntp server, are you able to reproduce migration with destination host pinging NTP server during migration ( issue should not appear in this scenario)

Comment 8 Andrew Fung 2015-08-24 10:28:10 UTC
Current Architecture
                       [External NTP Source]
                                  |
                        |-------------------|
                   |--------|           |--------|   
                   RHEVH -Ext NTP     Ext NTP - RHEVH   
                 cmnnhv01 cmndntp01  cmndntp02  cmnnhv02
                     |      |           |        |
                     |      -------------        | 
                     |            |              |
                     |      -------------        |
                     |      |           |        |
                     |---Int NTP     Int NTP-----|
                         cmnndns01   cmnndns02
                            |           |
                            -------------
                                  |
                             NTP Clients  

1. RHEVH hosting the External NTP Server VMs & Internal NTP Server VMs
2. RHEVH & Ext NTP Server Time sync with External NTP Source independent.
3. Int NTP Server source from Ext NTP Server. 
4. All Internal NTP Client looking at Internal NTP Server, DMZ NTP Client looking at External NTP Server.

Comment 9 Andrew Fung 2015-08-25 03:03:59 UTC
Created attachment 1066688 [details]
NTP VMs Migration log

Comment 10 Andrew Fung 2015-08-25 03:04:38 UTC
Created attachment 1066689 [details]
Ext NTP VM - ntpq screen capture

Comment 11 Andrew Fung 2015-08-25 03:04:55 UTC
Created attachment 1066690 [details]
Ext NTP VM - ntpq screen capture

Comment 12 Andrew Fung 2015-08-25 03:05:23 UTC
Created attachment 1066691 [details]
Int NTP VM #1 - ntpq screen capture

Comment 13 Andrew Fung 2015-08-25 03:05:46 UTC
Created attachment 1066702 [details]
Int NTP VM #2- ntpq screen capture

Comment 14 Andrew Fung 2015-08-25 03:43:24 UTC
Below is the VM Migration time (from cmnnhv02 to cmnnhv01), and the NTP VMs logged offset timestamp.

0943-0944 : Migration cmndntp01
0950-0951 : Migration cmndntp02
0954-0955 : Migration cmnndns01
0954 : cmndntp01 logged offset (422.899) with stdtime.gov.hk
1030 : cmnndns02 logged offset (-423.27) with cmndntp01
1029-1030 : Migration cmnndns02
1029 : cmnndns02 logged offset (-369.9) with cmndntp02 & (-423.22) with cmndntp01
1029 : cmnndns01 logged offset (120.316) with cmndntp02 & (67.054) with cmndntp01
1029 : cmndntp01 logged offset (423.456) with clock.cuhk.edu & (403.818) with stdtime.gov.hk
1029 : cmndntp02 logged offset (370.0) with clock.cuhk.edu
1058 : cmnndns02 logged offset (121.393) with cmndntp02 & (59.971) with cmndntp01
1059 : cmndntp02 logged offset (360.384) with clock.cuhk.edu & (369.932) with stdtime.gov.hk
1059 : cmnndns01 logged offset (117.404) with cmndntp02 & (96.738) with cmndntp01

Comment 15 Fabian Deutsch 2015-09-03 14:09:11 UTC
Michal, have you maybe seen something like this before?

Comment 16 Michal Skrivanek 2015-09-03 15:17:34 UTC
not really
There are some things which may help...

- it's not a good idea in general to use VM as an NTP server. VMs always struggle with exact micro-second precision timing. Even regardless migration

- migration downtime during handover from src to dst will always create a gap which is at most equal to the maximum allowed downtime value as configured (500ms default)

- NTP is well designed for a resilient deployment on real hw, with small footprint. I would consider running it on existing real infrastructure

- the actual behavior seems to indicate a problem with network connectivity on the destination host (it looks like some on-demand connection to the outside world when VM starts running there. Maybe some NIC-related hooks are used or a dial-up or something?). That further complicates things as after the migration the NTP server is unable to re-sync for some time

it sounds to me that the solution is architectural/deployment change

Comment 17 Fabian Deutsch 2015-09-08 09:55:58 UTC
Andrew, does comment 16 help you?

Comment 18 Moran Goldboim 2015-10-20 09:58:30 UTC

*** This bug has been marked as a duplicate of bug 1156194 ***