Description of problem: Guest time drifts a second ahead when live migrated between two RHEL 6.6 hosts. Version-Release number of selected component (if applicable): rhevm-3.5.1.1-0.1.el6ev.noarch qemu-kvm-rhev-0.12.1.2-2.446.el6.x86_64 Red Hat Enterprise Virtualization Hypervisor release 6.6 How reproducible: When Customer live migrate guests to other host synced to same source. NTP offset changes. Steps to Reproduce: 1. Trigger VM "VM1" migrate from RHEV-H1 to RHEV-H2 2. VM1 logs "ntpd: no servers reachable" 3. Actual results: NTP Source offset changes to >22969.1 Expected results: NTP source offset should not change to such high value. Additional info: Customer says NTP offset increasing with migration time.
Can you also reproduce this when doing live migration with RHEL hosts?
I was not able to reproduce this issue. 15:09 "cmndntp01" logged "ntpd: no servers reachable" From this message it looks more like destination host has network issue to connect to ntp server, are you able to reproduce migration with destination host pinging NTP server during migration ( issue should not appear in this scenario)
Current Architecture [External NTP Source] | |-------------------| |--------| |--------| RHEVH -Ext NTP Ext NTP - RHEVH cmnnhv01 cmndntp01 cmndntp02 cmnnhv02 | | | | | ------------- | | | | | ------------- | | | | | |---Int NTP Int NTP-----| cmnndns01 cmnndns02 | | ------------- | NTP Clients 1. RHEVH hosting the External NTP Server VMs & Internal NTP Server VMs 2. RHEVH & Ext NTP Server Time sync with External NTP Source independent. 3. Int NTP Server source from Ext NTP Server. 4. All Internal NTP Client looking at Internal NTP Server, DMZ NTP Client looking at External NTP Server.
Created attachment 1066688 [details] NTP VMs Migration log
Created attachment 1066689 [details] Ext NTP VM - ntpq screen capture
Created attachment 1066690 [details] Ext NTP VM - ntpq screen capture
Created attachment 1066691 [details] Int NTP VM #1 - ntpq screen capture
Created attachment 1066702 [details] Int NTP VM #2- ntpq screen capture
Below is the VM Migration time (from cmnnhv02 to cmnnhv01), and the NTP VMs logged offset timestamp. 0943-0944 : Migration cmndntp01 0950-0951 : Migration cmndntp02 0954-0955 : Migration cmnndns01 0954 : cmndntp01 logged offset (422.899) with stdtime.gov.hk 1030 : cmnndns02 logged offset (-423.27) with cmndntp01 1029-1030 : Migration cmnndns02 1029 : cmnndns02 logged offset (-369.9) with cmndntp02 & (-423.22) with cmndntp01 1029 : cmnndns01 logged offset (120.316) with cmndntp02 & (67.054) with cmndntp01 1029 : cmndntp01 logged offset (423.456) with clock.cuhk.edu & (403.818) with stdtime.gov.hk 1029 : cmndntp02 logged offset (370.0) with clock.cuhk.edu 1058 : cmnndns02 logged offset (121.393) with cmndntp02 & (59.971) with cmndntp01 1059 : cmndntp02 logged offset (360.384) with clock.cuhk.edu & (369.932) with stdtime.gov.hk 1059 : cmnndns01 logged offset (117.404) with cmndntp02 & (96.738) with cmndntp01
Michal, have you maybe seen something like this before?
not really There are some things which may help... - it's not a good idea in general to use VM as an NTP server. VMs always struggle with exact micro-second precision timing. Even regardless migration - migration downtime during handover from src to dst will always create a gap which is at most equal to the maximum allowed downtime value as configured (500ms default) - NTP is well designed for a resilient deployment on real hw, with small footprint. I would consider running it on existing real infrastructure - the actual behavior seems to indicate a problem with network connectivity on the destination host (it looks like some on-demand connection to the outside world when VM starts running there. Maybe some NIC-related hooks are used or a dial-up or something?). That further complicates things as after the migration the NTP server is unable to re-sync for some time it sounds to me that the solution is architectural/deployment change
Andrew, does comment 16 help you?
*** This bug has been marked as a duplicate of bug 1156194 ***