Bug 1476650

Summary: RHV-H Upgrade Breaks System Clock Sync
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: ovirt-node-ngAssignee: Ryan Barry <rbarry>
Status: CLOSED ERRATA QA Contact: Huijuan Zhao <huzhao>
Severity: urgent Docs Contact:
Priority: unspecified    
Version: 4.1.4CC: apinnick, cshao, dfediuck, dguo, gveitmic, huzhao, jiawu, mgoldboi, mkalinin, qiyuan, rbarry, sbonazzo, trichard, weiwang, yaniwang, ycui, yzhao
Target Milestone: ovirt-4.1.5Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: redhat-release-virtualization-host-4.1-5.0 Doc Type: Bug Fix
Doc Text:
Later versions of RHV 4.1 dropped ntpd as a dependency from VDSM in favor of chrony. Earlier versions, however, had included ntpd and users had come to rely on it for time configuration. When the dependency was dropped, RHVH no longer included ntpd. Now, RHVH images once more include ntpd. As a result, ntpd configurations from earlier versions of RHV will continue to work.
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-08-22 17:44:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1481952    
Bug Blocks:    

Description Germano Veit Michel 2017-07-31 03:17:43 UTC
Description of problem:

Bugzilla #1428419 (switch VDSM dependency from ntp to chrony) removed the ntp dependency from vdsm and added chrony in RHV 4.1.2. This means NTP is not pulled into RHV-H images, only chrony.

What about previous installations (4.1.1 and lower, down to 3.6 NGN) which had NTP configured? Our documentation says it's important to configure clock synchronization, but then suddenly the NTP package is gone from the image. This breaks every installation that configured NTP in the past, potentially leaving lots of host with time sync disabled.

After upgrading to 4.1.2+ user ntpd configuration is persisted but the package/service is not. Due to this there time synchronization may break and the clock can go out of sync. If the user doesn't pay attention to the engine warnings, bad this can happen.

NTP should still be included in the images for backward compatibility for the users that have already configured it.

Steps to Reproduce:
1. Install 4.0.x or 4.1.1 (including 3.6 NGN?)

rhvh-4.0-0.20170307.0:
# rpm -qa | egrep 'ntp-|chrony-'
chrony-2.1.1-4.el7_3.x86_64
ntp-4.2.6p5-25.el7_3.1.x86_64

2. Setup NTP

# grep special /etc/ntp.conf
##### my special config here
systemctl enable ntpd
systemctl enable start

3. Upgrade to latest (4.1-20170706.0):
# rpm -qa | egrep 'ntp-|chrony-'
chrony-2.1.1-4.el7_3.x86_64
# grep special /etc/ntp.conf
##### my special config here
# systemctl status ntpd
Unit ntpd.service could not be found.

Comment 4 Huijuan Zhao 2017-08-01 08:18:44 UTC
QE can reproduce this issue.

Test version:
From: redhat-virtualization-host-4.0-20170307.1
To:   redhat-virtualization-host-4.1-20170728.0
      imgbased-0.9.36-0.1.el7ev.noarch

# imgbase layout
rhvh-4.0-0.20170307.0
 +- rhvh-4.0-0.20170307.0+1
rhvh-4.1-0.20170728.0
 +- rhvh-4.1-0.20170728.0+1


Test steps:
1. Install redhat-virtualization-host-4.0-20170307.1
2. Login rhvh, check ntp and setup ntp:
   # rpm -qa | egrep 'ntp-|chrony-'
   chrony-2.1.1-4.el7_3.x86_64
   ntp-4.2.6p5-25.el7_3.1.x86_64

   # ls /etc/chrony.conf 
   /etc/chrony.conf 
   
   # systemctl enable ntpd
   # systemctl start ntpd

   # systemctl status ntpd
● ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Tue 2017-08-01 07:43:48 GMT; 14s ago
  Process: 20554 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 20555 (ntpd)
   CGroup: /system.slice/ntpd.service
           └─20555 /usr/sbin/ntpd -u ntp:ntp -g

Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: Listen and drop on 1 v6wildcard :: UDP 123
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: Listen normally on 2 lo 127.0.0.1 UDP 123
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: Listen normally on 3 em1 10.66.10.16 UDP 123
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: Listen normally on 4 lo ::1 UDP 123
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: Listen normally on 5 em1 fe80::baca:3aff:fea9:9170 UDP 123
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: Listening on routing socket on fd #22 for interface updates
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: 0.0.0.0 c016 06 restart
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Aug 01 07:43:48 dhcp-10-16.nay.redhat.com ntpd[20555]: 0.0.0.0 c011 01 freq_not_set
Aug 01 07:43:55 dhcp-10-16.nay.redhat.com ntpd[20555]: 0.0.0.0 c614 04 freq_mode

3. Setup local repo and upgrade to rhvh-4.1-0.20170728.0:
   # yum update
4. Reboot and login new layer rhvh-4.1-0.20170728.0, check ntp:
   # rpm -qa | egrep 'ntp-|chrony-'
   # ls /etc/chrony.conf
   # systemctl status ntpd


Actual results:
1. After step4, there is /etc/ntp.conf, but rpm ntp is disappeared and ntpd is disabled.

# rpm -qa | egrep 'ntp-|chrony-'
chrony-3.1-2.el7.x86_64

# ls /etc/ntp.conf
/etc/ntp.conf

# systemctl status ntpd
Unit ntpd.service could not be found.


Expected results:
1. After step4, suggest to persist rpm ntp and ntpd should be active.

Comment 10 Huijuan Zhao 2017-08-16 09:43:37 UTC
There is Bug 1481952 when upgrade from rhvh-4.x(el7.3) to rhvh-4.1-0.20170815.0, so I will test this scenario when the bug is resolved.


Currently tested this issue when upgrade from rhvh-4.1-0.20170808.0(el7.4) to rhvh-4.1-0.20170815.0, below is detailed info.


Test version:
From: rhvh-4.1-0.20170808.0
To:   rhvh-4.1-0.20170815.0
      imgbased-0.9.43-0.1.el7ev.noarch


Test steps:
1. Install rhvh-4.1-0.20170808.0
2. Login rhvh and check there is no ntp rpm:
   # rpm -qa | egrep 'ntp-|chrony-'
   chrony-3.1-2.el7.x86_64
3. Setup local repo and upgrade rhvh to rhvh-4.1-0.20170815.0:
   # yum update
4. Reboot and login new layer rhvh-4.1-0.20170815.0, check ntp:
   # rpm -qa | egrep 'ntp-|chrony-'
   # ls /etc/ntp.conf
   # systemctl status ntpd
   # systemctl start ntpd
   # systemctl status ntpd

Test results:
After step 4, there is rpm ntp, but start ntpd failed.

# rpm -qa | egrep 'ntp-|chrony-'
ntp-4.2.6p5-25.el7_3.2.x86_64
chrony-3.1-2.el7.x86_64

# ls /etc/ntp.conf 
/etc/ntp.conf

# systemctl status ntpd
● ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
   Active: inactive (dead)

# systemctl start ntpd
# 
# systemctl status ntpd
● ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; disabled; vendor preset: disabled)
   Active: failed (Result: exit-code) since Wed 2017-08-16 07:55:04 GMT; 5s ago
  Process: 28964 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 28965 (code=exited, status=255)

Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: Listen normally on 4 lo ::1 UDP 123
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: Listen normally on 5 em1 fe80::baca:3aff:fea9:9170 UDP 123
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: Listening on routing socket on fd #22 for interface updates
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: 0.0.0.0 c016 06 restart
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: 0.0.0.0 c011 01 freq_not_set
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com ntpd[28965]: Cannot find user `ntp'
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com systemd[1]: ntpd.service: main process exited, code=exited, status=255/n/a
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com systemd[1]: Unit ntpd.service entered failed state.
Aug 16 07:55:04 dhcp-10-16.nay.redhat.com systemd[1]: ntpd.service failed.



As there is no ntp rpm in the old layer, after upgrade to new layer, although there is ntp rpm, but the ntpd is failed to start, I am not sure whether this is expected results.

Comment 11 Ryan Barry 2017-08-16 21:19:58 UTC
Since this is primarily backwards compatibility for upgrading from 4.0, that's ok.

Testing 4.1->4.1 did expose a bug in addition of new users/groups (they were not added if no drift occurred) which is fixed in the new version of imgbased.

Comment 12 Huijuan Zhao 2017-08-17 07:27:26 UTC
Tested with imgbased-0.9.47-0.1.el7ev.noarch(upgrade from rhvh-4.1-0.20170808.0 to rhvh-4.1-0.20170816.2), the issue in comment 10 goes away(After step 4, there is rpm ntp, and start ntpd successful). 


Below is another scenario, upgrade from 4.0 to 4.1.

Test version:
From: rhvh-4.0-0.20170307.0
To:   rhvh-4.1-0.20170816.2
      imgbased-0.9.47-0.1.el7ev.noarch

# imgbase layout
rhvh-4.0-0.20170307.0
 +- rhvh-4.0-0.20170307.0+1
rhvh-4.1-0.20170817.0
 +- rhvh-4.1-0.20170817.0+1


Test steps:
Same as comment 4


Test results:
After step4, rpm ntp is persisted, ntpd is inactive(chronyd is active), but after start ntpd(chronyd is inactive), it can be active successful.

# rpm -qa | egrep 'ntp-|chrony-'
chrony-3.1-2.el7.x86_64
ntp-4.2.6p5-25.el7_3.2.x86_64

# ls /etc/ntp.conf 
/etc/ntp.conf

# systemctl start ntpd

# systemctl status ntpd
● ntpd.service - Network Time Service
   Loaded: loaded (/usr/lib/systemd/system/ntpd.service; enabled; vendor preset: disabled)
   Active: active (running) since Thu 2017-08-17 06:07:26 GMT; 7s ago
  Process: 29315 ExecStart=/usr/sbin/ntpd -u ntp:ntp $OPTIONS (code=exited, status=0/SUCCESS)
 Main PID: 29316 (ntpd)
   CGroup: /system.slice/ntpd.service
           └─29316 /usr/sbin/ntpd -u ntp:ntp -g

Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: ntp_io: estimated max descriptors: 1024, initial socket boundary: 16
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: Listen and drop on 0 v4wildcard 0.0.0.0 UDP 123
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: Listen and drop on 1 v6wildcard :: UDP 123
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: Listen normally on 2 lo 127.0.0.1 UDP 123
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: Listen normally on 3 ovirtmgmt 10.66.10.16 UDP 123
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: Listen normally on 4 lo ::1 UDP 123
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: Listening on routing socket on fd #21 for interface updates
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: 0.0.0.0 c016 06 restart
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: 0.0.0.0 c012 02 freq_set kernel 0.000 PPM
Aug 17 06:07:26 dhcp-10-16.nay.redhat.com ntpd[29316]: 0.0.0.0 c011 01 freq_not_set


So this bug is fixed in imgbased-0.9.47-0.1.el7ev.noarch, change the status to VERIFIED.

Comment 14 errata-xmlrpc 2017-08-22 17:44:45 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:2529