Bug 1454536

Summary: HostedEngine setup fails if RHV-H timezone < UTC set during installation
Product: Red Hat Enterprise Virtualization Manager Reporter: Germano Veit Michel <gveitmic>
Component: imgbasedAssignee: Ryan Barry <rbarry>
Status: CLOSED ERRATA QA Contact: Yihui Zhao <yzhao>
Severity: urgent Docs Contact:
Priority: high    
Version: 4.1.1CC: cshao, dfediuck, huzhao, lsurette, melewis, mgoldboi, qiyuan, rbarry, sbonazzo, srevivo, stirabos, trichard, weiwang, yaniwang, ycui, ykaul, yzhao
Target Milestone: ovirt-4.2.0Keywords: ZStream
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: imgbased-0.9.30-0.1.el7ev Doc Type: Known Issue
Doc Text:
Red Hat Virtualization Host generates VDSM certificates at the time of the first boot. This means that if the system clock was not set correctly at install time, chronyd or ntpd may resynchronize the clock after the VDSM certificate was generated, leading to a certificate that is not valid yet if the timezone is behind UTC. Now, imgbased-configure-vdsm starts after chronyd or ntpd and waits two seconds for the clock to synchronize; however, this does guarantee that the clock will be synchronized in time, so the best workaround is to set the system clock appropriately during installation.
Story Points: ---
Clone Of:
: 1455441 (view as bug list) Environment:
Last Closed: 2018-05-15 17:57:40 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1455441    
Attachments:
Description Flags
all log info none

Description Germano Veit Michel 2017-05-23 01:10:02 UTC
Description of problem:

Fresh HostedEngine deployment fails because vdsmcert.pem NotBefore date not valid if host deployed with Timezone < UTC.

If I select UTC-4  during Anaconda Install, certificate is not valid for the next 4 hours.
            UTC-3  during Anaconda Install, certificate is not valid for the next 3 hours.
            UTC    during Anaconda Install, certificate is valid.
            UTC+10 during Anaconda Install, certificate is valid since 10 hours before install.

Certificate being is generated during RHV-H Install, not during ovirt-hosted-engine-setup.

Version-Release number of selected component (if applicable):
RHVH-4.1-20170425.3

How reproducible:
100%

Steps to Reproduce:
1. Boot Installation ISO
2. Select New York Timezone
3. Finish Install
4. Try to Deploy Hosted-Engine

OR (short way)

1. Boot Installation ISO
2. Select New York Timezone
3. Finish Install
4. vdsm-tool configure --module libvirt --force

Actual results:

HE Deployment:

[ INFO  ] Stage: Environment setup
[ ERROR ] Failed to execute stage 'Environment setup': Failed to reconfigure libvirt for VDSM
[ INFO  ] Stage: Clean up

libvirt configuration fails, HostedEngine not deployed.

VDSM-Tool:

# vdsm-tool configure --module libvirt --force

Checking configuration status...

libvirt is already configured for vdsm
SUCCESS: ssl configured to true. No conflicts

Running configure...
Reconfiguration of libvirt is done.
Error:  ServiceOperationError: _systemctlStart failed
Job for libvirtd.service failed because the control process exited with error code. See "systemctl status libvirtd.service" and "journalctl -xe" for d
etails.
 
May 22 20:18:13 localhost journal: The server certificate /etc/pki/vdsm/certs/vdsmcert.pem is not yet active
May 22 20:18:13 localhost systemd: libvirtd.service: main process exited, code=exited, status=6/NOTCONFIGURED
May 22 20:18:13 localhost systemd: Failed to start Virtualization daemon.
May 22 20:18:13 localhost systemd: Job vdsmd.service/start failed with result 'dependency'.

Host was installed 3-4 minutes ago.

# date
Mon May 22 20:08:04 EDT 2017

# openssl verify -CAfile /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/certs/vdsmcert.pem
/etc/pki/vdsm/certs/vdsmcert.pem: CN = VDSM Certificate Authority
error 9 at 1 depth lookup:certificate is not yet valid

# openssl x509 -noout -dates -in /etc/pki/vdsm/certs/vdsmcert.pem 
notBefore=May 23 04:03:59 2017 GMT
notAfter=May 23 04:03:59 2018 GMT

But 20:08 EDT is 00:08 GMT, not 04:08 GMT. So certificate is 4 hours ahead.

Here the UTC-3 option during install.
-------------------------------------

# openssl x509 -noout -dates -in /etc/pki/vdsm/certs/vdsmcert.pem 
notBefore=May 23 03:49:20 2017 GMT
notAfter=May 23 03:49:20 2018 GMT

# date
Mon May 22 21:50:16 -03 2017

Again, now the certificate is 3 hours ahead.

Here the UTC+10 option during install.
--------------------------------------

# openssl x509 -noout -dates -in /etc/pki/vdsm/certs/vdsmcert.pem 
notBefore=May 22 15:05:08 2017 GMT
notAfter=May 22 15:05:08 2018 GMT

# date
Tue May 23 11:06:11 AEST 2017

11:06 AEST is 01:06 GMT on the same day. Host was installed 3 minutes ago, but certificate is valid since 15:05 GMT, which is 10 hours before host install.

Expected results:
Certificate generated during installation should be valid regardless of selected Timezone.

Comment 1 Germano Veit Michel 2017-05-23 01:16:45 UTC
Same thing with /etc/pki/vdsm/certs/cacert.pem

Comment 2 Ryan Barry 2017-05-23 01:40:15 UTC
The certificate is not generated at install time, but rather on the first boot by vdsm itself. Reassigning...

Comment 3 Ryan Barry 2017-05-23 01:41:30 UTC
Can the customer also ensure that this isn't ntp/chrony catching the clock up after install?

That is -- does the hardware clock change on the first boot (new in box server)?

Comment 4 Germano Veit Michel 2017-05-23 01:44:50 UTC
(In reply to Ryan Barry from comment #2)
> The certificate is not generated at install time, but rather on the first
> boot by vdsm itself. Reassigning...

Hi Ryan,

Right, thanks. I was actually expecting it to be generated during ovirt-hosted-engine-deploy using src/plugins/gr-he-setup/pki/vdsmpki.py.

> Can the customer also ensure that this isn't ntp/chrony catching the clock
> up after install?
> 
> That is -- does the hardware clock change on the first boot (new in box
> server)?

I can reproduce it as many times as I want, without any NTP setting.

Comment 5 Germano Veit Michel 2017-05-23 01:52:37 UTC
Ryan,

Are you saying this is generated during that vdsm_init sh script? So it would be through a vdsm-tool configure call?

Any ideas on what debug to enable to get more info on this?

Comment 7 Ryan Barry 2017-05-23 02:49:29 UTC
(In reply to Germano Veit Michel from comment #5)
> Ryan,
> 
> Are you saying this is generated during that vdsm_init sh script? So it
> would be through a vdsm-tool configure call?
> 
> Any ideas on what debug to enable to get more info on this?

Well, kind of. It's generated as part of the patches attached to https://bugzilla.redhat.com/show_bug.cgi?id=1429288 with 'vdsm-tool configure --force'

You could disable this during RHVH install during %post, then start it at will, with debugging added to the systemd service. Ex:

%post
systemctl disable imgbase-config-vdsm
imgbase init --layout
%end

But I'd expect this is also reproducible on RHEL.

Comment 8 Sandro Bonazzola 2017-05-23 09:34:34 UTC
Simone can you have a look?

Comment 9 Simone Tiraboschi 2017-05-24 18:30:53 UTC
Reproduced just delpoying ovirt-node-ng-installer-ovirt-4.1-2017052309.iso
without starting hosted-engine-setup.

Just after node installation, vdsmcert.pem is already there but invalid due to not before.

We need to understand what generates it, if it's already there hosted-engine-setup will simply keep that one.

[root@node412 ~]# date -u
mer 24 mag 2017, 18.26.47, UTC
[root@node412 ~]# openssl x509 -noout -dates -in /etc/pki/vdsm/certs/vdsmcert.pem
notBefore=May 25 00:10:05 2017 GMT
notAfter=May 25 00:10:05 2018 GMT

Comment 10 Simone Tiraboschi 2017-05-24 18:41:48 UTC
This looks interesting,
on my reproduction vdsmcert.pem has been generated at 17:10:05


[root@node412 ~]# ls --full-time /etc/pki/vdsm/certs/vdsmcert.pem 
-rw-------. 1 vdsm kvm 1224 2017-05-24 17:10:05.973100716 -0700 /etc/pki/vdsm/certs/vdsmcert.pem

but at 17:13:16 chronyd moved back the time to 10:14:27; I assume that my cert is not valid due to that reason.

mag 24 17:13:16 node412.localdomain NetworkManager[1063]: <info>  [1495671196.9871] policy: set 'eth0' (eth0) as default for IPv6 routing and DNS
mag 24 10:13:21 node412.localdomain systemd[1]: Time has been changed
mag 24 17:13:21 node412.localdomain chronyd[997]: Selected source 212.45.144.206
mag 24 17:13:21 node412.localdomain chronyd[997]: System clock wrong by -25199.566962 seconds, adjustment started
mag 24 10:13:21 node412.localdomain chronyd[997]: System clock was stepped by -25199.566962 seconds
mag 24 10:14:27 node412.localdomain chronyd[997]: Selected source 85.199.214.99

Comment 11 Simone Tiraboschi 2017-05-24 18:55:03 UTC
On my opinion this is node specific and the issue is that, at least on my system, imgbase-config-vdsm started configuring vdsm (and creating its certs) at 17:09:59 while only at 17:13:21 chronyd moved back the time to 10:14:27 and so the certs create at 17:10:05 was simply invalid because they was created in a relative future.

Maybe we should make imgbase-config-vdsm waiting chronyd at systemd level

[root@node412 ~]# journalctl -u imgbase-config-vdsm
-- Logs begin at mer 2017-05-24 10:09:51 PDT, end at mer 2017-05-24 11:48:01 PDT. --
mag 24 17:09:59 node412.localdomain systemd[1]: Starting Reconfigure vdsm...
mag 24 17:10:08 node412.localdomain usermod[1204]: add 'sanlock' to group 'kvm'
mag 24 17:10:08 node412.localdomain usermod[1204]: add 'sanlock' to group 'qemu'
mag 24 17:10:08 node412.localdomain usermod[1204]: add 'sanlock' to shadow group 'kvm'
mag 24 17:10:08 node412.localdomain usermod[1204]: add 'sanlock' to shadow group 'qemu'
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Checking configuration status...
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: multipath requires configuration
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: lvm requires configuration
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: libvirt is not configured for vdsm yet
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: FAILED: conflicting vdsm and libvirt-qemu tls configuration.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: vdsm.conf with ssl=True requires the following changes:
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: libvirtd.conf: listen_tcp=0, auth_tcp="sasl", listen_tls=1
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: qemu.conf: spice_tls=1.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Running configure...
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of certificates is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of sebool is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of multipath is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Backing up /etc/lvm/lvmlocal.conf to /etc/lvm/lvmlocal.conf.201705241710
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Installing /usr/share/vdsm/lvmlocal.conf at /etc/lvm/lvmlocal.conf
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of lvm is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of passwd is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of sanlock is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Reconfiguration of libvirt is done.
mag 24 17:10:08 node412.localdomain vdsm-tool[951]: Done configuring modules to VDSM.
mag 24 17:10:08 node412.localdomain systemd[1]: Started Reconfigure vdsm.

Comment 12 Simone Tiraboschi 2017-05-24 19:07:53 UTC
Checking the sos-reports from the original reporter, I think it happened exactly in the same way:


May 22 18:30:59 ******** dracut: *** Including module: network ***
May 22 14:30:59 ******** chronyd[1726]: Selected source *******
May 22 14:30:59 ******** systemd: Time has been changed
May 22 14:30:59 ******** chronyd[1726]: System clock wrong by -14399.772337 seconds, adjustment started
May 22 14:30:59 ******** chronyd[1726]: System clock was stepped by -14399.772337 seconds

Comment 14 cshao 2017-05-25 03:34:54 UTC
QE can reproduce this issue.

Test version:
rhvh-4.1-0.20170421.0+1
RHVH-4.1-20170425.3-RHVH-x86_64-dvd1.iso
imgbased-0.9.23-0.1.el7ev.noarch
libvirt-2.0.0-10.el7_3.5.x86_64
vdsm-4.19.10.1-1.el7ev.x86_64
ovirt-hosted-engine-ha-2.1.0.5-2.el7ev.noarch
ovirt-hosted-engine-setup-2.1.0.5-1.el7ev.noarch
rhvm-appliance-20170511.0-1.x86_64.rhevm.ova

Test steps:
1. Boot Installation ISO
2. Select New York Timezone
3. Finish Install
4. vdsm-tool configure --module libvirt --force
5. Try to Deploy Hosted-Engine via cockpit

Test result:
Hosted Engine deployment failed.


===================================================
# imgbase w
[INFO] You are on rhvh-4.1-0.20170421.0+1

# date 
Wed May 24 22:58:47 EDT 2017

#  ls --full-time /etc/pki/vdsm/certs/vdsmcert.pem 
-rw-------. 1 vdsm kvm 1224 2017-05-25 02:55:49.477178065 -0400 /etc/pki/vdsm/certs/vdsmcert.pem
[root@dhcp-9-70 ~]# date 
Wed May 24 23:02:06 EDT 2017

# date
Wed May 24 23:03:26 EDT 2017
[root@dhcp-9-70 ~]# date -u
Thu May 25 03:03:29 UTC 2017

# vdsm-tool configure --module libvirt --force

Checking configuration status...

libvirt is already configured for vdsm
SUCCESS: ssl configured to true. No conflicts

Running configure...
Reconfiguration of libvirt is done.
Error:  ServiceOperationError: _systemctlStart failed
Job for libvirtd.service failed because the control process exited with error code. See "systemctl status libvirtd.service" and "journalctl -xe" for details.



May 24 23:04:14 dhcp-9-70.nay.redhat.com libvirtd[21246]: The server certificate /etc/pki/vdsm/certs/vdsmcert.pem is not yet active
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: libvirtd.service: main process exited, code=exited, status=6/NOTCONFIGURED
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Failed to start Virtualization daemon.
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager network restoration.
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Dependency failed for Virtual Desktop Server Manager.
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Dependency failed for MOM instance configured for VDSM purposes.
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Job mom-vdsm.service/start failed with result 'dependency'.
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Job vdsmd.service/start failed with result 'dependency'.
May 24 23:04:14 dhcp-9-70.nay.redhat.com systemd[1]: Job vdsm-network.service/start failed with result 'dependency'.


Host was installed 20 minutes ago.
# date 
Wed May 24 23:18:54 EDT 2017


# openssl verify -CAfile /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/certs/vdsmcert.pem
/etc/pki/vdsm/certs/vdsmcert.pem: CN = VDSM Certificate Authority
error 9 at 1 depth lookup:certificate is not yet valid

# openssl x509 -noout -dates -in /etc/pki/vdsm/certs/vdsmcert.pem 
notBefore=May 25 06:55:49 2017 GMT
notAfter=May 25 06:55:49 2018 GMT

Comment 15 Simone Tiraboschi 2017-05-25 06:39:59 UTC
(In reply to shaochen from comment #14)
> QE can reproduce this issue.

shaochen, could you please check if also in your reproduction, chronyd changed the system clock after cert creation by imgbase-config-vdsm?

You should look for 'systemd: Time has been changed' in systemd journal or on /var/log/messages.

Comment 16 cshao 2017-05-25 06:57:29 UTC
(In reply to Simone Tiraboschi from comment #15)
> (In reply to shaochen from comment #14)
> > QE can reproduce this issue.
> 
> shaochen, could you please check if also in your reproduction, chronyd
> changed the system clock after cert creation by imgbase-config-vdsm?
Yes. 

> You should look for 'systemd: Time has been changed' in systemd journal or
> on /var/log/messages.


May 24 22:55:57 dhcp-9-70 chronyd[892]: System clock wrong by -14400.420517 seconds, adjustment started
May 24 22:55:57 dhcp-9-70 systemd: Time has been changed
May 24 22:55:57 dhcp-9-70 chronyd[892]: System clock was stepped by -14400.420517 seconds

Attach all logs for you reference.

Comment 17 cshao 2017-05-25 06:59:13 UTC
Created attachment 1282136 [details]
all log info

Comment 20 Yihui Zhao 2017-12-08 07:38:54 UTC
1. Cannot reproduce this issue.


Test version:
RHVH-4.1-20171127.0-RHVH-x86_64-dvd1.iso
vdsm-4.19.40-1.el7ev.x86_64
imgbased-0.9.51-0.1.el7ev.noarch


Test steps:
1. Set the hwclock is UTC+
2. Install RHVH via ISO
3. Do not enable nic and network time
4. Continue to finish other installation steps, then reboot
5. Check whether the RTC is default to UTC using "timedatectl"
6. check vdsm certificate using:
#openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -text -noout |grep Not
#openssl verify -CAfile /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/certs/vdsmcert.pem
7. Check libvirtd.service status
8. Enable network and sync time
9. Repeat step6,7
10. Try to deploy HostedEngine via cockpit

Test Results:
1. After step 5:
[root@dhcp-8-151 certs]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -text -noout |grep Not
            Not Before: Nov 30 12:13:40 2017 GMT
            Not After : Dec  1 12:13:40 2018 GMT
[root@dhcp-8-151 certs]# openssl x509 -in /etc/pki/vdsm/certs/cacert.pem -text -noout |grep Not
            Not Before: Nov 30 12:13:40 2017 GMT
            Not After : Dec  1 12:13:40 2018 GMT

[root@dhcp-8-151 certs]# openssl verify -CAfile /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/certs/vdsmcert.pem
/etc/pki/vdsm/certs/vdsmcert.pem: OK


2. After step 9:
[root@dhcp-8-151 certs]# openssl x509 -in /etc/pki/vdsm/certs/vdsmcert.pem -text -noout |grep Not
            Not Before: Nov 30 12:13:40 2017 GMT
            Not After : Dec  1 12:13:40 2018 GMT
[root@dhcp-8-151 certs]# openssl x509 -in /etc/pki/vdsm/certs/cacert.pem -text -noout |grep Not
            Not Before: Nov 30 12:13:40 2017 GMT
            Not After : Dec  1 12:13:40 2018 GMT

[root@dhcp-8-151 certs]# openssl verify -CAfile /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/certs/vdsmcert.pem
/etc/pki/vdsm/certs/vdsmcert.pem: OK


3. After step10,
Deploy HE successfully.


Due to the result of the step5,9,10, so this issue cannot reproduce, and change the bug's status to verified.

Comment 22 Yihui Zhao 2018-04-24 03:38:22 UTC
According to Comment 20, moving to verified.

Comment 25 errata-xmlrpc 2018-05-15 17:57:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:1524

Comment 26 Franta Kust 2019-05-16 13:03:01 UTC
BZ<2>Jira Resync