Bug 1634742 - HE cleanup code is not cleaning libvirt.qemu.conf correctly and HE can't be redeployed
Summary: HE cleanup code is not cleaning libvirt.qemu.conf correctly and HE can't be r...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: ovirt-hosted-engine-setup
Classification: oVirt
Component: Tools
Version: 2.2.24
Hardware: Unspecified
OS: Unspecified
high
medium with 1 vote
Target Milestone: ovirt-4.4.1
: 2.4.5
Assignee: Asaf Rachmani
QA Contact: Nikolai Sednev
URL:
Whiteboard:
: 1817880 1832517 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-10-01 13:55 UTC by Denis Chaplygin
Modified: 2020-09-03 09:59 UTC (History)
12 users (show)

Fixed In Version: ovirt-hosted-engine-setup-2.4.5
Doc Type: Bug Fix
Doc Text:
Previously, if you decided to redeploy RHV Manager as a hosted engine, running the `ovirt-hosted-engine-cleanup` command did not clean up the `/etc/libvirt/qemu.conf` file correctly. Then, the hosted engine redeployment failed to restart the libvirtd service because `libvirtd-tls.socket` remained active. The current release fixes this issue. You can run the cleanup tool and redeploy the Manager as a hosted engine.
Clone Of:
Environment:
Last Closed: 2020-07-08 08:27:33 UTC
oVirt Team: Integration
Embargoed:
nsednev: needinfo-
sbonazzo: ovirt-4.4?
sbonazzo: planning_ack?
sbonazzo: devel_ack+
sbonazzo: testing_ack?


Attachments (Terms of Use)
sosreport from puma18 (6.82 MB, application/x-xz)
2020-06-22 12:13 UTC, Nikolai Sednev
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 95689 0 'None' MERGED ansible: Remove duplication of vnc_tls option 2021-01-13 23:09:37 UTC
oVirt gerrit 109131 0 master MERGED cleanup: Stop libvirtd-tls.socket service in HE cleanup 2021-01-13 23:09:37 UTC

Description Denis Chaplygin 2018-10-01 13:55:26 UTC
Description of problem: IF you decide to redeploy engine and call ovirt-hosted-engine-cleanup, /etc/libvirt/qemu.conf file will not be cleaned correctly, leaving those lines:

vnc_tls=1
vnc_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-vnc"

That will fail future redeployments


How reproducible: Always


Steps to Reproduce:
1. Deploy HE
2. Undeploy it using ovirt-hosted-engine-cleanup
3. Try to deploy again

Actual results: Deployment will fail with:

Failed to start VNC server: Unable to access credentials /etc/pki/vdsm/libvirt-vnc/ca-cert.pem: No such file or directory"


Expected results:

Deployment succeeds

Additional info:

The issue comes from here: https://gerrit.ovirt.org/#/c/93777/4/packaging/playbooks/roles/ovirt-host-deploy-vnc-certificates/tasks/main.yml

We needs those changes to have prefix and suffix, so cleanup code will be able to find them and remove the from the confgituration file. IT could be achieved by using https://docs.ansible.com/ansible/2.6/modules/blockinfile_module.html for example 

Both ansible role and cleanup should be fixed.

Comment 1 Sandro Bonazzola 2019-01-21 08:28:50 UTC
re-targeting to 4.3.1 since this BZ has not been proposed as blocker for 4.3.0.
If you think this bug should block 4.3.0 please re-target and set blocker flag.

Comment 2 Sandro Bonazzola 2019-02-18 07:55:00 UTC
Moving to 4.3.2 not being identified as blocker for 4.3.1.

Comment 3 vaneamihailov@mail.ru 2019-08-17 12:15:52 UTC
Hi 

Just faced with same issue. To solve it I did:

vdsm-tool configure --force

cp /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/libvirt-spice/ca-cert.pem
cp /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/vdsm/libvirt-spice/server-key.pem
cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-spice/server-cert.pem

cp /etc/pki/vdsm/certs/cacert.pem /etc/pki/vdsm/libvirt-vnc/ca-cert.pem
cp /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/vdsm/libvirt-vnc/server-key.pem
cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/vdsm/libvirt-vnc/server-cert.pem

cp /etc/pki/vdsm/certs/vdsmcert.pem /etc/pki/libvirt/clientcert.pem
cp /etc/pki/vdsm/keys/vdsmkey.pem /etc/pki/libvirt/private/clientkey.pem
cp /etc/pki/vdsm/certs/cacert.pem /etc/pki/CA/cacert.pem


chown root:kvm /etc/pki/vdsm/libvirt-spice/ca-cert.pem /etc/pki/vdsm/libvirt-vnc/ca-cert.pem /etc/pki/libvirt/clientcert.pem
chown vdsm:kvm /etc/pki/vdsm/libvirt-spice/server-key.pem /etc/pki/vdsm/libvirt-spice/server-cert.pem /etc/pki/vdsm/libvirt-vnc/server-key.pem /etc/pki/vdsm/libvirt-vnc/server-cert.pem /etc/pki/libvirt/private/clientkey.pem /etc/pki/CA/cacert.pem
chmod 644 /etc/pki/vdsm/libvirt-spice/ca-cert.pem /etc/pki/vdsm/libvirt-vnc/ca-cert.pem /etc/pki/libvirt/clientcert.pem
chmod 440  /etc/pki/vdsm/libvirt-spice/server-key.pem /etc/pki/vdsm/libvirt-spice/server-cert.pem /etc/pki/vdsm/libvirt-vnc/server-key.pem /etc/pki/vdsm/libvirt-vnc/server-cert.pem /etc/pki/libvirt/private/clientkey.pem /etc/pki/CA/cacert.pem

systemctl restart vdsmd libvirtd

And after that I could install Engine again

Comment 4 the.benparry 2020-01-03 16:25:32 UTC
Still seeing the same error in 4.3.5.  Fixed it by removing the lines identified at the top of the bug from file /etc/libvirt/qemu.conf

Comment 5 Sandro Bonazzola 2020-01-08 13:29:48 UTC
The offending section should be marked with a comment due to the fix pushed long time ago in https://gerrit.ovirt.org/#/c/95689/9/packaging/playbooks/roles/ovirt-host-deploy-vnc-certificates/tasks/main.yml

The section should be removed during a re-deploy with https://github.com/oVirt/ovirt-ansible-hosted-engine-setup/blob/master/tasks/initial_clean.yml#L10

The bug should be reproducible only if the offending line was added to the configuration file before the first fix was introduced.
Can you confirm that your qemu.conf file wasn't showing the marker "## {mark} configuration section by vdsm-{{ host_deploy_vdsm_version }}" ?

Comment 6 Sandro Bonazzola 2020-01-08 15:51:48 UTC
This has been reproduced on oVirt 4.4.
The offending line was marked with the mentioned comment even though is was the actual version.
After the re-deployment the section got removed.
Between the firsts Deployment and redeployment were 3 days in between.

Comment 7 Sandro Bonazzola 2020-05-14 08:42:49 UTC
*** Bug 1832517 has been marked as a duplicate of this bug. ***

Comment 8 Asaf Rachmani 2020-05-18 10:03:32 UTC
Tried to reproduce it.
/etc/libvirt/qemu.conf file contains the following commented lines after cleanup:
#vnc_tls = 1
#vnc_tls_x509_cert_dir = "/etc/pki/libvirt-vnc"


The HE redeploy fails on:
TASK [ovirt.hosted_engine_setup : Start libvirt]******************************
:
    "msg": "Unable to start service libvirtd: Job for libvirtd.service failed because the control process exited with error code.\nSee \"systemctl status libvirtd.service\" and \"journalctl -xe\" for details.\n"


From 'journalctl -u libvirtd':

May 18 08:47:13 44rc2 systemd[1]: Stopped Virtualization daemon.
May 18 08:49:24 44rc2 systemd[1]: Starting Virtualization daemon...
May 18 08:49:25 44rc2 libvirtd[2409]: libvirt version: 5.6.0, package: 10.el8 (CBS <cbs>, 2020-02-27-01:09:46, )
May 18 08:49:25 44rc2 libvirtd[2409]: hostname: 44rc2
May 18 08:49:25 44rc2 libvirtd[2409]: Cannot read CA certificate '/etc/pki/CA/cacert.pem': No such file or directory
May 18 08:49:25 44rc2 systemd[1]: libvirtd.service: Main process exited, code=exited, status=6/NOTCONFIGURED
May 18 08:49:25 44rc2 systemd[1]: libvirtd.service: Failed with result 'exit-code'.
May 18 08:49:25 44rc2 systemd[1]: Failed to start Virtualization daemon.
May 18 08:49:25 44rc2 systemd[1]: libvirtd.service: Service RestartSec=100ms expired, scheduling restart.
May 18 08:49:25 44rc2 systemd[1]: libvirtd.service: Scheduled restart job, restart counter is at 1.
May 18 08:49:25 44rc2 systemd[1]: Stopped Virtualization daemon.
May 18 08:49:25 44rc2 systemd[1]: Starting Virtualization daemon...
May 18 08:49:25 44rc2 libvirtd[2430]: libvirt version: 5.6.0, package: 10.el8 (CBS <cbs>, 2020-02-27-01:09:46, )
May 18 08:49:25 44rc2 libvirtd[2430]: hostname: 44rc2

Comment 9 Evgeny Slutsky 2020-06-01 08:02:40 UTC
*** Bug 1817880 has been marked as a duplicate of this bug. ***

Comment 10 Nikolai Sednev 2020-06-21 08:41:40 UTC
QA still has ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch.rpm from 2020-06-04.

Comment 11 Nikolai Sednev 2020-06-22 12:05:04 UTC
cat /etc/libvirt/qemu.conf |grep vnc_tls=1
puma18 ~]# cat /etc/libvirt/qemu.conf |grep vnc_tls_x509_cert_dir
#vnc_tls_x509_cert_dir = "/etc/pki/libvirt-vnc"
# ca-cert.pem certificate signed by the CA in the vnc_tls_x509_cert_dir

Lines either erased or commented after using "ovirt-hosted-engine-cleanup".
Redeployment has failed with:
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": false, "msg": "Unable to start service libvirtd: Job for libvirtd.service failed because the control process exited with error code.\nSee \"systemctl status libvirtd.service\" and \"journalctl -xe\" for details.\n"}
[ ERROR ] Failed to execute stage 'Closing up': Failed executing ansible-playbook
[ ERROR ] Hosted Engine deployment failed: please check the logs for the issue, fix accordingly or re-deploy from scratch.
          Log file is located at /var/log/ovirt-hosted-engine-setup/ovirt-hosted-engine-setup-20200622145346-nmwunv.log


Tested on:
ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.3-1.el8ev.noarch
rhvm-appliance-4.4-20200604.0.el8ev.x86_64
Red Hat Enterprise Linux release 8.2 (Ootpa)
Linux 4.18.0-193.10.1.el8_2.x86_64 #1 SMP Fri Jun 19 15:31:45 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux

Comment 12 Nikolai Sednev 2020-06-22 12:10:37 UTC
After unsuccessful redeployment I see these:
puma18 ~]# cat  /etc/libvirt/qemu.conf |grep vnc_tls
#vnc_tls = 1
# If the path is not provided, but vnc_tls = 1, then the
#vnc_tls_x509_cert_dir = "/etc/pki/libvirt-vnc"
#vnc_tls_x509_secret_uuid = "00000000-0000-0000-0000-000000000000"
# ca-cert.pem certificate signed by the CA in the vnc_tls_x509_cert_dir
#vnc_tls_x509_verify = 1
[root@puma18 ~]# cat  /etc/libvirt/qemu.conf |grep vnc_tls_x509_cert_dir
#vnc_tls_x509_cert_dir = "/etc/pki/libvirt-vnc"
# ca-cert.pem certificate signed by the CA in the vnc_tls_x509_cert_dir

Please see logs attached to the bug.

Comment 13 Nikolai Sednev 2020-06-22 12:13:46 UTC
Created attachment 1698287 [details]
sosreport from puma18

Comment 14 Nikolai Sednev 2020-06-22 12:15:52 UTC
Probably this fix did not worked due to the fact that it was not fixed in ovirt-hosted-engine-setup-2.4.4-1.el8ev.noarch
, which is available to QA now. Will have to wait for ovirt-hosted-engine-setup-2.4.5 to retest.

Comment 15 Nikolai Sednev 2020-06-22 16:46:33 UTC
Works just fine with ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch.
Tested on the same components, but with new ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch and redeployment worked fine and got successfully finished.

Comment 16 Sandro Bonazzola 2020-07-08 08:27:33 UTC
This bugzilla is included in oVirt 4.4.1 release, published on July 8th 2020.

Since the problem described in this bug report should be resolved in oVirt 4.4.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.