Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1884599

Summary: Hosted-engine deploy is failing with error "Unable to access credentials /etc/pki/vdsm/libvirt-vnc/ca-cert.pem"
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: ovirt-ansible-collectionAssignee: Asaf Rachmani <arachman>
Status: CLOSED ERRATA QA Contact: Nikolai Sednev <nsednev>
Severity: medium Docs Contact:
Priority: low    
Version: 4.4.1CC: lsurette, mavital
Target Milestone: ovirt-4.4.3-1Keywords: Triaged, ZStream
Target Release: 4.4.3   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: ovirt-ansible-collection-1.2.2 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-11-24 13:13:12 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Integration RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description nijin ashok 2020-10-02 12:12:36 UTC
Description of problem:

An ovirt-hosted-engine-cleanup was executed after a failed deployment and then the deployment is always failing with the error below while installing the HostedEngineLocal VM.

====
[ ERROR ] fatal: [localhost]: FAILED! => {"changed": true, "cmd": ["virt-install", "-n", "HostedEngineLocal", "--os-variant", "rhel8.0", "--virt-type", "kvm", "--memory", "2786", "--vcpus", "4", "--network", "network=default,mac=56:6f:88:58:00:00,model=virtio", "--disk", .....
 "msg": "non-zero return code", "rc": 1, "start": "2020-10-02 12:07:26.297519", "stderr": "ERROR    internal error: process exited while connecting to monitor: 2020-10-02T12:07:39.063524Z qemu-kvm: -object tls-creds-x509,id=vnc-tls-creds0,dir=/etc/pki/vdsm/libvirt-vnc,endpoint=server,verify-peer=no: Unable to access credentials /etc/pki/vdsm/libvirt-vnc/ca-cert.pem: No such file or directory\nDomain installation does not appear to have been successful......... --connect qemu:///system start HostedEngineLocal", "otherwise, please restart your installation."], "stdout": "\nStarting install...", "stdout_lines": ["", "Starting install..."]}
====

The issue is because the cleanup script is not removing the below entries from the file "/etc/libvirt/qemu.conf".

===
tail -4 /etc/libvirt/qemu.conf
## beginning of configuration section for VNC encryption
vnc_tls=1
vnc_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-vnc"
## end of configuration section for VNC encryption
===

The above entries are added if the cluster has "VNC encryption" enabled. The default cluster don't have "VNC encryption" enabled by default. However any new cluster which is created will be having "vnc encryption" by default. So the issue is only reproducible if we provide custom DC and cluster names.

The initial cleanup play is configured to cleanup data between beginning and end of "configuration section by vdsm-4.40.0".

===
  - name: Drop vdsm config statements
    command: >-
      sed -i
      '/## beginning of configuration section by
      vdsm-4.[0-9]\+.[0-9]\+/,/## end of configuration section by vdsm-4.[0-9]\+.[0-9]\+/d' {{ item }}
    environment: "{{ he_cmd_lang }}"
    args:
      warn: false
    with_items:
      - /etc/libvirt/libvirtd.conf
      - /etc/libvirt/qemu.conf
      - /etc/libvirt/qemu-sanlock.conf
      - /etc/sysconfig/libvirtd
===

However, the VNC encryption configuration which is added in qemu.conf is after "end of configuration section by vdsm".

===
grep -A 17 "beginning of configuration section by vdsm" /etc/libvirt/qemu.conf
## beginning of configuration section by vdsm-4.40.0
dynamic_ownership=1
group="qemu"
lock_manager="sanlock"
max_core="unlimited"
migrate_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-migrate"
remote_display_port_max=6923
remote_display_port_min=5900
save_image_format="gzip"
spice_tls=1
spice_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-spice"
user="qemu"
## end of configuration section by vdsm-4.40.0
## beginning of configuration section for VNC encryption
vnc_tls=1
vnc_tls_x509_cert_dir="/etc/pki/vdsm/libvirt-vnc"
## end of configuration section for VNC encryption
===

The user has to manually comment those lines for the deployment to proceed.


Version-Release number of selected component (if applicable):

ovirt-hosted-engine-setup-2.4.5-1.el8ev.noarch
vdsm-4.40.22-1.el8ev.x86_64

How reproducible:

100%

Steps to Reproduce:

1. During the hosted-engine deploy, provide custom name for DC and cluster.
2. Cancel the deployment once the host is added to the hosted-engine local VM.
3. Run ovirt-hosted-engine-cleanup.
4. Try to deploy the hosted-engine again and it will fail with the above mentioned error.

Actual results:

Hosted-engine deploy is failing with error "Unable to access credentials /etc/pki/vdsm/libvirt-vnc/ca-cert.pem" while it starts the engine VM

Expected results:

User should be able to re-deploy without changing the configuration files manually.

Additional info:

Comment 6 Nikolai Sednev 2020-11-17 13:26:58 UTC
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Configure libvirt firewalld zone]
[ INFO  ] changed: [localhost]
[ INFO  ] TASK [ovirt.ovirt.hosted_engine_setup : Add host]
[ INFO  ] changed: [localhost]
[ INFO  ] skipping: [localhost]
^C

/usr/sbin/ovirt-hosted-engine-cleanup

This will de-configure the host to run ovirt-hosted-engine-setup from scratch. 
Caution, this operation should be used with care.

Are you sure you want to proceed? [y/n]
y
  -=== Destroy hosted-engine VM ===- 
error: failed to get domain 'HostedEngine'

  -=== Stop HA services ===- 
  -=== Shutdown sanlock ===- 
shutdown force 1 wait 0
shutdown done 0
  -=== Disconnecting the hosted-engine storage domain ===- 
  -=== De-configure VDSM networks ===- 
ovirtmgmt
ovirtmgmt
ovirtmgmt
 A previously configured management bridge has been found on the system, this will try to de-configure it. Under certain circumstances you can loose network connection. 
Caution, this operation should be used with care.

Are you sure you want to proceed? [y/n]
y
  -=== Stop other services ===- 
Warning: Stopping libvirtd.service, but it can still be activated by:
  libvirtd-ro.socket
  libvirtd-admin.socket
  libvirtd.socket
  -=== De-configure external daemons ===- 
Removing database file /var/lib/vdsm/storage/managedvolume.db
  -=== Removing configuration files ===- 
? /etc/init/libvirtd.conf already missing
- removing /etc/libvirt/nwfilter/vdsm-no-mac-spoofing.xml
? /etc/ovirt-hosted-engine/answers.conf already missing
- removing /etc/ovirt-hosted-engine/hosted-engine.conf
- removing /etc/vdsm/vdsm.conf
- removing /etc/pki/vdsm/certs/cacert.pem
- removing /etc/pki/vdsm/certs/vdsmcert.pem
- removing /etc/pki/vdsm/keys/vdsmkey.pem
- removing /etc/pki/vdsm/libvirt-migrate/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-migrate/server-cert.pem
- removing /etc/pki/vdsm/libvirt-migrate/server-key.pem
- removing /etc/pki/vdsm/libvirt-spice/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-cert.pem
- removing /etc/pki/vdsm/libvirt-spice/server-key.pem
- removing /etc/pki/vdsm/libvirt-vnc/ca-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-cert.pem
- removing /etc/pki/vdsm/libvirt-vnc/server-key.pem
- removing /etc/pki/CA/cacert.pem
- removing /etc/pki/libvirt/clientcert.pem
- removing /etc/pki/libvirt/private/clientkey.pem
? /etc/pki/ovirt-vmconsole/*.pem already missing
- removing /var/cache/libvirt/qemu
? /var/run/ovirt-hosted-engine-ha/* already missing
- removing /var/tmp/localvm5lfi_hfp
  -=== Removing IP Rules ===- 

cat /etc/libvirt/qemu.conf
http://pastebin.test.redhat.com/918769

Then I successfully redeployed on the same host:
[ INFO  ] Stage: Termination
[ INFO  ] Hosted Engine successfully deployed

Tested with:
rhvm-4.4.3.10-0.1.el8ev.noarch
ovirt-ansible-collection-1.2.2-1.el8ev.noarch
ansible-2.9.14-1.el8ae.noarch
ovirt-hosted-engine-setup-2.4.8-1.el8ev.noarch
ovirt-hosted-engine-ha-2.4.5-1.el8ev.noarch
rhvm-appliance-4.4-20201111.0.el8ev.x86_64
Linux 4.18.0-240.4.1.el8_3.x86_64 #1 SMP Wed Nov 11 08:19:41 EST 2020 x86_64 x86_64 x86_64 GNU/Linux
Red Hat Enterprise Linux release 8.3 (Ootpa)

Comment 10 errata-xmlrpc 2020-11-24 13:13:12 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (RHV Engine and Host Common Packages 4.4.z [ovirt-4.4.3] 0-day), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:5216

Comment 11 meital avital 2022-08-04 11:15:45 UTC
Due to QE capacity, we are not going to cover this issue in our automation