Bug 1878724

Summary: vdsm-tool configure is failing with error "dependency job for libvirtd.service failed"
Product: Red Hat Enterprise Virtualization Manager Reporter: nijin ashok <nashok>
Component: vdsmAssignee: Marcin Sobczyk <msobczyk>
Status: CLOSED ERRATA QA Contact: Pavol Brilla <pbrilla>
Severity: medium Docs Contact:
Priority: medium    
Version: 4.4.1CC: cshao, gdeolive, jortialc, lsurette, mavital, mkalinin, mperina, srevivo, ycui
Target Milestone: ovirt-4.5.0Keywords: TestOnly, ZStream
Target Release: 4.5.0   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2022-05-26 17:22:44 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Infra RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1889363    
Bug Blocks:    

Description nijin ashok 2020-09-14 12:01:03 UTC
Description of problem:

The TLS service for libvirtd is not enabled by default. It is enabled while the host is added to the manager. However, before that if a user starts any service that requires libvirtd (example virt-who), it will spawn libvirtd process as below.

===
# systemctl start virt-who

# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: active (running) since Mon 2020-09-14 10:31:30 UTC; 22s ago

# ps aux|grep libvirtd
root        2234  0.7  1.1 1818924 44472 ?       Ssl  10:31   0:00 /usr/sbin/libvirtd --timeout 120

# systemctl is-enabled libvirtd-tls.socket
disabled

# systemctl status libvirtd-tls.socket
● libvirtd-tls.socket - Libvirt TLS IP socket
   Loaded: loaded (/usr/lib/systemd/system/libvirtd-tls.socket; disabled; vendor preset: disabled)
   Active: inactive (dead)
   Listen: [::]:16514 (Stream)
===

During the vdsm-tool configuration phase, we stop libvirtd service, add the libvirtd-tls.socket as required service and then we will start the libvirtd service.

This is failing while libvitd tries to start the libvirt-tls service.

===
2020-09-14 16:23:40 IST - TASK [ovirt-host-deploy-vdsm : Reconfigure vdsm tool] **************************

"stderr_lines" : [ "Error:  ServiceOperationError: _systemctlStart failed", "b\"A dependency job for libvirtd.service failed. See 'journalctl -xe' for details.\\n\" " ],

# systemctl status libvirtd-tls.socket
● libvirtd-tls.socket - Libvirt TLS IP socket
   Loaded: loaded (/usr/lib/systemd/system/libvirtd-tls.socket; enabled; vendor preset: disabled)
   Active: inactive (dead)
   Listen: [::]:16514 (Stream)

Sep 14 10:54:02 vm249-58.gsslab.pnq2.redhat.com systemd[1]: libvirtd-tls.socket: Socket service libvirtd.service already active, refusing.
Sep 14 10:54:02 vm249-58.gsslab.pnq2.redhat.com systemd[1]: Failed to listen on Libvirt TLS IP socket.
===

The reinstallation will work if a user simply use reinstall again from portal.

I was able to reproduce this issue manually doing what vdsm-tool is doing. The issue is because once vdsm-tool stops libvirtd, it will be automatically started by the libvirtd.socket as virt-who is running. Then when we try to start the libvirtd again with libvirtd-tls.socket, it will fail with the mentioned error as the libvirtd is already active.

- libvirtd socket service is active after installing the host however libvirtd is inactive.

# systemctl status libvirtd.socket
● libvirtd.socket - Libvirt local socket
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.socket; enabled; vendor preset: disabled)
   Active: active (listening) since Mon 2020-09-14 11:39:38 UTC; 2min 36s ago

# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: inactive (dead) since Mon 2020-09-14 11:41:50 UTC; 34s ago


- Started the virt-who which service which started the libvirtd.

# systemctl start virt-who

# ps aux|grep libvirtd
root        3093  2.8  1.1 1818924 46160 ?       Ssl  11:42   0:00 /usr/sbin/libvirtd --timeout 120

- Stopped the libvirtd service but socket started the service again.

# systemctl stop libvirtd
Warning: Stopping libvirtd.service, but it can still be activated by:
  libvirtd-ro.socket
  libvirtd.socket
  libvirtd-admin.socket

# systemctl status libvirtd
● libvirtd.service - Virtualization daemon
   Loaded: loaded (/usr/lib/systemd/system/libvirtd.service; enabled; vendor preset: enabled)
  Drop-In: /etc/systemd/system/libvirtd.service.d
           └─unlimited-core.conf
   Active: active (running) since Mon 2020-09-14 11:43:10 UTC; 4s ago


- Enabled tls service and started the libvirtd which failed with mentioned error.

# ln -s /usr/lib/systemd/system/libvirtd-tls.socket /etc/systemd/system/libvirtd.service.requires/libvirtd-tls.socket ;systemctl daemon-reload

# systemctl start libvirtd
A dependency job for libvirtd.service failed. See 'journalctl -xe' for details.

Sep 14 11:43:31 vm249-58.gsslab.pnq2.redhat.com systemd[1]: Reloading.
Sep 14 11:44:18 vm249-58.gsslab.pnq2.redhat.com systemd[1]: Reloading.
Sep 14 11:44:45 vm249-58.gsslab.pnq2.redhat.com systemd[1]: libvirtd-tls.socket: Socket service libvirtd.service already active, refusing.
Sep 14 11:44:45 vm249-58.gsslab.pnq2.redhat.com systemd[1]: Failed to listen on Libvirt TLS IP socket.


I think for clean shutdown of libvirtd during vdsm-tool configuration phase, we should also stop the libvirtd.socket service.


Version-Release number of selected component (if applicable):

vdsm-4.40.22-1.el8ev.x86_64
libvirt-daemon-6.0.0-25.module+el8.2.1+7154+47ffd890.x86_64
Red Hat Virtualization Host 4.4.1 (el8.2)

How reproducible:

100%

Steps to Reproduce:

1. In a freshly deployed host, start virt-who service before adding the host to manager.

Actual results:


vdsm-tool configure is failing with error "dependency job for libvirtd.service failed"

Expected results:

vdsm-tool configure should work.

Additional info:

Comment 2 Petr Matyáš 2020-10-14 13:57:06 UTC
Using vdsm-4.40.33-1.el8ev.x86_64 this still fails the first time I try to install the host (reinstall passes just as is said in description).
Change linked in this bug is apparently present when checking the changed file on host.
I installed RHEL 8.3, then ovirt-host and virt-who which I started as well as libvirt, then I tried to install the host in an engine which failed on:

    "stdout" : "fatal: [10.37.138.41]: FAILED! => {\"changed\": true, \"cmd\": [\"vdsm-tool\", \"configure\", \"--force\"], \"delta\": \"0:00:46.909863\", \"end\": \"2020-10-14 15:19:29.123406\", \"msg\": \"non-zero return code\", \"rc\": 1, \"start\": \"2020-10-14 15:18:42.213543\", \"stderr\": \"Error:  ServiceOperationError: _systemctlStart failed\\nb'Job for libvirtd.socket failed.\\\\nSee \\\"systemctl status libvirtd.socket\\\" and \\\"journalctl -xe\\\" for details.\\\\n' \", \"stderr_lines\": [\"Error:  ServiceOperationError: _systemctlStart failed\", \"b'Job for libvirtd.socket failed.\\\\nSee \\\"systemctl status libvirtd.socket\\\" and \\\"journalctl -xe\\\" for details.\\\\n' \"], \"stdout\": \"\\nChecking configuration status...\\n\\nWARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration\\nlvm requires configuration\\nlibvirt is not configured for vdsm yet\\nlibvirtd.service doesn't have requirement on libvirtd-tls.socket unit\\nDB file /var/lib/vdsm/storage/managedvolume.db doesn't exists\\nManaged volume database requires configuration\\nabrt is not configured for vdsm\\nmultipath requires configuration\\n\\nRunning configure...\\nReconfiguration of sanlock is done.\\nReconfiguration of passwd is done.\\nReconfiguration of certificates is done.\\nWARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration\\nBacking up /etc/lvm/lvmlocal.conf to /etc/lvm/lvmlocal.conf.202010141519\\nInstalling /usr/share/vdsm/lvmlocal.conf at /etc/lvm/lvmlocal.conf\\nReconfiguration of lvm is done.\\nReconfiguration of libvirt is done.\\nDB file /var/lib/vdsm/storage/managedvolume.db doesn't exists\\nCreating managed volumes database at /var/lib/vdsm/storage/managedvolume.db\\nSetting up ownership of database file to vdsm:kvm\\nReconfiguration of managedvolumedb is done.\\nReconfiguration of bond_defaults is done.\\nReconfiguration of abrt is done.\\nReconfiguration of sebool is done.\\nReconfiguration of multipath is done.\", \"stdout_lines\": [\"\", \"Checking configuration status...\", \"\", \"WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration\", \"lvm requires configuration\", \"libvirt is not configured for vdsm yet\", \"libvirtd.service doesn't have requirement on libvirtd-tls.socket unit\", \"DB file /var/lib/vdsm/storage/managedvolume.db doesn't exists\", \"Managed volume database requires configuration\", \"abrt is not configured for vdsm\", \"multipath requires configuration\", \"\", \"Running configure...\", \"Reconfiguration of sanlock is done.\", \"Reconfiguration of passwd is done.\", \"Reconfiguration of certificates is done.\", \"WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration\", \"Backing up /etc/lvm/lvmlocal.conf to /etc/lvm/lvmlocal.conf.202010141519\", \"Installing /usr/share/vdsm/lvmlocal.conf at /etc/lvm/lvmlocal.conf\", \"Reconfiguration of lvm is done.\", \"Reconfiguration of libvirt is done.\", \"DB file /var/lib/vdsm/storage/managedvolume.db doesn't exists\", \"Creating managed volumes database at /var/lib/vdsm/storage/managedvolume.db\", \"Setting up ownership of database file to vdsm:kvm\", \"Reconfiguration of managedvolumedb is done.\", \"Reconfiguration of bond_defaults is done.\", \"Reconfiguration of abrt is done.\", \"Reconfiguration of sebool is done.\", \"Reconfiguration of multipath is done.\"]}",

Comment 3 Marcin Sobczyk 2020-10-19 10:34:35 UTC
Right, so it turns out that even though virt-who uses the 'libvirtd-ro.socket' [1]
it doesn't require it on a systemd unit level [2]. That means that even if we stop 'libvirtd-ro.socket',
'virt-who.service' will still be running and depending on the implementation anything can really happen.
This has to be fixed on virt-who side first.

Given that, the fact that we also dynamically depend on either 'libvirtd-tcp.socket' or 'libvirt-tls.socket',
so we cannot prevent a similar scenario to happen if someone uses one of these,
and the gentle nature of socket activation I would prefer to revert the patch and leave the things as is.

[1] https://github.com/candlepin/virt-who/blob/4c7fdb032a66e2fe3324cc2d7579101c699e3b00/virtwho/virt/libvirtd/libvirtd.py#L282
[2] https://github.com/candlepin/virt-who/blob/master/virt-who.service

Comment 10 Martin Perina 2022-01-19 12:47:59 UTC
virt-who-1.30.9-1.el8 should contain the fix, no code changes requires on RHV side

Comment 12 Pavol Brilla 2022-05-02 09:46:48 UTC
 "stdout_lines" : [ "", "Checking configuration status...", "", "WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration", "lvm requires configuration", "DB file /var/lib/vdsm/storage/managedvolume.db doesn't exists", "Managed volume database requires configuration", "sanlock user needs groups: qemu, kvm", "multipath requires configuration", "libvirt is not configured for vdsm yet", "libvirtd.service doesn't have requirement on libvirtd-tls.socket unit", "", "Running configure...", "WARNING: LVM local configuration: /etc/lvm/lvmlocal.conf is not based on vdsm configuration", "Previous lvmlocal.conf copied to /etc/lvm/lvmlocal.conf.20220502123424", "Installing /usr/share/vdsm/lvmlocal.conf at /etc/lvm/lvmlocal.conf", "Reconfiguration of lvm is done.", "DB file /var/lib/vdsm/storage/managedvolume.db doesn't exists", "Creating managed volumes database at /var/lib/vdsm/storage/managedvolume.db", "Setting up ownership of database file to vdsm:kvm", "Reconfiguration of managedvolumedb is done.", "Reconfiguration of passwd is done.", "Configuring sanlock user groups", "Configuring sanlock config file", "Previous sanlock.conf copied to /etc/sanlock/sanlock.conf.20220502123424", "Reconfiguration of sanlock is done.", "Reconfiguration of multipath is done.", "Reconfiguration of libvirt is done.", "Reconfiguration of sebool is done.", "Reconfiguration of bond_defaults is done.", "", "Done configuring modules to VDSM." ],
      "stderr_lines" : [ ],


# yum list ovirt-engine
Installed Packages
ovirt-engine.noarch           4.5.0.5-0.7.el8ev

Comment 19 errata-xmlrpc 2022-05-26 17:22:44 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Low: RHV RHEL Host (ovirt-host) [ovirt-4.5.0] security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:4764