Comment0 carries some information from the dependent bug. Here is the issue observed with this bug. When the RHVH node is rebooted, sometimes, /var/run/vdsm directory is missing, which leaves the host non-responsive. Workaround for this problem is to manually create dirs. # mkdir /var/run/vdsm # mkdir /var/run/vdsm/trackedInterfaces # chmod 755 /var/run/vdsm # chown -R vdsm:kvm /var/run/vdsm
(In reply to SATHEESARAN from comment #1) > Comment0 carries some information from the dependent bug. Here is the issue > observed with this bug. > > When the RHVH node is rebooted, sometimes, /var/run/vdsm directory is > missing, which leaves the host non-responsive. creating /run/vdsm is the first thing vdsm does when started, see: https://github.com/oVirt/vdsm/blob/ece859806fb531492e1ac54d11fc78f0b5d33e1c/init/vdsmd_init_common.sh.in#L209 As part of ExecStartPre - see: https://github.com/oVirt/vdsm/blob/master/static/usr/lib/systemd/system/vdsmd.service.in ovirt-imageio-daemon.service is *not* enabled - and it is started by vdsm using: Wants=mom-vdsm.service ovirt-imageio-daemon.service abrtd.service \ dev-hugepages1G.mount libvirt-guests.service kdump.service # systemctl status ovirt-imageio-daemon ● ovirt-imageio-daemon.service - oVirt ImageIO Daemon Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service; disabled; vendor preset: disabled) ... Is it possible that ovirt-imageio-daemon.service is enabled by mistake on RHHI?
(In reply to Nir Soffer from comment #2) > (In reply to SATHEESARAN from comment #1) > > Comment0 carries some information from the dependent bug. Here is the issue > > observed with this bug. > > > > When the RHVH node is rebooted, sometimes, /var/run/vdsm directory is > > missing, which leaves the host non-responsive. > > creating /run/vdsm is the first thing vdsm does when started, see: > https://github.com/oVirt/vdsm/blob/ece859806fb531492e1ac54d11fc78f0b5d33e1c/ > init/vdsmd_init_common.sh.in#L209 > > As part of ExecStartPre - see: > https://github.com/oVirt/vdsm/blob/master/static/usr/lib/systemd/system/ > vdsmd.service.in > > ovirt-imageio-daemon.service is *not* enabled - and it is started by vdsm > using: > > Wants=mom-vdsm.service ovirt-imageio-daemon.service abrtd.service \ > dev-hugepages1G.mount libvirt-guests.service kdump.service > > # systemctl status ovirt-imageio-daemon > ● ovirt-imageio-daemon.service - oVirt ImageIO Daemon > Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service; > disabled; vendor preset: disabled) > ... > > Is it possible that ovirt-imageio-daemon.service is enabled by mistake on > RHHI? To install RHHI, we install RHV-H , deploy Hosted Engine, and add the nodes to RHV-M. There's no additional step done unless selecting ovirt-image-io service during engine-setup enables the daemon on the nodes?
(In reply to Sahina Bose from comment #3) The daemon should not be enabled by anything. Maybe you replace the certificates during deploy or upgrade? this may try to restart the daemon. But in this flow /run/vdsm must exists. It can help if you can reproduce the issue without RHHI, with a host connected to normal engine.
(In reply to Nir Soffer from comment #4) > (In reply to Sahina Bose from comment #3) > The daemon should not be enabled by anything. Maybe you replace the > certificates > during deploy or upgrade? this may try to restart the daemon. But in this > flow > /run/vdsm must exists. No - we do not. > > It can help if you can reproduce the issue without RHHI, with a host > connected to > normal engine. We donot have a non-RHHI setup to reproduce. Raz, can you help? Has RHV QE encountered this error on HE deployments?
(In reply to Sahina Bose from comment #5) > (In reply to Nir Soffer from comment #4) > > (In reply to Sahina Bose from comment #3) > > The daemon should not be enabled by anything. Maybe you replace the > > certificates > > during deploy or upgrade? this may try to restart the daemon. But in this > > flow > > /run/vdsm must exists. > > No - we do not. > > > > > It can help if you can reproduce the issue without RHHI, with a host > > connected to > > normal engine. > > We donot have a non-RHHI setup to reproduce. Raz, can you help? Has RHV QE > encountered this error on HE deployments? Seems like this is not happening on non-RHHI HE deployment. So far, from the replies I got, no one saw it
Sahina, Denis Keefe has come up with the workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1639667#c18. I have tested it and it worked. After reboot, there were /var/run/vdsm directory was intact. Should this be called as the known_issue now ?
(In reply to SATHEESARAN from comment #7) > Sahina, > > Denis Keefe has come up with the workaround in > https://bugzilla.redhat.com/show_bug.cgi?id=1639667#c18. I have tested it > and it worked. After reboot, there were /var/run/vdsm directory was intact. > > Should this be called as the known_issue now ? Yes, I've updated the doc_text.
The issue has been addressed in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1654584#c2
The dependent bug is ON_QA
Tested with gdeploy-2.0.2-31.el7rhgs Additional mount options (_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service) are updated with /etc/fstab for XFS filesystems ( gluster bricks ) created on top of VDO volumes <snip> /dev/gluster_vg_sdb/gluster_lv_engine /gluster_bricks/engine xfs inode64,noatime,nodiratime 0 0 /dev/gluster_vg_sdc/gluster_lv_data /gluster_bricks/data xfs inode64,noatime,nodiratime,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0 /dev/gluster_vg_sdc/gluster_lv_vmstore /gluster_bricks/vmstore xfs inode64,noatime,nodiratime,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0 /dev/gluster_vg_sdd/gluster_lv_newvol /gluster_bricks/newvol xfs inode64,noatime,nodiratime 0 0 </snip>