Bug 1630788
Summary: | Host goes non-responsive post reboot, as /var/run/vdsm directory is missing | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | bipin <bshetty> | |
Component: | rhhi | Assignee: | Sachidananda Urs <surs> | |
Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> | |
Severity: | high | Docs Contact: | ||
Priority: | high | |||
Version: | rhhiv-1.5 | CC: | amureini, bugs, danken, derez, ebenahar, guillaume.pavese, mflanner, nsoffer, rcyriac, rhs-bugs, sabose, sankarshan, sasundar, tnisan | |
Target Milestone: | --- | Keywords: | Reopened, ZStream | |
Target Release: | RHHI-V 1.5.z Async | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | gdeploy-2.0.2-31 | Doc Type: | Known Issue | |
Doc Text: |
Cause: File system is not mounted when fstab entries have reference to vdo service when a host is rebooted
Consequence: gluster bricks and vdsm services cannot be started
Workaround (if any): Update fstab entries for devices using vdo as "_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service"
Result: Filesystem is started on reboot
|
Story Points: | --- | |
Clone Of: | 1576479 | |||
: | 1654584 (view as bug list) | Environment: | ||
Last Closed: | 2019-05-20 04:54:29 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | Storage | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | 1576479, 1654584 | |||
Bug Blocks: | 1639667 |
Comment 1
SATHEESARAN
2018-09-24 06:33:49 UTC
(In reply to SATHEESARAN from comment #1) > Comment0 carries some information from the dependent bug. Here is the issue > observed with this bug. > > When the RHVH node is rebooted, sometimes, /var/run/vdsm directory is > missing, which leaves the host non-responsive. creating /run/vdsm is the first thing vdsm does when started, see: https://github.com/oVirt/vdsm/blob/ece859806fb531492e1ac54d11fc78f0b5d33e1c/init/vdsmd_init_common.sh.in#L209 As part of ExecStartPre - see: https://github.com/oVirt/vdsm/blob/master/static/usr/lib/systemd/system/vdsmd.service.in ovirt-imageio-daemon.service is *not* enabled - and it is started by vdsm using: Wants=mom-vdsm.service ovirt-imageio-daemon.service abrtd.service \ dev-hugepages1G.mount libvirt-guests.service kdump.service # systemctl status ovirt-imageio-daemon ● ovirt-imageio-daemon.service - oVirt ImageIO Daemon Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service; disabled; vendor preset: disabled) ... Is it possible that ovirt-imageio-daemon.service is enabled by mistake on RHHI? (In reply to Nir Soffer from comment #2) > (In reply to SATHEESARAN from comment #1) > > Comment0 carries some information from the dependent bug. Here is the issue > > observed with this bug. > > > > When the RHVH node is rebooted, sometimes, /var/run/vdsm directory is > > missing, which leaves the host non-responsive. > > creating /run/vdsm is the first thing vdsm does when started, see: > https://github.com/oVirt/vdsm/blob/ece859806fb531492e1ac54d11fc78f0b5d33e1c/ > init/vdsmd_init_common.sh.in#L209 > > As part of ExecStartPre - see: > https://github.com/oVirt/vdsm/blob/master/static/usr/lib/systemd/system/ > vdsmd.service.in > > ovirt-imageio-daemon.service is *not* enabled - and it is started by vdsm > using: > > Wants=mom-vdsm.service ovirt-imageio-daemon.service abrtd.service \ > dev-hugepages1G.mount libvirt-guests.service kdump.service > > # systemctl status ovirt-imageio-daemon > ● ovirt-imageio-daemon.service - oVirt ImageIO Daemon > Loaded: loaded (/usr/lib/systemd/system/ovirt-imageio-daemon.service; > disabled; vendor preset: disabled) > ... > > Is it possible that ovirt-imageio-daemon.service is enabled by mistake on > RHHI? To install RHHI, we install RHV-H , deploy Hosted Engine, and add the nodes to RHV-M. There's no additional step done unless selecting ovirt-image-io service during engine-setup enables the daemon on the nodes? (In reply to Sahina Bose from comment #3) The daemon should not be enabled by anything. Maybe you replace the certificates during deploy or upgrade? this may try to restart the daemon. But in this flow /run/vdsm must exists. It can help if you can reproduce the issue without RHHI, with a host connected to normal engine. (In reply to Nir Soffer from comment #4) > (In reply to Sahina Bose from comment #3) > The daemon should not be enabled by anything. Maybe you replace the > certificates > during deploy or upgrade? this may try to restart the daemon. But in this > flow > /run/vdsm must exists. No - we do not. > > It can help if you can reproduce the issue without RHHI, with a host > connected to > normal engine. We donot have a non-RHHI setup to reproduce. Raz, can you help? Has RHV QE encountered this error on HE deployments? (In reply to Sahina Bose from comment #5) > (In reply to Nir Soffer from comment #4) > > (In reply to Sahina Bose from comment #3) > > The daemon should not be enabled by anything. Maybe you replace the > > certificates > > during deploy or upgrade? this may try to restart the daemon. But in this > > flow > > /run/vdsm must exists. > > No - we do not. > > > > > It can help if you can reproduce the issue without RHHI, with a host > > connected to > > normal engine. > > We donot have a non-RHHI setup to reproduce. Raz, can you help? Has RHV QE > encountered this error on HE deployments? Seems like this is not happening on non-RHHI HE deployment. So far, from the replies I got, no one saw it Sahina, Denis Keefe has come up with the workaround in https://bugzilla.redhat.com/show_bug.cgi?id=1639667#c18. I have tested it and it worked. After reboot, there were /var/run/vdsm directory was intact. Should this be called as the known_issue now ? (In reply to SATHEESARAN from comment #7) > Sahina, > > Denis Keefe has come up with the workaround in > https://bugzilla.redhat.com/show_bug.cgi?id=1639667#c18. I have tested it > and it worked. After reboot, there were /var/run/vdsm directory was intact. > > Should this be called as the known_issue now ? Yes, I've updated the doc_text. The issue has been addressed in this bug: https://bugzilla.redhat.com/show_bug.cgi?id=1654584#c2 The dependent bug is ON_QA Tested with gdeploy-2.0.2-31.el7rhgs Additional mount options (_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service) are updated with /etc/fstab for XFS filesystems ( gluster bricks ) created on top of VDO volumes <snip> /dev/gluster_vg_sdb/gluster_lv_engine /gluster_bricks/engine xfs inode64,noatime,nodiratime 0 0 /dev/gluster_vg_sdc/gluster_lv_data /gluster_bricks/data xfs inode64,noatime,nodiratime,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0 /dev/gluster_vg_sdc/gluster_lv_vmstore /gluster_bricks/vmstore xfs inode64,noatime,nodiratime,_netdev,x-systemd.device-timeout=0,x-systemd.requires=vdo.service 0 0 /dev/gluster_vg_sdd/gluster_lv_newvol /gluster_bricks/newvol xfs inode64,noatime,nodiratime 0 0 </snip> |