Description of problem: On rebooting a host added to hyperconverged environment, the host is marked Non-responsive in the UI due to failure starting vdsm services. [root@tendrl26 ~]# service vdsmd status Redirecting to /bin/systemctl status vdsmd.service ● vdsmd.service - Virtual Desktop Server Manager Loaded: loaded (/usr/lib/systemd/system/vdsmd.service; enabled; vendor preset: enabled) Active: inactive (dead) Oct 16 16:07:44 tendrl26.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Se...r. Oct 16 16:07:44 tendrl26.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resu...'. Oct 16 16:07:54 tendrl26.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Se...r. Oct 16 16:07:54 tendrl26.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resu...'. Oct 16 16:08:05 tendrl26.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Se...r. Oct 16 16:08:05 tendrl26.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resu...'. Oct 16 16:08:16 tendrl26.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Se...r. Oct 16 16:08:16 tendrl26.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resu...'. Oct 16 16:08:27 tendrl26.lab.eng.blr.redhat.com systemd[1]: Dependency failed for Virtual Desktop Se...r. Oct 16 16:08:27 tendrl26.lab.eng.blr.redhat.com systemd[1]: Job vdsmd.service/start failed with resu...'. Hint: Some lines were ellipsized, use -l to show in full. [root@tendrl26 ~]# vdsm-tool configure --force Checking configuration status... abrt is already configured for vdsm lvm is configured for vdsm libvirt is already configured for vdsm SUCCESS: ssl configured to true. No conflicts Manual override for multipath.conf detected - preserving current configuration This manual override for multipath.conf was based on downrevved template. You are strongly advised to contact your support representatives Running configure... Reconfiguration of abrt is done. Reconfiguration of passwd is done. Reconfiguration of libvirt is done. Traceback (most recent call last): File "/usr/bin/vdsm-tool", line 219, in main return tool_command[cmd]["command"](*args) File "/usr/lib/python2.7/site-packages/vdsm/tool/__init__.py", line 38, in wrapper func(*args, **kwargs) File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 141, in configure _configure(c) File "/usr/lib/python2.7/site-packages/vdsm/tool/configurator.py", line 88, in _configure getattr(module, 'configure', lambda: None)() File "/usr/lib/python2.7/site-packages/vdsm/tool/configurators/bond_defaults.py", line 37, in configure sysfs_options_mapper.dump_bonding_options() File "/usr/lib/python2.7/site-packages/vdsm/network/link/bond/sysfs_options_mapper.py", line 46, in dump_bonding_options with open(sysfs_options.BONDING_DEFAULTS, 'w') as f: IOError: [Errno 2] No such file or directory: '/var/run/vdsm/bonding-defaults.json' Version-Release number of selected component (if applicable): How reproducible: Sometimes Steps to Reproduce: 1. Reboot a host added to HE (in HC environment)
FYI - this is a RHEL 7.6 host (not a RHV-H host like the earlier reported Bug 1576479) [root@tendrl26 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.6 (Maipo)
I was able to workaround the issue by doing a "yum reinstall vdsm"
Can you provide an estimation on the percentage of this happening?
Could you please check why /var/ryn/vdsm is missing after upgrade?
(In reply to Michael Burman from comment #5) > Could you please check why /var/ryn/vdsm is missing after upgrade? This was not an upgrade. It was a reboot of the server without moving to maintenance mode.
(In reply to Yaniv Lavi from comment #4) > Can you provide an estimation on the percentage of this happening? I would ~30% - seen it 2 out of 6 times. And this is not specific to RHEL 7.6 - we have faced it with RHV-H and RHEL 7.5 as well (see https://bugzilla.redhat.com/show_bug.cgi?id=1576479#c20)
We never saw it on RHV QE
(In reply to Michael Burman from comment #5) > Could you please check why /var/ryn/vdsm is missing after upgrade? /var/run directory is being populated by systemd.tmpfs, but this couldn't be started due to dependency issues: Oct 16 15:16:14 tendrl26 systemd: Found ordering cycle on sysinit.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on systemd-tmpfiles-setup.service/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on local-fs.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on gluster_bricks-engine.mount/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on vdo.service/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on basic.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on sockets.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on iscsiuio.socket/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on sysinit.target/start Oct 16 15:16:14 tendrl26 systemd: Breaking ordering cycle by deleting job systemd-tmpfiles-setup.service/start But no idea which of those dependencies is wrong, we probably need some systemd expert to figure that out
Sahina, one of the dependencies mentioned above is gluster_bricks-engine-mount. As Michael mentioned this issue was never observed by RHV QE, so can't this be a reason of the failure?
(In reply to Martin Perina from comment #10) > Sahina, one of the dependencies mentioned above is > gluster_bricks-engine-mount. As Michael mentioned this issue was never > observed by RHV QE, so can't this be a reason of the failure? Possible. the brick mount has the following entry in /etc/fstab : /dev/vg_sda3/gluster_lv_engine /gluster_bricks/engine xfs inode64,noatime,nodiratime,x-systemd.requires=vdo.service 0 0 Could this be similar to Bug 1552242 Setting info on Dennis
Removing blocker as this is not consistent and seen only with VDO volume in the stack
*** Bug 1576479 has been marked as a duplicate of this bug. ***
The dependent bug 1630788 is already verified as the problem is resolved with changes in mount options of the filesystem in /etc/fstab @Sahina, based on the above reasoning, could you move this bug to ON_QA, so that it could be verified
This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1.
Tested with the cockpit-ovirt-dashboard-0.12.4 and RHV 4.3.2 For the bricks/XFS filesystems created on the VDO volume, has an entry in /etc/fstab with special mount options - "_netdev,x-system.device-timeout=0", which solves this problem
This bugzilla is included in oVirt 4.3.1 release, published on February 28th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.