Bug 1639667
Summary: | Sometimes host is non-responsive on reboot when gluster bricks are on vdo volumes, due to missing /var/run/vdsm directory | ||
---|---|---|---|
Product: | [oVirt] vdsm | Reporter: | Sahina Bose <sabose> |
Component: | Gluster | Assignee: | Parth Dhanjal <dparth> |
Status: | CLOSED CURRENTRELEASE | QA Contact: | SATHEESARAN <sasundar> |
Severity: | urgent | Docs Contact: | |
Priority: | high | ||
Version: | 4.30.0 | CC: | bugs, dkeefe, godas, guillaume.pavese, mperina, msobczyk, sabose, sasundar |
Target Milestone: | ovirt-4.3.1 | Flags: | rule-engine:
ovirt-4.3+
ylavi: testing_plan_complete? |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2019-03-13 16:39:33 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Gluster | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1630788 | ||
Bug Blocks: |
Description
Sahina Bose
2018-10-16 10:46:08 UTC
FYI - this is a RHEL 7.6 host (not a RHV-H host like the earlier reported Bug 1576479) [root@tendrl26 ~]# cat /etc/redhat-release Red Hat Enterprise Linux Server release 7.6 (Maipo) I was able to workaround the issue by doing a "yum reinstall vdsm" Can you provide an estimation on the percentage of this happening? Could you please check why /var/ryn/vdsm is missing after upgrade? (In reply to Michael Burman from comment #5) > Could you please check why /var/ryn/vdsm is missing after upgrade? This was not an upgrade. It was a reboot of the server without moving to maintenance mode. (In reply to Yaniv Lavi from comment #4) > Can you provide an estimation on the percentage of this happening? I would ~30% - seen it 2 out of 6 times. And this is not specific to RHEL 7.6 - we have faced it with RHV-H and RHEL 7.5 as well (see https://bugzilla.redhat.com/show_bug.cgi?id=1576479#c20) We never saw it on RHV QE (In reply to Michael Burman from comment #5) > Could you please check why /var/ryn/vdsm is missing after upgrade? /var/run directory is being populated by systemd.tmpfs, but this couldn't be started due to dependency issues: Oct 16 15:16:14 tendrl26 systemd: Found ordering cycle on sysinit.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on systemd-tmpfiles-setup.service/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on local-fs.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on gluster_bricks-engine.mount/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on vdo.service/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on basic.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on sockets.target/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on iscsiuio.socket/start Oct 16 15:16:14 tendrl26 systemd: Found dependency on sysinit.target/start Oct 16 15:16:14 tendrl26 systemd: Breaking ordering cycle by deleting job systemd-tmpfiles-setup.service/start But no idea which of those dependencies is wrong, we probably need some systemd expert to figure that out Sahina, one of the dependencies mentioned above is gluster_bricks-engine-mount. As Michael mentioned this issue was never observed by RHV QE, so can't this be a reason of the failure? (In reply to Martin Perina from comment #10) > Sahina, one of the dependencies mentioned above is > gluster_bricks-engine-mount. As Michael mentioned this issue was never > observed by RHV QE, so can't this be a reason of the failure? Possible. the brick mount has the following entry in /etc/fstab : /dev/vg_sda3/gluster_lv_engine /gluster_bricks/engine xfs inode64,noatime,nodiratime,x-systemd.requires=vdo.service 0 0 Could this be similar to Bug 1552242 Setting info on Dennis Removing blocker as this is not consistent and seen only with VDO volume in the stack *** Bug 1576479 has been marked as a duplicate of this bug. *** The dependent bug 1630788 is already verified as the problem is resolved with changes in mount options of the filesystem in /etc/fstab @Sahina, based on the above reasoning, could you move this bug to ON_QA, so that it could be verified This bug has not been marked as blocker for oVirt 4.3.0. Since we are releasing it tomorrow, January 29th, this bug has been re-targeted to 4.3.1. Tested with the cockpit-ovirt-dashboard-0.12.4 and RHV 4.3.2 For the bricks/XFS filesystems created on the VDO volume, has an entry in /etc/fstab with special mount options - "_netdev,x-system.device-timeout=0", which solves this problem This bugzilla is included in oVirt 4.3.1 release, published on February 28th 2019. Since the problem described in this bug report should be resolved in oVirt 4.3.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report. |