Bug 1568414
Summary: | missing lvm filter causing "nodectl check" to fail to verify thinprovisioned local lv metadata | ||
---|---|---|---|
Product: | Red Hat Enterprise Virtualization Manager | Reporter: | Marian Jankular <mjankula> |
Component: | imgbased | Assignee: | Ryan Barry <rbarry> |
Status: | CLOSED ERRATA | QA Contact: | Yaning Wang <yaniwang> |
Severity: | high | Docs Contact: | |
Priority: | unspecified | ||
Version: | 4.1.6 | CC: | cshao, dfediuck, eheftman, frank.toth, fsun, huzhao, jiaczhan, mgoldboi, mjankula, qiyuan, rbarry, weiwang, yaniwang, ycui, yzhao |
Target Milestone: | ovirt-4.2.4 | Keywords: | ZStream |
Target Release: | --- | Flags: | lsvaty:
testing_plan_complete-
|
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | imgbased-1.0.18 | Doc Type: | Bug Fix |
Doc Text: |
Previously, if systems were configured to skip Logical Volume Manager (LVM) clusters, imgbased sees output that is unrelated to the Logical Volumes that are being queried.
As a result, imgbased failed to parse the output, causing Red Hat Virtualization Host updates to fail.
In this release imgbased now ignores output from skipped clusters enabling imgbased LVM commands to return successfully.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2018-06-27 10:04:47 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | Node | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Marian Jankular
2018-04-17 13:03:36 UTC
imgbased always filters LVs. See https://gerrit.ovirt.org/#/c/74720/ `nodectl check` is essentially a wrapper around `nodectl check`, except is also checks service status for vdsm. I do not have a reproducer for this bug, and have never encountered it. I suspect it may be something to do with the clustered LV. It is not reasonable to specify a global LVM filter for all RHVH installations, and we do not currently modify any files shipped by platform. The question is why LVM is returning an error code. Duplicate LV names? Duplicate UUIDs? Can the customer remove the filter and try: vgs -vvvv --noheadings --select lv_tags=imgbased:pool -o lv_full_name; echo $? There will be a lot of output. ---------------------------------------------------------------------- Unrelated to this (but related to comments in the case), 'nodectl init' should only be run at install time. It is not triggered dyring upgrades. Bad things (TM) will happen if it is executed on a running/configured system. Essentially, 'nodectl init' looks at a "bare" install (LVM thinpool, no snapshots for RHVH), and creates the required layout/tagging. In this case, it would essentially: Tag volumes with: imgbased:pool imgbased:root imgbased:lv # Create a new LV based on the NVR of the image # copy the contents of imgbased:root into that LV, and configure the bootloader On upgrades, this is handled by `imgbase update --format liveimg ...`, which can be seeing in the RPM %post scripts for new images. I test this bug with the version:RHVH-4.1-20171002.0-RHVH-x86_64-dvd1.iso Step: 1. Install RHVH via Anaconda with ISCSI machine 2. Add rhvh to rhevm , creat iscsi storage 3. Creat VM with ISCSI storage successfully. 4. Reboot the host. However,when I run the command "nodectl check", the status is OK. I cannot reproduce the bug. Are there any other steps for reproducing this bug? Hi, I have this issue on the latest image (redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch) which uses ovirt-node-ng-nodectl-4.1.5-0.20170810.0.el7.noarch The issue happens when I have VM running on the node and the VM has a disk which contains clustered LVM. The 'nodectl info' command throws the folloing error: # nodectl info Traceback (most recent call last): File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main "__main__", fname, loader, pkg_name) File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code exec code in run_globals File "/usr/lib/python2.7/site-packages/nodectl/__main__.py", line 42, in <module> CliApplication() File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 200, in CliApplication return cmdmap.command(args) File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 118, in command return self.commands[command](**kwargs) File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 76, in info Info(self.imgbased, self.machine).write() File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 45, in __init__ self._fetch_information() File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 49, in _fetch_information self._get_layout() File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 66, in _get_layout layout = LayoutParser(self.app.imgbase.layout()).parse() File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 154, in layout return self.naming.layout() File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 109, in layout tree = self.tree(lvs) File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 205, in tree names = datasource() File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 99, in list_our_lv_names lvs = LVM.list_lvs(filtr=filtr) File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 63, in list_lvs lvs = [cls.LV.from_lvm_name(n) for n in cls._list_lv_full_names(filtr)] File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 56, in _list_lv_full_names raw = LVM._lvs(cmd) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 374, in lvs return self.call(["lvs"] + args, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 453, in call return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 368, in call stdout = call(*args, **kwargs) File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 147, in call return subprocess.check_output(*args, **kwargs).strip() File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output raise CalledProcessError(retcode, cmd, output=output) subprocess.CalledProcessError: Command '['lvs', '--noheadings', '-o', 'lv_full_name', '--select', 'lv_tags = imgbased:base || lv_tags = imgbased:layer']' returned non-zero exit status 5 Also 'nodectl check' throws an exception: # nodectl check Status: FAILED Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... FAILED - It looks like the LVM layout is not correct. The reason could be an incorrect installation. Checking from thin metadata ... ERROR Exception in '<function <lambda> at 0x7f9eae6d70c8>': AssertionError() vdsmd ... OK At login the motd show the node status is degraded. Once the VM with the volume cointains the clustered LVM migrated to somewhere else all commands works as expected. In presence of any clustered VG the actual command works properly and output looks like: # lvs --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer' Couldn't find device with uuid YBcwSi-NrfX-xSfP-F82r-jA6v-JMGV-pXyDe7. Skipping clustered volume group vg_cluster Skipping clustered volume group vg_cluster rhvh_rhevh0109/rhvh-4.1-0.20171002.0 rhvh_rhevh0109/rhvh-4.1-0.20171002.0+1 rhvh_rhevh0109/rhvh-4.1-0.20180410.0 rhvh_rhevh0109/rhvh-4.1-0.20180410.0+1 But the return code is 5 instead of 0 Otherwise the output is the following: # lvs --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer' rhvh_rhevh0102/rhvh-4.1-0.20171002.0 rhvh_rhevh0102/rhvh-4.1-0.20171002.0+1 rhvh_rhevh0102/rhvh-4.1-0.20180410.0 rhvh_rhevh0102/rhvh-4.1-0.20180410.0+1 Any chance to fix this before 4.2.5? Something simple like this: # lvs --ignoreskippedcluster --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer' Couldn't find device with uuid YBcwSi-NrfX-xSfP-F82r-jA6v-JMGV-pXyDe7. rhvh_rhevh0109/rhvh-4.1-0.20171002.0 rhvh_rhevh0109/rhvh-4.1-0.20171002.0+1 rhvh_rhevh0109/rhvh-4.1-0.20180410.0 rhvh_rhevh0109/rhvh-4.1-0.20180410.0+1 # echo $? 0 The following patch for /usr/lib/python2.7/site-packages/imgbased/lvm.py fix the issue and won't cause problems if there are any clustered LVM used by the node itself: 53c53 < cmd = ["--noheadings", "-o", "lv_full_name"] --- > cmd = ["--noheadings", "--ignoreskippedcluster", "-o", "lv_full_name"] 55c55 < cmd = ["--noheadings", "-o", "lv_full_name", "--select", filtr] --- > cmd = ["--noheadings", "--ignoreskippedcluster", "-o", "lv_full_name", "--select", filtr] 108c108 < vgs = LVM._vgs(["--noheadings", "--select", --- > vgs = LVM._vgs(["--noheadings", "--ignoreskippedcluster", "--select", 128c128 < return LVM._vgs(["--noheadings", "-ovg_tags", --- > return LVM._vgs(["--noheadings", "--ignoreskippedcluster", "-ovg_tags", 143c143 < return LVM._lvs(["--noheadings", "-olv_path", self.lvm_name]) --- > return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-olv_path", self.lvm_name]) 147c147 < return LVM._lvs(["--noheadings", "-osize", "--units", "B", --- > return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-osize", "--units", "B", 177c177 < lvs = LVM._vgs(["--noheadings", "@%s" % tag, --- > lvs = LVM._vgs(["--noheadings", "--ignoreskippedcluster", "@%s" % tag, 204c204 < data = LVM._lvs(["--noheadings", "-ovg_name,lv_name", path]) --- > data = LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-ovg_name,lv_name", path]) 243c243 < pool_lv = LVM._lvs(["--noheadings", "-opool_lv", --- > pool_lv = LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-opool_lv", 258c258 < return LVM._lvs(["--noheadings", "-olv_tags", --- > return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-olv_tags", 267a268 > "--ignoreskippedcluster", 307c308 < args = ["--noheadings", "--nosuffix", "--units", "m", --- > args = ["--noheadings", "--ignoreskippedcluster", "--nosuffix", "--units", "m", Maybe it can be used just as a workaround and requires more attention but it solves my problem for now. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2018:2079 BZ<2>Jira Resync sync2jira sync2jira |