Bugzilla will be upgraded to version 5.0. The upgrade date is tentatively scheduled for 2 December 2018, pending final testing and feedback.
Bug 1568414 - missing lvm filter causing "nodectl check" to fail to verify thinprovisioned local lv metadata
missing lvm filter causing "nodectl check" to fail to verify thinprovisioned ...
Status: CLOSED ERRATA
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: imgbased (Show other bugs)
4.1.6
Unspecified Unspecified
unspecified Severity high
: ovirt-4.2.4
: ---
Assigned To: Ryan Barry
Yaning Wang
: ZStream
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2018-04-17 09:03 EDT by Marian Jankular
Modified: 2018-10-05 12:52 EDT (History)
16 users (show)

See Also:
Fixed In Version: imgbased-1.0.18
Doc Type: Bug Fix
Doc Text:
Previously, if systems were configured to skip Logical Volume Manager (LVM) clusters, imgbased sees output that is unrelated to the Logical Volumes that are being queried. As a result, imgbased failed to parse the output, causing Red Hat Virtualization Host updates to fail. In this release imgbased now ignores output from skipped clusters enabling imgbased LVM commands to return successfully.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-06-27 06:04:47 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: Node
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
oVirt gerrit 91977 master MERGED lvm: ignore skipped clusters 2018-06-06 08:04 EDT
oVirt gerrit 92004 ovirt-4.2 MERGED lvm: ignore skipped clusters 2018-06-06 08:04 EDT
Red Hat Product Errata RHSA-2018:2079 None None None 2018-06-27 06:05 EDT

  None (edit)
Description Marian Jankular 2018-04-17 09:03:36 EDT
Description of problem:
missing lvm filter causing "nodectl check" to fail to verify thinprovisioned local lv metadata

Version-Release number of selected component (if applicable):
imgbased-0.9.47-0.1.el7ev.noarch                                                                                                                                                                                                                                               
ovirt-node-ng-nodectl-4.1.5-0.20170810.0.el7.noarch                                                                                                                                                                                                                            
redhat-release-virtualization-host-4.1-6.0.el7.x86_64                                                                                                                                                                                                                          
redhat-release-virtualization-host-content-4.1-6.0.el7.x86_64                                                                                                                                                                                                                  
redhat-virtualization-host-image-update-placeholder-4.1-6.0.el7.noarch  

How reproducible:
did not tried to reproduced yet

Steps to Reproduce:
1. instal rhvh 4.1.6 add it to manager
2. reboot the host so guest lvs get activated
3. run "nodectl check"

Actual results:
Status: FAILED
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... FAILED - It looks like the LVM layout is not correct. The reason could be an incorrect installation.
  Checking from thin metadata ... ERROR
    Exception in '<function <lambda> at 0x136e398>': AssertionError()


Expected results:
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK
vdsmd ... OK

Additional info:
Comment 1 Ryan Barry 2018-04-17 09:59:39 EDT
imgbased always filters LVs. See https://gerrit.ovirt.org/#/c/74720/

`nodectl check` is essentially a wrapper around `nodectl check`, except is also checks service status for vdsm.

I do not have a reproducer for this bug, and have never encountered it. I suspect it may be something to do with the clustered LV.

It is not reasonable to specify a global LVM filter for all RHVH installations, and we do not currently modify any files shipped by platform. The question is why LVM is returning an error code. Duplicate LV names? Duplicate UUIDs?

Can the customer remove the filter and try:

vgs -vvvv --noheadings --select lv_tags=imgbased:pool -o lv_full_name; echo $?

There will be a lot of output.

----------------------------------------------------------------------

Unrelated to this (but related to comments in the case), 'nodectl init' should only be run at install time. It is not triggered dyring upgrades. Bad things (TM) will happen if it is executed on a running/configured system.

Essentially, 'nodectl init' looks at a "bare" install (LVM thinpool, no snapshots for RHVH), and creates the required layout/tagging. In this case, it would essentially:

Tag volumes with:
imgbased:pool
imgbased:root
imgbased:lv
# Create a new LV based on the NVR of the image
# copy the contents of imgbased:root into that LV, and configure the bootloader

On upgrades, this is handled by `imgbase update --format liveimg ...`, which can be seeing in the RPM %post scripts for new images.
Comment 2 jiachen zhang 2018-04-26 02:54:06 EDT
I test this bug with the version:RHVH-4.1-20171002.0-RHVH-x86_64-dvd1.iso
Step: 
1. Install RHVH via Anaconda with ISCSI machine
2. Add rhvh to rhevm , creat iscsi storage
3. Creat VM with ISCSI storage successfully.
4. Reboot the host.

However,when I run the command "nodectl check", the status is OK.
I cannot reproduce the bug.
Are there any other steps for reproducing this bug?
Comment 3 Frank Toth 2018-04-26 03:27:19 EDT
Hi,

I have this issue on the latest image (redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch) which uses ovirt-node-ng-nodectl-4.1.5-0.20170810.0.el7.noarch

The issue happens when I have VM running on the node and the VM has a disk which contains clustered LVM. The 'nodectl info' command throws the folloing error:

# nodectl info
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/nodectl/__main__.py", line 42, in <module>
    CliApplication()
  File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 200, in CliApplication
    return cmdmap.command(args)
  File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 118, in command
    return self.commands[command](**kwargs)
  File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 76, in info
    Info(self.imgbased, self.machine).write()
  File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 45, in __init__
    self._fetch_information()
  File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 49, in _fetch_information
    self._get_layout()
  File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 66, in _get_layout
    layout = LayoutParser(self.app.imgbase.layout()).parse()
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 154, in layout
    return self.naming.layout()
  File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 109, in layout
    tree = self.tree(lvs)
  File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 205, in tree
    names = datasource()
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 99, in list_our_lv_names
    lvs = LVM.list_lvs(filtr=filtr)
  File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 63, in list_lvs
    lvs = [cls.LV.from_lvm_name(n) for n in cls._list_lv_full_names(filtr)]
  File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 56, in _list_lv_full_names
    raw = LVM._lvs(cmd)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 374, in lvs
    return self.call(["lvs"] + args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 453, in call
    return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 368, in call
    stdout = call(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 147, in call
    return subprocess.check_output(*args, **kwargs).strip()
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['lvs', '--noheadings', '-o', 'lv_full_name', '--select', 'lv_tags = imgbased:base || lv_tags = imgbased:layer']' returned non-zero exit status 5

Also 'nodectl check' throws an exception:

# nodectl check
Status: FAILED
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... FAILED - It looks like the LVM layout is not correct. The reason could be an incorrect installation.
  Checking from thin metadata ... ERROR
    Exception in '<function <lambda> at 0x7f9eae6d70c8>': AssertionError()
vdsmd ... OK

At login the motd show the node status is degraded.

Once the VM with the volume cointains the clustered LVM migrated to somewhere else all commands works as expected.

In presence of any clustered VG the actual command works properly and output looks like:

# lvs --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer'
  Couldn't find device with uuid YBcwSi-NrfX-xSfP-F82r-jA6v-JMGV-pXyDe7.
  Skipping clustered volume group vg_cluster
  Skipping clustered volume group vg_cluster
  rhvh_rhevh0109/rhvh-4.1-0.20171002.0
  rhvh_rhevh0109/rhvh-4.1-0.20171002.0+1
  rhvh_rhevh0109/rhvh-4.1-0.20180410.0
  rhvh_rhevh0109/rhvh-4.1-0.20180410.0+1

But the return code is 5 instead of 0

Otherwise the output is the following:

# lvs --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer'
  rhvh_rhevh0102/rhvh-4.1-0.20171002.0
  rhvh_rhevh0102/rhvh-4.1-0.20171002.0+1
  rhvh_rhevh0102/rhvh-4.1-0.20180410.0
  rhvh_rhevh0102/rhvh-4.1-0.20180410.0+1

Any chance to fix this before 4.2.5?

Something simple like this:

# lvs --ignoreskippedcluster --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer'

Couldn't find device with uuid YBcwSi-NrfX-xSfP-F82r-jA6v-JMGV-pXyDe7.
rhvh_rhevh0109/rhvh-4.1-0.20171002.0
rhvh_rhevh0109/rhvh-4.1-0.20171002.0+1
rhvh_rhevh0109/rhvh-4.1-0.20180410.0
rhvh_rhevh0109/rhvh-4.1-0.20180410.0+1

# echo $?
0
Comment 4 Frank Toth 2018-04-26 04:31:31 EDT
The following patch for /usr/lib/python2.7/site-packages/imgbased/lvm.py fix the issue and won't cause problems if there are any clustered LVM used by the node itself:

53c53
<         cmd = ["--noheadings", "-o", "lv_full_name"]
---
>         cmd = ["--noheadings", "--ignoreskippedcluster", "-o", "lv_full_name"]
55c55
<             cmd = ["--noheadings", "-o", "lv_full_name", "--select", filtr]
---
>             cmd = ["--noheadings", "--ignoreskippedcluster", "-o", "lv_full_name", "--select", filtr]
108c108
<             vgs = LVM._vgs(["--noheadings", "--select",
---
>             vgs = LVM._vgs(["--noheadings", "--ignoreskippedcluster", "--select",
128c128
<             return LVM._vgs(["--noheadings", "-ovg_tags",
---
>             return LVM._vgs(["--noheadings", "--ignoreskippedcluster", "-ovg_tags",
143c143
<             return LVM._lvs(["--noheadings", "-olv_path", self.lvm_name])
---
>             return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-olv_path", self.lvm_name])
147c147
<             return LVM._lvs(["--noheadings", "-osize", "--units", "B",
---
>             return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-osize", "--units", "B",
177c177
<             lvs = LVM._vgs(["--noheadings", "@%s" % tag,
---
>             lvs = LVM._vgs(["--noheadings", "--ignoreskippedcluster", "@%s" % tag,
204c204
<             data = LVM._lvs(["--noheadings", "-ovg_name,lv_name", path])
---
>             data = LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-ovg_name,lv_name", path])
243c243
<             pool_lv = LVM._lvs(["--noheadings", "-opool_lv",
---
>             pool_lv = LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-opool_lv",
258c258
<             return LVM._lvs(["--noheadings", "-olv_tags",
---
>             return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-olv_tags",
267a268
> 		   "--ignoreskippedcluster",
307c308
<             args = ["--noheadings", "--nosuffix", "--units", "m",
---
>             args = ["--noheadings", "--ignoreskippedcluster", "--nosuffix", "--units", "m",

Maybe it can be used just as a workaround and requires more attention but it solves my problem for now.
Comment 13 errata-xmlrpc 2018-06-27 06:04:47 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2079

Note You need to log in before you can comment on or make changes to this bug.