Bug 1568414

Summary: missing lvm filter causing "nodectl check" to fail to verify thinprovisioned local lv metadata
Product: Red Hat Enterprise Virtualization Manager Reporter: Marian Jankular <mjankula>
Component: imgbasedAssignee: Ryan Barry <rbarry>
Status: CLOSED ERRATA QA Contact: Yaning Wang <yaniwang>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.6CC: cshao, dfediuck, eheftman, frank.toth, fsun, huzhao, jiaczhan, mgoldboi, mjankula, qiyuan, rbarry, weiwang, yaniwang, ycui, yzhao
Target Milestone: ovirt-4.2.4Keywords: ZStream
Target Release: ---Flags: lsvaty: testing_plan_complete-
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: imgbased-1.0.18 Doc Type: Bug Fix
Doc Text:
Previously, if systems were configured to skip Logical Volume Manager (LVM) clusters, imgbased sees output that is unrelated to the Logical Volumes that are being queried. As a result, imgbased failed to parse the output, causing Red Hat Virtualization Host updates to fail. In this release imgbased now ignores output from skipped clusters enabling imgbased LVM commands to return successfully.
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 10:04:47 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: Node RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Marian Jankular 2018-04-17 13:03:36 UTC
Description of problem:
missing lvm filter causing "nodectl check" to fail to verify thinprovisioned local lv metadata

Version-Release number of selected component (if applicable):
imgbased-0.9.47-0.1.el7ev.noarch                                                                                                                                                                                                                                               
ovirt-node-ng-nodectl-4.1.5-0.20170810.0.el7.noarch                                                                                                                                                                                                                            
redhat-release-virtualization-host-4.1-6.0.el7.x86_64                                                                                                                                                                                                                          
redhat-release-virtualization-host-content-4.1-6.0.el7.x86_64                                                                                                                                                                                                                  
redhat-virtualization-host-image-update-placeholder-4.1-6.0.el7.noarch  

How reproducible:
did not tried to reproduced yet

Steps to Reproduce:
1. instal rhvh 4.1.6 add it to manager
2. reboot the host so guest lvs get activated
3. run "nodectl check"

Actual results:
Status: FAILED
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... FAILED - It looks like the LVM layout is not correct. The reason could be an incorrect installation.
  Checking from thin metadata ... ERROR
    Exception in '<function <lambda> at 0x136e398>': AssertionError()


Expected results:
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK
vdsmd ... OK

Additional info:

Comment 1 Ryan Barry 2018-04-17 13:59:39 UTC
imgbased always filters LVs. See https://gerrit.ovirt.org/#/c/74720/

`nodectl check` is essentially a wrapper around `nodectl check`, except is also checks service status for vdsm.

I do not have a reproducer for this bug, and have never encountered it. I suspect it may be something to do with the clustered LV.

It is not reasonable to specify a global LVM filter for all RHVH installations, and we do not currently modify any files shipped by platform. The question is why LVM is returning an error code. Duplicate LV names? Duplicate UUIDs?

Can the customer remove the filter and try:

vgs -vvvv --noheadings --select lv_tags=imgbased:pool -o lv_full_name; echo $?

There will be a lot of output.

----------------------------------------------------------------------

Unrelated to this (but related to comments in the case), 'nodectl init' should only be run at install time. It is not triggered dyring upgrades. Bad things (TM) will happen if it is executed on a running/configured system.

Essentially, 'nodectl init' looks at a "bare" install (LVM thinpool, no snapshots for RHVH), and creates the required layout/tagging. In this case, it would essentially:

Tag volumes with:
imgbased:pool
imgbased:root
imgbased:lv
# Create a new LV based on the NVR of the image
# copy the contents of imgbased:root into that LV, and configure the bootloader

On upgrades, this is handled by `imgbase update --format liveimg ...`, which can be seeing in the RPM %post scripts for new images.

Comment 2 jiachen zhang 2018-04-26 06:54:06 UTC
I test this bug with the version:RHVH-4.1-20171002.0-RHVH-x86_64-dvd1.iso
Step: 
1. Install RHVH via Anaconda with ISCSI machine
2. Add rhvh to rhevm , creat iscsi storage
3. Creat VM with ISCSI storage successfully.
4. Reboot the host.

However,when I run the command "nodectl check", the status is OK.
I cannot reproduce the bug.
Are there any other steps for reproducing this bug?

Comment 3 Frank Toth 2018-04-26 07:27:19 UTC
Hi,

I have this issue on the latest image (redhat-virtualization-host-image-update-4.1-20180410.1.el7_5.noarch) which uses ovirt-node-ng-nodectl-4.1.5-0.20170810.0.el7.noarch

The issue happens when I have VM running on the node and the VM has a disk which contains clustered LVM. The 'nodectl info' command throws the folloing error:

# nodectl info
Traceback (most recent call last):
  File "/usr/lib64/python2.7/runpy.py", line 162, in _run_module_as_main
    "__main__", fname, loader, pkg_name)
  File "/usr/lib64/python2.7/runpy.py", line 72, in _run_code
    exec code in run_globals
  File "/usr/lib/python2.7/site-packages/nodectl/__main__.py", line 42, in <module>
    CliApplication()
  File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 200, in CliApplication
    return cmdmap.command(args)
  File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 118, in command
    return self.commands[command](**kwargs)
  File "/usr/lib/python2.7/site-packages/nodectl/__init__.py", line 76, in info
    Info(self.imgbased, self.machine).write()
  File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 45, in __init__
    self._fetch_information()
  File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 49, in _fetch_information
    self._get_layout()
  File "/usr/lib/python2.7/site-packages/nodectl/info.py", line 66, in _get_layout
    layout = LayoutParser(self.app.imgbase.layout()).parse()
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 154, in layout
    return self.naming.layout()
  File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 109, in layout
    tree = self.tree(lvs)
  File "/usr/lib/python2.7/site-packages/imgbased/naming.py", line 205, in tree
    names = datasource()
  File "/usr/lib/python2.7/site-packages/imgbased/imgbase.py", line 99, in list_our_lv_names
    lvs = LVM.list_lvs(filtr=filtr)
  File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 63, in list_lvs
    lvs = [cls.LV.from_lvm_name(n) for n in cls._list_lv_full_names(filtr)]
  File "/usr/lib/python2.7/site-packages/imgbased/lvm.py", line 56, in _list_lv_full_names
    raw = LVM._lvs(cmd)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 374, in lvs
    return self.call(["lvs"] + args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 453, in call
    return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 368, in call
    stdout = call(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/imgbased/utils.py", line 147, in call
    return subprocess.check_output(*args, **kwargs).strip()
  File "/usr/lib64/python2.7/subprocess.py", line 575, in check_output
    raise CalledProcessError(retcode, cmd, output=output)
subprocess.CalledProcessError: Command '['lvs', '--noheadings', '-o', 'lv_full_name', '--select', 'lv_tags = imgbased:base || lv_tags = imgbased:layer']' returned non-zero exit status 5

Also 'nodectl check' throws an exception:

# nodectl check
Status: FAILED
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... FAILED - It looks like the LVM layout is not correct. The reason could be an incorrect installation.
  Checking from thin metadata ... ERROR
    Exception in '<function <lambda> at 0x7f9eae6d70c8>': AssertionError()
vdsmd ... OK

At login the motd show the node status is degraded.

Once the VM with the volume cointains the clustered LVM migrated to somewhere else all commands works as expected.

In presence of any clustered VG the actual command works properly and output looks like:

# lvs --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer'
  Couldn't find device with uuid YBcwSi-NrfX-xSfP-F82r-jA6v-JMGV-pXyDe7.
  Skipping clustered volume group vg_cluster
  Skipping clustered volume group vg_cluster
  rhvh_rhevh0109/rhvh-4.1-0.20171002.0
  rhvh_rhevh0109/rhvh-4.1-0.20171002.0+1
  rhvh_rhevh0109/rhvh-4.1-0.20180410.0
  rhvh_rhevh0109/rhvh-4.1-0.20180410.0+1

But the return code is 5 instead of 0

Otherwise the output is the following:

# lvs --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer'
  rhvh_rhevh0102/rhvh-4.1-0.20171002.0
  rhvh_rhevh0102/rhvh-4.1-0.20171002.0+1
  rhvh_rhevh0102/rhvh-4.1-0.20180410.0
  rhvh_rhevh0102/rhvh-4.1-0.20180410.0+1

Any chance to fix this before 4.2.5?

Something simple like this:

# lvs --ignoreskippedcluster --noheadings -o lv_full_name --select 'lv_tags = imgbased:base || lv_tags = imgbased:layer'

Couldn't find device with uuid YBcwSi-NrfX-xSfP-F82r-jA6v-JMGV-pXyDe7.
rhvh_rhevh0109/rhvh-4.1-0.20171002.0
rhvh_rhevh0109/rhvh-4.1-0.20171002.0+1
rhvh_rhevh0109/rhvh-4.1-0.20180410.0
rhvh_rhevh0109/rhvh-4.1-0.20180410.0+1

# echo $?
0

Comment 4 Frank Toth 2018-04-26 08:31:31 UTC
The following patch for /usr/lib/python2.7/site-packages/imgbased/lvm.py fix the issue and won't cause problems if there are any clustered LVM used by the node itself:

53c53
<         cmd = ["--noheadings", "-o", "lv_full_name"]
---
>         cmd = ["--noheadings", "--ignoreskippedcluster", "-o", "lv_full_name"]
55c55
<             cmd = ["--noheadings", "-o", "lv_full_name", "--select", filtr]
---
>             cmd = ["--noheadings", "--ignoreskippedcluster", "-o", "lv_full_name", "--select", filtr]
108c108
<             vgs = LVM._vgs(["--noheadings", "--select",
---
>             vgs = LVM._vgs(["--noheadings", "--ignoreskippedcluster", "--select",
128c128
<             return LVM._vgs(["--noheadings", "-ovg_tags",
---
>             return LVM._vgs(["--noheadings", "--ignoreskippedcluster", "-ovg_tags",
143c143
<             return LVM._lvs(["--noheadings", "-olv_path", self.lvm_name])
---
>             return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-olv_path", self.lvm_name])
147c147
<             return LVM._lvs(["--noheadings", "-osize", "--units", "B",
---
>             return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-osize", "--units", "B",
177c177
<             lvs = LVM._vgs(["--noheadings", "@%s" % tag,
---
>             lvs = LVM._vgs(["--noheadings", "--ignoreskippedcluster", "@%s" % tag,
204c204
<             data = LVM._lvs(["--noheadings", "-ovg_name,lv_name", path])
---
>             data = LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-ovg_name,lv_name", path])
243c243
<             pool_lv = LVM._lvs(["--noheadings", "-opool_lv",
---
>             pool_lv = LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-opool_lv",
258c258
<             return LVM._lvs(["--noheadings", "-olv_tags",
---
>             return LVM._lvs(["--noheadings", "--ignoreskippedcluster", "-olv_tags",
267a268
> 		   "--ignoreskippedcluster",
307c308
<             args = ["--noheadings", "--nosuffix", "--units", "m",
---
>             args = ["--noheadings", "--ignoreskippedcluster", "--nosuffix", "--units", "m",

Maybe it can be used just as a workaround and requires more attention but it solves my problem for now.

Comment 13 errata-xmlrpc 2018-06-27 10:04:47 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2018:2079

Comment 14 Franta Kust 2019-05-16 13:05:08 UTC
BZ<2>Jira Resync

Comment 15 Daniel Gur 2019-08-28 13:12:37 UTC
sync2jira

Comment 16 Daniel Gur 2019-08-28 13:16:49 UTC
sync2jira