Bug 1859876 - imgbase check failed after register to engine
Summary: imgbase check failed after register to engine
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Virtualization Manager
Classification: Red Hat
Component: vdsm
Version: 4.4.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ovirt-4.4.2
: 4.4.2
Assignee: Amit Bawer
QA Contact: cshao
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-23 08:53 UTC by cshao
Modified: 2020-10-05 13:10 UTC (History)
25 users (show)

Fixed In Version: 4.40.26
Doc Type: No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-10-05 13:09:40 UTC
oVirt Team: Storage
Target Upstream Version:


Attachments (Terms of Use)
RHVH log (576.73 KB, application/gzip)
2020-07-23 08:53 UTC, cshao
no flags Details
new-vdsm-test (37.93 KB, text/plain)
2020-07-29 03:58 UTC, cshao
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 5428651 None None None 2020-09-23 16:36:30 UTC
Red Hat Product Errata RHSA-2020:4172 None None None 2020-10-05 13:10:23 UTC
oVirt gerrit 110512 master MERGED tool: Use multipath force reload option (-r) for config changes 2020-10-20 06:44:44 UTC
oVirt gerrit 110583 master ABANDONED tool: Use multipath force reload option (-r) for config changes 2020-10-20 06:44:44 UTC
oVirt gerrit 110757 master MERGED tools: Start and reconfigure multipathd 2020-10-20 06:44:44 UTC

Description cshao 2020-07-23 08:53:18 UTC
Created attachment 1702184 [details]
RHVH log

Description of problem:
imgbase check failed after register to engine

# imgbase check
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/site-packages/imgbased/__main__.py", line 53, in <module>
    CliApplication()
  File "/usr/lib/python3.6/site-packages/imgbased/__init__.py", line 82, in CliApplication
    app.hooks.emit("post-arg-parse", args)
  File "/usr/lib/python3.6/site-packages/imgbased/hooks.py", line 120, in emit
    cb(self.context, *args)
  File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 183, in post_argparse
    run_check(app)
  File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 220, in run_check
    status = Health(app).status()
  File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 353, in status
    status.results.append(group().run())
  File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 380, in check_thin
    pool = self.app.imgbase._thinpool()
  File "/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 125, in _thinpool
    return LVM.Thinpool.from_tag(self.thinpool_tag)
  File "/usr/lib/python3.6/site-packages/imgbased/lvm.py", line 231, in from_tag



# imgbase w 
2020-07-23 08:41:22,457 [ERROR] (MainThread) The root volume does not look like an image
Traceback (most recent call last):
  File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main
    "__main__", mod_spec)
  File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code
    exec(code, run_globals)
  File "/usr/lib/python3.6/site-packages/imgbased/__main__.py", line 53, in <module>
    CliApplication()
  File "/usr/lib/python3.6/site-packages/imgbased/__init__.py", line 82, in CliApplication
    app.hooks.emit("post-arg-parse", args)
  File "/usr/lib/python3.6/site-packages/imgbased/hooks.py", line 120, in emit
    cb(self.context, *args)
  File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 164, in post_argparse
    msg = "You are on %s" % app.imgbase.current_layer()
  File "/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 409, in current_layer
    return self.image_from_path(lv)
  File "/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 166, in image_from_path
    name = LVM.LV.from_path(path).lv_name
  File "/usr/lib/python3.6/site-packages/imgbased/lvm.py", line 251, in from_path
    "-ovg_name,lv_name", path])
  File "/usr/lib/python3.6/site-packages/imgbased/utils.py", line 330, in lvs
    return self.call(["lvs"] + args, **kwargs)
  File "/usr/lib/python3.6/site-packages/imgbased/utils.py", line 421, in call
    return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs)
  File "/usr/lib/python3.6/site-packages/imgbased/utils.py", line 324, in call
    stdout = command.call(*args, **kwargs)
  File "/usr/lib/python3.6/site-packages/imgbased/command.py", line 14, in call
    return subprocess.check_output(*args, **kwargs).strip()
  File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output
    **kwargs).stdout
  File "/usr/lib64/python3.6/subprocess.py", line 438, in run
    output=stdout, stderr=stderr)
subprocess.CalledProcessError: Command '['lvs', '--noheadings', '--ignoreskippedcluster', '-ovg_name,lv_name', '/dev/mapper/rhvh-rhvh--4.4.1.1--0.20200722.0+1']' returned non-zero exit status 5.


# lvs
# pvs
# vgs
# 



Version-Release number of selected component (if applicable):
redhat-virtualization-host-4.4.1-20200722.0.el8_2
imgbased-1.2.10-1.el8ev.noarch

How reproducible:
100%

Steps to Reproduce:
1. Install RHVH-4.4-20200722.1-RHVH-x86_64-dvd1.iso
2. Register RHVH to engine.
3. Run "imgbase check"

Actual results:
imgbase check failed after register to engine.

Expected results:
imgbase check pass after register to engine.

Additional info:
No such issue before register to engine.

Comment 2 Qin Yuan 2020-07-23 14:26:49 UTC
Checked on the host which encountered the issue, the filter in /etc/lvm/lvm.conf is filter = ["a|^/dev/mapper/mpatha2$|", "r|.*|"], but there is no /dev/mapper/mpatha2:

[root@hp-dl385g8-03 ~]# ls -al /dev/mapper/
total 0
drwxr-xr-x.  2 root root     380 Jul 23 08:33 .
drwxr-xr-x. 23 root root    3720 Jul 23 09:10 ..
lrwxrwxrwx.  1 root root       7 Jul 23 08:33 36005076300810b3e00000000000002a5 -> ../dm-0
lrwxrwxrwx.  1 root root       7 Jul 23 08:33 36005076300810b3e00000000000002a5p1 -> ../dm-1
lrwxrwxrwx.  1 root root       7 Jul 23 08:33 36005076300810b3e00000000000002a5p2 -> ../dm-2
lrwxrwxrwx.  1 root root       8 Jul 23 08:33 3600508b1001ccb4a1de53313beabbd82 -> ../dm-15
crw-------.  1 root root 10, 236 Jul 23 08:19 control
lrwxrwxrwx.  1 root root       8 Jul 23 08:19 rhvh-home -> ../dm-14
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-pool00 -> ../dm-8
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-pool00_tdata -> ../dm-4
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-pool00_tmeta -> ../dm-3
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-pool00-tpool -> ../dm-5
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-rhvh--4.4.1.1--0.20200722.0+1 -> ../dm-6
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-swap -> ../dm-7
lrwxrwxrwx.  1 root root       8 Jul 23 08:19 rhvh-tmp -> ../dm-13
lrwxrwxrwx.  1 root root       8 Jul 23 08:19 rhvh-var -> ../dm-12
lrwxrwxrwx.  1 root root       8 Jul 23 08:19 rhvh-var_crash -> ../dm-11
lrwxrwxrwx.  1 root root       8 Jul 23 08:19 rhvh-var_log -> ../dm-10
lrwxrwxrwx.  1 root root       7 Jul 23 08:19 rhvh-var_log_audit -> ../dm-9

I changed the filter to filter = ["a|^/dev/mapper/36005076300810b3e00000000000002a5p2$|", "a|^/dev/mapper/36005076300810b3e00000000000002a5p1$|", "r|.*|"], lvs cmd and imgbase check work again.


In /var/log/messages:

Jul 23 08:18:48 hp-dl385g8-03 systemd[1]: Starting Device-Mapper Multipath Device Controller...
Jul 23 08:18:48 hp-dl385g8-03 systemd[1]: Started Device-Mapper Multipath Device Controller.
Jul 23 08:18:49 hp-dl385g8-03 multipathd[742]: mpatha: load table [0 629145600 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 8:16 1]
Jul 23 08:18:49 hp-dl385g8-03 multipathd[742]: sdb [8:16]: path added to devmap mpatha
Jul 23 08:18:57 hp-dl385g8-03 systemd[1]: Stopping Device-Mapper Multipath Device Controller...
Jul 23 08:18:57 hp-dl385g8-03 systemd[1]: Stopped Device-Mapper Multipath Device Controller.
...
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting oVirt ImageIO Daemon...
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Virtual Desktop Server Manager network IP+link restoration...
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Shared Storage Lease Manager...
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Watchdog Multiplexing Daemon...
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Reached target System Time Synchronized.
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Device-Mapper Multipath Device Controller...
Jul 23 08:33:30 hp-dl385g8-03 multipathd[26944]: 3600508b1001ccb4a1de53313beabbd82: load table [0 1172058032 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 1 8:0 1]
Jul 23 08:33:30 hp-dl385g8-03 multipathd[26944]: 36005076300810b3e00000000000002a5: rename mpatha to 36005076300810b3e00000000000002a5
Jul 23 08:33:30 hp-dl385g8-03 multipathd[26944]: 36005076300810b3e00000000000002a5: load table [0 629145600 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:16 1 service-time 0 1 1 8:32 1]
Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Started Device-Mapper Multipath Device Controller.

As you can see, multipathd was started again during registering to engine, and mpatha was renamed to 36005076300810b3e00000000000002a5.

Comment 4 Sandro Bonazzola 2020-07-27 07:25:14 UTC
Not blocking 4.4.1 on this because this seems to be reproducible only on the host used for the test.

Comment 6 cshao 2020-07-27 14:29:26 UTC
Can reproduce this issue on other multipath machines as well. Seems this bug only effect multipath machine. No function impact, create VMs can succeed.

Comment 9 Amit Bawer 2020-07-28 11:43:04 UTC
Hi Ben

We are considering changes for multipathd configuration related for 
device naming mismatching shown here

Currenly our multipathd configurator replaces the "/etc/multipath.conf" with
the vdsm one [1].

Then it flushes multipath, assuming already running on system, with

multipath -F

and reloads:
systemctl reload multipathd

We would like to switch to restart and then flush:

systemctl restart multipathd
multipath -F

This in purpose to switch from /dev/mapper/mpath{X} naming into /dev/mapper/{WWN}
and to prevent mix-ups in lvm filtering setup coming next.

Will it ado?


[1] https://github.com/oVirt/vdsm/blob/09b18879aee8c2faa3e4a4d29ec966848c750915/lib/vdsm/tool/configurators/multipath.py#L77

Comment 10 Nir Soffer 2020-07-28 12:00:17 UTC
Ben, adding more context for comment 9.

Before we configure this host, the root file system is using:
/dev/mapper/mpatha2

(based on comment 0 - if we have lvm filter specifying /dev/mapper/mapatha2,
it means /dev/mapper/mpatha2 was mounted when we created the lvm filter)

I don't know why the host is using multipath device. This may be a local
device that should be blacklisted, or maybe this host is booting from 
SAN.

This is probably caused by default multipath configuration, using 
use_friendly_names = yes. We disable this since it is cannot work with
RHV shared storage.

So after we configure the host, we expect all /dev/mapper/mpath{X} to be 
replaced with /dev/mapper/{WWN}. Does it require reboot of the host or
restarting multipath is good enough?

Comment 11 Nir Soffer 2020-07-28 12:14:09 UTC
cshao, can you explain what do you mean by "multiipath machine"?

We need to understand why this host is using multipath device
for the root file system. This is valid only if the host is booting
from SAN.

If /dev/mapper/maptha is a local disk, it must be blacklisted 
by adding multipath drop-in configuration.

The way to configure is:

$ udevadm info /sys/block/sda | egrep "ID_SERIAL=|WWN="
E: ID_SERIAL=Generic-_SD_MMC_20120501030900000-0:0

In this case this blacklist should work:

$ cat /etc/multipath/conf.d/99-local.conf
blacklist {
    wwid "Generic-_SD_MMC_20120501030900000-0:0"
}

In your case and based on the info in comment 3, I think this whould work:

$ cat /etc/multipath/conf.d/99-local.conf
blacklist {
    wwid "3600508b1001ccb4a1de53313beabbd82"
}

This must be done before installing or upgrading RHV.

Does it resolve the issue?

Comment 12 Ben Marzinski 2020-07-28 17:28:08 UTC
(In reply to Nir Soffer from comment #10)
> Ben, adding more context for comment 9.
> 
> Before we configure this host, the root file system is using:
> /dev/mapper/mpatha2
> 
> (based on comment 0 - if we have lvm filter specifying /dev/mapper/mapatha2,
> it means /dev/mapper/mpatha2 was mounted when we created the lvm filter)
> 
> I don't know why the host is using multipath device. This may be a local
> device that should be blacklisted, or maybe this host is booting from 
> SAN.
> 
> This is probably caused by default multipath configuration, using 
> use_friendly_names = yes. We disable this since it is cannot work with
> RHV shared storage.
> 
> So after we configure the host, we expect all /dev/mapper/mpath{X} to be 
> replaced with /dev/mapper/{WWN}. Does it require reboot of the host or
> restarting multipath is good enough?

You shouldn't even need to do that.

# systemctl reload multipathd.service

should update the configuration of the running multipathd instance, and rename the devices, even if they are in use.

Comment 13 cshao 2020-07-29 03:55:59 UTC
(In reply to Nir Soffer from comment #11)
> cshao, can you explain what do you mean by "multiipath machine"?
> 
> We need to understand why this host is using multipath device
> for the root file system. This is valid only if the host is booting
> from SAN.
> 
Hi,
I means the multipath is from SAN, and the host is booting from SAN.


> If /dev/mapper/maptha is a local disk, it must be blacklisted 
The reproduced machine have 2 disks(the one is FC SAN, other is local disk), I just install RHVH on FC lun, and I didn't used the local disk. so I think /dev/mapper/maptha is not local disk.

> by adding multipath drop-in configuration.
> 
> The way to configure is:
> 
> $ udevadm info /sys/block/sda | egrep "ID_SERIAL=|WWN="
> E: ID_SERIAL=Generic-_SD_MMC_20120501030900000-0:0
> 
> In this case this blacklist should work:
> 
> $ cat /etc/multipath/conf.d/99-local.conf
> blacklist {
>     wwid "Generic-_SD_MMC_20120501030900000-0:0"
> }
> 
> In your case and based on the info in comment 3, I think this whould work:
> 
> $ cat /etc/multipath/conf.d/99-local.conf
> blacklist {
>     wwid "3600508b1001ccb4a1de53313beabbd82"
> }
> 
> This must be done before installing or upgrading RHV.
> 
> Does it resolve the issue?

Comment 15 cshao 2020-07-29 03:58:22 UTC
Created attachment 1702744 [details]
new-vdsm-test

Comment 17 Amit Bawer 2020-07-29 18:29:23 UTC
(In reply to Ben Marzinski from comment #12)
> (In reply to Nir Soffer from comment #10)
> > Ben, adding more context for comment 9.
> > 
> > Before we configure this host, the root file system is using:
> > /dev/mapper/mpatha2
> > 
> > (based on comment 0 - if we have lvm filter specifying /dev/mapper/mapatha2,
> > it means /dev/mapper/mpatha2 was mounted when we created the lvm filter)
> > 
> > I don't know why the host is using multipath device. This may be a local
> > device that should be blacklisted, or maybe this host is booting from 
> > SAN.
> > 
> > This is probably caused by default multipath configuration, using 
> > use_friendly_names = yes. We disable this since it is cannot work with
> > RHV shared storage.
> > 
> > So after we configure the host, we expect all /dev/mapper/mpath{X} to be 
> > replaced with /dev/mapper/{WWN}. Does it require reboot of the host or
> > restarting multipath is good enough?
> 
> You shouldn't even need to do that.
> 
> # systemctl reload multipathd.service
> 
> should update the configuration of the running multipathd instance, and
> rename the devices, even if they are in use.

In case we use restart multipathd instead of
reloading it post config changes,
is flushing it (multipath -F) still required?

Comment 18 Ben Marzinski 2020-07-30 17:06:26 UTC
(In reply to Amit Bawer from comment #17)

> In case we use restart multipathd instead of
> reloading it post config changes,
> is flushing it (multipath -F) still required?

Actually, this is an ugly corner case in mulipathd that needs fixing. When you start multipathd, if a device is supposed to have a new name, it will get renamed. However if its name changed and also some other part of its configuration changed, only the name change will take effect. So yes, simply running "service multipathd restart" will work to rename a device, but it won't immediately pick up other configuration changes if the device got renamed.  You could remove the devices before restarting multipathd if you know that they won't be in use.  Or you could simply reload them after restarting multipathd, with either

multipath -r

or

service multipathd reload

This issue only happens when starting multipathd. At all other times, when a device is updated, both name and configuration changes will be applied.

Comment 27 Amit Bawer 2020-08-06 16:58:41 UTC
Fix was reverted due to OST issues, this is pending re-fix after advisory regarding multipath config.
more likely to 4.4.3 if 4.4.2 is closing up this week.

Comment 28 Amit Bawer 2020-08-09 15:30:43 UTC
Ben,
Would need your advisory about "multipath -r" usage, also sent details in mail with same subject. 
Thanks

Comment 47 cshao 2020-09-15 10:37:24 UTC
Test version:
redhat-virtualization-host-4.4.2-20200915.0.el8_2
Engine: 4.4.2-6
vdsm-4.40.26.3-1.el8ev.x86_64

Test steps:
1. Install redhat-virtualization-host-4.4.2-20200915.0.el8_2 on multipath machine.
2. Register to engine
3. imgbase check, multipath server check.

Test result:
imgbase check - pass

# imgbase check
Status: OK
Bootloader ... OK
  Layer boot entries ... OK
  Valid boot entries ... OK
Mount points ... OK
  Separate /var ... OK
  Discard is used ... OK
Basic storage ... OK
  Initialized VG ... OK
  Initialized Thin Pool ... OK
  Initialized LVs ... OK
Thin storage ... OK
  Checking available space in thinpool ... OK
  Checking thinpool auto-extend ... OK

# systemctl status multipathd
● multipathd.service - Device-Mapper Multipath Device Controller
   Loaded: loaded (/usr/lib/systemd/system/multipathd.service; disabled; vendor preset: disabled)
   Active: active (running) since Tue 2020-09-15 10:03:57 UTC; 31min ago
 Main PID: 29424 (multipathd)
   Status: "up"
    Tasks: 7
   Memory: 12.5M
   CGroup: /system.slice/multipathd.service
           └─29424 /sbin/multipathd -d -s

Sep 15 10:03:56 hp-dl385g8-03.lab.eng.pek2.redhat.com systemd[1]: Starting Device-Mapper Multipath Device Controller...


# pvs
  PV                                              VG   Fmt  Attr PSize    PFree 
  /dev/mapper/36005076300810b3e00000000000002a5p2 rhvh lvm2 a--  <299.00g 58.01g


So the bug is fixed, change bug status to VERIFIED.

Comment 49 errata-xmlrpc 2020-10-05 13:09:40 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Important: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:4172


Note You need to log in before you can comment on or make changes to this bug.