Created attachment 1702184 [details] RHVH log Description of problem: imgbase check failed after register to engine # imgbase check Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3.6/site-packages/imgbased/__main__.py", line 53, in <module> CliApplication() File "/usr/lib/python3.6/site-packages/imgbased/__init__.py", line 82, in CliApplication app.hooks.emit("post-arg-parse", args) File "/usr/lib/python3.6/site-packages/imgbased/hooks.py", line 120, in emit cb(self.context, *args) File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 183, in post_argparse run_check(app) File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 220, in run_check status = Health(app).status() File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 353, in status status.results.append(group().run()) File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 380, in check_thin pool = self.app.imgbase._thinpool() File "/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 125, in _thinpool return LVM.Thinpool.from_tag(self.thinpool_tag) File "/usr/lib/python3.6/site-packages/imgbased/lvm.py", line 231, in from_tag # imgbase w 2020-07-23 08:41:22,457 [ERROR] (MainThread) The root volume does not look like an image Traceback (most recent call last): File "/usr/lib64/python3.6/runpy.py", line 193, in _run_module_as_main "__main__", mod_spec) File "/usr/lib64/python3.6/runpy.py", line 85, in _run_code exec(code, run_globals) File "/usr/lib/python3.6/site-packages/imgbased/__main__.py", line 53, in <module> CliApplication() File "/usr/lib/python3.6/site-packages/imgbased/__init__.py", line 82, in CliApplication app.hooks.emit("post-arg-parse", args) File "/usr/lib/python3.6/site-packages/imgbased/hooks.py", line 120, in emit cb(self.context, *args) File "/usr/lib/python3.6/site-packages/imgbased/plugins/core.py", line 164, in post_argparse msg = "You are on %s" % app.imgbase.current_layer() File "/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 409, in current_layer return self.image_from_path(lv) File "/usr/lib/python3.6/site-packages/imgbased/imgbase.py", line 166, in image_from_path name = LVM.LV.from_path(path).lv_name File "/usr/lib/python3.6/site-packages/imgbased/lvm.py", line 251, in from_path "-ovg_name,lv_name", path]) File "/usr/lib/python3.6/site-packages/imgbased/utils.py", line 330, in lvs return self.call(["lvs"] + args, **kwargs) File "/usr/lib/python3.6/site-packages/imgbased/utils.py", line 421, in call return super(LvmBinary, self).call(*args, stderr=DEVNULL, **kwargs) File "/usr/lib/python3.6/site-packages/imgbased/utils.py", line 324, in call stdout = command.call(*args, **kwargs) File "/usr/lib/python3.6/site-packages/imgbased/command.py", line 14, in call return subprocess.check_output(*args, **kwargs).strip() File "/usr/lib64/python3.6/subprocess.py", line 356, in check_output **kwargs).stdout File "/usr/lib64/python3.6/subprocess.py", line 438, in run output=stdout, stderr=stderr) subprocess.CalledProcessError: Command '['lvs', '--noheadings', '--ignoreskippedcluster', '-ovg_name,lv_name', '/dev/mapper/rhvh-rhvh--4.4.1.1--0.20200722.0+1']' returned non-zero exit status 5. # lvs # pvs # vgs # Version-Release number of selected component (if applicable): redhat-virtualization-host-4.4.1-20200722.0.el8_2 imgbased-1.2.10-1.el8ev.noarch How reproducible: 100% Steps to Reproduce: 1. Install RHVH-4.4-20200722.1-RHVH-x86_64-dvd1.iso 2. Register RHVH to engine. 3. Run "imgbase check" Actual results: imgbase check failed after register to engine. Expected results: imgbase check pass after register to engine. Additional info: No such issue before register to engine.
Checked on the host which encountered the issue, the filter in /etc/lvm/lvm.conf is filter = ["a|^/dev/mapper/mpatha2$|", "r|.*|"], but there is no /dev/mapper/mpatha2: [root@hp-dl385g8-03 ~]# ls -al /dev/mapper/ total 0 drwxr-xr-x. 2 root root 380 Jul 23 08:33 . drwxr-xr-x. 23 root root 3720 Jul 23 09:10 .. lrwxrwxrwx. 1 root root 7 Jul 23 08:33 36005076300810b3e00000000000002a5 -> ../dm-0 lrwxrwxrwx. 1 root root 7 Jul 23 08:33 36005076300810b3e00000000000002a5p1 -> ../dm-1 lrwxrwxrwx. 1 root root 7 Jul 23 08:33 36005076300810b3e00000000000002a5p2 -> ../dm-2 lrwxrwxrwx. 1 root root 8 Jul 23 08:33 3600508b1001ccb4a1de53313beabbd82 -> ../dm-15 crw-------. 1 root root 10, 236 Jul 23 08:19 control lrwxrwxrwx. 1 root root 8 Jul 23 08:19 rhvh-home -> ../dm-14 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-pool00 -> ../dm-8 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-pool00_tdata -> ../dm-4 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-pool00_tmeta -> ../dm-3 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-pool00-tpool -> ../dm-5 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-rhvh--4.4.1.1--0.20200722.0+1 -> ../dm-6 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-swap -> ../dm-7 lrwxrwxrwx. 1 root root 8 Jul 23 08:19 rhvh-tmp -> ../dm-13 lrwxrwxrwx. 1 root root 8 Jul 23 08:19 rhvh-var -> ../dm-12 lrwxrwxrwx. 1 root root 8 Jul 23 08:19 rhvh-var_crash -> ../dm-11 lrwxrwxrwx. 1 root root 8 Jul 23 08:19 rhvh-var_log -> ../dm-10 lrwxrwxrwx. 1 root root 7 Jul 23 08:19 rhvh-var_log_audit -> ../dm-9 I changed the filter to filter = ["a|^/dev/mapper/36005076300810b3e00000000000002a5p2$|", "a|^/dev/mapper/36005076300810b3e00000000000002a5p1$|", "r|.*|"], lvs cmd and imgbase check work again. In /var/log/messages: Jul 23 08:18:48 hp-dl385g8-03 systemd[1]: Starting Device-Mapper Multipath Device Controller... Jul 23 08:18:48 hp-dl385g8-03 systemd[1]: Started Device-Mapper Multipath Device Controller. Jul 23 08:18:49 hp-dl385g8-03 multipathd[742]: mpatha: load table [0 629145600 multipath 1 queue_if_no_path 1 alua 1 1 service-time 0 1 1 8:16 1] Jul 23 08:18:49 hp-dl385g8-03 multipathd[742]: sdb [8:16]: path added to devmap mpatha Jul 23 08:18:57 hp-dl385g8-03 systemd[1]: Stopping Device-Mapper Multipath Device Controller... Jul 23 08:18:57 hp-dl385g8-03 systemd[1]: Stopped Device-Mapper Multipath Device Controller. ... Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting oVirt ImageIO Daemon... Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Virtual Desktop Server Manager network IP+link restoration... Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Shared Storage Lease Manager... Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Watchdog Multiplexing Daemon... Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Reached target System Time Synchronized. Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Starting Device-Mapper Multipath Device Controller... Jul 23 08:33:30 hp-dl385g8-03 multipathd[26944]: 3600508b1001ccb4a1de53313beabbd82: load table [0 1172058032 multipath 1 queue_if_no_path 0 1 1 service-time 0 1 1 8:0 1] Jul 23 08:33:30 hp-dl385g8-03 multipathd[26944]: 36005076300810b3e00000000000002a5: rename mpatha to 36005076300810b3e00000000000002a5 Jul 23 08:33:30 hp-dl385g8-03 multipathd[26944]: 36005076300810b3e00000000000002a5: load table [0 629145600 multipath 1 queue_if_no_path 1 alua 2 1 service-time 0 1 1 8:16 1 service-time 0 1 1 8:32 1] Jul 23 08:33:30 hp-dl385g8-03 systemd[1]: Started Device-Mapper Multipath Device Controller. As you can see, multipathd was started again during registering to engine, and mpatha was renamed to 36005076300810b3e00000000000002a5.
Not blocking 4.4.1 on this because this seems to be reproducible only on the host used for the test.
Can reproduce this issue on other multipath machines as well. Seems this bug only effect multipath machine. No function impact, create VMs can succeed.
Hi Ben We are considering changes for multipathd configuration related for device naming mismatching shown here Currenly our multipathd configurator replaces the "/etc/multipath.conf" with the vdsm one [1]. Then it flushes multipath, assuming already running on system, with multipath -F and reloads: systemctl reload multipathd We would like to switch to restart and then flush: systemctl restart multipathd multipath -F This in purpose to switch from /dev/mapper/mpath{X} naming into /dev/mapper/{WWN} and to prevent mix-ups in lvm filtering setup coming next. Will it ado? [1] https://github.com/oVirt/vdsm/blob/09b18879aee8c2faa3e4a4d29ec966848c750915/lib/vdsm/tool/configurators/multipath.py#L77
Ben, adding more context for comment 9. Before we configure this host, the root file system is using: /dev/mapper/mpatha2 (based on comment 0 - if we have lvm filter specifying /dev/mapper/mapatha2, it means /dev/mapper/mpatha2 was mounted when we created the lvm filter) I don't know why the host is using multipath device. This may be a local device that should be blacklisted, or maybe this host is booting from SAN. This is probably caused by default multipath configuration, using use_friendly_names = yes. We disable this since it is cannot work with RHV shared storage. So after we configure the host, we expect all /dev/mapper/mpath{X} to be replaced with /dev/mapper/{WWN}. Does it require reboot of the host or restarting multipath is good enough?
cshao, can you explain what do you mean by "multiipath machine"? We need to understand why this host is using multipath device for the root file system. This is valid only if the host is booting from SAN. If /dev/mapper/maptha is a local disk, it must be blacklisted by adding multipath drop-in configuration. The way to configure is: $ udevadm info /sys/block/sda | egrep "ID_SERIAL=|WWN=" E: ID_SERIAL=Generic-_SD_MMC_20120501030900000-0:0 In this case this blacklist should work: $ cat /etc/multipath/conf.d/99-local.conf blacklist { wwid "Generic-_SD_MMC_20120501030900000-0:0" } In your case and based on the info in comment 3, I think this whould work: $ cat /etc/multipath/conf.d/99-local.conf blacklist { wwid "3600508b1001ccb4a1de53313beabbd82" } This must be done before installing or upgrading RHV. Does it resolve the issue?
(In reply to Nir Soffer from comment #10) > Ben, adding more context for comment 9. > > Before we configure this host, the root file system is using: > /dev/mapper/mpatha2 > > (based on comment 0 - if we have lvm filter specifying /dev/mapper/mapatha2, > it means /dev/mapper/mpatha2 was mounted when we created the lvm filter) > > I don't know why the host is using multipath device. This may be a local > device that should be blacklisted, or maybe this host is booting from > SAN. > > This is probably caused by default multipath configuration, using > use_friendly_names = yes. We disable this since it is cannot work with > RHV shared storage. > > So after we configure the host, we expect all /dev/mapper/mpath{X} to be > replaced with /dev/mapper/{WWN}. Does it require reboot of the host or > restarting multipath is good enough? You shouldn't even need to do that. # systemctl reload multipathd.service should update the configuration of the running multipathd instance, and rename the devices, even if they are in use.
(In reply to Nir Soffer from comment #11) > cshao, can you explain what do you mean by "multiipath machine"? > > We need to understand why this host is using multipath device > for the root file system. This is valid only if the host is booting > from SAN. > Hi, I means the multipath is from SAN, and the host is booting from SAN. > If /dev/mapper/maptha is a local disk, it must be blacklisted The reproduced machine have 2 disks(the one is FC SAN, other is local disk), I just install RHVH on FC lun, and I didn't used the local disk. so I think /dev/mapper/maptha is not local disk. > by adding multipath drop-in configuration. > > The way to configure is: > > $ udevadm info /sys/block/sda | egrep "ID_SERIAL=|WWN=" > E: ID_SERIAL=Generic-_SD_MMC_20120501030900000-0:0 > > In this case this blacklist should work: > > $ cat /etc/multipath/conf.d/99-local.conf > blacklist { > wwid "Generic-_SD_MMC_20120501030900000-0:0" > } > > In your case and based on the info in comment 3, I think this whould work: > > $ cat /etc/multipath/conf.d/99-local.conf > blacklist { > wwid "3600508b1001ccb4a1de53313beabbd82" > } > > This must be done before installing or upgrading RHV. > > Does it resolve the issue?
Created attachment 1702744 [details] new-vdsm-test
(In reply to Ben Marzinski from comment #12) > (In reply to Nir Soffer from comment #10) > > Ben, adding more context for comment 9. > > > > Before we configure this host, the root file system is using: > > /dev/mapper/mpatha2 > > > > (based on comment 0 - if we have lvm filter specifying /dev/mapper/mapatha2, > > it means /dev/mapper/mpatha2 was mounted when we created the lvm filter) > > > > I don't know why the host is using multipath device. This may be a local > > device that should be blacklisted, or maybe this host is booting from > > SAN. > > > > This is probably caused by default multipath configuration, using > > use_friendly_names = yes. We disable this since it is cannot work with > > RHV shared storage. > > > > So after we configure the host, we expect all /dev/mapper/mpath{X} to be > > replaced with /dev/mapper/{WWN}. Does it require reboot of the host or > > restarting multipath is good enough? > > You shouldn't even need to do that. > > # systemctl reload multipathd.service > > should update the configuration of the running multipathd instance, and > rename the devices, even if they are in use. In case we use restart multipathd instead of reloading it post config changes, is flushing it (multipath -F) still required?
(In reply to Amit Bawer from comment #17) > In case we use restart multipathd instead of > reloading it post config changes, > is flushing it (multipath -F) still required? Actually, this is an ugly corner case in mulipathd that needs fixing. When you start multipathd, if a device is supposed to have a new name, it will get renamed. However if its name changed and also some other part of its configuration changed, only the name change will take effect. So yes, simply running "service multipathd restart" will work to rename a device, but it won't immediately pick up other configuration changes if the device got renamed. You could remove the devices before restarting multipathd if you know that they won't be in use. Or you could simply reload them after restarting multipathd, with either multipath -r or service multipathd reload This issue only happens when starting multipathd. At all other times, when a device is updated, both name and configuration changes will be applied.
Fix was reverted due to OST issues, this is pending re-fix after advisory regarding multipath config. more likely to 4.4.3 if 4.4.2 is closing up this week.
Ben, Would need your advisory about "multipath -r" usage, also sent details in mail with same subject. Thanks
Test version: redhat-virtualization-host-4.4.2-20200915.0.el8_2 Engine: 4.4.2-6 vdsm-4.40.26.3-1.el8ev.x86_64 Test steps: 1. Install redhat-virtualization-host-4.4.2-20200915.0.el8_2 on multipath machine. 2. Register to engine 3. imgbase check, multipath server check. Test result: imgbase check - pass # imgbase check Status: OK Bootloader ... OK Layer boot entries ... OK Valid boot entries ... OK Mount points ... OK Separate /var ... OK Discard is used ... OK Basic storage ... OK Initialized VG ... OK Initialized Thin Pool ... OK Initialized LVs ... OK Thin storage ... OK Checking available space in thinpool ... OK Checking thinpool auto-extend ... OK # systemctl status multipathd ● multipathd.service - Device-Mapper Multipath Device Controller Loaded: loaded (/usr/lib/systemd/system/multipathd.service; disabled; vendor preset: disabled) Active: active (running) since Tue 2020-09-15 10:03:57 UTC; 31min ago Main PID: 29424 (multipathd) Status: "up" Tasks: 7 Memory: 12.5M CGroup: /system.slice/multipathd.service └─29424 /sbin/multipathd -d -s Sep 15 10:03:56 hp-dl385g8-03.lab.eng.pek2.redhat.com systemd[1]: Starting Device-Mapper Multipath Device Controller... # pvs PV VG Fmt Attr PSize PFree /dev/mapper/36005076300810b3e00000000000002a5p2 rhvh lvm2 a-- <299.00g 58.01g So the bug is fixed, change bug status to VERIFIED.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Important: Red Hat Virtualization security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:4172
We have already automation register to engine part, and we check "imgbased check" manually.