Created attachment 1145907 [details] multipath-device-error Description of problem: ngn 4.0 installed successful, and while login ngn 4.0, there display error messages, not sure whether impact function or not. Version-Release number of selected component (if applicable): ovirt-node-ng-installer-master-20160405.iso squashfs.20160405 ovirt-node-ng-image-update-placeholder-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch imgbased-0.5-0.201604040928gitd6a85f8.el7.centos.noarch ovirt-release-host-node-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch device-mapper-1.02.107-5.el7_2.1.x86_64 How reproducible: 100% Steps to Reproduce: 1. Installed ngn 4.0 2. Add ngn to engine. 3. After rhevh boot, focus on login screen. Actual results: There display device-mapper multipath: error getting device. Expected results: Do not display such error in ngn 4.0 login. Additional info:
Created attachment 1145908 [details] /var/log/*.*
Please re-try with a more recent Node build from Jenkins The logs show, that the storage steup was wrong: [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-327.13.1.el7.x86_64 root=/dev/mapper/centos_dhcp--11--123-root ro crashkernel=auto rd.lvm.lv=centos_dhcp-11-123/root rd.lvm.lv=centos_dhcp-11-123/swap biosdevname=0 rhgb quiet LANG=en_US.UTF-8 root= indicates that the host was booted from the regular centos LV, and not a Node lv.
(In reply to Fabian Deutsch from comment #2) > Please re-try with a more recent Node build from Jenkins > > The logs show, that the storage steup was wrong: > [ 0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-327.13.1.el7.x86_64 > root=/dev/mapper/centos_dhcp--11--123-root ro crashkernel=auto > rd.lvm.lv=centos_dhcp-11-123/root rd.lvm.lv=centos_dhcp-11-123/swap > biosdevname=0 rhgb quiet LANG=en_US.UTF-8 > > root= indicates that the host was booted from the regular centos LV, and not > a Node lv. Still can reproduce this issue with latest Node build from Jenkins Test version: ovirt-node-ng-installer-ovirt-3.6-2016042508.iso ovirt-node-ng-image-update-placeholder-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch imgbased-0.6-0.201604150305git1e3b28f.el7.centos.noarch ovirt-release-host-node-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch device-mapper-1.02.107-5.el7_2.1.x86_64
> > Still can reproduce this issue with latest Node build from Jenkins > > Test version: > ovirt-node-ng-installer-ovirt-3.6-2016042508.iso > ovirt-node-ng-image-update-placeholder-4.0.0-0.2.alpha1.20160405123556. > gitbd184ec.el7.noarch > imgbased-0.6-0.201604150305git1e3b28f.el7.centos.noarch > ovirt-release-host-node-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch > device-mapper-1.02.107-5.el7_2.1.x86_64 Update version info: ovirt-node-ng-installer-ovirt-3.6-2016042508.iso ovirt-node-ng-image-update-placeholder-3.6.5-0.0.master.20160419091412.gite23be77.el7.noarch imgbased-0.6-0.201604150305git1e3b28f.el7.centos.noarch ovirt-release-host-node-3.6.5-0.0.master.20160419091412.gite23be77.el7.noarch device-mapper-1.02.107-5.el7_2.1.x86_64 Still can reproduce this issue with latest Node build from Jenkins
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.
Can this still be reproduced?
(In reply to Fabian Deutsch from comment #6) > Can this still be reproduced? Yes, still can reproduce on rhev-hypervisor7-ng-4.0-20160616.0 build.
Tareq, Pavol, have you seen this issue in your testing? Chen, I also wonder if this appears on all hosts or just some hosts.
(In reply to Fabian Deutsch from comment #8) > Tareq, Pavol, have you seen this issue in your testing? > > Chen, I also wonder if this appears on all hosts or just some hosts. This appears on all hosts during my testing. The step "add ngn to engine" is necessary.
That's a good note Chen. One reason could be, that the multipath.conf inside the initrd is different than the one in userspace. Usually initrd needs to be regenerated if multipath.conf changes. Nir, can you tell if vdsm is regenerating initrd after it modified multipath.conf?
(In reply to Fabian Deutsch from comment #10) > That's a good note Chen. > > One reason could be, that the multipath.conf inside the initrd is different > than the one in userspace. > Usually initrd needs to be regenerated if multipath.conf changes. > > Nir, can you tell if vdsm is regenerating initrd after it modified > multipath.conf? No, we considered this in the past, but since this is a very special need, we decided that this should be the administrator responsibility if she needs this.
Okay, thanks. I just rebuilt the initrd on an affected machine, but the bug is still there: # cat /etc/multipath.conf # VDSM REVISION 1.3 defaults { polling_interval 5 no_path_retry fail user_friendly_names no flush_on_last_del yes fast_io_fail_tmo 5 dev_loss_tmo 30 max_fds 4096 } # Remove devices entries when overrides section is available. devices { device { # These settings overrides built-in devices settings. It does not apply # to devices without built-in settings (these use the settings in the # "defaults" section), or to devices defined in the "devices" section. # Note: This is not available yet on Fedora 21. For more info see # https://bugzilla.redhat.com/1253799 all_devs yes no_path_retry fail } } # Enable when this section is available on all supported platforms. # Options defined here override device specific options embedded into # multipathd. # # overrides { # no_path_retry fail # } [root@slot-6c ~]# dmesg | grep -C 3 device-mapper … [ 27.695388] device-mapper: table: 253:8: multipath: error getting device [ 27.702872] device-mapper: ioctl: error adding target to table Ben, I recall that this error appeared when the multipath.conf differed between initrd and user-space. In this case they are the same, and the error is still shown. Do you have an idea why this could be?
I was able to reproduce bug with clean RHEL adding to engine = installation of vdsm during Add host [ 2005.067667] device-mapper: multipath: version 1.9.0 loaded [ 2005.167426] device-mapper: multipath service-time: version 0.2.0 loaded [ 2005.174479] device-mapper: table: 253:4: multipath: error getting device [ 2005.181266] device-mapper: ioctl: error adding target to table
Nir, looks like this is seen on RHEL too. Could this be a side effect of the vdsm multipath configuration?
(In reply to Fabian Deutsch from comment #14) > Nir, looks like this is seen on RHEL too. > > Could this be a side effect of the vdsm multipath configuration? In a way yes, since we do not use find-multipath option, using multipath only for devices with multiple paths. We use multipath for all devices, so we can add paths to devices with single path transparently. This cause multipath to try to add mapping for all matching devices. If lvm was faster and took a device, multipath will fail to add map, logging this message. Disabling lvmetad and lvm auto activation may help to avoid this issue.
Thanks Nir. This means this is nothing we can solved from the Node side, thus moving over to vdsm. I'd assume that we want to solve this somehow on the long run.
Can be solved with lvm filter, preventing the race between multipath and lvm.
Moving out all non blocker\exceptions.
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4. Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.
(In reply to Pavol Brilla from comment #13) > I was able to reproduce bug with clean RHEL adding to engine = installation > of vdsm during Add host > > [ 2005.067667] device-mapper: multipath: version 1.9.0 loaded > [ 2005.167426] device-mapper: multipath service-time: version 0.2.0 loaded > [ 2005.174479] device-mapper: table: 253:4: multipath: error getting device > [ 2005.181266] device-mapper: ioctl: error adding target to table Ben, can you explain these errors during boot? is this related to lvm grabbing a device before multipath could use it?
(In reply to Nir Soffer from comment #20) > (In reply to Pavol Brilla from comment #13) > > I was able to reproduce bug with clean RHEL adding to engine = installation > > of vdsm during Add host > > > > [ 2005.067667] device-mapper: multipath: version 1.9.0 loaded > > [ 2005.167426] device-mapper: multipath service-time: version 0.2.0 loaded > > [ 2005.174479] device-mapper: table: 253:4: multipath: error getting device > > [ 2005.181266] device-mapper: ioctl: error adding target to table > > Ben, can you explain these errors during boot? is this related to lvm > grabbing > a device before multipath could use it? Yeah, probably. This error almost always means that the scsi device is already in use. With find_multipaths off, multipath will attempt to grab all devices. If something else (usually lvm) already has autoassembled on top of a device, multipath won't be able to grab it. The general solution is to either set find_multipaths, blacklist single path devices, or add the wwid to /etc/mutipath/wwids (if multipath is supposed to be grabbing this device).
This issue is prevented by applying a proper lvm filter that will not allow lvm to use devices which are not required by the host. We introduced a new vdsm-tool command, "config-lvm-filter", automating lvm configuration. If you use block storage you should configure lvm filter properly on all hosts. See https://ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/
Nir, this is targeted to 4.3 but modified in 4.2. Can you please check / fix target milestone?
Same as https://bugzilla.redhat.com/show_bug.cgi?id=1130527#c26
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason: [No external trackers attached] For more info please contact: infra
cshao, can you explain how to reproduce this, or verify this bug with lastest vdsm? Note that you must setup lvm filter to avoid this issue, using: vdsm-tool config-lvm-filter See https://www.ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/
Test version: redhat-virtualization-host-4.2-20180205.0 vdsm-4.20.17-1.el7ev.x86_64 Test steps: 1. Installed RHVH 4.2 2. Setup lvm filter. 3. Add RHVH to RHVM. 4. After rhevh boot, focus on login screen. Test result: No device-mapper error output in login screen. So the bug is fixed, change bug status to VERIFIED.
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018. Since the problem described in this bug report should be resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE. If the solution does not work for you, please open a new bug report.