Bug 1325844 - Hide error messages(device-mapper: multipath: error getting device) display while ngn 4.0 login (lvm filter?)
Summary: Hide error messages(device-mapper: multipath: error getting device) display w...
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: vdsm
Classification: oVirt
Component: Core
Version: 4.18.10
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ovirt-4.2.1
: ---
Assignee: Nir Soffer
QA Contact: Kevin Alon Goldblatt
URL:
Whiteboard:
Depends On: 1374545
Blocks: 1450114
TreeView+ depends on / blocked
 
Reported: 2016-04-11 10:47 UTC by cshao
Modified: 2018-02-12 11:49 UTC (History)
14 users (show)

Fixed In Version: vdsm-4.20.9
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-12 11:49:19 UTC
oVirt Team: Storage
Embargoed:
amureini: ovirt-4.2?
ykaul: exception?
rule-engine: planning_ack?
fdeutsch: devel_ack+
cshao: testing_ack+


Attachments (Terms of Use)
multipath-device-error (917.52 KB, image/png)
2016-04-11 10:47 UTC, cshao
no flags Details
/var/log/*.* (587.00 KB, application/x-gzip)
2016-04-11 10:49 UTC, cshao
no flags Details


Links
System ID Private Priority Status Summary Last Updated
oVirt gerrit 85126 0 None None None 2018-01-15 13:41:02 UTC

Description cshao 2016-04-11 10:47:49 UTC
Created attachment 1145907 [details]
multipath-device-error

Description of problem:
ngn 4.0 installed successful, and while login ngn 4.0, there display error messages, not sure whether impact function or not.

Version-Release number of selected component (if applicable):
ovirt-node-ng-installer-master-20160405.iso
squashfs.20160405
ovirt-node-ng-image-update-placeholder-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch
imgbased-0.5-0.201604040928gitd6a85f8.el7.centos.noarch
ovirt-release-host-node-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch
device-mapper-1.02.107-5.el7_2.1.x86_64

How reproducible:
100%

Steps to Reproduce:
1. Installed ngn 4.0
2. Add ngn to engine.
3. After rhevh boot, focus on login screen.


Actual results:
There display device-mapper multipath: error getting device.

Expected results:
Do not display such error in ngn 4.0 login.

Additional info:

Comment 1 cshao 2016-04-11 10:49:36 UTC
Created attachment 1145908 [details]
/var/log/*.*

Comment 2 Fabian Deutsch 2016-04-11 11:36:33 UTC
Please re-try with a more recent Node build from Jenkins

The logs show, that the storage steup was wrong:
[    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-327.13.1.el7.x86_64 root=/dev/mapper/centos_dhcp--11--123-root ro crashkernel=auto rd.lvm.lv=centos_dhcp-11-123/root rd.lvm.lv=centos_dhcp-11-123/swap biosdevname=0 rhgb quiet LANG=en_US.UTF-8

root= indicates that the host was booted from the regular centos LV, and not a Node lv.

Comment 3 cshao 2016-04-26 06:01:17 UTC
(In reply to Fabian Deutsch from comment #2)
> Please re-try with a more recent Node build from Jenkins
> 
> The logs show, that the storage steup was wrong:
> [    0.000000] Command line: BOOT_IMAGE=/vmlinuz-3.10.0-327.13.1.el7.x86_64
> root=/dev/mapper/centos_dhcp--11--123-root ro crashkernel=auto
> rd.lvm.lv=centos_dhcp-11-123/root rd.lvm.lv=centos_dhcp-11-123/swap
> biosdevname=0 rhgb quiet LANG=en_US.UTF-8
> 
> root= indicates that the host was booted from the regular centos LV, and not
> a Node lv.

Still can reproduce this issue with latest Node build from Jenkins

Test version:
ovirt-node-ng-installer-ovirt-3.6-2016042508.iso
ovirt-node-ng-image-update-placeholder-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch
imgbased-0.6-0.201604150305git1e3b28f.el7.centos.noarch
ovirt-release-host-node-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch
device-mapper-1.02.107-5.el7_2.1.x86_64

Comment 4 cshao 2016-04-26 07:32:20 UTC
> 
> Still can reproduce this issue with latest Node build from Jenkins
> 
> Test version:
> ovirt-node-ng-installer-ovirt-3.6-2016042508.iso
> ovirt-node-ng-image-update-placeholder-4.0.0-0.2.alpha1.20160405123556.
> gitbd184ec.el7.noarch
> imgbased-0.6-0.201604150305git1e3b28f.el7.centos.noarch
> ovirt-release-host-node-4.0.0-0.2.alpha1.20160405123556.gitbd184ec.el7.noarch
> device-mapper-1.02.107-5.el7_2.1.x86_64


Update version info:
ovirt-node-ng-installer-ovirt-3.6-2016042508.iso
ovirt-node-ng-image-update-placeholder-3.6.5-0.0.master.20160419091412.gite23be77.el7.noarch
imgbased-0.6-0.201604150305git1e3b28f.el7.centos.noarch
ovirt-release-host-node-3.6.5-0.0.master.20160419091412.gite23be77.el7.noarch
device-mapper-1.02.107-5.el7_2.1.x86_64

Still can reproduce this issue with latest Node build from Jenkins

Comment 5 Sandro Bonazzola 2016-05-02 10:02:25 UTC
Moving from 4.0 alpha to 4.0 beta since 4.0 alpha has been already released and bug is not ON_QA.

Comment 6 Fabian Deutsch 2016-06-21 14:05:30 UTC
Can this still be reproduced?

Comment 7 cshao 2016-06-22 01:08:06 UTC
(In reply to Fabian Deutsch from comment #6)
> Can this still be reproduced?

Yes, still can reproduce on rhev-hypervisor7-ng-4.0-20160616.0 build.

Comment 8 Fabian Deutsch 2016-07-21 12:40:39 UTC
Tareq, Pavol, have you seen this issue in your testing?

Chen, I also wonder if this appears on all hosts or just some hosts.

Comment 9 cshao 2016-07-21 12:47:57 UTC
(In reply to Fabian Deutsch from comment #8)
> Tareq, Pavol, have you seen this issue in your testing?
> 
> Chen, I also wonder if this appears on all hosts or just some hosts.

This appears on all hosts during my testing. The step "add ngn to engine" is necessary.

Comment 10 Fabian Deutsch 2016-07-21 13:11:00 UTC
That's a good note Chen.

One reason could be, that the multipath.conf inside the initrd is different than the one in userspace.
Usually initrd needs to  be regenerated if multipath.conf changes.

Nir, can you tell if vdsm is regenerating initrd after it modified multipath.conf?

Comment 11 Nir Soffer 2016-07-21 13:20:00 UTC
(In reply to Fabian Deutsch from comment #10)
> That's a good note Chen.
> 
> One reason could be, that the multipath.conf inside the initrd is different
> than the one in userspace.
> Usually initrd needs to  be regenerated if multipath.conf changes.
> 
> Nir, can you tell if vdsm is regenerating initrd after it modified
> multipath.conf?

No, we considered this in the past, but since this is a very special need, we 
decided that this should be the administrator responsibility if she needs this.

Comment 12 Fabian Deutsch 2016-07-22 11:57:11 UTC
Okay, thanks.

I just rebuilt the initrd on an affected machine, but the bug is still there:

# cat /etc/multipath.conf 
# VDSM REVISION 1.3

defaults {
    polling_interval            5
    no_path_retry               fail
    user_friendly_names         no
    flush_on_last_del           yes
    fast_io_fail_tmo            5
    dev_loss_tmo                30
    max_fds                     4096
}

# Remove devices entries when overrides section is available.
devices {
    device {
        # These settings overrides built-in devices settings. It does not apply
        # to devices without built-in settings (these use the settings in the
        # "defaults" section), or to devices defined in the "devices" section.
        # Note: This is not available yet on Fedora 21. For more info see
        # https://bugzilla.redhat.com/1253799
        all_devs                yes
        no_path_retry           fail
    }
}

# Enable when this section is available on all supported platforms.
# Options defined here override device specific options embedded into
# multipathd.
#
# overrides {
#      no_path_retry           fail
# }


[root@slot-6c ~]# dmesg | grep -C 3 device-mapper
…
[   27.695388] device-mapper: table: 253:8: multipath: error getting device
[   27.702872] device-mapper: ioctl: error adding target to table

Ben, I recall that this error appeared when the multipath.conf differed between initrd and user-space.
In this case they are the same, and the error is still shown.
Do you have an idea why this could be?

Comment 13 Pavol Brilla 2016-07-22 13:01:40 UTC
I was able to reproduce bug with clean RHEL adding to engine = installation of vdsm during Add host

[ 2005.067667] device-mapper: multipath: version 1.9.0 loaded
[ 2005.167426] device-mapper: multipath service-time: version 0.2.0 loaded
[ 2005.174479] device-mapper: table: 253:4: multipath: error getting device
[ 2005.181266] device-mapper: ioctl: error adding target to table

Comment 14 Fabian Deutsch 2016-08-11 19:14:57 UTC
Nir, looks like this is seen on RHEL too.

Could this be a side effect of the vdsm multipath configuration?

Comment 15 Nir Soffer 2016-08-11 19:24:54 UTC
(In reply to Fabian Deutsch from comment #14)
> Nir, looks like this is seen on RHEL too.
> 
> Could this be a side effect of the vdsm multipath configuration?

In a way yes, since we do not use find-multipath option, using multipath
only for devices with multiple paths. We use multipath for all devices,
so we can add paths to devices with single path transparently.

This cause multipath to try to add mapping for all matching devices. If lvm was
faster and took a device, multipath will fail to add map, logging this message.

Disabling lvmetad and lvm auto activation may help to avoid this issue.

Comment 16 Fabian Deutsch 2016-08-12 07:21:21 UTC
Thanks Nir.

This means this is nothing we can solved from the Node side, thus moving over to vdsm.

I'd assume that we want to solve this somehow on the long run.

Comment 17 Nir Soffer 2017-02-14 11:39:15 UTC
Can be solved with lvm filter, preventing the race between multipath and lvm.

Comment 18 Yaniv Lavi 2017-02-23 11:25:21 UTC
Moving out all non blocker\exceptions.

Comment 19 Allon Mureinik 2017-07-02 20:38:03 UTC
4.1.4 is planned as a minimal, fast, z-stream version to fix any open issues we may have in supporting the upcoming EL 7.4.

Pushing out anything unrelated, although if there's a minimal/trival, SAFE fix that's ready on time, we can consider introducing it in 4.1.4.

Comment 20 Nir Soffer 2017-07-02 21:57:40 UTC
(In reply to Pavol Brilla from comment #13)
> I was able to reproduce bug with clean RHEL adding to engine = installation
> of vdsm during Add host
> 
> [ 2005.067667] device-mapper: multipath: version 1.9.0 loaded
> [ 2005.167426] device-mapper: multipath service-time: version 0.2.0 loaded
> [ 2005.174479] device-mapper: table: 253:4: multipath: error getting device
> [ 2005.181266] device-mapper: ioctl: error adding target to table

Ben, can you explain these errors during boot? is this related to lvm grabbing
a device before multipath could use it?

Comment 21 Ben Marzinski 2017-07-10 20:34:27 UTC
(In reply to Nir Soffer from comment #20)
> (In reply to Pavol Brilla from comment #13)
> > I was able to reproduce bug with clean RHEL adding to engine = installation
> > of vdsm during Add host
> > 
> > [ 2005.067667] device-mapper: multipath: version 1.9.0 loaded
> > [ 2005.167426] device-mapper: multipath service-time: version 0.2.0 loaded
> > [ 2005.174479] device-mapper: table: 253:4: multipath: error getting device
> > [ 2005.181266] device-mapper: ioctl: error adding target to table
> 
> Ben, can you explain these errors during boot? is this related to lvm
> grabbing
> a device before multipath could use it?

Yeah, probably.  This error almost always means that the scsi device is already in use.  With find_multipaths off, multipath will attempt to grab all devices.  If something else (usually lvm) already has autoassembled on top of a device,
multipath won't be able to grab it.  The general solution is to either set find_multipaths, blacklist single path devices, or add the wwid to /etc/mutipath/wwids (if multipath is supposed to be grabbing this device).

Comment 22 Nir Soffer 2017-12-06 18:02:16 UTC
This issue is prevented by applying a proper lvm filter that will not allow lvm
to use devices which are not required by the host.

We introduced a new vdsm-tool command, "config-lvm-filter", automating lvm
configuration. If you use block storage you should configure lvm filter properly
on all hosts.

See https://ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/

Comment 23 Sandro Bonazzola 2017-12-12 15:52:58 UTC
Nir, this is targeted to 4.3 but modified in 4.2.
Can you please check / fix target milestone?

Comment 24 Nir Soffer 2017-12-12 15:56:56 UTC
Same as https://bugzilla.redhat.com/show_bug.cgi?id=1130527#c26

Comment 25 RHV bug bot 2018-01-05 16:58:39 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No external trackers attached]

For more info please contact: infra

Comment 26 RHV bug bot 2018-01-12 14:40:38 UTC
INFO: Bug status wasn't changed from MODIFIED to ON_QA due to the following reason:

[No external trackers attached]

For more info please contact: infra

Comment 27 Nir Soffer 2018-02-07 07:51:47 UTC
cshao, can you explain how to reproduce this, or verify this bug with lastest
vdsm?

Note that you must setup lvm filter to avoid this issue, using:
vdsm-tool config-lvm-filter

See https://www.ovirt.org/blog/2017/12/lvm-configuration-the-easy-way/

Comment 28 cshao 2018-02-07 08:24:58 UTC
Test version:
redhat-virtualization-host-4.2-20180205.0
vdsm-4.20.17-1.el7ev.x86_64

Test steps:
1. Installed RHVH 4.2
2. Setup lvm filter.
3. Add RHVH to RHVM.
4. After rhevh boot, focus on login screen.

Test result:
No device-mapper error output in login screen. So the bug is fixed, change bug status to VERIFIED.

Comment 29 Sandro Bonazzola 2018-02-12 11:49:19 UTC
This bugzilla is included in oVirt 4.2.1 release, published on Feb 12th 2018.

Since the problem described in this bug report should be
resolved in oVirt 4.2.1 release, it has been closed with a resolution of CURRENT RELEASE.

If the solution does not work for you, please open a new bug report.


Note You need to log in before you can comment on or make changes to this bug.