Bug 1324028 - Possible timing issue when processing udev events for multipathed PV causing lvmetad and LVM commands to not see the PV
Summary: Possible timing issue when processing udev events for multipathed PV causing ...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: lvm2
Version: 7.3
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: ---
Assignee: Peter Rajnoha
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On: 1287106
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-04-05 11:35 UTC by Peter Rajnoha
Modified: 2019-11-14 07:43 UTC (History)
14 users (show)

Fixed In Version: lvm2-2.02.152-1.el7
Doc Type: Bug Fix
Doc Text:
Due to timing problems, when processing udev events for multipathed Physical Volumes (PVs), lvmetad and LVM commands previously failed to detect the PVs on reboot. This update fixes the aforementioned timing bug, and the system now finds and activates all LVM volume groups while scanning and activating PVs.
Clone Of:
Environment:
Last Closed: 2016-11-04 04:19:54 UTC
Target Upstream Version:


Attachments (Terms of Use)
Udev monitoring service to collect device events with variables to /run/udev_monitor.log (348 bytes, text/plain)
2016-06-06 12:01 UTC, Peter Rajnoha
no flags Details


Links
System ID Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2016:1445 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2016-11-03 13:46:41 UTC

Description Peter Rajnoha 2016-04-05 11:35:18 UTC
A timing issue may cause PVs on multipath devices to be left out when scanning these PVs for lvmetad update. The lvmetad is a daemon that is used by default in RHEL7 (configured by global/use_lvmetad=1 lvm.conf setting) and it's used to cache LVM metadata for their use in subsequent LVM commands to reduce disk scanning when LVM commands are executed. So LVM commands rely on lvmetad content instead of scanning disks on each LVM command execution.

As a consequence of this timing issue, LVM commands can end up with an error:
  WARNING: Device for PV <PV_UUID> not found or rejected by a filter.

Furthermore, any VGs/LVs which are on top of such PVs which hit this timing issue are not automatically activated. This may lead to further errors when such VG/LV is supposed to be mounted or any other part of the system expects this VG/LV - system may end up with a timeout for the VG/LV.

The source of the problem is in cooperation between LVM and multipath when processing udev events. Whenever multipath component appears or disappears, multipath needs to reload table for DM device that represents multipath device which puts all the multipath components together. Multipath device needs to be suspended for a while when the table is reloaded. At the same time, LVM does not scan DM devices (including multipath devices) which are suspended in general as that itself may cause timeouts when processing udev events. The timing issue causing above-mentioned problem happens if LVM tries to scan multipath devices just in time when the multipath device is being reloaded and it's in suspended state - hence all such devices are skipped.

On device-mapper-multipath side, multipath now waits waits for completion of udev events properly before issuing further reloads, if needed. On lvm2 side, LVM2 command that is executed from within udev context (the pvscan --cache --aay called from udev rules to update lvmetad) does not check for suspended devices itself as that is already done within udev rules (this second check for suspended dm devs in pvscan --cache --aay may have caused another race again because pvscan is detached from udev context as background process so any synchronization in multipath waiting for events to complete would not be effective).

This is resolved in RHEL7.1.z (lvm2-2.02.115-3.el7_1.2 and device-mapper-multipath-0.4.9-77.el7_1.3) and in RHEL7.2.z versions of packages (lvm2-2.02.130-5.el7_2.2 and device-mapper-multipath-0.4.9-85.el7_2.2). 

The fixes will be also included in RHEL7.3 and higher versions of lvm2 and device-mapper-multipath packages.

Comment 1 Peter Rajnoha 2016-04-05 11:38:34 UTC
This is an excerpt and summary of existing private bug #1287106 so this problem and solution can be searched for and referenced in other bugs if needed.

Comment 2 LENHOF 2016-04-06 11:46:28 UTC
Hi,

Is there a roadmap available somewhere about the release date of RHEL 7.3 ?
Version lvm2-2.02.130-5.el7_2.2 seems not available yet in EUS channel... Is it on QA ?

Regards,


Paquets disponibles
Nom                 : lvm2
Architecture        : x86_64
Date                : 7
Version             : 2.02.130
Révision            : 5.el7_2.1
Taille              : 1.0 M
Dépôt               : rhel-7-server-eus-rpms/x86_64
Résumé              : Userland logical volume management tools
URL                 : http://sources.redhat.com/lvm2
Licence             : GPLv2
Description         : LVM2 includes all of the support for handling read/write operations on
                    : physical volumes (hard disks, RAID-Systems, magneto optical, etc.,
                    : multiple devices (MD), see mdadd(8) or even loop devices, see
                    : losetup(8)), creating volume groups (kind of virtual disks) from one
                    : or more physical volumes and creating one or more logical volumes
                    : (kind of logical partitions) in volume groups.

[root@tstlsys007 yum.repos.d]#

Comment 3 Peter Rajnoha 2016-04-06 12:05:41 UTC
(In reply to LENHOF from comment #2)
> Hi,
> 
> Is there a roadmap available somewhere about the release date of RHEL 7.3 ?
> Version lvm2-2.02.130-5.el7_2.2 seems not available yet in EUS channel... Is
> it on QA ?

I don't think 7.3 schedule is publicly available yet. The 7.2.z version (lvm2-2.02.130-5.el7_2.2) has currently passed QA so it should appear on the channel in a few days.

Comment 6 LENHOF 2016-04-25 14:04:10 UTC
Hi,

Packages seems to not be available yet.... after more than "few days". Could you provide some feedback about progress been made about this bug report ?

Regards,

Comment 7 Peter Rajnoha 2016-04-25 14:25:10 UTC
7.1.EUS is shipped already (lvm2-2.02.115-3.el7_1.2).

7.2.z is in preparation (lvm2-2.02.130-5.el7_2.2). This one passed devel, passed QA, it's verified - all done, just waiting to be released. Add release engineering to needinfo if they can provide exact dates...

Comment 9 Jan Blazek 2016-05-18 09:33:59 UTC
(In reply to Peter Rajnoha from comment #7)
> 7.1.EUS is shipped already (lvm2-2.02.115-3.el7_1.2).
> 
> 7.2.z is in preparation (lvm2-2.02.130-5.el7_2.2). This one passed devel,
> passed QA, it's verified - all done, just waiting to be released. Add
> release engineering to needinfo if they can provide exact dates...

The 7.2.z advisory with lvm2-2.02.130-5.el7_2.2 build was already released.
https://access.redhat.com/errata/RHBA-2016:1028

Comment 10 Peter Rajnoha 2016-06-06 12:01:03 UTC
Created attachment 1165180 [details]
Udev monitoring service to collect device events with variables to /run/udev_monitor.log

I'm copying the part about getting proper logs to analyze this issue from bug #1287106 so it's publicly available:

1) place systemd-udev-monitor.service file in /etc/systemd/system
2) call "systemctl daemon-reload"
3) call "systemctl enable systemd-udev-monitor.service"
4) reboot
5) append "systemd.log_level=debug systemd.log_target=kmsg udev.log-priority=debug log_buf_len=8M" to kernel cmd line at boot
6) make sure systemd-udev-monitor.service is running by checking "systemctl status systemd-udev-monitor.service"
7) after booting, grab the journal with "journalctl -b"
8) generate lvmdump by calling "lvmdump -u -l -s" and take the generated dump file
9) stop the systemd-udev-monitor.service with "systemctl stop systemd-udev-monitor.service"
10) take the /run/udev_monitor.log file that got generated
11) do the sosreport, just in case we'd need some extra info not already gathered by commands above
12) call "systemctl disable systemd-udev-monitor.service" and "rm /etc/systemd/system/systemd-udev-monitor.service" for cleanup.

It's important to have all the information and logs in sync for *one single run* as otherwise it's harder to track devices with logs from different runs (major/minors changed, names can be changed etc.). I'm interested in logs which are gathered for the failed run, of course but if we also have exactly same set of logs for the correct boot as well, that would be even more perfect (for comparison).

Comment 11 Peter Rajnoha 2016-06-06 12:12:46 UTC
Note: as for z-streams, the fix for this issue has been released as:

7.2.z:
======
  lvm2-2.02.130-5.el7_2.2
  device-mapper-multipath-0.4.9-85.el7_2.2

7.1.z:
======
  lvm2-2.02.115-3.el7_1.2
  device-mapper-multipath-0.4.9-77.el7_1.3

Comment 12 Roman Bednář 2016-09-19 12:41:46 UTC
Verified based on https://bugzilla.redhat.com/show_bug.cgi?id=1287106#c56 

No warnings related to missing PVs observed after reboot with multipath setup using latest rpms.


# systemctl is-active lvm2-lvmetad.service
inactive

# pvs -o +devices
  PV                 VG            Fmt  Attr PSize   PFree   Devices              
  /dev/mapper/mpatha vg            lvm2 a--  972.00m 872.00m /dev/mapper/mpatha(0)
  /dev/mapper/mpatha vg            lvm2 a--  972.00m 872.00m                      
  ... 
                     
# reboot
...

# pvs
  PV                 VG            Fmt  Attr PSize   PFree  
  /dev/mapper/mpatha vg            lvm2 a--  972.00m 872.00m
  ...
==============================================================
lvmetad on:

# systemctl is-active lvm2-lvmetad.service 
active

# vgcreate vg /dev/mapper/mpatha
  Volume group "vg" successfully created

# lvcreate -L100M vg
  Logical volume "lvol0" created.

# reboot
...
# pvs
  PV                 VG            Fmt  Attr PSize   PFree  
  /dev/mapper/mpatha vg            lvm2 a--  972.00m 872.00m
  /dev/vda2          rhel_virt-283 lvm2 a--    7.79g  40.00m



3.10.0-505.el7.x86_64

lvm2-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
lvm2-libs-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
lvm2-cluster-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-libs-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-event-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-event-libs-1.02.134-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016
device-mapper-persistent-data-0.6.3-1.el7    BUILT: Fri Jul 22 12:29:13 CEST 2016
cmirror-2.02.165-2.el7    BUILT: Wed Sep 14 16:01:43 CEST 2016

Comment 14 errata-xmlrpc 2016-11-04 04:19:54 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHBA-2016-1445.html


Note You need to log in before you can comment on or make changes to this bug.