Bug 636001
Summary: | [RFE] LVM operations should not scan all devices | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 6 | Reporter: | Itamar Heim <iheim> |
Component: | lvm2 | Assignee: | Petr Rockai <prockai> |
Status: | CLOSED ERRATA | QA Contact: | Corey Marthaler <cmarthal> |
Severity: | high | Docs Contact: | |
Priority: | high | ||
Version: | 6.1 | CC: | abaron, agk, coughlan, dwysocha, heinzm, jbrassow, nperic, prajnoha, prockai |
Target Milestone: | rc | Keywords: | FutureFeature |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | lvm2-2.02.98-1.el6 | Doc Type: | Enhancement |
Doc Text: |
A new optional metadata caching daemon (lvmetad) is available as part of this update of LVM2, along with udev integration for device scanning. Repeated scans of all block devices in the system with each LVM command are avoided if the daemon is enabled (see lvm.conf for details). The original behaviour can be restored at any time by disabling lvmetad in lvm.conf.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2013-02-21 08:09:07 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | |||
Bug Blocks: | 655920, 697866, 749672, 756082 |
Description
Itamar Heim
2010-09-21 09:56:51 UTC
Please note that in all our setups only one pv in the vg holds the MDA. After the initial scan is performed, only the devices containing MDAs should be accessed in all subsequent commands (any changes to the list of devices to be scanned should be evident from vg md read from already known devices). Also note that seeing as scan is performed sequentially, in a setup with 500 LUNs, it is sufficient to have just a few with high latency to cause lvs/pvs/vgs to stall for a long time. It appears to make no difference whether or not there's MDA in the PV or not, all devices are scanned the second time regardless. Also, I see no devel unit test results proving otherwise. Marking this FailsQA and removing 6.3 flag. This should be pulled out and moved to rhel6.4. 2.6.32-269.el6.x86_64 lvm2-2.02.95-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 lvm2-libs-2.02.95-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 lvm2-cluster-2.02.95-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 udev-147-2.41.el6 BUILT: Thu Mar 1 13:01:08 CST 2012 device-mapper-1.02.74-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 device-mapper-libs-1.02.74-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 device-mapper-event-1.02.74-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 device-mapper-event-libs-1.02.74-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 cmirror-2.02.95-8.el6 BUILT: Wed May 9 03:33:32 CDT 2012 [root@hayes-01 ~]# pvcreate /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 Writing physical volume data to disk "/dev/etherd/e1.1p1" Physical volume "/dev/etherd/e1.1p1" successfully created Writing physical volume data to disk "/dev/etherd/e1.1p2" Physical volume "/dev/etherd/e1.1p2" successfully created [root@hayes-01 ~]# pvcreate --pvmetadatacopies 0 /dev/etherd/e1.1p3 /dev/etherd/e1.1p4 Writing physical volume data to disk "/dev/etherd/e1.1p3" Physical volume "/dev/etherd/e1.1p3" successfully created Writing physical volume data to disk "/dev/etherd/e1.1p4" Physical volume "/dev/etherd/e1.1p4" successfully created [root@hayes-01 ~]# pvs -a -o +pv_mda_free,pv_mda_size PV VG Fmt Attr PSize PFree PMdaFree PMdaSize /dev/etherd/e1.1p1 lvm2 a-- 908.23g 908.23g 509.50k 1020.00k /dev/etherd/e1.1p2 lvm2 a-- 908.23g 908.23g 509.50k 1020.00k /dev/etherd/e1.1p3 lvm2 a-- 908.23g 908.23g 0 0 /dev/etherd/e1.1p4 lvm2 a-- 908.23g 908.23g 0 0 [root@hayes-01 ~]# vgcreate VG1 /dev/etherd/e1.1p[13] Volume group "VG1" successfully created [root@hayes-01 ~]# vgcreate VG2 /dev/etherd/e1.1p[24] Volume group "VG2" successfully created [root@hayes-01 ~]# vgs -vvvv VG1 > /tmp/vg1.a 2>&1 [root@hayes-01 ~]# vgs -vvvv VG1 > /tmp/vg1.b 2>&1 [root@hayes-01 ~]# diff /tmp/vg1.a /tmp/vg1.b 574c574 < #metadata/vg.c:59 Allocated VG VG1 at 0x1a8bcb0. --- > #metadata/vg.c:59 Allocated VG VG1 at 0x1f8ccb0. 582c582 < #metadata/vg.c:74 Freeing VG VG1 at 0x1a8fcc0. --- > #metadata/vg.c:74 Freeing VG VG1 at 0x1f90cc0. 588c588 < #metadata/vg.c:74 Freeing VG VG1 at 0x1a8bcb0. --- > #metadata/vg.c:74 Freeing VG VG1 at 0x1f8ccb0. [root@hayes-01 ~]# vgs -vvvv VG2 > /tmp/vg2.a 2>&1 [root@hayes-01 ~]# vgs -vvvv VG2 > /tmp/vg2.b 2>&1 [root@hayes-01 ~]# diff /tmp/vg2.a /tmp/vg2.b 574c574 < #metadata/vg.c:59 Allocated VG VG2 at 0x3289cb0. --- > #metadata/vg.c:59 Allocated VG VG2 at 0x2ffbcb0. 582c582 < #metadata/vg.c:74 Freeing VG VG2 at 0x328dcc0. --- > #metadata/vg.c:74 Freeing VG VG2 at 0x2fffcc0. 588c588 < #metadata/vg.c:74 Freeing VG VG2 at 0x3289cb0. --- > #metadata/vg.c:74 Freeing VG VG2 at 0x2ffbcb0. # SHOULDN'T THE NON MDA DEVICES NOT BE IN THE 2ND SCAN??? [root@hayes-01 ~]# grep e1.1p3 /tmp/vg1.b #device/dev-cache.c:333 /dev/etherd/e1.1p3: Added to device cache #device/dev-cache.c:330 /dev/block/152:275: Aliased to /dev/etherd/e1.1p3 in device cache #device/dev-io.c:524 Opened /dev/etherd/e1.1p3 RO O_DIRECT #device/dev-io.c:271 /dev/etherd/e1.1p3: size is 1904693766 sectors #device/dev-io.c:577 Closed /dev/etherd/e1.1p3 #device/dev-io.c:271 /dev/etherd/e1.1p3: size is 1904693766 sectors #device/dev-io.c:524 Opened /dev/etherd/e1.1p3 RO O_DIRECT #device/dev-io.c:137 /dev/etherd/e1.1p3: block size is 1024 bytes #device/dev-io.c:577 Closed /dev/etherd/e1.1p3 #filters/filter-composite.c:31 Using /dev/etherd/e1.1p3 #device/dev-io.c:524 Opened /dev/etherd/e1.1p3 RO O_DIRECT #device/dev-io.c:137 /dev/etherd/e1.1p3: block size is 1024 bytes #label/label.c:156 /dev/etherd/e1.1p3: lvm2 label detected at sector 1 #cache/lvmcache.c:1337 lvmcache: /dev/etherd/e1.1p3: now in VG #orphans_lvm2 (#orphans_lvm2) with 0 mdas #device/dev-io.c:577 Closed /dev/etherd/e1.1p3 #label/label.c:266 Using cached label for /dev/etherd/e1.1p3 #cache/lvmcache.c:1337 lvmcache: /dev/etherd/e1.1p3: now in VG VG1 (UP4Q1C0dfn10T6wRU8gEa5iDTu4NB4xW) with 0 mdas #metadata/pv_manip.c:327 /dev/etherd/e1.1p3 0: 0 232506: NULL(0:0) # SHOULDN'T THE NON MDA DEVICES NOT BE IN THE 2ND SCAN??? [root@hayes-01 ~]# grep e1.1p4 /tmp/vg2.b #device/dev-cache.c:333 /dev/etherd/e1.1p4: Added to device cache #device/dev-cache.c:330 /dev/block/152:276: Aliased to /dev/etherd/e1.1p4 in device cache #device/dev-io.c:524 Opened /dev/etherd/e1.1p4 RO O_DIRECT #device/dev-io.c:271 /dev/etherd/e1.1p4: size is 1904693766 sectors #device/dev-io.c:577 Closed /dev/etherd/e1.1p4 #device/dev-io.c:271 /dev/etherd/e1.1p4: size is 1904693766 sectors #device/dev-io.c:524 Opened /dev/etherd/e1.1p4 RO O_DIRECT #device/dev-io.c:137 /dev/etherd/e1.1p4: block size is 1024 bytes #device/dev-io.c:577 Closed /dev/etherd/e1.1p4 #filters/filter-composite.c:31 Using /dev/etherd/e1.1p4 #device/dev-io.c:524 Opened /dev/etherd/e1.1p4 RO O_DIRECT #device/dev-io.c:137 /dev/etherd/e1.1p4: block size is 1024 bytes #label/label.c:156 /dev/etherd/e1.1p4: lvm2 label detected at sector 1 #cache/lvmcache.c:1337 lvmcache: /dev/etherd/e1.1p4: now in VG #orphans_lvm2 (#orphans_lvm2) with 0 mdas #device/dev-io.c:577 Closed /dev/etherd/e1.1p4 #label/label.c:266 Using cached label for /dev/etherd/e1.1p4 #cache/lvmcache.c:1337 lvmcache: /dev/etherd/e1.1p4: now in VG VG2 (duQVk7ME9UoItbXvp4VjBNruXcFFscYd) with 0 mdas #metadata/pv_manip.c:327 /dev/etherd/e1.1p4 0: 0 232506: NULL(0:0) [root@hayes-01 ~]# pvs -a -o +pv_mda_free,pv_mda_size PV VG Fmt Attr PSize PFree PMdaFree PMdaSize /dev/etherd/e1.1p1 VG1 lvm2 a-- 908.23g 908.23g 508.50k 1020.00k /dev/etherd/e1.1p10 --- 0 0 0 0 /dev/etherd/e1.1p2 VG2 lvm2 a-- 908.23g 908.23g 508.50k 1020.00k /dev/etherd/e1.1p3 VG1 lvm2 a-- 908.23g 908.23g 0 0 /dev/etherd/e1.1p4 VG2 lvm2 a-- 908.23g 908.23g 0 0 I think there is a misunderstanding about the bug here, or maybe the solution. The proposed fix is to use lvmetad, which avoids the scans altogether (and if it doesn't, then it is definitely a bug and grounds for FailsQA), which is a strict improvement over what the bug asks for (scan only some devices). I don't think there is much interest in optimizing the non-lvmetad code paths for reducing scans, since the preferred future solution is to use lvmetad on big systems. It would be good to know what the original submitter thinks about this. I am in favour of WONTFIX in case the request is for non-lvmetad setups to behave this way. With lvmetad, I think the bug is already fixed. Opinions? The original description is unambiguous. If a VG has 50 PVs in it, but only of those has an MDA, only that one disk should be accessed - the others should be skipped. The bug proposed that this would be fixed provided lvmetad was used. This means that the 'lvm' process (the 'client' that is talking to lvmetad) should only be accessing the one device that lvmetad tells it is the one containing the metadata. Investigation is needed to understand why those "Opened" lines still appear in the log messages. I would suggest to use strace -e open on an lvm command to determine what files/devices it is opening. Something like: strace -o strace.log -e open pvs grep /dev strace.log To get rid of device reads, you also need to disable MD component detection in lvm.conf, since it currently forces the device filter to read bits from each device. Moreover, other filters open the devices to get their size (without reading anything though). So with current upstream, you may want to instead check for reads (strace -e open,read) and check that the devices are not being read. I.e., the QA check should be that when lvmetad is active, lvm commands to not *read* from devices, even though they may open them to obtain their size. Ideally however, with lvmetad active, there would be no device opens for read-only commands, and only MDA devices should be opened for read-write commands. That can be achieved with the following patch: diff --git a/lib/commands/toolcontext.c b/lib/commands/toolcontext.c index 5177f41..0ee6ddb 100644 --- a/lib/commands/toolcontext.c +++ b/lib/commands/toolcontext.c @@ -774,7 +774,7 @@ static struct dev_filter *_init_filter_components(struct cmd_context *cmd) * Listed first because it's very efficient at eliminating * unavailable devices. */ - if (find_config_tree_bool(cmd, "devices/sysfs_scan", + if (!lvmetad_active() && find_config_tree_bool(cmd, "devices/sysfs_scan", DEFAULT_SYSFS_SCAN)) { if ((filters[nr_filt] = sysfs_filter_create(cmd->sysfs_dir))) nr_filt++; @@ -791,27 +791,29 @@ static struct dev_filter *_init_filter_components(struct cmd_context *cmd) } else nr_filt++; - /* device type filter. Required. */ - cn = find_config_tree_node(cmd, "devices/types"); - if (!(filters[nr_filt] = lvm_type_filter_create(cmd->proc_dir, cn))) { - log_error("Failed to create lvm type filter"); - goto bad; - } - nr_filt++; + if (!lvmetad_active()) { + /* device type filter. Required. */ + cn = find_config_tree_node(cmd, "devices/types"); + if (!(filters[nr_filt] = lvm_type_filter_create(cmd->proc_dir, cn))) { + log_error("Failed to create lvm type filter"); + goto bad; + } + nr_filt++; - /* md component filter. Optional, non-critical. */ - if (find_config_tree_bool(cmd, "devices/md_component_detection", - DEFAULT_MD_COMPONENT_DETECTION)) { - init_md_filtering(1); - if ((filters[nr_filt] = md_filter_create())) - nr_filt++; - } + /* md component filter. Optional, non-critical. */ + if (find_config_tree_bool(cmd, "devices/md_component_detection", + DEFAULT_MD_COMPONENT_DETECTION)) { + init_md_filtering(1); + if ((filters[nr_filt] = md_filter_create())) + nr_filt++; + } - /* mpath component filter. Optional, non-critical. */ - if (find_config_tree_bool(cmd, "devices/multipath_component_detection", - DEFAULT_MULTIPATH_COMPONENT_DETECTION)) { - if ((filters[nr_filt] = mpath_filter_create(cmd->sysfs_dir))) - nr_filt++; + /* mpath component filter. Optional, non-critical. */ + if (find_config_tree_bool(cmd, "devices/multipath_component_detection", + DEFAULT_MULTIPATH_COMPONENT_DETECTION)) { + if ((filters[nr_filt] = mpath_filter_create(cmd->sysfs_dir))) + nr_filt++; + } } /* Only build a composite filter if we really need it. */ Tested with running read-only LVM commands while lvmetad is running, and use_lvmetad is set to 1 in lvm config. Even though devices were opened only the ones with metada have been read. So since this was stated as a requirement in Comment #24 I am marking this BZ as verified. If lvmetada was off (or otherwise misconfigured, running lvmetad without use_lvmetad in lvm.conf) the scan included reading all the devices, as was expected. PV VG Fmt Attr PSize PFree PMdaFree PMdaSize /dev/sda1 smallvg lvm2 a-- 9.99g 9.99g 0 1020.00k /dev/sdb1 smallvg lvm2 a-- 9.99g 9.99g 0 1020.00k /dev/sdc1 lvm2 a-- 10.00g 10.00g 0 0 /dev/sdd1 lvm2 a-- 10.00g 10.00g 0 0 /dev/vda2 VolGroup lvm2 a-- 9.51g 0 0 1020.00k open("/dev/sda1", O_RDONLY|O_DIRECT|O_NOATIME) = 5 open("/dev/sda1", O_RDONLY) = 6 open("/dev/sda1", O_RDONLY) = 5 open("/dev/sda1", O_RDONLY|O_DIRECT|O_NOATIME) = 5 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 read(5, "\253\364/# LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 1024) = 1024 open("/dev/sdb1", O_RDONLY|O_DIRECT|O_NOATIME) = 5 open("/dev/sdb1", O_RDONLY) = 6 open("/dev/sdb1", O_RDONLY) = 5 open("/dev/sdb1", O_RDONLY|O_DIRECT|O_NOATIME) = 5 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 1024) = 1024 read(5, "\253\364/# LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 1024) = 1024 read(4, "response=\"OK\"\nname=\"VolGroup\"\nme", 32) = 32 read(4, "tadata {\n\tid=\"1QA3xc-9cdT-Pces-3"..., 1024) = 1024 read(4, "in\"\n\t\t\tcreation_time=1349267710\n"..., 1056) = 189 open("/dev/vda2", O_RDONLY|O_DIRECT|O_NOATIME) = 5 open("/dev/vda2", O_RDONLY) = 6 open("/dev/vda2", O_RDONLY) = 5 open("/dev/vda2", O_RDONLY|O_DIRECT|O_NOATIME) = 5 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(5, "\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0\0"..., 4096) = 4096 read(5, "\254YT\374 LVM2 x[5A%r0N*>\1\0\0\0\0\20\0\0\0\0\0\0"..., 4096) = 4096 open("/proc/self/task/31826/attr/current", O_RDONLY) = 5 No other devices were being opened for reading. Tested with: lvm2-2.02.98-3.el6 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. http://rhn.redhat.com/errata/RHBA-2013-0501.html |