Created attachment 454425 [details] vgscan -vvvv on lvm2-2.02.46-8.el5_4.2 Description of problem: some lvm2 commands have been showing a slow performance on systems where there is a failed PV. Commands which the problem was noticed: vgdisplay vgs pvscan vgscan Version-Release number of selected component (if applicable): lvm2-2.02.74-1.el5 How reproducible: always, since lvm2-2.02.56-8 version Steps to Reproduce: # create 3 pvs # create a VG # create a LV # corrupt a PV issuing `dd if=/dev/zero of=<pv_path> bs=512 count=10` # run any of the above commands Actual results: lvm2 commands show a poor performance compared with old lvm2 versions. Expected results: a better lvm2 commands performance Additional info: The problem was introduced between: lvm2-2.02.46-8.el5_4.2 and lvm2-2.02.56-8 These are results from tests with `vgdisplay` on my system: lvm2-2.02.46-8.el5_4.2 real 0m24.057s user 0m0.024s sys 0m0.070s lvm2-2.02.56-8 real 1m12.868s user 0m0.037s sys 0m0.139s lvm2-2.02.74-1.el5 real 1m13.045s user 0m0.041s sys 0m0.184s The above vgdisplay commands were ran with -vvvv option that are attached on the bz
Created attachment 454427 [details] vgscan -vvvv on lvm2-2.02.56-8
Created attachment 454428 [details] vgscan -vvvv on lvm2-2.02.74-1
About #description, the command ran was `vgscan` not `vgdisplay`, sorry for this mistake on the Description.
Can I ask for attachment of 'strace -ttt' for this case. It seems to be closely related to some multipath problems. Maybe Bug 639037 seems to be slightly similar. Let's see what Ben thinks about it as well.
I don't think 639037 has anything to do with this. That's a RHEL6 issue. The bug reporter cannot see it on RHEL5. Also, from the problem description, it doesn't sound like any paths are failing from multipath's point of view. Overwriting the data on a PV doesn't bother multipath at all. And if the only change between the vgdisplay runs listed in the problem description is in the lvm2 package, then a multipath bug wouldn't easily account for the difference. Are you issuing that dd to a multipath device? If so, can you disable multipathing, and try this with just the scsi devices? That should take multipath out of the picture completely. If you can't see the slowdown then, that would be a good indication that multipath is at fault. But from the information provided, I don't think it is.
Strace from the failing system is needed.
Created attachment 468412 [details] strace -ttt with a failed PV
Created attachment 468413 [details] strace -ttt with a failed PV but without blacklist the FDD
Created attachment 468417 [details] Performance report
If I get it right - trace end time: 1289979808.611520 trace start time: 1289979798.082233 this gives 10 second difference ? Though in the trace there is nicely visible that few devices are quite slow: 1289979800.220838 open("/dev/sdt" 1289979803.404841 open("/dev/sdt" 1289979805.004831 open("/dev/sdt", 1289979805.020780 read(6, "\353<\220MSDOS5.0 1289979805.324685 close(6) = 0 ... So it looks like there is something slow with /dev/sdt device which seems to have MSDOS partition - it looks like every MSDOS read takes quite a few hundreds of milliseconds - thought it is slow - I cannot see 1m13s from initial BZ report.
Hi Zdenek, I just uploaded the straces you asked for. Well, looks like that most of the time was spent on /dev/sdt which is a FDD device attached on the customer's box. 14743 1289979805.671783 stat("/dev/sdt", {st_mode=S_IFBLK|0640, st_rdev=makedev(65, 48), ...}) = 0 14745 1289979805.671879 open("/dev/sdt", O_RDONLY|O_NOATIME) = 6 14778 1289979808.524704 close(6) = 0 What customer is concerned is that this delay just happens on lvm2 versions after lvm2-2.02.46-8.el5. This is written on the performance report (already attached). Also, the customer said that he is not using multipath on these tests. Do you have any idea about what could be make old version to be faster than the newer version reading FDDs while there is also a corrupted PV ? Please, let me know if you need more information. Cheers, Carlos Maiolino
Original traces had: #device/dev-io.c:103 /dev/sdb: read failed after 0 of 4096 at 21474770944: Input/output error #device/dev-io.c:103 /dev/sdi: read failed after 0 of 4096 at 20905984: Input/output error 1) Resolve the external I/O error problems evident there. (Check what is wrong with those devices. Look in the kernel messages for more clues. Can you read them normally at all? What size are they really? Are they even meant to be there?) 2) No trace from 2.02.46 was provided for comparison in the original problem description. We need -vvvv (and perhaps strace) from *both* 2.02.46 and 2.02.74 if we are to make comparisons.
Created attachment 471648 [details] vgreduce core backtraces I've made a special testcase and modified lvm binary to show the device read access to 'unimportant' device during vgreduce. Scenario: (next attachment test script) 1. create PV1, PV2, delayed PV3 2. create VG from PV1 and PV2 3. disable PV2 4. vgreduce --removemissing --force stack trace each access to PV3 - As we can see - we are doing 7 read accesses to device PV3. I've added also stack traces for commands: 'pvs' (11), 'lvcreate' (4)
Created attachment 471650 [details] Test case for device for slow access test case for simulation of large read latency device (i.e. slow like floppy)
Created attachment 471652 [details] log trace for vgreduce vgreduce -vvvv trace extended with my own debug messages and slightly change logging routine. First number is time difference in microseconds from the previous debug line. Second number is 'running' in seconds time. Slow device is: /dev/shm/LVMTEST5450.ZC6ytxoDXK/dev/mapper/LVMTEST5450delayed
OK - relative timing like that is helpful. Let's consider (elsewhere, not this bz) the options for packaging it. So can we annotate each of the 7 accesses with the reason why it is performed?
(I presume that trace is of the *current* upstream code incl. the recent changes I made?)
> (I presume that trace is of the *current* upstream code incl. the recent > changes I made?) yes - it's current git tree - however with some applied patches in my local tree. (so some line numbers might not match upstream git tree). It seems the part of the problem is - repeated read of things we just scanned with 'different' full_scan parameter passed to lvmcache_label_scan. Next issue seems to be reread of devices - which of course may change during the command by some other tool - but it seems to have not that big impact on robustness - if the 'other' tools modifies scanned data just a bit later after we made this reread, we will have inconsistent data on the disk anyway - so it seems to be only timing issue.
7 accesses to 'unrelated' device for vgreduce: 1. reads 4b - run dev_reset_error_count() 2. reads 4b - reading locked vg with full_scan == 0 3. reads 2048b - reading locked vg with full_scan == 2 4. reads 4b - again full_scan == 2 5. reads 4b - full_scan == 2 6. reads 4b - full_scan == 2 7. reads 4b - unlock and free vg -> reset error count So it looks like 1. and 7. do a completely unneeded device read access. (added by Peter's patch for limiting repeated access to broken devices - my wild guess would be - that dev access was 'unexcpected' side effect from dev_iter_create() ?) 3. - deeper version of 2. - probably needed. 4.,5.,6. - seems to be doing unneeded rescan - unclear why device_from_pvid() should need rescanning all devices - where full_scan has been already used. (I'm looking at the reason, why 4,5,6 are trying to access same device with different dev structure I assume links?)
Patch https://www.redhat.com/archives/lvm-devel/2011-January/msg00121.html which reduces amount of device read accesses for lvm command has been committed upstream. But as this code has been added to repository in Oct 2010 (2.02.75) it's not the reason for slowdown of the version 2.02.56.
As raid detection currently requires 3 disk reads (which are slow on FDD like devices) - if the user is sure, there are no md raids in the system, he could try to modify lvm.conf settings 'device/md_component_detection = 0'. and check if there are performance gains.
Patch for setting pv_min_size has been posted upstream: https://www.redhat.com/archives/lvm-devel/2011-February/msg00086.html
Adding a "SanityOnly" ack for performance bugs.
Fixed in lvm2-2.02.84-2.el5
Technical note added. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. New Contents: This field is the basis of the errata or release note for this bug. It can also be used for change logs. The Technical Note template, known as CCFR, is as follows: Cause What actions or circumstances cause this bug to present. Consequence What happens when the bug presents. Fix What was done to fix the bug. Result What now happens when the actions or circumstances above occur. Note: this is not the same as the bug doesn’t present anymore.
Technical note updated. If any revisions are required, please edit the "Technical Notes" field accordingly. All revisions will be proofread by the Engineering Content Services team. Diffed Contents: @@ -1,13 +1 @@ -This field is the basis of the errata or release note for this bug. It can also be used for change logs. +lvm disk scanning is accessing all block devices and checks whether such disk is a PV or not. In the case the block device is slow, user experiences low performance for some commands which are scanning all devices and do not use cached list. When possible, users may use lvm.conf filter option to skip access to such device, however in this particular case we've improved the performance by addition of a new filter (pv_min_size) which is skips reading of a device below a certain predefined value.- -The Technical Note template, known as CCFR, is as follows: - -Cause - What actions or circumstances cause this bug to present. -Consequence - What happens when the bug presents. -Fix - What was done to fix the bug. -Result - What now happens when the actions or circumstances above occur. - Note: this is not the same as the bug doesn’t present anymore.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2011-1071.html