Bug 644578 - slow performance of lvm2 commands with a failed PV
slow performance of lvm2 commands with a failed PV
Status: CLOSED ERRATA
Product: Red Hat Enterprise Linux 5
Classification: Red Hat
Component: lvm2 (Show other bugs)
5.6
All Linux
medium Severity medium
: rc
: ---
Assigned To: Zdenek Kabelac
Corey Marthaler
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2010-10-19 15:49 EDT by Carlos Maiolino
Modified: 2011-07-21 08:28 EDT (History)
11 users (show)

See Also:
Fixed In Version: lvm2-2.02.84-2.el5
Doc Type: Bug Fix
Doc Text:
lvm disk scanning is accessing all block devices and checks whether such disk is a PV or not. In the case the block device is slow, user experiences low performance for some commands which are scanning all devices and do not use cached list. When possible, users may use lvm.conf filter option to skip access to such device, however in this particular case we've improved the performance by addition of a new filter (pv_min_size) which is skips reading of a device below a certain predefined value.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2011-07-21 06:51:43 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
vgscan -vvvv on lvm2-2.02.46-8.el5_4.2 (107.92 KB, text/plain)
2010-10-19 15:49 EDT, Carlos Maiolino
no flags Details
vgscan -vvvv on lvm2-2.02.56-8 (158.30 KB, text/plain)
2010-10-19 15:51 EDT, Carlos Maiolino
no flags Details
vgscan -vvvv on lvm2-2.02.74-1 (216.75 KB, text/plain)
2010-10-19 15:52 EDT, Carlos Maiolino
no flags Details
strace -ttt with a failed PV (1.19 MB, application/octet-stream)
2010-12-13 11:01 EST, Carlos Maiolino
no flags Details
strace -ttt with a failed PV but without blacklist the FDD (1.20 MB, application/octet-stream)
2010-12-13 11:03 EST, Carlos Maiolino
no flags Details
Performance report (56.91 KB, text/html)
2010-12-13 11:14 EST, Carlos Maiolino
no flags Details
vgreduce core backtraces (37.71 KB, text/plain)
2011-01-04 07:28 EST, Zdenek Kabelac
no flags Details
Test case for device for slow access (1.29 KB, text/plain)
2011-01-04 07:31 EST, Zdenek Kabelac
no flags Details
log trace for vgreduce (72.72 KB, text/plain)
2011-01-04 07:44 EST, Zdenek Kabelac
no flags Details

  None (edit)
Description Carlos Maiolino 2010-10-19 15:49:58 EDT
Created attachment 454425 [details]
vgscan -vvvv on lvm2-2.02.46-8.el5_4.2

Description of problem:

some lvm2 commands have been showing a slow performance on systems where there is a failed PV.

Commands which the problem was noticed:

vgdisplay
vgs
pvscan
vgscan


Version-Release number of selected component (if applicable):

lvm2-2.02.74-1.el5

How reproducible:

always, since lvm2-2.02.56-8 version

Steps to Reproduce:
# create 3 pvs
# create a VG
# create a LV
# corrupt a PV issuing `dd if=/dev/zero of=<pv_path> bs=512 count=10`
# run any of the above commands
  
Actual results:

lvm2 commands show a poor performance compared with old lvm2 versions.

Expected results:

a better lvm2 commands performance

Additional info:

The problem was introduced between:

lvm2-2.02.46-8.el5_4.2  and  lvm2-2.02.56-8


These are results from tests with `vgdisplay` on my system:

lvm2-2.02.46-8.el5_4.2

real    0m24.057s
user    0m0.024s
sys     0m0.070s

lvm2-2.02.56-8

real    1m12.868s
user    0m0.037s
sys     0m0.139s


lvm2-2.02.74-1.el5

real    1m13.045s
user    0m0.041s
sys     0m0.184s


The above vgdisplay commands were ran with -vvvv option that are attached on the bz
Comment 1 Carlos Maiolino 2010-10-19 15:51:56 EDT
Created attachment 454427 [details]
vgscan -vvvv on lvm2-2.02.56-8
Comment 2 Carlos Maiolino 2010-10-19 15:52:37 EDT
Created attachment 454428 [details]
vgscan -vvvv on lvm2-2.02.74-1
Comment 3 Carlos Maiolino 2010-10-19 15:54:56 EDT
About #description, the command ran was `vgscan` not `vgdisplay`, sorry for this mistake on the Description.
Comment 4 Zdenek Kabelac 2010-10-27 08:47:35 EDT
Can I ask for attachment of 'strace -ttt' for this case.

It seems to be closely related to some multipath problems.

Maybe Bug 639037 seems to be slightly similar.

Let's see what Ben thinks about it as well.
Comment 5 Ben Marzinski 2010-10-28 13:53:15 EDT
I don't think 639037 has anything to do with this.  That's a RHEL6 issue. The bug reporter cannot see it on RHEL5.  Also, from the problem description, it doesn't sound like any paths are failing from multipath's point of view.  Overwriting the data on a PV doesn't bother multipath at all. And if the only change between the vgdisplay runs listed in the problem description is in the lvm2 package, then a multipath bug wouldn't easily account for the difference.

Are you issuing that dd to a multipath device? If so, can you disable multipathing, and try this with just the scsi devices?  That should take multipath out of the picture completely.  If you can't see the slowdown then, that would be a good indication that multipath is at fault.  But from the information provided, I don't think it is.
Comment 6 Zdenek Kabelac 2010-11-09 07:17:26 EST
Strace from the failing system is needed.
Comment 7 Carlos Maiolino 2010-12-13 11:01:27 EST
Created attachment 468412 [details]
strace -ttt with a failed PV
Comment 8 Carlos Maiolino 2010-12-13 11:03:30 EST
Created attachment 468413 [details]
strace -ttt with a failed PV but without blacklist the FDD
Comment 9 Carlos Maiolino 2010-12-13 11:14:41 EST
Created attachment 468417 [details]
Performance report
Comment 10 Zdenek Kabelac 2010-12-13 11:30:13 EST
    If I get it right -

    trace end time:   1289979808.611520
    trace start time: 1289979798.082233

    this gives 10 second difference ?

    Though in the trace there is nicely visible that few devices are quite slow:

    1289979800.220838 open("/dev/sdt"

    1289979803.404841 open("/dev/sdt"

    1289979805.004831 open("/dev/sdt", 
    1289979805.020780 read(6, "\353<\220MSDOS5.0
    1289979805.324685 close(6)              = 0

    ...


    So it looks like there is something slow with  /dev/sdt device which seems to
    have MSDOS partition - it looks like every MSDOS read takes quite a few
    hundreds of milliseconds - thought it is slow  - I cannot see 1m13s from
    initial BZ report.
Comment 11 Carlos Maiolino 2010-12-13 11:34:07 EST
Hi Zdenek,

I just uploaded the straces you asked for.

Well, looks like that most of the time was spent on /dev/sdt which is a FDD device attached on the customer's box.

14743 1289979805.671783 stat("/dev/sdt", {st_mode=S_IFBLK|0640, st_rdev=makedev(65, 48), ...}) = 0
14745 1289979805.671879 open("/dev/sdt", O_RDONLY|O_NOATIME) = 6
14778 1289979808.524704 close(6)              = 0


What customer is concerned is that this delay just happens on lvm2 versions after lvm2-2.02.46-8.el5. This is written on the performance report (already attached).

Also, the customer said that he is not using multipath on these tests.

Do you have any idea about what could be make old version to be faster than the newer version reading FDDs while there is also a corrupted PV ?

Please, let me know if you need more information.

Cheers,
Carlos Maiolino
Comment 12 Alasdair Kergon 2010-12-13 11:44:46 EST
Original traces had:

#device/dev-io.c:103   /dev/sdb: read failed after 0 of 4096 at 21474770944:
Input/output error

#device/dev-io.c:103   /dev/sdi: read failed after 0 of 4096 at 20905984:
Input/output error


1) Resolve the external I/O error problems evident there.

(Check what is wrong with those devices.  Look in the kernel messages for more
clues.  Can you read them normally at all?  What size are they really?  Are
they even meant to be there?)

2) No trace from 2.02.46 was provided for comparison in the original problem description.  We need -vvvv (and perhaps strace) from *both* 2.02.46 and 2.02.74 if we are to make comparisons.
Comment 13 Zdenek Kabelac 2011-01-04 07:28:24 EST
Created attachment 471648 [details]
vgreduce core backtraces

I've made a special testcase and modified lvm binary to show the device read access to 'unimportant' device during vgreduce.

Scenario: (next attachment test script)

1. create PV1, PV2,  delayed PV3
2. create VG from  PV1 and PV2
3. disable PV2
4. vgreduce --removemissing --force

stack trace each access to  PV3 -

As we can see - we are doing 7 read accesses to device PV3.

I've added also stack traces for commands:  'pvs' (11), 'lvcreate' (4)
Comment 14 Zdenek Kabelac 2011-01-04 07:31:24 EST
Created attachment 471650 [details]
Test case for device for slow access

test case for simulation of large read latency device (i.e. slow like floppy)
Comment 15 Zdenek Kabelac 2011-01-04 07:44:56 EST
Created attachment 471652 [details]
log trace for vgreduce

vgreduce -vvvv trace 
extended with my own debug messages and slightly change logging routine.

First number is time difference in microseconds from the previous debug line.
Second number is 'running' in seconds time.

Slow device is: /dev/shm/LVMTEST5450.ZC6ytxoDXK/dev/mapper/LVMTEST5450delayed
Comment 16 Alasdair Kergon 2011-01-04 08:04:43 EST
OK - relative timing like that is helpful.  Let's consider (elsewhere, not this bz) the options for packaging it.

So can we annotate each of the 7 accesses with the reason why it is performed?
Comment 17 Alasdair Kergon 2011-01-04 08:10:25 EST
(I presume that trace is of the *current* upstream code incl. the recent changes I made?)
Comment 18 Zdenek Kabelac 2011-01-04 08:26:26 EST
> (I presume that trace is of the *current* upstream code incl. the recent
> changes I made?)

yes - it's current git tree - however with some applied patches in my local tree.
(so some line numbers might not match upstream git tree).

It seems the part of the problem is - repeated read of things we just scanned with 'different' full_scan parameter passed to lvmcache_label_scan.

Next issue seems to be reread of devices - which of course may change during the command by some other tool - but it seems to have not that big impact on robustness - if the 'other' tools modifies scanned data just a bit later after we made this reread, we will have inconsistent data on the disk anyway - so it seems to be only timing issue.
Comment 19 Zdenek Kabelac 2011-01-11 09:28:37 EST
7 accesses to  'unrelated' device for vgreduce:


1. reads 4b - run dev_reset_error_count()

2. reads 4b - reading locked  vg with full_scan == 0

3. reads 2048b - reading locked  vg with full_scan == 2

4. reads 4b - again  full_scan == 2

5. reads 4b - full_scan == 2

6. reads 4b - full_scan == 2

7. reads 4b - unlock and free vg -> reset error count


So it looks like  1. and  7. do a completely unneeded device read access.
(added by Peter's patch for limiting repeated access to broken devices -
my wild guess would be - that dev access was 'unexcpected' side effect from dev_iter_create() ?)

3. - deeper version of 2. - probably needed.

4.,5.,6. - seems to be doing unneeded rescan - unclear why  device_from_pvid() should need rescanning all devices - where full_scan has been already used.
(I'm looking at the reason, why 4,5,6 are trying to access same device with different dev structure I assume links?)
Comment 20 Zdenek Kabelac 2011-01-17 10:59:17 EST
Patch  https://www.redhat.com/archives/lvm-devel/2011-January/msg00121.html
which reduces amount of device read accesses for lvm command has been committed upstream.

But as this code has been added to repository in Oct 2010 (2.02.75) it's not the reason for slowdown of the version 2.02.56.
Comment 21 Zdenek Kabelac 2011-01-18 07:09:33 EST
As raid detection currently requires 3 disk reads (which are slow on FDD like devices) - if the user is sure, there are no md raids in the system,
he could try to modify   lvm.conf settings 'device/md_component_detection = 0'.
and check if there are performance gains.
Comment 22 Zdenek Kabelac 2011-03-02 11:15:47 EST
Patch for setting pv_min_size has been posted upstream:

https://www.redhat.com/archives/lvm-devel/2011-February/msg00086.html
Comment 23 Corey Marthaler 2011-03-08 11:58:44 EST
Adding a "SanityOnly" ack for performance bugs.
Comment 24 Milan Broz 2011-03-23 08:19:26 EDT
Fixed in
lvm2-2.02.84-2.el5
Comment 28 Florian Nadge 2011-05-26 10:56:34 EDT
    Technical note added. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    New Contents:
This field is the basis of the errata or release note for this bug. It can also be used for change logs.

The Technical Note template, known as CCFR, is as follows:

Cause
    What actions or circumstances cause this bug to present.
Consequence
    What happens when the bug presents.
Fix
    What was done to fix the bug.
Result
    What now happens when the actions or circumstances above occur.
    Note: this is not the same as the bug doesn’t present anymore.
Comment 29 Zdenek Kabelac 2011-06-17 07:59:12 EDT
    Technical note updated. If any revisions are required, please edit the "Technical Notes" field
    accordingly. All revisions will be proofread by the Engineering Content Services team.
    
    Diffed Contents:
@@ -1,13 +1 @@
-This field is the basis of the errata or release note for this bug. It can also be used for change logs.
+lvm disk scanning is accessing all block devices and checks whether such disk is a PV or not. In the case the block device is slow, user experiences low performance for some commands which are scanning all devices and do not use cached list. When possible, users may use lvm.conf filter option to skip access to such device, however in this particular case we've improved the performance by addition of a new filter (pv_min_size) which is skips reading of a device below a certain predefined value.-
-The Technical Note template, known as CCFR, is as follows:
-
-Cause
-    What actions or circumstances cause this bug to present.
-Consequence
-    What happens when the bug presents.
-Fix
-    What was done to fix the bug.
-Result
-    What now happens when the actions or circumstances above occur.
-    Note: this is not the same as the bug doesn’t present anymore.
Comment 30 errata-xmlrpc 2011-07-21 06:51:43 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html
Comment 31 errata-xmlrpc 2011-07-21 08:28:55 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2011-1071.html

Note You need to log in before you can comment on or make changes to this bug.