Bug 471596 - pvcreate -ff loops indefinitely if PV is corrupted
Summary: pvcreate -ff loops indefinitely if PV is corrupted
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 10
Hardware: All
OS: Linux
medium
medium
Target Milestone: ---
Assignee: Peter Rajnoha
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2008-11-14 15:23 UTC by Milan Broz
Modified: 2013-03-01 04:07 UTC (History)
9 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2009-12-18 06:51:15 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)

Description Milan Broz 2008-11-14 15:23:50 UTC
This reproducer causes corruption of PVs.
The last lvcreate command loops forever, probably searching somewhere for unterminated duplicate PV string.

#!/bin/bash

# uses /dev/sd[bcd]

dd if=/dev/zero of=/dev/sdb bs=1M count=1
dd if=/dev/zero of=/dev/sdc bs=1M count=1
dd if=/dev/zero of=/dev/sdd bs=1M count=1

pvcreate -ff /dev/sd[bcd]

dd if=/dev/zero of=/dev/sdb bs=512 count=1
dd if=/dev/sdb of=/dev/sdc bs=512 count=131072

sync

# Basically it simulates corruption caused bu this:
#dmsetup create log --table "0 2 linear /dev/sdb 0"
#dmsetup create mirror --table "0 131072 mirror core 1 1024 2 /dev/sdc 0 /dev/sdd 0"
#sleep 5
#dmsetup remove mirror
#dmsetup remove log

pvcreate -ff /dev/sd[bcd]

------------
[root@saloonio ~]# /reproduce_stupid_hash
+ dd if=/dev/zero of=/dev/sdb bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.0117125 s, 89.5 MB/s
+ dd if=/dev/zero of=/dev/sdc bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00682748 s, 154 MB/s
+ dd if=/dev/zero of=/dev/sdd bs=1M count=1
1+0 records in
1+0 records out
1048576 bytes (1.0 MB) copied, 0.00661354 s, 159 MB/s
+ pvcreate -ff /dev/sdb /dev/sdc /dev/sdd
  Physical volume "/dev/sdb" successfully created
  Physical volume "/dev/sdc" successfully created
  Physical volume "/dev/sdd" successfully created
+ dd if=/dev/zero of=/dev/sdb bs=512 count=1
1+0 records in
1+0 records out
512 bytes (512 B) copied, 0.0185469 s, 27.6 kB/s
+ dd if=/dev/sdb of=/dev/sdc bs=512 count=131072
131072+0 records in
131072+0 records out
67108864 bytes (67 MB) copied, 2.88822 s, 23.2 MB/s
+ sync
+ pvcreate -ff /dev/sdb /dev/sdc /dev/sdd
  Found duplicate PV 4Y3U4U6minJFhJkmNgkFpYQgfHYfJCdv: using /dev/sdc not /dev/sdb
  Found duplicate PV h8TIG8TynLe6nfcaqaqrRF8fwldn0LrB: using /dev/sdb not /dev/sdc
  Physical volume "/dev/sdb" successfully created
^C^C/reproduce_stupid_hash: line 23: 18537 Killed                  pvcreate -ff /dev/sd[bcd]



lvm version lvm2-2.02.39-6.fc10.x86_64 (also upstream cvs snapshot)

Comment 1 Peter Rajnoha 2008-11-21 11:34:00 UTC
The infinite loop is in metadata/metada.c, function _vg_read_orphans while iterating through vginfo->infos. The iteration never ends, possibly caused by corrupted vginfo->infos list.

Comment 2 Peter Rajnoha 2008-11-21 14:30:28 UTC
This is a simplified version of the reproducer:

#!/bin/bash

dd if=/dev/zero of=/dev/sdb
dd if=/dev/zero of=/dev/sdc

pvcreate -ff /dev/sdb /dev/sdc
dd if=/dev/sdb of=/dev/sdc
pvcreate -ff /dev/sdb /dev/sdc

Comment 3 Bug Zapper 2008-11-26 05:22:57 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 10 development cycle.
Changing version to '10'.

More information and reason for this action is here:
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 4 Alasdair Kergon 2008-11-28 04:39:05 UTC
OK, since nobody did a full analysis of this yet, I decided to start to look at it myself today.

The root cause of this problem is a bug in the internal lvmcache code.  It maintains an index of devices by pvid.  Given a pvid, a hash lookup returns the corresponding struct lvmcache_info which in turn gives the struct device (and pvid which should match the one looked up).

static int _lvmcache_update_pvid(struct lvmcache_info *info, const char *pvid)
{
        if (!strcmp(info->dev->pvid, pvid))
                return 1;
        ...

When there are two devices with the same pvid, there'll be two info->dev->pvid the same, but the pvid hash only holds one of them.  After pvcreate changes the first pvid, you're left without a hash entry for the original pvid but crucially the second info->dev->pvid still holds it, so when the code should be adding it, the pvids in the strcmp match and the function just returns.

The fix is to check that the info is the one already stored in the hash:
        if (((dm_hash_lookup(_pvid_hash, pvid)) == info) &&
            !strcmp(info->dev->pvid, pvid))
                return 1;

There's also a misleading 'duplicate PV' error message when the cache is updated which should be suppressed - by testing for matching pvids?

Comment 5 Peter Rajnoha 2008-12-08 12:32:25 UTC
The fix has been uploaded to upstream (version 2.02.44).

Comment 6 Fedora Update System 2009-07-03 09:29:50 UTC
lvm2-2.02.48-1.fc11 has been submitted as an update for Fedora 11.
http://admin.fedoraproject.org/updates/lvm2-2.02.48-1.fc11

Comment 7 Bug Zapper 2009-11-18 08:51:55 UTC
This message is a reminder that Fedora 10 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 10.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '10'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 10's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 10 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Bug Zapper 2009-12-18 06:51:15 UTC
Fedora 10 changed to end-of-life (EOL) status on 2009-12-17. Fedora 10 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.