Bug 643538 - pvmove --abort fails after vgreduce --removemissing --force (during pvmove)
Summary: pvmove --abort fails after vgreduce --removemissing --force (during pvmove)
Keywords:
Status: CLOSED EOL
Alias: None
Product: Fedora
Classification: Fedora
Component: lvm2
Version: 19
Hardware: All
OS: Linux
low
high
Target Milestone: ---
Assignee: Petr Rockai
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2010-10-15 22:41 UTC by Carlos Ferrabone
Modified: 2015-02-18 13:30 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2015-02-18 13:30:08 UTC
Type: ---
Embargoed:


Attachments (Terms of Use)
lvmdump (31.40 KB, application/x-compressed-tar)
2010-10-15 22:42 UTC, Carlos Ferrabone
no flags Details
dd if=/dev/sdb2 of=/tmp/backup bs=256K count=1 (256.00 KB, application/octet-stream)
2010-10-15 22:45 UTC, Carlos Ferrabone
no flags Details
dd if=/dev/sda3 of=/tmp/backup bs=256K count=1 (256.00 KB, application/octet-stream)
2010-10-15 22:49 UTC, Carlos Ferrabone
no flags Details
test for lvm2 testsuit (1.48 KB, text/plain)
2010-10-21 14:44 UTC, Zdenek Kabelac
no flags Details

Description Carlos Ferrabone 2010-10-15 22:41:24 UTC
Status before problem:
3 HardDisk = 1tb + 320gb + 320gb.
one VG called vg_fedora, 4 LV called
lv_root (used as / )
lv_swap ( used as swap)
lv_home (used as /home)
lv_archivos (used as mail pool for big files, virtual machines, isos, ect)

one of the 320gb Hd srtarted to fail. i bought a new HD (1.5tb), pluged it, booted the pc, and used system-config-lvm to initialize the hd and add it to the VG.
Clicked the "remove volume from Volume Group" button to remove the failing disk. at this point the failing disk froze for the last time and died

i tried running vgreduce removemissing, and the command refused to run. using advice i found in google, i added the --force switch. this time the command worked. 

at this point, i rebooted, but failed to boot up with the following error (typed from memory)

"init: 2: cannot create /sys/block/sr0"

with the help of people in freenode, #lvm, i was able to recover lvm config, restore it. but that did not fixed the issue, still was showing the cannot write error at boot.

from a livecd i extrated info about disks 
dd if=/dev/device  of=/tmp/backup  bs=256K count=1
and lvmdump 
(im ataching both dd from 1tb and 320gb working disk )

then unpluged all disks, and the system was able to boot again. 
from there, rebooted and pluged in a 320gb, reset the lvm metadata and added it to the VG.

finally rebooted and added the new 1.5tb disk and failed to boot again.
had to use a livecd to format the drive lvm metadata. 
seemed like at boot time, the linux kernel selected info from the 1.5tb disk.

Version-Release number of selected component (if applicable):
lvm> version
  LVM version:     2.02.72(2)-f14 (2010-08-02)
  Library version: 1.02.53 (2010-07-28)
  Driver version:  4.15.0


How reproducible:
i hope you can reproduce it with the dumps im sending you

Additional info:
im sure im missing some info, but dont hesitate to ask what you need

Comment 1 Carlos Ferrabone 2010-10-15 22:42:51 UTC
Created attachment 453818 [details]
lvmdump

lvmdump with data before, during and after the problem

Comment 2 Carlos Ferrabone 2010-10-15 22:45:56 UTC
Created attachment 453819 [details]
dd if=/dev/sdb2  of=/tmp/backup  bs=256K count=1

dd of the 320gb disk that still works

Comment 3 Carlos Ferrabone 2010-10-15 22:49:28 UTC
Created attachment 453821 [details]
dd if=/dev/sda3  of=/tmp/backup  bs=256K count=1

dd of the 1tb disk, sda
where mbr is, and all the lvs that dont span over more than one disk

Comment 4 Alasdair Kergon 2010-10-16 00:25:16 UTC
1) To investigate the interaction between vgremove --reducemissing and pvmove.

2) To ensure vg_validate() prevents the writing to disk of metadata that references LVs that aren't present in the VG (pvmove0).

3) Any other recommendations to ease the recoverability of the system after this sort of failure.

Comment 5 Zdenek Kabelac 2010-10-18 11:40:01 UTC
I'll try to prepare some testcase for this - to see whether it's repeatable.
I assume I'll reassign this bug later to Petr.

Comment 6 Zdenek Kabelac 2010-10-21 14:44:58 UTC
Created attachment 454857 [details]
test for lvm2 testsuit

I'm attaching test script which tries to 'simulate' conditions from this bugzilla.
It's using dm-delay and creates for now 2 files /tmp/vgbackup_testD and /tmp/vgbackup_testE to show broken resulting metadata.

Comment 7 Bug Zapper 2010-11-03 09:29:26 UTC
This message is a reminder that Fedora 12 is nearing its end of life.
Approximately 30 (thirty) days from now Fedora will stop maintaining
and issuing updates for Fedora 12.  It is Fedora's policy to close all
bug reports from releases that are no longer maintained.  At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '12'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 12's end of life.

Bug Reporter: Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 12 is end of life.  If you 
would still like to see this bug fixed and are able to reproduce it 
against a later version of Fedora please change the 'version' of this 
bug to the applicable version.  If you are unable to change the version, 
please add a comment here and someone will do it for you.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events.  Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

The process we are following is described here: 
http://fedoraproject.org/wiki/BugZappers/HouseKeeping

Comment 8 Petr Rockai 2010-11-11 15:53:39 UTC
Guess this isn't really F12 specific. Zdeněk, what is the status on this, is the test script complete and failing as expected? What was the needinfo request for?

Comment 9 Petr Rockai 2010-11-30 22:37:18 UTC
This may be a bug in vgreduce --removemissing --force or in pvmove. Either way, I would be much in favour of deprecating and removing the former. It has a lot of legacy, custom code that deals poorly with various situations. We could change it to simply lvremove anything that depends on the missing PVs without any special handling of mirrors etc. This would make it a lot more dangerous than it already is... to actually make it safer I'd propose to call out to lvconvert --repair for any partial mirrors before commencing with the cleansing. This needs some code restructuring and sounds pretty long-term to me. Likely to build on top of the new APIs we are working on for lvm2app, which would be eventually used by all lvm tools.

Comment 10 Petr Rockai 2010-12-01 10:43:02 UTC
PS: It would be also possible to just drop vgreduce --removemissing --force: equivalent or better results can be achieved with lvconvert --repair, lvremove and vgreduce --removemissing (without --force). This approach is also *much* safer. I am currently inclined to change --removemissing --force to be a noop, printing a hint that lvremove/lvconvert --repair need to be used instead. As the original report shows, people don't think too much before issuing the command even though it could be pretty dangerous. Requirement to remove/repair the offending volumes manually would at least force them to think twice before proceeding. We no longer use --removemissing --force internally for anything, either.

Comment 11 Petr Rockai 2010-12-12 23:07:36 UTC
Feedback please?

Comment 12 Fedora End Of Life 2013-04-03 20:13:15 UTC
This bug appears to have been reported against 'rawhide' during the Fedora 19 development cycle.
Changing version to '19'.

(As we did not run this process for some time, it could affect also pre-Fedora 19 development
cycle bugs. We are very sorry. It will help us with cleanup during Fedora 19 End Of Life. Thank you.)

More information and reason for this action is here:
https://fedoraproject.org/wiki/BugZappers/HouseKeeping/Fedora19

Comment 13 Fedora End Of Life 2015-01-09 21:46:07 UTC
This message is a notice that Fedora 19 is now at end of life. Fedora 
has stopped maintaining and issuing updates for Fedora 19. It is 
Fedora's policy to close all bug reports from releases that are no 
longer maintained. Approximately 4 (four) weeks from now this bug will
be closed as EOL if it remains open with a Fedora 'version' of '19'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 19 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 14 Fedora End Of Life 2015-02-18 13:30:08 UTC
Fedora 19 changed to end-of-life (EOL) status on 2015-01-06. Fedora 19 is
no longer maintained, which means that it will not receive any further
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of
Fedora please feel free to reopen this bug against that version. If you
are unable to reopen this bug, please file a new report against the
current release. If you experience problems, please add a comment to this
bug.

Thank you for reporting this bug and we are sorry it could not be fixed.


Note You need to log in before you can comment on or make changes to this bug.