Red Hat Bugzilla – Bug 128840
pvmove locks and becomes unkillable potentially corrupting volumes
Last modified: 2007-11-30 17:10:46 EST
Description of problem:
when trying to move lvm2 data off a drive prior to
removal/replacement, the use of pvmove locks and potentially corrupts
the volume group.
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create lvm2 vg with more than one PV
2. pvmove one of the PVs
Locked pvmove process. If you try pvmove --abort, that locks too.
The data moving off the PV
Although things look quite bad, I have not actually lost any data as
yet. One may need to reboot a couple of times with a "pvmove --abort"
in the first reboot, but all is not lost .. it just looks that way.
(a) don't know if that kernel included mirror support (needed by
pvmove) or not: the newest kernels do and newer lvm2 RPMs check for
this before initiating the pvmove;
(b) you shouldn't try to pvmove a system volume (containing /etc,
/dev, /var etc.) as pvmove itself needs to access those directories
(probably need to run with everything lvm2 uses in a ramdisk instead
to make this work).
In answer ..
a) How can this be checked for? The last pvmove I did nearly lead to
the loss of quite a lot of data (as mentioned) and therefore restore
time. I would like to check for this before trying again.
b) The system in question had two VGs. One was for system stuff and
the other purely data. The pvmove was attempted in the vgdata volume.
(a) checks were added in 2.00.16. pvmove uses a mirror to move the
data so it won't free up the original data location until the data is
already synced into its new location.
So (b) doesn't apply in your case
I've just had the same problem -- pvmove locked hard at about 98% of a
10GB pvmove. dm-mirror seemed happy until I started getting memory
pressure-like kernel errors when checking dmesg. pvmove never
finished. I force-sync'd the drives and rebooted with SysRq-U,S,B
since nothing else was working anymore (I could type commands, but
nothing actually happened). On reboot, pvmove appeared to try and
continue but failed again the same way. I rebooted into runlevel 1 and
did an --abort but that did 'nothing' as well. I finally moved the
dm-mirror module into /tmp/ to force it to not continue and that worked
(I'm running right now) but two of my logical volumes are corrupt; I'm
assuming the mirrored version is correct, somehow. How do I get my