Red Hat Bugzilla – Bug 207204
pvmove hangs when tried from rescue disk context
Last modified: 2007-12-04 16:37:00 EST
Patrick has encountered a bug where pvmove hangs the machine when run from a
rescue environment. I've been unable to reproduce this bug on my own outside
that environment (in normal boot-up).
I will attach a file that is a typescript of what he was trying to do...
Essentially, user was trying to pvmove contents off of one disk and onto another
to eventually remove that disk. One of the LVs being moved was fragmented (4
times?). It appears that when pvmove got to the end of one of those fragments,
it would hang. (You can see this from the first two 'dmsetup table' outputs.)
The user would issue a ^c on the pvmove, then issue the command again, and it
would proceed to the next segment. (You can see this from the 3rd 'dmsetup table')
This is my best guess at the problem. Not sure if it is reproducible.
Created attachment 136680 [details]
typescript of events
booted into normal install (fully updated RHEL4) after that (move of the LV with
/ had finished).
pvmove -t -v -n /dev/Volume00/LogVol_amanda_hold /dev/sdd1 /dev/sda3
pvmove -v /dev/sdd1 /dev/sda3
typescript and kernel panic will follow.
Created attachment 136682 [details]
typescript of the pvmove when in the normal system
Created attachment 136683 [details]
the kernel oops I captured on the serial console
the sage continues, after the Oops in Comment #2 I hit reset thinking "ah well,
I'll move the LVs one by one then".
Oops on boot, console log will follow.
Created attachment 136688 [details]
the Oops described in Comment #5
and the saga continues further.
At the stage of Comment #5 I figured "oh well, recue mode then, I want to go
home, it's past 1am and I stll have not had dinner", so boot ino rescue, choose
to mount partitions r/w when anaconda asks. You guessed it, Oops again.
Created attachment 136689 [details]
he Oops described in Comment #7
and just tom complete the saga, booted rescue but chose not to mount, "lvm
pvmove --abort" rebooted back to normql system, did the pvmoves one LV at a
time. All seems good now.
Looking at the attachment in comment #3...
When the logical volume is not specified, and pvmove is run, I find it strange
that a 'pvmove0' is reported for each of the 3 LVs being moved. I'm not sure
what that means yet...
... and why do I see messages like 'Volume00/pvmove0 already monitored.' pvmove
devices should _NEVER_ be monitored.
no doubt, the panics are happening due to a check-for-sync off the end of a bitmap.
I've added code to ensure that PVMOVE volumes are never monitored. I'm not 100%
sure that the panic has been fixed, however.
It would be nice if this could be run again with the updated packages...
this bug has been reproduced (pretty sure) and is bug #213887
based on comment #15, I'm moving this back to assigned.