Hide Forgot
Description of problem: Although this problem is not reliably reproducible, it has been seen on many clusters during 6.2 regression testing. Basically, during change operations, device appear to be missing and cause the following errors: Couldn't find device with uuid H019sC-nSGg-iM1p-vcTw-BSfB-SfeT-bwwLg9. Cannot change VG mirror_sanity while PVs are missing. Consider vgreduce --removemissing. Upon further investigation however, all the devices are present and the VG remains fine. SCENARIO - [open_fsadm_resize_attempt] Create mirror, add fs, and then attempt to resize it while it's mounted grant-03: lvcreate -m 1 -n open_fsadm_resize -L 4G --nosync mirror_sanity WARNING: New mirror won't be synchronised. Don't read what you didn't write! Placing an ext4 on open_fsadm_resize volume mke2fs 1.41.12 (17-May-2010) Attempt to resize the open mirrored filesystem multiple times with lvextend/fsadm on grant-03 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) resize2fs 1.41.12 (17-May-2010) (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize) Couldn't find device with uuid H019sC-nSGg-iM1p-vcTw-BSfB-SfeT-bwwLg9. Cannot change VG mirror_sanity while PVs are missing. Consider vgreduce --removemissing. couldn't resize mirror and filesystem on grant-03 Oct 21 15:03:09 grant-03 qarshd[29601]: Running cmdline: lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize Oct 21 15:03:10 grant-03 xinetd[5684]: EXIT: qarsh status=0 pid=29601 duration=1(sec) Oct 21 15:04:23 grant-03 lvm[1092]: mirror_sanity-open_fsadm_resize is now in-sync. [root@grant-03 ~]# lvs -a -o +devices LV Attr LSize Log Copy% Devices open_fsadm_resize Mwi-ao 28.00g open_fsadm_resize_mlog 100.00 open_fsadm_resize_mimage_0(0),open_fsadm_resize_mimage_1(0) [open_fsadm_resize_mimage_0] iwi-ao 28.00g /dev/sdb1(0) [open_fsadm_resize_mimage_1] iwi-ao 28.00g /dev/sdb2(0) [open_fsadm_resize_mlog] lwi-ao 4.00m /dev/sdc6(0) Version-Release number of selected component (if applicable): 2.6.32-209.el6.x86_64 lvm2-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 lvm2-libs-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 lvm2-cluster-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 udev-147-2.40.el6 BUILT: Fri Sep 23 07:51:13 CDT 2011 device-mapper-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 device-mapper-libs-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 device-mapper-event-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 device-mapper-event-libs-1.02.66-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 cmirror-2.02.87-6.el6 BUILT: Wed Oct 19 06:46:31 CDT 2011 How reproducible: Often during extended regression testing
I wonder if you can pin down one instance of this occurring with the exact sequence of commands that the script issued. How long has it been doing this? Is it just a few test scripts or many different ones? LVM is supposed to take responsibility for ensuring its own data is updated on disk, visible to all nodes, at the crucial places - not stuck in buffers. We should probably review the code to check none of the recent changes broke the guarantees, or some other logic bug has crept in. Equally it's possible the test scripts themselves aren't providing the necessary guarantees in everything they do. So basically, more investigation needed to try to narrow down the circumstances/versions/variations when it does happen and when it doesn't.
Corey, is this still seen with the latest test build?