Bug 748065

Summary: temporarily "missing" devices cause clvm change operations to fail
Product: Red Hat Enterprise Linux 6 Reporter: Corey Marthaler <cmarthal>
Component: lvm2Assignee: LVM and device-mapper development team <lvm-team>
Status: CLOSED WORKSFORME QA Contact: Cluster QE <mspqa-list>
Severity: low Docs Contact:
Priority: low    
Version: 6.2CC: agk, dwysocha, heinzm, jbrassow, mbroz, prajnoha, prockai, thornber, zkabelac
Target Milestone: rcKeywords: Regression
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-03-01 20:30:27 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Bug Depends On:    
Bug Blocks: 756082    

Description Corey Marthaler 2011-10-21 21:27:40 UTC
Description of problem:
Although this problem is not reliably reproducible, it has been seen on many clusters during 6.2 regression testing. Basically, during change operations, device appear to be missing and cause the following errors:

   Couldn't find device with uuid H019sC-nSGg-iM1p-vcTw-BSfB-SfeT-bwwLg9.
   Cannot change VG mirror_sanity while PVs are missing.
   Consider vgreduce --removemissing.

Upon further investigation however, all the devices are present and the VG remains fine.

 SCENARIO - [open_fsadm_resize_attempt]
 Create mirror, add fs, and then attempt to resize it while it's mounted
 grant-03: lvcreate -m 1 -n open_fsadm_resize -L 4G --nosync mirror_sanity
   WARNING: New mirror won't be synchronised. Don't read what you didn't write!
 Placing an ext4 on open_fsadm_resize volume
 mke2fs 1.41.12 (17-May-2010)
 Attempt to resize the open mirrored filesystem multiple times with lvextend/fsadm on grant-03
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
 resize2fs 1.41.12 (17-May-2010)
 (lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize)
   Couldn't find device with uuid H019sC-nSGg-iM1p-vcTw-BSfB-SfeT-bwwLg9.
   Cannot change VG mirror_sanity while PVs are missing.
   Consider vgreduce --removemissing.
 couldn't resize mirror and filesystem on grant-03

Oct 21 15:03:09 grant-03 qarshd[29601]: Running cmdline: lvextend -L +3G -r /dev/mirror_sanity/open_fsadm_resize
Oct 21 15:03:10 grant-03 xinetd[5684]: EXIT: qarsh status=0 pid=29601 duration=1(sec)
Oct 21 15:04:23 grant-03 lvm[1092]: mirror_sanity-open_fsadm_resize is now in-sync.
                                    

[root@grant-03 ~]# lvs -a -o +devices
 LV                           Attr   LSize  Log                    Copy%  Devices
 open_fsadm_resize            Mwi-ao 28.00g open_fsadm_resize_mlog 100.00 open_fsadm_resize_mimage_0(0),open_fsadm_resize_mimage_1(0)
 [open_fsadm_resize_mimage_0] iwi-ao 28.00g                               /dev/sdb1(0)
 [open_fsadm_resize_mimage_1] iwi-ao 28.00g                               /dev/sdb2(0)
 [open_fsadm_resize_mlog]     lwi-ao  4.00m                               /dev/sdc6(0)


Version-Release number of selected component (if applicable):
2.6.32-209.el6.x86_64

lvm2-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
lvm2-libs-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
lvm2-cluster-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
udev-147-2.40.el6    BUILT: Fri Sep 23 07:51:13 CDT 2011
device-mapper-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
device-mapper-libs-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
device-mapper-event-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
device-mapper-event-libs-1.02.66-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011
cmirror-2.02.87-6.el6    BUILT: Wed Oct 19 06:46:31 CDT 2011


How reproducible:
Often during extended regression testing

Comment 2 Alasdair Kergon 2012-01-04 20:21:25 UTC
I wonder if you can pin down one instance of this occurring with the exact sequence of commands that the script issued.  How long has it been doing this?  Is it just a few test scripts or many different ones?

LVM is supposed to take responsibility for ensuring its own data is updated on disk, visible to all nodes, at the crucial places - not stuck in buffers.

We should probably review the code to check none of the recent changes broke the guarantees, or some other logic bug has crept in.  Equally it's possible the test scripts themselves aren't providing the necessary guarantees in everything they do.

So basically, more investigation needed to try to narrow down the circumstances/versions/variations when it does happen and when it doesn't.

Comment 4 Peter Rajnoha 2012-02-16 11:05:47 UTC
Corey, is this still seen with the latest test build?