Description of problem: This happens on my 6 node cluster and may be some how related to bz 362691. Everytime I run the following cmd, I see the following message 'cluster request failed: Invalid argument' [root@link-08 ~]# vgs mirror_sanity --noheadings -o pv_name -O pv_size cluster request failed: Invalid argument /dev/sda1 /dev/sdb1 /dev/sdc1 /dev/sdd1 /dev/sde1 /dev/sdf1 /dev/sdg1 Version-Release number of selected component (if applicable): 2.6.9-65.BRsmp lvm2-cluster-2.02.27-2.el4
[root@link-08 ~]# cman_tool nodes Node Votes Exp Sts Name 1 1 6 M link-02 2 1 6 M grant-01 3 1 6 M link-07 4 1 6 M grant-03 5 1 6 M link-08 6 1 6 M grant-02
Created attachment 246851 [details] this is the clvmd -d output while this cmd runs
It looks like there are several attempts to unlock the VG 'V_mirror_sanity'. The first one succeeded but subsequent ones fail because it is already unlocked.
Just a note that this happens on a plain 2.6.9-65.ELsmp kern and is not related to the patch for bz 290821.
Looks like this bz is much more serious than first though. Any lvm create/remove op hangs after running this vgs cmd. In fact it appears to be the cause of bz 362691 [root@hayes-03 etherd!e1.1]# vgs hayes --noheadings -o pv_name -O pv_size cluster request failed: Invalid argument /dev/etherd/e1.1p1 /dev/etherd/e1.1p2 /dev/etherd/e1.1p3 [root@hayes-03 etherd!e1.1]# lvcreate -L 10G hayes
Changing component. The bug is in lvm not lvm2-cluster.
The fix for this in lvm2 needs to be escalated and released as a z-stream/day zero errata for 4.6, as it affects all lvm operations: <visegrips> kanderso, my feature? It's blocking everything... linear creates, mirror creates, (and removes), everything... and it boils down to this one reporting bug.
This bug represents the simplest way to show the problem, but it manifests itself in other ways too, like: bug 362691: This should probably be marked as a duplicate of this bug - currently, it is marked a a dependent.
I agree that this deserves an erratum. My quick analysis: The bug has been present for a long time, but, fortunately, only a few code paths are affected. The problem is broader than the 'vgs' command mentioned here BTW - the following commands should also be included in testing: pvdisplay, pvresize, vgreduce. On single-host lvm2, the problem would remain invisible insofar as the affected commands would appear to work correctly, but actually fail to prevent conflicting commands from being run concurrently leaving open the (albeit remote) possibility of metadata corruption. So it is still important to fix this. On clustered lvm2, as shown in these bugzillas, clvmd hanging is a more likely failure mode, and so this fix is essential. What has changed is that mirroring code is regularly trying to make use of one of these vulnerable code paths.
Testing: In every case, run the commands with -vvvv and ensure that the 'Locking X' and 'Unlocking X' lines alternate. 'Locking X; Locking Y; Unlocking Y; Unlocking X' is OK. 'Locking X; Unlocking X; Locking X; Unlocking X' is OK. 'Locking X; Locking X; Unlocking X; Unlocking X' reveals the bug.
where X and Y refer to volume groups and begin with P_ or V_.
Try: lvm2-2.02.27-4.el4 lvm2-cluster-2.02.27-4.el4 Note that the code paths involved affect the following commands: pvdisplay pvresize pvs (including --segments) vgs (with -o+ pv fields, like pv_name, pvseg_start) vgdisplay vgreduce (incl --removemissing) Test these (as appropriate) with PV that are inside VGs and PVs that are not. Also test with PVs (inside VGs) that have been tagged e.g. pvchange --addtag tag1 <PV> then vgs -o +pv_name @tag1 Check with VGs that are clustered and those that are not. Check the exit status of the commands remains sensible, including the queries run on local VGs during machine boot before the cluster infrastructure has started up.
And as described higher up, run the tests with -vvvv (or equivalent lvm.conf logging enabled) and grep for the Locking/Unlocking messages to check they are paired correctly. (I'll think about doing an upstream patch to detect those errors automatically.)
(tag example above returns no output now BTW as vgs looks for a VG tag not a PV tag, but pvs @tag1 will - I might change that one day - it ought to find it I think)
This bug has been verified fixed in lvm2-2.02.27-4.el4/lvm2-cluster-2.02.27-4.el4.
An advisory has been issued which should help the problem described in this bug report. This report is therefore being closed with a resolution of ERRATA. For more information on therefore solution and/or where to find the updated files, please follow the link below. You may reopen this bug report if the solution does not work for you. http://rhn.redhat.com/errata/RHBA-2008-0776.html