Red Hat Bugzilla – Bug 364081
'cluster request failed' during vgs command
Last modified: 2008-07-24 16:07:46 EDT
Description of problem:
This happens on my 6 node cluster and may be some how related to bz 362691.
Everytime I run the following cmd, I see the following message 'cluster request
failed: Invalid argument'
[root@link-08 ~]# vgs mirror_sanity --noheadings -o pv_name -O pv_size
cluster request failed: Invalid argument
Version-Release number of selected component (if applicable):
[root@link-08 ~]# cman_tool nodes
Node Votes Exp Sts Name
1 1 6 M link-02
2 1 6 M grant-01
3 1 6 M link-07
4 1 6 M grant-03
5 1 6 M link-08
6 1 6 M grant-02
Created attachment 246851 [details]
this is the clvmd -d output while this cmd runs
It looks like there are several attempts to unlock the VG 'V_mirror_sanity'. The
first one succeeded but subsequent ones fail because it is already unlocked.
Just a note that this happens on a plain 2.6.9-65.ELsmp kern and is not related
to the patch for bz 290821.
Looks like this bz is much more serious than first though. Any lvm create/remove
op hangs after running this vgs cmd. In fact it appears to be the cause of bz 362691
[root@hayes-03 etherd!e1.1]# vgs hayes --noheadings -o pv_name -O pv_size
cluster request failed: Invalid argument
[root@hayes-03 etherd!e1.1]# lvcreate -L 10G hayes
Changing component. The bug is in lvm not lvm2-cluster.
The fix for this in lvm2 needs to be escalated and released as a z-stream/day
zero errata for 4.6, as it affects all lvm operations:
<visegrips> kanderso, my feature? It's blocking everything... linear creates,
mirror creates, (and removes), everything... and it boils down to this one
This bug represents the simplest way to show the problem, but it manifests
itself in other ways too, like:
bug 362691: This should probably be marked as a duplicate of this bug -
currently, it is marked a a dependent.
I agree that this deserves an erratum.
My quick analysis:
The bug has been present for a long time, but, fortunately, only a few code
paths are affected. The problem is broader than the 'vgs' command mentioned
here BTW - the following commands should also be included in testing:
pvdisplay, pvresize, vgreduce.
On single-host lvm2, the problem would remain invisible insofar as the
affected commands would appear to work correctly, but actually fail to prevent
conflicting commands from being run concurrently leaving open the (albeit
remote) possibility of metadata corruption. So it is still important to fix
On clustered lvm2, as shown in these bugzillas, clvmd hanging is a more likely
failure mode, and so this fix is essential. What has changed is that
mirroring code is regularly trying to make use of one of these vulnerable code
In every case, run the commands with -vvvv and ensure that the 'Locking X'
and 'Unlocking X' lines alternate.
'Locking X; Locking Y; Unlocking Y; Unlocking X' is OK.
'Locking X; Unlocking X; Locking X; Unlocking X' is OK.
'Locking X; Locking X; Unlocking X; Unlocking X' reveals the bug.
where X and Y refer to volume groups and begin with P_ or V_.
Note that the code paths involved affect the following commands:
pvs (including --segments)
vgs (with -o+ pv fields, like pv_name, pvseg_start)
vgreduce (incl --removemissing)
Test these (as appropriate) with PV that are inside VGs and PVs that are not.
Also test with PVs (inside VGs) that have been tagged e.g.
pvchange --addtag tag1 <PV>
vgs -o +pv_name @tag1
Check with VGs that are clustered and those that are not.
Check the exit status of the commands remains sensible, including the queries
run on local VGs during machine boot before the cluster infrastructure has
And as described higher up, run the tests with -vvvv (or equivalent lvm.conf
logging enabled) and grep for the Locking/Unlocking messages to check they are
paired correctly. (I'll think about doing an upstream patch to detect those
(tag example above returns no output now BTW as vgs looks for a VG tag not a PV
tag, but pvs @tag1 will - I might change that one day - it ought to find it I think)
This bug has been verified fixed in lvm2-2.02.27-4.el4/lvm2-cluster-2.02.27-4.el4.
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.