Bug 364081 - 'cluster request failed' during vgs command
'cluster request failed' during vgs command
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lvm2 (Show other bugs)
All Linux
urgent Severity high
: rc
: ---
Assigned To: Alasdair Kergon
Cluster QE
: ZStream
Depends On:
Blocks: 362691 386831 386841 399361
  Show dependency treegraph
Reported: 2007-11-02 11:43 EDT by Corey Marthaler
Modified: 2008-07-24 16:07 EDT (History)
7 users (show)

See Also:
Fixed In Version: RHBA-2008-0776
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Last Closed: 2008-07-24 16:07:46 EDT
Type: ---
Regression: ---
Mount Type: ---
Documentation: ---
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---

Attachments (Terms of Use)
this is the clvmd -d output while this cmd runs (24.66 KB, text/plain)
2007-11-02 11:45 EDT, Corey Marthaler
no flags Details

  None (edit)
Description Corey Marthaler 2007-11-02 11:43:00 EDT
Description of problem:
This happens on my 6 node cluster and may be some how related to bz 362691.
Everytime I run the following cmd, I see the following message 'cluster request
failed: Invalid argument'

[root@link-08 ~]# vgs mirror_sanity --noheadings -o pv_name -O pv_size
  cluster request failed: Invalid argument

Version-Release number of selected component (if applicable):
Comment 1 Corey Marthaler 2007-11-02 11:44:55 EDT
[root@link-08 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    6   M   link-02
   2    1    6   M   grant-01
   3    1    6   M   link-07
   4    1    6   M   grant-03
   5    1    6   M   link-08
   6    1    6   M   grant-02
Comment 2 Corey Marthaler 2007-11-02 11:45:51 EDT
Created attachment 246851 [details]
this is the clvmd -d output while this cmd runs
Comment 3 Christine Caulfield 2007-11-02 12:11:02 EDT
It looks like there are several attempts to unlock the VG 'V_mirror_sanity'. The
first one succeeded but subsequent ones fail because it is already unlocked.
Comment 4 Corey Marthaler 2007-11-06 10:59:49 EST
Just a note that this happens on a plain 2.6.9-65.ELsmp kern and is not related
to the patch for bz 290821.
Comment 5 Corey Marthaler 2007-11-13 15:40:29 EST
Looks like this bz is much more serious than first though. Any lvm create/remove
op hangs after running this vgs cmd. In fact it appears to be the cause of bz 362691

[root@hayes-03 etherd!e1.1]# vgs hayes --noheadings -o pv_name -O pv_size
  cluster request failed: Invalid argument
[root@hayes-03 etherd!e1.1]# lvcreate -L 10G hayes
Comment 6 Kiersten (Kerri) Anderson 2007-11-15 10:32:29 EST
Changing component.  The bug is in lvm not lvm2-cluster.
Comment 7 Kiersten (Kerri) Anderson 2007-11-15 10:34:22 EST
The fix for this in lvm2 needs to be escalated and released as a z-stream/day
zero errata for 4.6, as it affects all lvm operations:

<visegrips> kanderso, my feature?  It's blocking everything... linear creates,
mirror creates, (and removes), everything... and it boils down to this one
reporting bug.
Comment 8 Jonathan Earl Brassow 2007-11-15 11:05:54 EST
This bug represents the simplest way to show the problem, but it manifests
itself in other ways too, like:

bug 362691:  This should probably be marked as a duplicate of this bug -
currently, it is marked a a dependent.
Comment 9 Alasdair Kergon 2007-11-15 11:49:21 EST
I agree that this deserves an erratum.

My quick analysis:

The bug has been present for a long time, but, fortunately, only a few code 
paths are affected.  The problem is broader than the 'vgs' command mentioned 
here BTW - the following commands should also be included in testing: 
pvdisplay, pvresize, vgreduce.

On single-host lvm2, the problem would remain invisible insofar as the 
affected commands would appear to work correctly, but actually fail to prevent 
conflicting commands from being run concurrently leaving open the (albeit 
remote) possibility of metadata corruption.  So it is still important to fix 

On clustered lvm2, as shown in these bugzillas, clvmd hanging is a more likely 
failure mode, and so this fix is essential.  What has changed is that 
mirroring code is regularly trying to make use of one of these vulnerable code 
Comment 10 Alasdair Kergon 2007-11-15 12:20:49 EST

In every case, run the commands with -vvvv and ensure that the 'Locking X' 
and 'Unlocking X' lines alternate.

'Locking X; Locking Y; Unlocking Y; Unlocking X' is OK.
'Locking X; Unlocking X; Locking X; Unlocking X' is OK.

'Locking X; Locking X; Unlocking X; Unlocking X' reveals the bug.

Comment 11 Alasdair Kergon 2007-11-15 12:22:08 EST
where X and Y refer to volume groups and begin with P_ or V_.
Comment 12 Alasdair Kergon 2007-11-15 22:11:31 EST

Note that the code paths involved affect the following commands:
  pvs  (including --segments)
  vgs  (with -o+ pv fields, like pv_name, pvseg_start)
  vgreduce  (incl --removemissing)

Test these (as appropriate) with PV that are inside VGs and PVs that are not.
Also test with PVs (inside VGs) that have been tagged e.g.
   pvchange --addtag tag1 <PV>
   vgs -o +pv_name @tag1

Check with VGs that are clustered and those that are not.

Check the exit status of the commands remains sensible, including the queries
run on local VGs during machine boot before the cluster infrastructure has
started up.
Comment 13 Alasdair Kergon 2007-11-15 22:15:19 EST
And as described higher up, run the tests with -vvvv (or equivalent lvm.conf
logging enabled) and grep for the Locking/Unlocking messages to check they are
paired correctly.  (I'll think about doing an upstream patch to detect those
errors automatically.)
Comment 14 Alasdair Kergon 2007-11-15 22:20:40 EST
(tag example above returns no output now BTW as vgs looks for a VG tag not a PV
tag, but pvs @tag1 will - I might change that one day - it ought to find it I think)
Comment 18 Corey Marthaler 2007-12-04 14:38:22 EST
This bug has been verified fixed in lvm2-2.02.27-4.el4/lvm2-cluster-2.02.27-4.el4.
Comment 20 errata-xmlrpc 2008-07-24 16:07:46 EDT
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.


Note You need to log in before you can comment on or make changes to this bug.