Bug 364081 - 'cluster request failed' during vgs command
Summary: 'cluster request failed' during vgs command
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Enterprise Linux 4
Classification: Red Hat
Component: lvm2
Version: 4.0
Hardware: All
OS: Linux
urgent
high
Target Milestone: rc
: ---
Assignee: Alasdair Kergon
QA Contact: Cluster QE
URL:
Whiteboard: GSSApproved
Depends On:
Blocks: 362691 386831 386841 399361
TreeView+ depends on / blocked
 
Reported: 2007-11-02 15:43 UTC by Corey Marthaler
Modified: 2008-07-24 20:07 UTC (History)
7 users (show)

Fixed In Version: RHBA-2008-0776
Doc Type: Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed: 2008-07-24 20:07:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
this is the clvmd -d output while this cmd runs (24.66 KB, text/plain)
2007-11-02 15:45 UTC, Corey Marthaler
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2008:0776 0 normal SHIPPED_LIVE lvm2 bug fix and enhancement update 2008-07-23 17:19:45 UTC

Description Corey Marthaler 2007-11-02 15:43:00 UTC
Description of problem:
This happens on my 6 node cluster and may be some how related to bz 362691.
Everytime I run the following cmd, I see the following message 'cluster request
failed: Invalid argument'

[root@link-08 ~]# vgs mirror_sanity --noheadings -o pv_name -O pv_size
  cluster request failed: Invalid argument
  /dev/sda1
  /dev/sdb1
  /dev/sdc1
  /dev/sdd1
  /dev/sde1
  /dev/sdf1
  /dev/sdg1


Version-Release number of selected component (if applicable):
2.6.9-65.BRsmp
lvm2-cluster-2.02.27-2.el4

Comment 1 Corey Marthaler 2007-11-02 15:44:55 UTC
[root@link-08 ~]# cman_tool nodes
Node  Votes Exp Sts  Name
   1    1    6   M   link-02
   2    1    6   M   grant-01
   3    1    6   M   link-07
   4    1    6   M   grant-03
   5    1    6   M   link-08
   6    1    6   M   grant-02

Comment 2 Corey Marthaler 2007-11-02 15:45:51 UTC
Created attachment 246851 [details]
this is the clvmd -d output while this cmd runs

Comment 3 Christine Caulfield 2007-11-02 16:11:02 UTC
It looks like there are several attempts to unlock the VG 'V_mirror_sanity'. The
first one succeeded but subsequent ones fail because it is already unlocked.

Comment 4 Corey Marthaler 2007-11-06 15:59:49 UTC
Just a note that this happens on a plain 2.6.9-65.ELsmp kern and is not related
to the patch for bz 290821.

Comment 5 Corey Marthaler 2007-11-13 20:40:29 UTC
Looks like this bz is much more serious than first though. Any lvm create/remove
op hangs after running this vgs cmd. In fact it appears to be the cause of bz 362691

[root@hayes-03 etherd!e1.1]# vgs hayes --noheadings -o pv_name -O pv_size
  cluster request failed: Invalid argument
  /dev/etherd/e1.1p1
  /dev/etherd/e1.1p2
  /dev/etherd/e1.1p3
[root@hayes-03 etherd!e1.1]# lvcreate -L 10G hayes


Comment 6 Kiersten (Kerri) Anderson 2007-11-15 15:32:29 UTC
Changing component.  The bug is in lvm not lvm2-cluster.

Comment 7 Kiersten (Kerri) Anderson 2007-11-15 15:34:22 UTC
The fix for this in lvm2 needs to be escalated and released as a z-stream/day
zero errata for 4.6, as it affects all lvm operations:

<visegrips> kanderso, my feature?  It's blocking everything... linear creates,
mirror creates, (and removes), everything... and it boils down to this one
reporting bug.

Comment 8 Jonathan Earl Brassow 2007-11-15 16:05:54 UTC
This bug represents the simplest way to show the problem, but it manifests
itself in other ways too, like:

bug 362691:  This should probably be marked as a duplicate of this bug -
currently, it is marked a a dependent.


Comment 9 Alasdair Kergon 2007-11-15 16:49:21 UTC
I agree that this deserves an erratum.

My quick analysis:

The bug has been present for a long time, but, fortunately, only a few code 
paths are affected.  The problem is broader than the 'vgs' command mentioned 
here BTW - the following commands should also be included in testing: 
pvdisplay, pvresize, vgreduce.

On single-host lvm2, the problem would remain invisible insofar as the 
affected commands would appear to work correctly, but actually fail to prevent 
conflicting commands from being run concurrently leaving open the (albeit 
remote) possibility of metadata corruption.  So it is still important to fix 
this.

On clustered lvm2, as shown in these bugzillas, clvmd hanging is a more likely 
failure mode, and so this fix is essential.  What has changed is that 
mirroring code is regularly trying to make use of one of these vulnerable code 
paths.

Comment 10 Alasdair Kergon 2007-11-15 17:20:49 UTC
Testing:

In every case, run the commands with -vvvv and ensure that the 'Locking X' 
and 'Unlocking X' lines alternate.

'Locking X; Locking Y; Unlocking Y; Unlocking X' is OK.
'Locking X; Unlocking X; Locking X; Unlocking X' is OK.

'Locking X; Locking X; Unlocking X; Unlocking X' reveals the bug.



Comment 11 Alasdair Kergon 2007-11-15 17:22:08 UTC
where X and Y refer to volume groups and begin with P_ or V_.

Comment 12 Alasdair Kergon 2007-11-16 03:11:31 UTC
Try:
  lvm2-2.02.27-4.el4
  lvm2-cluster-2.02.27-4.el4

Note that the code paths involved affect the following commands:
  pvdisplay
  pvresize
  pvs  (including --segments)
  vgs  (with -o+ pv fields, like pv_name, pvseg_start)
  vgdisplay
  vgreduce  (incl --removemissing)

Test these (as appropriate) with PV that are inside VGs and PVs that are not.
Also test with PVs (inside VGs) that have been tagged e.g.
   pvchange --addtag tag1 <PV>
then 
   vgs -o +pv_name @tag1

Check with VGs that are clustered and those that are not.

Check the exit status of the commands remains sensible, including the queries
run on local VGs during machine boot before the cluster infrastructure has
started up.


Comment 13 Alasdair Kergon 2007-11-16 03:15:19 UTC
And as described higher up, run the tests with -vvvv (or equivalent lvm.conf
logging enabled) and grep for the Locking/Unlocking messages to check they are
paired correctly.  (I'll think about doing an upstream patch to detect those
errors automatically.)

Comment 14 Alasdair Kergon 2007-11-16 03:20:40 UTC
(tag example above returns no output now BTW as vgs looks for a VG tag not a PV
tag, but pvs @tag1 will - I might change that one day - it ought to find it I think)

Comment 18 Corey Marthaler 2007-12-04 19:38:22 UTC
This bug has been verified fixed in lvm2-2.02.27-4.el4/lvm2-cluster-2.02.27-4.el4.

Comment 20 errata-xmlrpc 2008-07-24 20:07:46 UTC
An advisory has been issued which should help the problem
described in this bug report. This report is therefore being
closed with a resolution of ERRATA. For more information
on therefore solution and/or where to find the updated files,
please follow the link below. You may reopen this bug report
if the solution does not work for you.

http://rhn.redhat.com/errata/RHBA-2008-0776.html


Note You need to log in before you can comment on or make changes to this bug.