208993 – need to return proper failed error code when exclusive lock attempt fails

Bug 208993 - need to return proper failed error code when exclusive lock attempt fails

Summary: need to return proper failed error code when exclusive lock attempt fails

Keywords:
Status:	CLOSED NEXTRELEASE
Alias:	None
Product:	Red Hat Enterprise Linux 4
Classification:	Red Hat
Component:	lvm2
Sub Component:
Version:	4.0
Hardware:	All
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	---
Assignee:	Alasdair Kergon
QA Contact:	Cluster QE
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2006-10-02 21:13 UTC by Corey Marthaler
Modified:	2010-07-07 11:15 UTC (History)
CC List:	7 users (show)
Fixed In Version:
Doc Type:	Bug Fix
Doc Text:
Clone Of:
Environment:
Last Closed:	2010-07-07 11:15:09 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
Idea for an improvement (3.04 KB, patch) 2007-01-03 16:03 UTC, Christine Caulfield	no flags	Details \| Diff
View All

Description Corey Marthaler 2006-10-02 21:13:56 UTC

Description of problem:
I took out the exclusive activation lock on link-01 and then attempted to also
grab that lock on link-02. That command failed as it should however the error
code was still 0.

[root@link-02 lib]# vgchange -aye
  Error locking on node link-02: Resource temporarily unavailable
  0 logical volume(s) in volume group "linear_1_5844" now active
[root@link-02 lib]# echo $?
0

Version-Release number of selected component (if applicable):
lvm2-cluster-2.02.06-7.0.RHEL4

Comment 1 Christine Caulfield 2006-10-04 12:15:41 UTC

Part of the problem here, I think, is that vgchange can affect multiple volume
groups. so some could have been activated, and others not.

This isn't specific to clustered groups either. perhaps we need to have a
particular returned error code that indicates whether some groups have failed to
be activated.

Comment 2 Alasdair Kergon 2006-10-04 14:29:43 UTC

Remember that vgchange -ay is shorthand for:

  for each VG on the command line (or all VGs if none given)
    for each LV in that VG
       lvchange -ay VG/LV

Conventionally only the most severe error from any of the constituent commands
(lvchange here) gets reported.

In precisely what circumstances do you want an error?

In a previous case the argument that won was that a command that changes
something should report an error if the *change* was not possible because the
entity was already in the state requested.  That would mean if any referenced LV
is *already* active, the command should return an error because the attempt to
change it into the active state failed.  [My preference was for error codes to
reflect whether or not the final state was reached, regardless of whether or not
anything had to change.]  In a cluster it's even more complicated because of the
way tags control the activation, and the need to query the lock status.

Comment 3 Corey Marthaler 2006-10-04 19:25:26 UTC

First, since the command didn't fall under either of the cases listed in comment
#2, an error should be reported. 

The command attempted to change the state of an entity, which was not currently
in that state (at least not on that node), and it failed to do so. The final
state (being active) was never reached.


But to answer your question, if the volume was already in the exclusive active
state, and you issue that command again, I wouldn't expect an error because the
final state was reached. However, the previous case that you refer to (bz
179473) where the entity was already "in the state requested" (or removed) and I
argued the opposite should happen, doens't really apply to this case. In that
case I was trying to manipulate an entity (PV) that no longer existed, so that
would be like attempting to deactivate a nonexistent vg. Should that fail, well
technically, a nonexistent vg isn't active so the command did technically work. :)

[root@link-02 tmp]# vgchange -an foo
  Volume group "foo" not found
[root@link-02 tmp]# echo $?
5

It does the right thing and gives the error. :)

Comment 4 Corey Marthaler 2006-11-01 17:33:01 UTC

Here's what you see with the latest:

[root@link-08 lvm]#  vgchange -ae
  Error locking on node link-08: Volume is busy on another node
  1 logical volume(s) in volume group "vg" now active
[root@link-08 lvm]# echo $?
0

I still argue that since the exclusive lock wasn't obtained, and since an error
was even given, a non zero error code should be given as well.

[root@link-08 lvm]# rpm -qa | grep lvm2
lvm2-cluster-2.02.13-1
lvm2-2.02.13-1

Comment 5 Christine Caulfield 2006-11-01 17:51:27 UTC

Yes, I know. only the error has changed as per 162809

Comment 6 Christine Caulfield 2007-01-03 16:03:35 UTC

Created attachment 144716 [details]
Idea for an improvement

Here's a patch I did a while ago that does this. The most controversial part of
it (I suspect) will be the shifting up of the return codes, this is needed to
keep INCOMPLETE in its rightful place in the severity stakes

Comment 7 Christine Caulfield 2007-01-08 09:48:15 UTC

email repsonse from Alasdair:

> I don't see that a new error code gains us anything there:
>
> If operating on multiple objects and you need to know which ones did or
> didn't succeed, then you simply perform the operations separately.
>
> Only use commands that operate on multiple objects when you aren't
> interested in knowing.
>
> The bugzilla referenced is discussing "What precisely should -ae" mean?
> Should it be different from "-ael" ?  Under precisely what sets of
> circumstances should it return an error, and based on the final state
> or whether a change happened?
>
> The man page says:
>    If clustered locking is enabled, -ae will  activate  exclusively
>    on  one  node and -aly will activate only on the local node.
>
> Note that it does *not* say -ae will activate exclusively on the
> *local* node.
>
> Alasdair

I'm assigning this to him because nothing I do is going to get past this argument.

Comment 11 RHEL Program Management 2008-09-05 17:12:35 UTC

This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release.  Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products.  This request is not yet committed for inclusion in an Update
release.

Comment 14 RHEL Program Management 2009-03-12 19:32:38 UTC

Since RHEL 4.8 External Beta has begun, and this bugzilla remains 
unresolved, it has been rejected as it is not proposed as exception or 
blocker.

Comment 15 Alasdair Kergon 2010-05-17 12:41:40 UTC

Still unresolved?  But since this was originally raised, we added code that can check the lock state?

So can it now with -aey:

  check the lock state to see if it is already active exclusively on any node, and return success if so - no error message?

  if it is already active but not exclusive, issue the exclusive activation request in a way so that if it is active on exactly one node that lock will be changed to exclusive, but if it is active on multiple nodes, there'll be an error?

  if it is not already active on any nodes, try exclusive activation on the local node first, but if it's filtered so nothing happens, issue it to all nodes, ignoring errors provided that one node succeeds?

Comment 16 Milan Broz 2010-07-07 11:15:09 UTC

This should be solved by this commit:

Version 2.02.46 - 21st May 2009
===============================
  Detect LVs active on remote nodes by querying locks if supported.

But this version is not planed for RHEL4, should be already fixed in RHEL5 and above.

Note You need to log in before you can comment on or make changes to this bug.