Red Hat Bugzilla – Bug 208993
need to return proper failed error code when exclusive lock attempt fails
Last modified: 2010-07-07 07:15:09 EDT
Description of problem:
I took out the exclusive activation lock on link-01 and then attempted to also
grab that lock on link-02. That command failed as it should however the error
code was still 0.
[root@link-02 lib]# vgchange -aye
Error locking on node link-02: Resource temporarily unavailable
0 logical volume(s) in volume group "linear_1_5844" now active
[root@link-02 lib]# echo $?
Version-Release number of selected component (if applicable):
Part of the problem here, I think, is that vgchange can affect multiple volume
groups. so some could have been activated, and others not.
This isn't specific to clustered groups either. perhaps we need to have a
particular returned error code that indicates whether some groups have failed to
Remember that vgchange -ay is shorthand for:
for each VG on the command line (or all VGs if none given)
for each LV in that VG
lvchange -ay VG/LV
Conventionally only the most severe error from any of the constituent commands
(lvchange here) gets reported.
In precisely what circumstances do you want an error?
In a previous case the argument that won was that a command that changes
something should report an error if the *change* was not possible because the
entity was already in the state requested. That would mean if any referenced LV
is *already* active, the command should return an error because the attempt to
change it into the active state failed. [My preference was for error codes to
reflect whether or not the final state was reached, regardless of whether or not
anything had to change.] In a cluster it's even more complicated because of the
way tags control the activation, and the need to query the lock status.
First, since the command didn't fall under either of the cases listed in comment
#2, an error should be reported.
The command attempted to change the state of an entity, which was not currently
in that state (at least not on that node), and it failed to do so. The final
state (being active) was never reached.
But to answer your question, if the volume was already in the exclusive active
state, and you issue that command again, I wouldn't expect an error because the
final state was reached. However, the previous case that you refer to (bz
179473) where the entity was already "in the state requested" (or removed) and I
argued the opposite should happen, doens't really apply to this case. In that
case I was trying to manipulate an entity (PV) that no longer existed, so that
would be like attempting to deactivate a nonexistent vg. Should that fail, well
technically, a nonexistent vg isn't active so the command did technically work. :)
[root@link-02 tmp]# vgchange -an foo
Volume group "foo" not found
[root@link-02 tmp]# echo $?
It does the right thing and gives the error. :)
Here's what you see with the latest:
[root@link-08 lvm]# vgchange -ae
Error locking on node link-08: Volume is busy on another node
1 logical volume(s) in volume group "vg" now active
[root@link-08 lvm]# echo $?
I still argue that since the exclusive lock wasn't obtained, and since an error
was even given, a non zero error code should be given as well.
[root@link-08 lvm]# rpm -qa | grep lvm2
Yes, I know. only the error has changed as per 162809
Created attachment 144716 [details]
Idea for an improvement
Here's a patch I did a while ago that does this. The most controversial part of
it (I suspect) will be the shifting up of the return codes, this is needed to
keep INCOMPLETE in its rightful place in the severity stakes
email repsonse from Alasdair:
> I don't see that a new error code gains us anything there:
> If operating on multiple objects and you need to know which ones did or
> didn't succeed, then you simply perform the operations separately.
> Only use commands that operate on multiple objects when you aren't
> interested in knowing.
> The bugzilla referenced is discussing "What precisely should -ae" mean?
> Should it be different from "-ael" ? Under precisely what sets of
> circumstances should it return an error, and based on the final state
> or whether a change happened?
> The man page says:
> If clustered locking is enabled, -ae will activate exclusively
> on one node and -aly will activate only on the local node.
> Note that it does *not* say -ae will activate exclusively on the
> *local* node.
I'm assigning this to him because nothing I do is going to get past this argument.
This request was evaluated by Red Hat Product Management for inclusion in a Red
Hat Enterprise Linux maintenance release. Product Management has requested
further review of this request by Red Hat Engineering, for potential
inclusion in a Red Hat Enterprise Linux Update release for currently deployed
products. This request is not yet committed for inclusion in an Update
Since RHEL 4.8 External Beta has begun, and this bugzilla remains
unresolved, it has been rejected as it is not proposed as exception or
Still unresolved? But since this was originally raised, we added code that can check the lock state?
So can it now with -aey:
check the lock state to see if it is already active exclusively on any node, and return success if so - no error message?
if it is already active but not exclusive, issue the exclusive activation request in a way so that if it is active on exactly one node that lock will be changed to exclusive, but if it is active on multiple nodes, there'll be an error?
if it is not already active on any nodes, try exclusive activation on the local node first, but if it's filtered so nothing happens, issue it to all nodes, ignoring errors provided that one node succeeds?
This should be solved by this commit:
Version 2.02.46 - 21st May 2009
Detect LVs active on remote nodes by querying locks if supported.
But this version is not planed for RHEL4, should be already fixed in RHEL5 and above.