Red Hat Bugzilla – Bug 814779
Handle lvmetad/udev in a cluster
Last modified: 2015-09-29 17:39:23 EDT
As currently coded, in a cluster, lvmetad would run independently on each node.
Certain events happening on one node would cause the cache contents on another node to be wrong.
Deal with this.
I've heard that there was a suggestion that we could possibly do without event propagation to other nodes. We have to be very careful here as the events are not originated in kernel itself as a result of processing certain dm ioctl only. Any user can call "udevadm trigger" anytime as well as adding the WATCH udev rule (though this rule is removed for dm-device in current RHEL versions) - we can't prohibit that generaly. This would generate (artificial) events and as a matter of fact, all the rules are reapllied (together with calling the lvmetad --cache) which means lvmetad could see something different now - a new state.
We have to be careful if we go the "no event propagation" way. We need to make sure that the states of all lvmetad instances on all nodes are always in sync!
(In reply to comment #1)
> I've heard that there was a suggestion that we could possibly do without event
> propagation to other nodes. We have to be very careful here as the events are
> not originated in kernel itself as a result of processing certain dm ioctl
(...or updating the state based on current "cluster locking" scheme)
This request was not resolved in time for the current release.
Red Hat invites you to ask your support representative to
propose this request, if still desired, for consideration in
the next release of Red Hat Enterprise Linux.
This request was erroneously removed from consideration in Red Hat Enterprise Linux 6.4, which is currently under development. This request will be evaluated for inclusion in Red Hat Enterprise Linux 6.4.
*** Bug 799859 has been marked as a duplicate of this bug. ***
I have been thinking about this, and this is what comes to mind:
1) when clvmd is running, it "takes over" the lvmetad socket; there are some options on how to achieve this:
- clients know that clvmd is running and use an alternate socket to talk to lvmetad
- clvmd is started in place of lvmetad, opens its usual socket and lvmetad is started on an alternate socket that only clvmd will use
2) all requests on the lvmetad socket are intercepted by clvmd, but passed on unchanged into lvmetad; lvmetad does its normal processing and replies (to clvmd); a new field is added to lvmetad responses, something like "needs_propagating" (0 or 1); if this is 1, the original request as intercepted by clvmd is broadcast to all other clvmd instances and each clvmd passes it on to its local lvmetad instance; the reply is then of course forwarded back to the original client.
This basically means that while clvmd implements all the transport logic, lvmetad retains the knowledge of its own protocol and of what it is caching and how. In this case, clvmd acts as a relatively dumb transport. On the other hand, lvmetad needs to know which state is local (devices) and which is global (VGs).
An alternative to 2) would be to include more detailed information about "what changed" in the response, like a list of VGs that have been affected and how (maybe extending the current status = complete mechanism). This might find other uses and include less knowledge about cluster in lvmetad (which shouldn't need to know much about it).
This request was evaluated by Red Hat Product Management for
inclusion in the current release of Red Hat Enterprise Linux.
Since we are unable to provide this feature at this time,
it has been proposed for the next release of
Red Hat Enterprise Linux.
In 6.4, this is *not* supported. I have made the code disable lvmetad when locking_type is set to 3 and warn the user about it (log_warn). The patch implementing that change is b248ba0..2fdd084.
The warning message we issue:
"WARNING: configuration setting use_lvmetad overriden to 0 due to locking_type 3. Clustered environment not supported by lvmetad yet."
This issue should be mentioned in RHEL6.4 Known Issues part of the documentation.
We don't have a concrete design yet, setting devel NAK.
Moving to 6.6 for consideration.
Moving to 6.7 for consideration.