Bug 1854675 - [RFE] a pacemaker agent that detects if a volume is lost or becomes inaccessible
Summary: [RFE] a pacemaker agent that detects if a volume is lost or becomes inaccessible
Keywords:
Status: CLOSED DUPLICATE of bug 2055299
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: resource-agents
Version: 8.2
Hardware: All
OS: Linux
medium
medium
Target Milestone: rc
: 8.4
Assignee: Oyvind Albrigtsen
QA Contact: cluster-qe@redhat.com
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-08 00:38 UTC by Seunghwan Jung
Modified: 2023-08-02 13:55 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2023-08-02 13:54:43 UTC
Type: Feature Request
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Knowledge Base (Solution) 3495091 0 None None None 2020-07-29 17:39:36 UTC

Description Seunghwan Jung 2020-07-08 00:38:35 UTC
Description of problem:

A customer requests a new feature with which pacemaker cluster can detect when a mounted volume such as "/", is lost or not accessible. It would be implemented as a resource like 'ethmonitor' or an other way.

Version-Release number of selected component (if applicable):

The customer is running RHEL 8.1 E4S


How reproducible:
n/a

Steps to Reproduce:
n/a

Actual results:
When a node has lost its root volume, it won't detect it but will keep running.

Expected results:

There is an agent or cluster configuration with which cluster can be set to detect when a volume is lost or becomes inaccessible 

Additional info:

Comment 2 Reid Wahl 2020-07-08 05:02:05 UTC
A couple of considerations:
  1) It sounds like we want to monitor a filesystem rather than monitoring a disk. This distinction is important because a disk could be used without a filesystem (e.g., a raw disk for an Oracle database).
  2) If we want to monitor filesystems (e.g., the root filesystem), then do we even need a new resource agent? It's worth considering whether we could add a new "monitor_only" attribute to the Filesystem resource agent. Then a user can set:

        # pcs resource create <rsc_name> ocf:heartbeat:Filesystem monitor_only=true op monitor interval=<interval> timeout=<timeout> OCF_CHECK_LEVEL=20 on-fail=fence

     With `monitor_only=true`, the resource agent would not mount the filesystem during the start operation but would perform the monitor operation.
     With `op monitor ... OCF_CHECK_LEVEL=20 on-fail=fence` (existing options), the monitor operation would perform a read/write test. The node would self-fence if the filesystem is not mounted or if I/O fails.


kgaillot said in parent BZ 1854340:

> It is expected behavior that loss of the root volume (or any other disk volume) 
> will not be detected automatically. If that is desired, a cluster resource must 
> be configured to monitor it. There is no agent currently available for that 
> purpose, so it would have be written (you could create an RFE BZ for that if 
> desired). Such an agent would work like the ocf:heartbeat:ethmonitor agent -- 
> it would not mount and unmount the root volume (as ocf:heartbeat:Filesystem 
> would), but would only run a recurring monitor and set a node attribute 
> accordingly. That node attribute could then be used in location constraints if 
> you want to move resources away from a node that loses the monitored disk 
> volume.
> 
> Without such a resource, Pacemaker will detect a lost disk volume only if it 
> needs to write to that volume for its own purposes, or if a resource monitor 
> that depends on the volume fails. This should eventually happen if 
> /var/lib/pacemaker is on the volume. The DC node will eventually attempt to 
> write the current CIB to disk, which will fail, then Pacemaker on the DC should 
> immediately exit without restarting, and the other nodes should take over and 
> fence the former DC.

There is a difference between kgaillot's suggestion and the approach that I suggested above.
  - My suggested approach would cause a node to self-fence if the monitor fails (when `on-fail=fence` is set), which would initiate recovery.
  - kgaillot's approach would change the value of a Pacemaker node attribute if the monitor fails. Then the cluster would initiate recovery in whatever way the user configured it to do so.

The attribute approach seems more flexible, but if we can solve the problem with an existing resource agent, then maybe that's desirable.

Comment 3 Seunghwan Jung 2020-07-08 05:24:12 UTC
Hi Reid,

Yes, it is filesystem to monitor. Thank you for clarifying that.
I think it is a good idea to use an existing resource with a small modification.

Comment 4 Reid Wahl 2020-07-08 06:37:10 UTC
Discussed with Hwanii in IRC. He pointed out that when the root LV is suspended, resource monitors won't work. That makes sense. The resource agents reside on the root filesystem, and Pacemaker presumably needs to write to the root filesystem when updating the CIB.

I tested it out with a new monitor_only attribute and confirmed. The resource failure is not detected until the LV is resumed (un-suspended) on the DC.

I think a resource agent-based approach isn't going to work for monitoring the root filesystem. Even if we made a resource agent that updates an attribute with a date and we try to check "how recently was the attribute updated?", I don't think we can make the OTHER node initiate fencing based on the lack of an attribute update. The local node can't decide to fence itself without access to the root FS.

Comment 5 Ken Gaillot 2020-07-08 14:43:15 UTC
Hi all,

Those are all good points.

It occurs to me that for Reid's approach on a non-root filesystem, the current Dummy agent is sufficient -- just set the "state" parameter to a writeable location on the target filesystem, and the dummy resource will be unable to perform any action successfully if the filesystem is unavailable (or unwriteable).

For the root filesystem, it does seem that the most practical solution is to rely on pacemaker's internal CIB write failure handling, so Bug 1854340 should be the focus.

However if there is an independent filesystem available, I do think it would be possible to do this via an agent. It would have to be an LSB-style (not OCF) agent, written in a compiled language (such as C, so it doesn't need a script interpreter from the root filesystem) and stored on the separate filesystem. Pacemaker could launch the agent from the separate filesystem even if the root filesystem were unavailable -- unless of course both filesystems had a single point of failure, such as a shared I/O controller. (It would have to be LSB because that is the only standard that Pacemaker allows you to specify the full path for. A compiled OCF agent would be fine if Pacemaker supported specifying a path to it, or if /usr/lib/ocf were kept on an independent filesystem.)

Comment 13 Chris Feist 2023-08-02 13:54:43 UTC
Closing this bz as it sounds like storagemon (currently in Tech Preview) is what the customer is interested in - https://bugzilla.redhat.com/show_bug.cgi?id=2055299

*** This bug has been marked as a duplicate of bug 2055299 ***


Note You need to log in before you can comment on or make changes to this bug.