Description of problem: We have three Percona servers running in master/master mode. There's a floating VIP in front. This percona_vip is managed by pacemaker & corosync for HA purpose, using RA "IPaddr2", provider "heartbeat", and standard "ocf". How to let this RA to run user defined monitor script? There is a parameter `monitor_script` in monitor operation existing in derived/old version of IPaddr2 script, which let use define the path of monitor script. However, it doesn't work well with the setting of `start-delay` parameter. In our tests, the monitor script was executed right after the resource was restarted even if we set `start-delay` of monitor operation to 400 seconds. Version-Release number of selected component (if applicable): pacemaker-libs-1.1.13-10.el7_2.4.x86_64 pacemaker-cluster-libs-1.1.13-10.el7_2.4.x86_64 pacemaker-cli-1.1.13-10.el7_2.4.x86_64 pacemaker-1.1.13-10.el7_2.4.x86_64 How reproducible: easy to reproduce Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
I'm not sure there's a way currently to do what you want. The monitor_scripts parameter, as far as I know, is only implemented for VirtualDomain resources. While it would be possible to implement something similar in any resource agent, it's probably not the best place to do that sort of thing. Also, most resource agents execute a monitor as part of the start and stop operations. Setting a start-delay in pacemaker won't affect what the resource agent does internally. Pacemaker does offer a generic capability similar to monitor_scripts that can work with any resource agent. I've never used it myself, but here's my understanding: There is a resource meta-attribute "container" that was originally designed for use with nagios checks. Pacemaker supports the "nagios" class of resources (as opposed to "ocf", "systemd", etc.) to execute nagios checks. Nagios checks are essentially just a monitor; pacemaker implements start and stop basically as null operations. When a resource has the "container" meta-attribute, the resource is started, stopped, and monitored normally, but if the monitor fails, the resource specified by "container" is recovered (rather than the resource itself). Also, the resource will be colocated with the container resource and ordered relative to it. The intended use case was to have a resource that creates a virtual guest (such as VirtualDomain or Xen) and a nagios check for a service that runs inside that virtual guest. The nagios check would be configured with "container" set to the guest resource. The cluster would start the guest, then "start" the nagios resource. The nagios check would run as a normal recurring monitor, and if it failed, the guest resource would be recovered. You could potentially use that capability here. You could write either a custom nagios check, or a custom OCF resource, implementing your extended monitor, and set its "container" meta-attribute to the percona_vip resource.
Due to limited development resources and the existing workaround of using the "container" meta-attribute, we will not implement anything new for this. If you have any questions or encounter any problems when trying the "container" approach, let me know.