Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.
RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.

Bug 2209436

Summary: ocf:heartbeat:Delay RA fails monitor and stop operations when using default settings
Product: Red Hat Enterprise Linux 8 Reporter: Joshua Baker <jobaker>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED MIGRATED QA Contact: cluster-qe <cluster-qe>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 8.7CC: agk, cluster-maint, fdinitto, oalbrigt, sbradley
Target Milestone: rcKeywords: MigratedToJIRA
Target Release: ---Flags: pm-rhel: mirror+
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 2209433 Environment:
Last Closed: 2023-09-22 20:15:55 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2209433    
Bug Blocks:    

Description Joshua Baker 2023-05-23 21:36:07 UTC
+++ This bug was initially created as a clone of Bug #2209433 +++

Description of problem:
Monitor and Stop operations for the "ocf:heartbeat:Delay" resource fail at default settings ( out of the box config ). This is because the default "mondelay" and "stopdelay" timeouts are the exact same as the timeout period for monitor and stop operations in the cluster. 

Version-Release number of selected component (if applicable):

# rpm -q resource-agents kernel
resource-agents-4.9.0-29.el8_7.3.x86_64
kernel-4.18.0-425.3.1.el8.x86_64

How reproducible:
Monitor failures appear to be 100% faillure. I have had a couple of successful stop operations, but most fail at default configuration:

Steps to Reproduce:

1. Created resource with default settings ( no additional options ), and disabled to run "debug-<operation>" test:
~~~
[root@rhel8-node1 ~]# pcs resource create test-delay Delay
Assumed agent name 'ocf:heartbeat:Delay' (deduced from 'Delay')

[root@rhel8-node1 ~]# pcs resource disable test-delay
~~~

2. Start operation is successful with default settings ( successful ):
~~~
[root@rhel8-node2 ~]# pcs resource debug-start test-delay
Operation force-start for test-delay (ocf:heartbeat:Delay) returned 0 (ok)
~~~

3. Monitor operation times out with default settings:
~~~
[root@rhel8-node1 ~]# pcs resource debug-monitor test-delay
Operation force-check for test-delay (ocf:heartbeat:Delay) could not be executed (Timed Out: Resource agent did not exit within specified timeout)
crm_resource: Error performing operation: Error occurred
~~~

4. Stop operations time out with default settings:
~~~
# Can only be ran after a "debug-start" to start the resource. Otherwise reports as already down:
[root@rhel8-node2 ~]# pcs resource debug-stop test-delay
Operation force-stop for test-delay (ocf:heartbeat:Delay) could not be executed (Timed Out: Resource agent did not exit within specified timeout)
crm_resource: Error performing operation: Error occurred
~~~~~~

- Current default monitor and stop delay times in the RA, match the default timeout periods for "monitor" and "stop" operations:
~~~
[root@rhel8-node1 ~]# rpm -q resource-agents kernel
resource-agents-4.9.0-29.el8_7.3.x86_64
~~~

~~~
$ vim /usr/lib/ocf/resource.d/heartbeat/Delay
----------------------->8--------------------------
 33 OCF_RESKEY_startdelay_default="20"
 34 OCF_RESKEY_stopdelay_default="30"
 35 OCF_RESKEY_mondelay_default="30"
 36 
 37 : ${OCF_RESKEY_startdelay=${OCF_RESKEY_startdelay_default}}
 38 : ${OCF_RESKEY_stopdelay=${OCF_RESKEY_stopdelay_default}}
 39 : ${OCF_RESKEY_mondelay=${OCF_RESKEY_mondelay_default}}
~~~

~~~
[root@rhel8-node2 ~]# pcs config show
----------------------->8--------------------------
  Resource: test-delay (class=ocf provider=heartbeat type=Delay)
    Attributes: test-delay-instance_attributes
      mondelay=10
    Operations:
      monitor: test-delay-monitor-interval-10s
        interval=10s
        timeout=30s <---
      start: test-delay-start-interval-0s
        interval=0s
        timeout=30s <---
      stop: test-delay-stop-interval-0s
        interval=0s
        timeout=30s <---
~~~

So we should probably reduce the default delay for the resource agent for both of these operations. Otherwise they will fail out of the box. 

Actual results:
Start operations are successful.
Monitor operations timed out.
Stop operations timed out.

Expected results:
All operations ( start, stop monitor ) should be successful with a default configuration.

Additional info:
- Shane Bradley++ has pointed out that the resource description is also incorrect. Stop and monitor delays are not set to the same as the start delay. Not sure if this should be update here or in another Bugzilla

~~~
[root@rhel8-node1 ~]# pcs resource describe Delay
Assumed agent name 'ocf:heartbeat:Delay' (deduced from 'Delay')
ocf:heartbeat:Delay - Waits for a defined timespan

This script is a test resource for introducing delay.

Resource options:
  startdelay: How long in seconds to delay on start operation.
  stopdelay: How long in seconds to delay on stop operation. Defaults to "startdelay" if unspecified. <---
  mondelay: How long in seconds to delay on monitor operation. Defaults to "startdelay" if unspecified. <---
~~~

- Both the description discrepancy and this issue were likely introduced in this change, which set default timeouts for stopdelay and startdelay to 30s:

https://github.com/ClusterLabs/resource-agents/commit/baa4cdf6afb9df801d40895f2a9ffcf7d2c8fdae

Comment 2 RHEL Program Management 2023-09-22 20:15:20 UTC
Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug.

Comment 3 RHEL Program Management 2023-09-22 20:15:55 UTC
This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there.

Due to differences in account names between systems, some fields were not replicated.  Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information.

To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer.  You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like:

"Bugzilla Bug" = 1234567

In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information.