Bug 1747553
| Summary: | Dynamically calculate Pacemaker's cluster recheck interval | ||||||
|---|---|---|---|---|---|---|---|
| Product: | Red Hat Enterprise Linux 8 | Reporter: | Ken Gaillot <kgaillot> | ||||
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | ||||
| Status: | CLOSED ERRATA | QA Contact: | michal novacek <mnovacek> | ||||
| Severity: | medium | Docs Contact: | |||||
| Priority: | high | ||||||
| Version: | 8.1 | CC: | cluster-maint, mnovacek, obenes, phagara, pzimek | ||||
| Target Milestone: | pre-dev-freeze | Keywords: | FutureFeature | ||||
| Target Release: | 8.2 | Flags: | pm-rhel:
mirror+
|
||||
| Hardware: | All | ||||||
| OS: | All | ||||||
| Whiteboard: | |||||||
| Fixed In Version: | pacemaker-2.0.3-1.el8 | Doc Type: | Enhancement | ||||
| Doc Text: |
Feature: Pacemaker now calculates the cluster recheck interval dynamically.
Reason: Previously, Pacemaker's failure-timeout resource meta-attribute, ocf:pacemaker:remote reconnect_interval instance attribute, and time-based configuration rules were not guaranteed to be checked any more often than the value of the cluster-recheck-interval cluster property. When using those features, users would have to know to set a low cluster-recheck-interval, which generates unnecessary system load during normal operation.
Result: Users can configure these features without having to know anything about the cluster-recheck-interval property, and the cluster reacts in a timely manner without adding system load for cluster calculations when not needed.
|
Story Points: | --- | ||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2020-04-28 15:38:28 UTC | Type: | Enhancement | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Bug Depends On: | 1752538 | ||||||
| Bug Blocks: | |||||||
| Attachments: |
|
||||||
|
Description
Ken Gaillot
2019-08-30 19:38:52 UTC
The planned implementation is to have Pacemaker's scheduler calculate the time when these features (if used) would change state, and add the minimum such time to the transition graph. Pacemaker's controller would read this value and set the recheck timer to the minimum of the cluster-recheck-interval and the time remaining until the specified time. Fixed upstream as of https://github.com/ClusterLabs/pacemaker/pull/1890 Note to QA: the dynamic recheck applies to resource failure-timeout, ocf:pacemaker:remote reconnect_interval, and rules with date_expression *except* date expressions with "operation" set to "date_spec". date_spec still relies on cluster-recheck-interval for evaluation. Example pcs commands: # pcs resource update rsc1 meta failure-timeout=200s Set a failure-timeout for rsc1. # pcs node add-remote remote-host-1 reconnect_interval=60s op monitor interval="20s" Create an ocf:pacemaker:remote resource connecting to host "remote-host-1", and attempt reconnections every 60s if a connection attempt fails. # pcs constraint location rsc2 rule '#uname' eq 'node2' and date in_range "2019-09-24 11:00:00 -05:00" to "2019-09-24 11:05:00 -05:00" Set a time-based location constraint so that rsc2 prefers a certain node for 5 minutes. I have checked that failcount is reset after failure-timeout with pacemaker pacemaker-2.0.2-3.el8_1.2.x86_64. --- Test method: failure-timeout is set to 120s meaning that failcount of the resource is set to 0 after two minutes. migration-threshold is set to 1 meaning the resource is moved to other node after the first failure. At the same time location constraint is set for the resource to prefer node1. Resource is initially running on node1. When resource is killed on node1 it is moved to node2 due to migration-threshold=1. However the failure-timeout will reset the count after 10s and the resource should be moved back to node1. Before the fix the failure-timeout will not be honored and the resource will stay on node2 before other action will initiate. Common: ------ Initiallly the resource is running on virt-137 with no failcount (full config: [1]). > [root@virt-137 ~]# pcs resource ... webserver (ocf::heartbeat:apache): Started virt-137 > [root@virt-137 ~]# pcs resource failcount show webserver No failcounts for resource 'webserver' > [root@virt-137 ~]# export PS1='[\D{+%T} \u@\h \W]\$ ' Before the fix (pacemaker-2.0.2-3.el8_1.2.x86_64) ------------------------------------------------- > [+13:32:06 root@virt-137 ~]# date && killall -9 httpd && sleep 30 && crm_resource --wait && pcs resource; pcs resource failcount show webserver Clone Set: locking-clone [locking] Started: [ virt-136 virt-137 ] webserver (ocf::heartbeat:apache): Started virt-136 Failcounts for resource 'webserver' virt-137: 1 > [+13:32:42 root@virt-137 ~]# sleep 120 && pcs resource Clone Set: locking-clone [locking] Started: [ virt-136 virt-137 ] webserver (ocf::heartbeat:apache): Started virt-136 After failure-timeout seconds resource did not move back to original node. It moved after cluster-recheck-interval have passed. After the fix (pacemaker-2.0.3-4.el8.x86_64) -------------------------------------------- > [+14:12:20 root@virt-137 ~]# date && killall -9 httpd && sleep 30 && crm_resource --wait && pcs resource; pcs resource failcount show webserver Wed Feb 5 14:13:07 CET 2020 * Clone Set: locking-clone [locking]: * Started: [ virt-136 virt-137 ] * webserver (ocf::heartbeat:apache): Started virt-136 Failcounts for resource 'webserver' virt-137: 1 > [+14:13:39 root@virt-137 ~]# date && sleep 120 && pcs resource Wed Feb 5 14:13:41 CET 2020 * Clone Set: locking-clone [locking]: * Started: [ virt-136 virt-137 ] * webserver (ocf::heartbeat:apache): Started virt-137 > [+14:15:42 root@virt-137 ~]# date && pcs resource failcount show webserver Wed Feb 5 14:17:24 CET 2020 No failcounts for resource 'webserver' Failcount have been reset correcty after 120s and the resource have been moved back to prefered node. -- [1]: pcs config [root@virt-137 ~]# pcs config Cluster Name: STSRHTS5570 Corosync Nodes: virt-136 virt-137 Pacemaker Nodes: virt-136 virt-137 Resources: Clone: locking-clone Group: locking Resource: dlm (class=ocf provider=pacemaker type=controld) Operations: monitor interval=30s (dlm-monitor-interval-30s) start interval=0s timeout=90s (dlm-start-interval-0s) stop interval=0s timeout=100s (dlm-stop-interval-0s) Resource: lvmlockd (class=ocf provider=heartbeat type=lvmlockd) Attributes: with_cmirrord=1 Operations: monitor interval=30s (lvmlockd-monitor-interval-30s) start interval=0s timeout=90s (lvmlockd-start-interval-0s) stop interval=0s timeout=90s (lvmlockd-stop-interval-0s) Resource: webserver (class=ocf provider=heartbeat type=apache) Attributes: configfile=/etc/httpd/conf/httpd.conf httpd=/usr/sbin/httpd statusurl=http://127.0.0.1/server-status Meta Attrs: failure-timeout=120s migration-threshold=1 Operations: monitor interval=30s (webserver-monitor-interval-30s) start interval=0s timeout=40s (webserver-start-interval-0s) stop interval=0s timeout=60s (webserver-stop-interval-0s) Stonith Devices: Resource: fence-virt-136 (class=stonith type=fence_xvm) Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=virt-136 pcmk_host_map=virt-136:virt-136.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-virt-136-monitor-interval-60s) Resource: fence-virt-137 (class=stonith type=fence_xvm) Attributes: pcmk_host_check=static-list pcmk_host_list=virt-137 pcmk_host_map=virt-137:virt-137.cluster-qe.lab.eng.brq.redhat.com Operations: monitor interval=60s (fence-virt-137-monitor-interval-60s) Fencing Levels: Location Constraints: Resource: webserver Enabled on: Node: virt-137 (score:INFINITY) (id:location-webserver-virt-137-INFINITY) Ordering Constraints: Colocation Constraints: Ticket Constraints: Alerts: No alerts defined Resources Defaults: No defaults set Operations Defaults: No defaults set Cluster Properties: cluster-infrastructure: corosync cluster-name: STSRHTS5570 dc-version: 2.0.2-3.el8_1.2-744a30d655 have-watchdog: false no-quorum-policy: freeze Quorum: Options: I have checked that time based rule for constraint is checked at latest after
failure-timeout seconds as opposed to being checked only after cluster-recheck
interval for pacemaker-2.0.2-3.el8_1.2.x86_64.
---
Test method:
failure-timeout is set to 120s which means that that time-based rule is checked
to be expired (and disabled if it is) after 120s instead of the same happening
after cluster-recheck interval (15min).
To prove it works we do the following:
- start with no time-based constraint
- set resource to prefer node1 with score=100
- set time-based constraint that will last five minutes for the resources to
prefer node2
- resource will be moved to node2 based on time-based constraint
- after five minutes at latest the resource will move back to node1 as time-based
constraint expires and the location constraint takes over
Common:
------
Initiallly the resource is running on virt-136 (full config: [1]).
> [root@virt-136 ~]# pcs resource
webserver (ocf::heartbeat:apache): Started virt-137
> [root@virt-136 ~]# pcs constraint location
Location Constraints:
Resource: webserver
Enabled on:
Node: virt-137 (score:100)
Before the fix (pacemaker-2.0.2-3.el8_1.2.x86_64)
-------------------------------------------------
> [root@virt-136 ~]# pcs constraint location webserver \
rule \
constraint-id=time-based-prefer-virt-136 \
'#uname' eq 'virt-136' \
and date in_range "$(date +'%F %T')" to "$(date --date='5 minutes' +'%F %T')" \
&& crm_resource --wait \
&& date
Mon Feb 10 12:09:48 CET 2020
> [root@virt-136 ~]# date && pcs resource && sleep 300 && pcs resource
Mon Feb 10 12:09:53 CET 2020
webserver (ocf::heartbeat:apache): Started virt-136
webserver (ocf::heartbeat:apache): Started virt-136
# Resource is not moved back to original location after failure-timeout but
# only after cluster-recheck-interval occurs.
> [root@virt-136 ~]# cat /var/log/pacemaker/pacemaker.log
...
Feb 10 12:24:47 virt-136 pacemaker-controld [4710] (do_state_transition) notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
Feb 10 12:24:47 virt-136 pacemaker-controld [4710] (do_state_transition) notice: State transition S_IDLE -> S_POLICY_ENGINE | input=I_PE_CALC cause=C_TIMER_POPPED origin=crm_timer_popped
Feb 10 12:24:47 virt-136 pacemaker-controld [4710] (do_state_transition) info: Progressed to state S_POLICY_ENGINE after C_TIMER_POPPED
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (determine_online_status_fencing) info: Node virt-136 is active
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (determine_online_status) info: Node virt-136 is online
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (determine_online_status_fencing) info: Node virt-137 is active
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (determine_online_status) info: Node virt-137 is online
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (unpack_node_loop) info: Node 1 is already processed
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (unpack_node_loop) info: Node 2 is already processed
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (unpack_node_loop) info: Node 1 is already processed
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (unpack_node_loop) info: Node 2 is already processed
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (common_print) info: fence-virt-136 (stonith:fence_xvm): Started virt-136
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (common_print) info: fence-virt-137 (stonith:fence_xvm): Started virt-137
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (common_print) info: webserver (ocf::heartbeat:apache): Started virt-136
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (RecurringOp) info: Start recurring monitor (30s) for webserver on virt-137
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (LogActions) info: Leave fence-virt-136 (Started virt-136)
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (LogActions) info: Leave fence-virt-137 (Started virt-137)
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (LogAction) notice: * Move webserver ( virt-136 -> virt-137 )
Feb 10 12:24:47 virt-136 pacemaker-schedulerd[4709] (pcmk__log_transition_summary) notice: Calculated transition 11, saving inputs in /var/lib/pacemaker/pengine/pe-input-1147.bz2
Feb 10 12:24:47 virt-136 pacemaker-controld [4710] (do_state_transition) info: State transition S_POLICY_ENGINE -> S_TRANSITION_ENGINE | input=I_PE_SUCCESS cause=C_IPC_MESSAGE origin=handle_response
...
Feb 10 12:24:47 apache(webserver)[18928]: INFO: Attempting graceful stop of apache PID 12493
Feb 10 12:24:49 apache(webserver)[18928]: INFO: apache stopped.
Feb 10 12:24:49 virt-136 pacemaker-execd [4707] (log_finished) info: finished - rsc:webserver action:stop call_id:31 pid:18928 exit-code:0 exec-time:1176ms queue-time:0ms
Feb 10 12:24:49 virt-136 pacemaker-controld [4710] (process_lrm_event) notice: Result of stop operation for webserver on virt-136: 0 (ok) | call=31 key=webserver_stop_0 confirmed=true cib-update=80
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_process_request) info: Forwarding cib_modify operation for section status to all (origin=local/crmd/80)
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: --- 0.15.7 2
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: +++ 0.15.8 (null)
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib: @num_updates=8
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib/status/node_state[@id='1']/lrm[@id='1']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_last_0']: @transition-magic=0:0;8:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @call-id=31, @rc-code=0, @op-status=0, @exec-time=1176
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-136/crmd/80, version=0.15.8)
Feb 10 12:24:49 virt-136 pacemaker-controld [4710] (match_graph_event) info: Action webserver_stop_0 (8) confirmed on virt-136 (rc=0)
Feb 10 12:24:49 virt-136 pacemaker-controld [4710] (te_rsc_command) notice: Initiating start operation webserver_start_0 on virt-137 | action 9
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: --- 0.15.8 2
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: +++ 0.15.9 (null)
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib: @num_updates=9
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_last_0']: @operation_key=webserver_start_0, @operation=start, @transition-key=9:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @transition-magic=-1:193;9:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @call-id=-1, @rc-code=193, @op-status=-1, @last-run=1581333889, @last-rc-change=158133388
Feb 10 12:24:49 virt-136 pacemaker-based [4705] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-137/crmd/40, version=0.15.9)
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: --- 0.15.9 2
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: +++ 0.15.10 (null)
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib: @num_updates=10
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_last_0']: @transition-magic=0:0;9:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @call-id=28, @rc-code=0, @op-status=0, @exec-time=1320
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-137/crmd/41, version=0.15.10)
Feb 10 12:24:50 virt-136 pacemaker-controld [4710] (match_graph_event) info: Action webserver_start_0 (9) confirmed on virt-137 (rc=0)
Feb 10 12:24:50 virt-136 pacemaker-controld [4710] (te_rsc_command) notice: Initiating monitor operation webserver_monitor_30000 on virt-137 | action 10
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: --- 0.15.10 2
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: +++ 0.15.11 (null)
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib: @num_updates=11
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_monitor_30000']: @transition-key=10:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @transition-magic=-1:193;10:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @call-id=-1, @rc-code=193, @op-status=-1, @last-rc-change=1581333890, @exec-time=0
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-137/crmd/42, version=0.15.11)
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: --- 0.15.11 2
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: Diff: +++ 0.15.12 (null)
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib: @num_updates=12
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_perform_op) info: + /cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='webserver']/lrm_rsc_op[@id='webserver_monitor_30000']: @transition-magic=0:0;10:11:0:e8fe8897-a242-49c3-a717-35f87cc5ac47, @call-id=29, @rc-code=0, @op-status=0, @exec-time=100
Feb 10 12:24:50 virt-136 pacemaker-based [4705] (cib_process_request) info: Completed cib_modify operation for section status: OK (rc=0, origin=virt-137/crmd/43, version=0.15.12)
Feb 10 12:24:50 virt-136 pacemaker-controld [4710] (match_graph_event) info: Action webserver_monitor_30000 (10) confirmed on virt-137 (rc=0)
Feb 10 12:24:50 virt-136 pacemaker-controld [4710] (run_graph) notice: Transition 11 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1147.bz2): Complete
Feb 10 12:24:50 virt-136 pacemaker-controld [4710] (do_log) info: Input I_TE_SUCCESS received in state S_TRANSITION_ENGINE from notify_crmd
Feb 10 12:24:50 virt-136 pacemaker-controld [4710] (do_state_transition) notice: State transition S_TRANSITION_ENGINE -> S_IDLE | input=I_TE_SUCCESS cause=C_FSA_INTERNAL origin=not
After the fix (pacemaker-2.0.3-4.el8.x86_64)
--------------------------------------------
> [root@virt-136 ~]# pcs constraint location webserver \
rule \
constraint-id=time-based-prefer-virt-136 \
'#uname' eq 'virt-136' \
and date in_range "$(date +'%F %T')" to "$(date --date='5 minutes' +'%F %T')" \
&& crm_resource --wait \
&& date
Mon Feb 10 11:30:23 CET 2020
> [root@virt-136 ~]# date && pcs resource && sleep 300 && pcs resource
Mon Feb 10 11:30:34 CET 2020
webserver (ocf::heartbeat:apache): Started virt-136
webserver (ocf::heartbeat:apache): Started virt-137
> [root@virt-136 ~]# cat /var/log/pacemaker/pacemaker.log
...
> # time-based rule comes into effect
Feb 10 11:30:23 virt-136 pacemaker-controld[7157]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Feb 10 11:30:23 virt-136 pacemaker-schedulerd[7156]: notice: * Move webserver ( virt-137 -> virt-136 )
Feb 10 11:30:23 virt-136 pacemaker-schedulerd[7156]: notice: Calculated transition 495, saving inputs in /var/lib/pacemaker/pengine/pe-input-1130.bz2
Feb 10 11:30:23 virt-136 pacemaker-controld[7157]: notice: Initiating stop operation webserver_stop_0 on virt-137
Feb 10 11:30:24 virt-136 pacemaker-controld[7157]: notice: Initiating start operation webserver_start_0 locally on virt-136
Feb 10 11:30:25 virt-136 httpd[11637]: Server configured, listening on: port 80
Feb 10 11:30:25 virt-136 pacemaker-controld[7157]: notice: Result of start operation for webserver on virt-136: 0 (ok)
Feb 10 11:30:25 virt-136 pacemaker-controld[7157]: notice: Initiating monitor operation webserver_monitor_30000 locally on virt-136
Feb 10 11:30:25 virt-136 pacemaker-controld[7157]: notice: Result of monitor operation for webserver on virt-136: 0 (ok)
Feb 10 11:30:25 virt-136 pacemaker-controld[7157]: notice: Transition 495 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1130.bz2): Complete
Feb 10 11:30:25 virt-136 pacemaker-controld[7157]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
...
> # time-based rule expires.
Feb 10 11:35:23 virt-136 pacemaker-controld[7157]: notice: State transition S_IDLE -> S_POLICY_ENGINE
Feb 10 11:35:23 virt-136 pacemaker-schedulerd[7156]: notice: * Move webserver ( virt-136 -> virt-137 )
Feb 10 11:35:23 virt-136 pacemaker-schedulerd[7156]: notice: Calculated transition 496, saving inputs in /var/lib/pacemaker/pengine/pe-input-1131.bz2
Feb 10 11:35:23 virt-136 pacemaker-controld[7157]: notice: Initiating stop operation webserver_stop_0 locally on virt-136
Feb 10 11:35:23 virt-136 apache(webserver)[14418]: INFO: Attempting graceful stop of apache PID 11637
Feb 10 11:35:25 virt-136 apache(webserver)[14418]: INFO: apache stopped.
Feb 10 11:35:25 virt-136 pacemaker-controld[7157]: notice: Result of stop operation for webserver on virt-136: 0 (ok)
Feb 10 11:35:25 virt-136 pacemaker-controld[7157]: notice: Initiating start operation webserver_start_0 on virt-137
Feb 10 11:35:26 virt-136 pacemaker-controld[7157]: notice: Initiating monitor operation webserver_monitor_30000 on virt-137
Feb 10 11:35:26 virt-136 pacemaker-controld[7157]: notice: Transition 496 (Complete=3, Pending=0, Fired=0, Skipped=0, Incomplete=0, Source=/var/lib/pacemaker/pengine/pe-input-1131.bz2): Complete
Feb 10 11:35:26 virt-136 pacemaker-controld[7157]: notice: State transition S_TRANSITION_ENGINE -> S_IDLE
...
----
[1]: pcs config
[root@virt-136 ~]# pcs config
Cluster Name: STSRHTS5570
Corosync Nodes:
virt-136 virt-137
Pacemaker Nodes:
virt-136 virt-137
Resources:
Resource: webserver (class=ocf provider=heartbeat type=apache)
Attributes: configfile=/etc/httpd/conf/httpd.conf httpd=/usr/sbin/httpd statusurl=http://127.0.0.1/server-status
Meta Attrs: failure-timeout=120s
Operations: monitor interval=30s (webserver-monitor-interval-30s)
start interval=0s timeout=40s (webserver-start-interval-0s)
stop interval=0s timeout=60s (webserver-stop-interval-0s)
Stonith Devices:
Resource: fence-virt-136 (class=stonith type=fence_xvm)
Attributes: delay=5 pcmk_host_check=static-list pcmk_host_list=virt-136 pcmk_host_map=virt-136:virt-136.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-136-monitor-interval-60s)
Resource: fence-virt-137 (class=stonith type=fence_xvm)
Attributes: pcmk_host_check=static-list pcmk_host_list=virt-137 pcmk_host_map=virt-137:virt-137.cluster-qe.lab.eng.brq.redhat.com
Operations: monitor interval=60s (fence-virt-137-monitor-interval-60s)
Fencing Levels:
Location Constraints:
Resource: webserver
Enabled on:
Node: virt-137 (score:100) (id:location-webserver-virt-137-100)
Ordering Constraints:
Colocation Constraints:
Ticket Constraints:
Alerts:
No alerts defined
Resources Defaults:
No defaults set
Operations Defaults:
No defaults set
Cluster Properties:
cluster-infrastructure: corosync
cluster-name: STSRHTS5570
dc-version: 2.0.3-4.el8-4b1f869f0f
have-watchdog: false
last-lrm-refresh: 1580905822
no-quorum-policy: freeze
Quorum:
Options:
Created attachment 1662954 [details]
pcs cluster report
Cluster report taken after remote resource have not been moved back to original node.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2020:1609 |