Bug 1291108 - Pacemaker cannot make progress if a remote node has never been up
Pacemaker cannot make progress if a remote node has never been up
Status: NEW
Product: Red Hat Enterprise Linux 7
Classification: Red Hat
Component: pacemaker (Show other bugs)
7.2
Unspecified Unspecified
medium Severity high
: rc
: 7.6
Assigned To: Ken Gaillot
cluster-qe@redhat.com
:
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2015-12-13 20:07 EST by Andrew Beekhof
Modified: 2017-12-01 21:05 EST (History)
5 users (show)

See Also:
Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed:
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)

  None (edit)
Description Andrew Beekhof 2015-12-13 20:07:39 EST
Description of problem:

Pacemaker gets into a loop where it tries to start the remote node connection over and over:

Dec 13 19:37:58 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 122: stop overcloud-novacompute-2_stop_0 on overcloud-controller-0
Dec 13 19:37:58 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 798: notify redis_pre_notify_stop_0 on overcloud-controller-2
Dec 13 19:37:58 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 799: notify redis_pre_notify_stop_0 on overcloud-controller-0
Dec 13 19:37:58 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 627: start overcloud-novacompute-2_start_0 on overcloud-controller-1 (local)
Dec 13 19:37:58 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition aborted by deletion of lrm_rsc_op[@id='overcloud-novacompute-2_last_failure_0']: Resource operation removal (cib=0.337.1070, source=te_update_diff:452, path=/cib/status/node_state[@id='2']/lrm[@id='2']/lrm_resources/lrm_resource[@id='overcloud-novacompute-2']/lrm_rsc_op[@id='overcloud-novacompute-2_last_failure_0'], 0)
Dec 13 19:38:55 overcloud-controller-1.localdomain crmd[2856]:    error: Operation overcloud-novacompute-2_start_0: Timed Out (node=overcloud-controller-1, call=55, timeout=60000ms)
Dec 13 19:38:55 overcloud-controller-1.localdomain crmd[2856]:  warning: Action 627 (overcloud-novacompute-2_start_0) on overcloud-controller-1 failed (target: 0 vs. rc: 1): Error
Dec 13 19:38:55 overcloud-controller-1.localdomain crmd[2856]:  warning: Action 627 (overcloud-novacompute-2_start_0) on overcloud-controller-1 failed (target: 0 vs. rc: 1): Error
Dec 13 19:38:55 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition 195 (Complete=9, Pending=0, Fired=0, Skipped=13, Incomplete=60, Source=/var/lib/pacemaker/pengine/pe-input-1374.bz2): Stopped
Dec 13 19:38:56 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 40: stop overcloud-novacompute-2_stop_0 on overcloud-controller-1 (local)
Dec 13 19:38:56 overcloud-controller-1.localdomain crmd[2856]:   notice: Operation overcloud-novacompute-2_stop_0: ok (node=overcloud-controller-1, call=56, rc=0, cib-update=651, confirmed=true)
Dec 13 19:38:56 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 798: notify redis_pre_notify_stop_0 on overcloud-controller-2
Dec 13 19:38:56 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 799: notify redis_pre_notify_stop_0 on overcloud-controller-0
Dec 13 19:38:56 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition aborted by deletion of lrm_rsc_op[@id='overcloud-novacompute-2_last_failure_0']: Resource operation removal (cib=0.337.1074, source=te_update_diff:452, path=/cib/status/node_state[@id='3']/lrm[@id='3']/lrm_resources/lrm_resource[@id='overcloud-novacompute-2']/lrm_rsc_op[@id='overcloud-novacompute-2_last_failure_0'], 0)
Dec 13 19:38:56 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition 197 (Complete=8, Pending=0, Fired=0, Skipped=1, Incomplete=74, Source=/var/lib/pacemaker/pengine/pe-input-1376.bz2): Stopped
Dec 13 19:38:57 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 797: notify redis_pre_notify_stop_0 on overcloud-controller-2
Dec 13 19:38:57 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 798: notify redis_pre_notify_stop_0 on overcloud-controller-0
Dec 13 19:38:57 overcloud-controller-1.localdomain crmd[2856]:   notice: Initiating action 626: start overcloud-novacompute-2_start_0 on overcloud-controller-2
Dec 13 19:39:54 overcloud-controller-1.localdomain crmd[2856]:  warning: Action 626 (overcloud-novacompute-2_start_0) on overcloud-controller-2 failed (target: 0 vs. rc: 1): Error
Dec 13 19:39:54 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition aborted by overcloud-novacompute-2_start_0 'modify' on overcloud-controller-2: Event failed (magic=2:1;626:198:0:da5e2dce-0a3d-4a59-8006-32a19d0d3ecc, cib=0.337.1078, source=match_graph_event:381, 0)
Dec 13 19:39:54 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition aborted by rsc_op.140: Node failure (cib=0.0.0, source=fail_incompletable_actions:101, path=/rsc_op[@id='140'], 0)
Dec 13 19:39:54 overcloud-controller-1.localdomain crmd[2856]:  warning: Action 626 (overcloud-novacompute-2_start_0) on overcloud-controller-2 failed (target: 0 vs. rc: 1): Error
Dec 13 19:39:54 overcloud-controller-1.localdomain crmd[2856]:   notice: Transition 198 (Complete=8, Pending=0, Fired=0, Skipped=13, Incomplete=60, Source=/var/lib/pacemaker/pengine/pe-input-1377.bz2): Stopped

Good: being able to find the remote node again when it comes back
Not good: blocking other recovery in a tight loop

Version-Release number of selected component (if applicable):

Pacemaker 1.1.13-10.el7

How reproducible:

Unclear
Comment 5 Ken Gaillot 2016-06-06 13:49:20 EDT
This will not be addressed in the 7.3 timeframe, marking for 7.4.
Comment 6 Ken Gaillot 2017-01-10 16:47:41 EST
This will not be addressed in the 7.4 timeframe
Comment 7 Ken Gaillot 2017-10-09 13:16:24 EDT
Due to time constraints, this will not make 7.5

Note You need to log in before you can comment on or make changes to this bug.