Bug 1638580

Summary: Cluster needlessly tears down remote connections when container status is unknown
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Beekhof <abeekhof>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED DUPLICATE QA Contact: cluster-qe <cluster-qe>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 7.6CC: abeekhof, cluster-maint
Target Milestone: rc   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-10-12 14:19:49 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Andrew Beekhof 2018-10-12 01:38:27 UTC
Description of problem:

A cleanup of a bundle's container causes the remote connection to be torn down before the cluster establishes the containers current state.


Version-Release number of selected component (if applicable):


How reproducible:

100%


Steps to Reproduce:
1. create a bundle
2. erase the container's operation history

Actual results:

1. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:   notice: te_rsc_command:    Initiating monitor operation rabbitmq-bundle-docker-0_monitor_0 on atl1-ctl01 | action 35
2. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:   notice: te_rsc_command:    Initiating stop operation rabbitmq-bundle-0_stop_0 on atl1-ctl01 | action 52
3. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:  warning: status_from_rc:    Action 35 (rabbitmq-bundle-docker-0_monitor_0) on atl1-ctl01 failed (target: 7 vs. rc: 0): Error
4. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:     info: abort_transition_graph:    Transition aborted by operation rabbitmq-bundle-docker-0_monitor_0 'modify' on atl1-ctl01: Event failed | magic=0:0;35:488:7:b1378773-47a7-495a-94d7-99c8b87e85d3 cib=0.251.32 source=match_graph_event:310 complete=false
5. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:     info: match_graph_event:    Action rabbitmq-bundle-docker-0_monitor_0 (35) confirmed on atl1-ctl01 (rc=0)

Expected results:

2 should be ordered after 5, and in this case never run because of 4

1. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:   notice: te_rsc_command:    Initiating monitor operation rabbitmq-bundle-docker-0_monitor_0 on atl1-ctl01 | action 35
3. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:  warning: status_from_rc:    Action 35 (rabbitmq-bundle-docker-0_monitor_0) on atl1-ctl01 failed (target: 7 vs. rc: 0): Error
4. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:     info: abort_transition_graph:    Transition aborted by operation rabbitmq-bundle-docker-0_monitor_0 'modify' on atl1-ctl01: Event failed | magic=0:0;35:488:7:b1378773-47a7-495a-94d7-99c8b87e85d3 cib=0.251.32 source=match_graph_event:310 complete=false
5. Sep 24 12:37:04 [360510] atl1-ctl02.mgmt.geix.cloud.ge.com       crmd:     info: match_graph_event:    Action rabbitmq-bundle-docker-0_monitor_0 (35) confirmed on atl1-ctl01 (rc=0)

Additional info:

See attached scheduler input for test case
Upstream patch: https://github.com/ClusterLabs/pacemaker/commit/407f524

Comment 3 Ken Gaillot 2018-10-12 14:19:49 UTC

*** This bug has been marked as a duplicate of bug 1448467 ***