Bug 1508373

Summary: Correct handling of bundles
Product: Red Hat Enterprise Linux 7 Reporter: Andrew Beekhof <abeekhof>
Component: pacemakerAssignee: Ken Gaillot <kgaillot>
Status: CLOSED ERRATA QA Contact: Ofer Blaut <oblaut>
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: abeekhof, aherr, cluster-maint, mkrcmari, mnovacek
Target Milestone: rcKeywords: ZStream
Target Release: 7.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.18-4.el7 Doc Type: No Doc Update
Doc Text:
Previously, multiple problems occurred when pacemaker handled the new "bundle" feature. As a consequence, scheduling actions related to bundles led to unpredictable outcomes. With this update, these problems have been fixed, and scheduling with bundles now works as expected.
Story Points: ---
Clone Of:
: 1509871 (view as bug list) Environment:
Last Closed: 2018-04-10 15:32:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1509871    

Description Andrew Beekhof 2017-11-01 10:26:51 UTC
Description of problem:

A number of correctness issues were encountered and patched while using bundles to simulate OSP installs with a large numbers of IHA computes.

- [pacemaker] allow resources to stop before their state is known everywhere
- [pacemaker] ensure remote nodes get probed
- [pacemaker] Do not probe connection resources until the container is active
- [pacemaker] Exclude resources and nodes from the symmetric_default constraint in some circumstances
- [pacemaker] Do not always expire failed operations of nested remotes 
- [bundles] turn on stderr logging so 'docker logs' works
- [bundles] correctly populate allowed_nodes
- [bundles] Only wait for other containers on the same node to be probed 
- [bundles] There is no need for port mapping directives when net=host is specified 

These will all need to be patched in time for OSP12 GA.
Even the ones that look like optimizations prevent the cluster from getting into 'stuck' states where it cannot make progress towards service recovery.

Comment 2 Andrew Beekhof 2017-11-01 10:36:24 UTC
http://github.com/beekhof/pacemaker
 '~' represents a related but optional patch


+ 7bc58b4: Fix: PE: Have bundles log to stderr so that 'docker logs' works 
~ bc4228d: Test: PE: Resources are allowed to stop before their state is known everywhere 
+ b322110: Fix: PE: Resources are allowed to stop before their state is known everywhere 
~ 058d45e: Fix: PE: Use the node we already have and know isnt NULL 
~ 02defd0: PE: Flag resources that are acting as remote nodes 
~ bec14be: Test: PE: Bare metal remotes _can_ run resources now and must be probed 
+ cafc6b1: Fix: PE: Bare metal remotes _can_ run resources now and must be probed 
~ bee6a66: Test: PE: Bundles only need to wait for other containers on the same node to be probed 
+ 03d40c0: Fix: PE: Bundles only need to wait for other containers on the same node to be probed 
~ 929ed9b: Test: PE: There is no need for port mapping directives when net=host is specified 
+ b4321a7: Fix: PE: There is no need for port mapping directives when net=host is specified 
+ 40aaa36: Fix: PE: Do not always expire failed operations of nested remotes 
+ e1e81ae: Fix: PE: Consolidate REMOTE_CONTAINER_HACK logic 
~ f764f36: Test: PE: Exclude resources and nodes from the symmetric_default constraint in some circumstances 
+ cacbac0: Fix: PE: Exclude resources and nodes from the symmetric_default constraint in some circumstances 
~ b2ca8d5: Test: PE: Do not probe connection resources until the container is active 
+ c3d4ec0: Fix: PE: Do not probe connection resources until the container is active 
+ 2bf3f0b: Log: PE: Detailed resource information should include connection resource state 
~ d1d5643: Log: PE: Trace logging for generated bundle resource xml 
~ ccc8944: Log: PE: Remove overly noisey developmental logging 
~ bb5a731: Log: PE: Improved comment 
~ c0180b4: Test: PE: Only pass requests for promote/demote flags onto the bundle's child 
+ 4ee68a2: Fix: PE: Only pass requests for promote/demote flags onto the bundle's child

Comment 3 Andrew Beekhof 2017-11-01 10:46:35 UTC
Oh, and:

+ b0ca9a1: Fix: Tools: Allow crm_resource to operate on anonymous clones in unknown states

Comment 4 Andrew Beekhof 2017-11-02 05:18:11 UTC
Also:

~ 1fa28f0: Test: PE: Improved logging of reasons for stop/restart actions 
~ 837adae: Log: PE: Improved logging of reasons for stop/restart actions 
+ 3a34fed: Fix: PE: Allow all resources to stop prior to probes completing 
+ 15208f7: Fix: PE: Correctly defer processing of resources inside containers

Comment 11 errata-xmlrpc 2018-04-10 15:32:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0860