Bug 1481140
| Summary: | [GANESHA] pcs status shows all nodes in started state for ~15 mins even when hit "partition WITHOUT quorum" with IO's still resuming [rhel-7.4.z] | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Oneata Mircea Teodor <toneata> | 
| Component: | pacemaker | Assignee: | Ken Gaillot <kgaillot> | 
| Status: | CLOSED ERRATA | QA Contact: | cluster-qe <cluster-qe> | 
| Severity: | urgent | Docs Contact: | |
| Priority: | urgent | ||
| Version: | 7.4 | CC: | abeekhof, aherr, cfeist, cluster-maint, fwestpha, hsowa, jruemker, jthottan, kgaillot, kkeithle, mjuricek, mnovacek, msaini, nbarcet, rhs-bugs, rkhan, skoduri, storage-qa-internal | 
| Target Milestone: | rc | Keywords: | ZStream | 
| Target Release: | 7.4 | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | pacemaker-1.1.16-12.el7_4.1 | Doc Type: | Bug Fix | 
| Doc Text: | 
       Previously, quorum loss did not trigger Pacemaker to recheck resource placement. As a consequence, in certain situations Pacemaker required a long time, up to the cluster recheck interval, before stopping resources after quorum loss. This happened only when several conditions were met: a node that was correctly shutting down dropped the cluster below the quorum; that node was not running any resources at the time; and a cluster transition was already in progress. With this update, Pacemaker always cancels the current transition when quorum is lost and recalculates resource placement immediately. As a result, the long delay no longer occurs. 
 | 
        
        
        
        Story Points: | --- | 
| Clone Of: | 1464068 | Environment: | |
| Last Closed: | 2017-09-05 11:31:54 UTC | Type: | --- | 
| Regression: | --- | Mount Type: | --- | 
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | 1464068 | ||
| Bug Blocks: | |||
| 
 
        
          Description
        
        
          Oneata Mircea Teodor
        
        
        
        
        
          2017-08-14 08:25:26 UTC
        
       
      
      
      
    Testing procedure (from parent bug): 1. Configure a cluster of at least three nodes, one dummy resource that takes a long time to stop, and at least one other resource. 2. Stop enough nodes so that the cluster is one node away from losing quorum. 3. Put one of the remaining nodes in standby, and wait until it has no resources running on it. 4. Disable the dummy resource so that it initiates a stop, and before it complete the stop, shut down the standby node. Before the change, the cluster will not stop the remaining resource(s) on the active node(s) until the next cluster-recheck-interval. After the change, the cluster will immediate stop all remaining resources. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2017:2587  |