Description of problem: Pacemaker consists of multiple daemons, including the controller and the attribute manager, which both elect one node to have a special role (the Designated Controller a.k.a. DC and the attribute writer). When a node needs to be shut down, a "shutdown" transient node attribute is created for it. Transient node attributes are stored both in the CIB and in attribute manager memory. When the DC leaves the cluster, all other nodes remove its transient node attributes from the CIB, including "shutdown". When any node's attribute manager leaves the cluster, its transient node attributes are removed from memory by all other nodes' attribute managers. When a node wins the attribute writer election, it writes out all its transient node attributes to the CIB. This creates a race condition when different nodes are the DC and the writer, and both nodes are shutting down while other nodes remain up. When the DC controller exits, the remaining nodes erase its attributes. However its attribute manager may still be up at this point, and if the former attribute writer leaves at this time, it may win the election for a new attribute writer, and write out its attributes back to the CIB. Since the shutdown attribute is written back out, the next time the node joins the cluster, it will immediately be shut down. Version-Release number of selected component (if applicable): How reproducible: Difficult Steps to Reproduce: 1. Configure a cluster of at least 5 nodes (so that quorum can be retained after shutting down 2). 2. Ensure that different nodes are DC and attribute writer. The DC can be determined with "crmadmin -D". The attribute writer can be determined by searching /var/log/pacemaker/pacemaker.log on all nodes for the most recent "Recorded local node as attribute writer" message. Restart the existing winner to force a new election until this happens. 3. Shut down the DC and attribute writer at the same time. Actual results: Sometimes, the CIB will still have a "shutdown" node attribute for the former DC. This can be checked with "pcs cluster cib" and looking under "transient_attributes" in the "node_state" section for the node. Expected results: The "shutdown" node attribute for the former DC is never present after it leaves the cluster. Additional info: If this can't be reproduced, it can be sanity-checked only.
Fixed upstream as of commit f5263c94
*** Bug 2230133 has been marked as a duplicate of this bug. ***