Bug 1499217

Summary: Cleanup of bundle resource is incomplete
Product: Red Hat Enterprise Linux 7 Reporter: Damien Ciabrini <dciabrin>
Component: pacemakerAssignee: Andrew Beekhof <abeekhof>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.4CC: abeekhof, aherr, ahrechan, chjones, cluster-maint, kgaillot, mkrcmari, ohochman, ushkalim
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: 7.5   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: pacemaker-1.1.18-4.el7 Doc Type: No Doc Update
Doc Text:
Previously, the "pcs resource cleanup" command ignored stopped child clone resources of a bundle. Consequently, it was not possible to erase the state of the resources. With this update, Pacemaker now recognizes stopped clone resources. As a result, the pcs tool now works correctly with bundles when cleaning up.
Story Points: ---
Clone Of:
: 1509874 1514520 (view as bug list) Environment:
Last Closed: 2018-04-10 15:32:51 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1494455, 1509874, 1514520    
Attachments:
Description Flags
CIB before pcs cleanup resource galera
none
CIB after pcs resource cleanup galera
none
output of pcs resource cleanup galera
none
galera configuration
none
crm_report of the unexpected restart none

Description Damien Ciabrini 2017-10-06 11:48:57 UTC
Description of problem:
I'm running a galera resource in a bundle with "op promote on-fail=block".
I'm forcing the galera server to fail during promotion, which correctly gives me a resource is FAILED state on one node, and "blocked" from restarting.

Now when doing "pcs resource cleanup galera", I can see that the failcount on the resource is correctly cleaned up:

[root@centos2 ~]# pcs resource cleanup galera                                                                                                                                                                             
Cleaning up galera:0 on galera-bundle-0, removing fail-count-galera                                                                                                                                                       
Cleaning up galera:1 on galera-bundle-1, removing fail-count-galera
Cleaning up galera:2 on galera-bundle-2, removing fail-count-galera

but the resource still shows up as "FAILED (blocked)" in pcs status.

Attached are two dumps of the CIB before the cleanup, and after the cleanup.

When diff -u the two files, I can see the following diff:
       <transient_attributes id="galera-bundle-2">                                                                                                                                                                        
         <instance_attributes id="status-galera-bundle-2">                                                                                                                                                                
-          <nvpair id="status-galera-bundle-2-fail-count-galera" name="fail-count-galera" value="1"/>                                                                                                                     
           <nvpair id="status-galera-bundle-2-last-failure-galera" name="last-failure-galera" value="1507289301"/>                                                                                                        
         </instance_attributes>                                                                                                                                                                                          
       </transient_attributes>                                                                                                                                                                                            

Showing that the fail-count is cleaned up, but apparently the last-failure is not.


Version-Release number of selected component (if applicable):
pacemaker-1.1.16-12.el7_4.3

How reproducible:
Always

Steps to Reproduce:
There might be simpler reproducer, but here's the procedure with equivalent packages from RDO and the galera resource.

On a three node cluster centos1, centos2, centos3

1. pull the container image on all nodes
  docker pull docker.io/tripleomaster/centos-binary-mariadb:passed-ci-test

2. prepare the hosts
  touch /foo # create a empty file on the host
  # install the attached galera.cnf in /etc/my.cnf.d/galera.cnf and adapt the host names 

3. create a bundle
  pcs resource bundle create galera-bundle container docker image=docker.io/tripleomaster/centos-binary-mariadb:passed-ci-test replicas=3 masters=3 network=host options="--user=root --log-driver=journald" run-command="/usr/sbin/pacemaker_remoted" network control-port=3123 storage-map id=map1 source-dir=/foo target-dir=/etc/libqb/force-filesystem-sockets options=ro storage-map id=map2 source-dir=/etc/my.cnf.d/galera.cnf target-dir=/etc/my.cnf.d/galera.cnf options=ro storage-map id=map3 source-dir=/var/lib/mysql target-dir=/var/lib/mysql options=rw --disabled

4. create the galera resource inside the bundle (adapt the host names)
  pcs resource create galera galera enable_creation=true wsrep_cluster_address='gcomm://centos1,centos2,centos3' cluster_host_map='centos1:centos1;centos2:centos2;centos3:centos3' op promote on-fail=block meta container-attribute-target=host bundle galera-bundle

5. start a first time the bundle to bootstrap the galera cluster
  pcs resource enable galera-bundle

6. once all nodes are in master, stop the bundle to stop the galera cluster
  pcs resource disable galera-bundle

7. on third node, break galera internals to force a failure at next restart
  dd if=/dev/null of=/var/lib/mysql/gvwstate.dat

8. restart the bundle and wait for the galera resource on centos3 to FAIL
  pcs resource enable galera-bundle
  

Actual results:
resource does is still blocked and out of pacemaker's control after cleanup. 

Expected results:
resource should be managed again by pacemaker (be in "Slave" state after the clean up and pacemaker should resume its scheduling). 

Additional info:

Comment 2 Damien Ciabrini 2017-10-06 11:51:42 UTC
Created attachment 1335243 [details]
CIB before pcs cleanup resource galera

Comment 3 Damien Ciabrini 2017-10-06 11:52:24 UTC
Created attachment 1335244 [details]
CIB after pcs resource cleanup galera

Comment 4 Damien Ciabrini 2017-10-06 11:52:56 UTC
Created attachment 1335245 [details]
output of pcs resource cleanup galera

Comment 5 Damien Ciabrini 2017-10-06 12:01:37 UTC
Created attachment 1335281 [details]
galera configuration

Comment 6 Ken Gaillot 2017-10-06 14:20:47 UTC
As with clones, the upstream recommendation is to always operate on the bundle resource, never its primitive. I think pcs automatically translates it for clones, and it might be a good idea to do that with bundles, too. But I agree this is an odd outcome worth looking into.

Comment 7 Andrew Beekhof 2017-10-09 10:40:52 UTC
(In reply to Ken Gaillot from comment #6)
> As with clones, the upstream recommendation is to always operate on the
> bundle resource, never its primitive. I think pcs automatically translates
> it for clones, and it might be a good idea to do that with bundles, too. But
> I agree this is an odd outcome worth looking into.

I don't buy this.
crm_resource/pcs automatically escalates the request from the primitive to the clone.
The only difference here is that it doesn't go all the way up to the bundle.

Comment 11 Ken Gaillot 2017-11-06 17:11:03 UTC
*** Bug 1505909 has been marked as a duplicate of this bug. ***

Comment 12 Damien Ciabrini 2017-11-07 20:12:16 UTC
As noted in https://bugzilla.redhat.com/show_bug.cgi?id=1505909, comment #7, I tested a scratch build with the provided patch and I can now clean errors by doing "pcs resource cleanup galera-bundle". I can also reprobe the state of unmanaged resource.

However, I now face another issue, in that when I "pcs resource manage galera-bundle" after the cleanup, a restart operation is triggered, which is unexpected and breaks the idiomatic way of "reprobing the current state of a resource before gicing back controller to pacemaker".

Comment 13 Damien Ciabrini 2017-11-07 20:18:08 UTC
Created attachment 1349106 [details]
crm_report of the unexpected restart

Attached crm_report of the unexpected restart:
Nov 07 21:01:15 ra1 crmd[5111]:   notice: State transition S_IDLE -> S_POLICY_ENGINE
Nov 07 21:01:15 ra1 pengine[5110]:   notice:  * Restart    galera:2                   (          Master galera-bundle-2 )

Comment 14 Ken Gaillot 2017-11-07 21:37:23 UTC
(In reply to Damien Ciabrini from comment #12)
> As noted in https://bugzilla.redhat.com/show_bug.cgi?id=1505909, comment #7,
> I tested a scratch build with the provided patch and I can now clean errors
> by doing "pcs resource cleanup galera-bundle". I can also reprobe the state
> of unmanaged resource.
> 
> However, I now face another issue, in that when I "pcs resource manage
> galera-bundle" after the cleanup, a restart operation is triggered, which is
> unexpected and breaks the idiomatic way of "reprobing the current state of a
> resource before gicing back controller to pacemaker".

To clarify, the scratch build is for the z-stream Bug 1509874. Will comment there.

Comment 15 Artem Hrechanychenko 2017-11-17 16:11:06 UTC
Move to POST because in latest puddle - 
http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/2017-11-16.4/

pacemaker-1.1.16-12.el7_4.4.x86_64

Comment 16 Omri Hochman 2017-11-17 16:25:45 UTC
(In reply to Artem Hrechanychenko from comment #15)
> Move to POST because in latest puddle - 
> http://download.lab.bos.redhat.com/rcm-guest/puddles/OpenStack/12.0-RHEL-7/
> 2017-11-16.4/
> 
> pacemaker-1.1.16-12.el7_4.4.x86_64

Switch back to ON_QA as this is RHEL BZ .
I'm cloning this bug to be verified on OSP12,  as it blocks the replace controller scenario.

Comment 17 pkomarov 2018-01-11 13:02:14 UTC
Resolved , cluster retains active control after galera node resumes its active status :

after Description steps : 

galera resoure is active on all nodes 

Full list of resources:

   galera-bundle-0	(ocf::heartbeat:galera):	Master controller-0
   galera-bundle-1	(ocf::heartbeat:galera):	Master controller-1
   galera-bundle-2	(ocf::heartbeat:galera):	Master controller-2

Daemon Status:
  corosync: active/enabled
  pacemaker: active/enabled
  pcsd: active/enabled


and as indicated by the logs : 

process_lrm_event:  Result of monitor operation for galera-bundle-docker-0 on controller-0: 0 (ok)
remote_node_up:     Announcing pacemaker_remote node galera-bundle-0
erase_status_tag:   Deleting lrm status entries for galera-bundle-0 | xpath=//node_state[@uname='galera-bundle-0']/lrm
erase_status_tag:   Deleting transient_attributes status entries for galera-bundle-0 | xpath=//node_state[@uname='galera-bundle-0']/transient_attributes
crm_update_peer_state_iter: Node galera-bundle-0 state is now member | nodeid=0 previous=lost source=remote_node_up
peer_update_callback:       Remote node galera-bundle-0 is now member (was lost)
send_remote_state_message:  Notifying DC controller-2 of pacemaker_remote node galera-bundle-0 coming up

Comment 20 errata-xmlrpc 2018-04-10 15:32:51 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:0860