Bug 1657138

Summary: rabbitmq-cluster: regression when restarting inside a bundle [rhel-7.6.z]
Product: Red Hat Enterprise Linux 7 Reporter: RAD team bot copy to z-stream <autobot-eus-copy>
Component: resource-agentsAssignee: Oyvind Albrigtsen <oalbrigt>
Status: CLOSED ERRATA QA Contact: pkomarov
Severity: urgent Docs Contact:
Priority: urgent    
Version: 7.6CC: abeekhof, agk, aherr, cfeist, chjones, cluster-maint, dciabrin, fdinitto, jeckersb, michele, mlisik, msuchane, oalbrigt, pkomarov, plemenko, salmy, sasha, sbradley
Target Milestone: rcKeywords: Triaged, ZStream
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: resource-agents-4.1.1-12.el7_6.7 Doc Type: If docs needed, set a value
Doc Text:
When a containerized RabbitMQ cluster was stopped entirely, but the containers were not stopped, the RabbitMQ resource agent failed to update the Pacemaker view of the RabbitMQ cluster. Consequently, RabbitMQ servers failed to restart the cluster. With this update, the RabbitMQ resource agent cleans up cluster attributes on RabbitMQ shutdown, and, as a result, the described problem no longer occurs.
Story Points: ---
Clone Of: 1656368 Environment:
Last Closed: 2018-12-17 17:08:25 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1656368    
Bug Blocks: 1655764    

Description RAD team bot copy to z-stream 2018-12-07 08:21:57 UTC
This bug has been copied from bug #1656368 and has been proposed to be backported to 7.6 z-stream (EUS).

Comment 3 pkomarov 2018-12-10 10:08:49 UTC
Verified , 

(undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'docker exec `docker ps -f name=rabbitmq-bundle -q`  sh -c "hostname -f;rpm -q resource-agents"'

controller-2 | SUCCESS | rc=0 >>
controller-2.localdomain
resource-agents-4.1.1-12.el7_6.7.x86_64

controller-1 | SUCCESS | rc=0 >>
controller-1.localdomain
resource-agents-4.1.1-12.el7_6.7.x86_64

(undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'cat /etc/redhat-release'

controller-2 | SUCCESS | rc=0 >>
Red Hat Enterprise Linux Server release 7.6 (Maipo)

controller-1 | SUCCESS | rc=0 >>
Red Hat Enterprise Linux Server release 7.6 (Maipo)

#injecting the cib change to trigger a restart:
[root@controller-1 ~]# diff NEW_cibadmin.xml ORG_cibadmin.xml 
67c67
<             <nvpair id="rabbitmq-instance_attributes-set_policy" name="set_policy" value="ha-all ^(?!amq\.).* {&quot;ha-mode&quot;:&quot;exactly&quot;,&quot;ha-params&quot;:2}"/>
---
>             <nvpair id="rabbitmq-instance_attributes-set_policy" name="set_policy" value="ha-all ^(?!amq\.).* {&quot;ha-mode&quot;:&quot;all&quot;}"/>

[root@controller-1 ~]# cibadmin --replace --xml-file NEW_cibadmin.xml
#crm_mon:
...
 Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
   rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):	Started controller-1
   rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):	Stopping controller-2
...

   rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):	Starting controller-1
   rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):	Stopped controller-2
...

   rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):	Started controller-1
   rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):	Starting controller-2
...
   rabbitmq-bundle-1    (ocf::heartbeat:rabbitmq-cluster):	Started controller-1
   rabbitmq-bundle-2    (ocf::heartbeat:rabbitmq-cluster):	Started controller-2


#a view inside a rabbitmq-bundle container : 
()[root@controller-2 /]# ps -ef|more
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 Nov27 ?        00:00:47 pcmk-init
root          13       1  0 Nov27 ?        00:24:45 /usr/sbin/pacemaker_remoted
rabbitmq     150       1  0 Nov27 ?        00:01:42 /usr/lib64/erlang/erts-7.3.1.4/bin/epmd -daemon
root       34579       1  0 Nov27 ?        00:00:00 sh -c /usr/sbin/rabbitmq-server > /var/log/rabbitmq/startup_log 2> /var/log/rabbit
mq/startup_err
root       34582   34579  0 Nov27 ?        00:00:00 /bin/sh /usr/sbin/rabbitmq-server
root       34602   34582  0 Nov27 ?        00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmq-server 
rabbitmq   34608   34602  0 Nov27 ?        00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq   34841   34608 30 Nov27 ?        3-21:03:38 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -W w -A 256 -K true -P 1048576 -K tr
ue -B i -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.15/ebin -
noshell -noinput -s rabbit boot -sname rabbit@controller-2 -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connec
t_options [{nodelay,true}] -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false
 -rabbit error_logger {file,"/var/log/rabbitmq/rabbit"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@con
troller-2-sasl.log"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/
lib/rabbitmq/lib/rabbitmq_server-3.6.15/plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@controller-2-plugins-expa
nd" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@co
ntroller-2"
rabbitmq   35217   34841  0 Nov27 ?        00:00:08 inet_gethost 4
rabbitmq   35218   35217  0 Nov27 ?        00:00:16 inet_gethost 4
root      220659       0  0 09:52 ?        00:00:00 bash
root      221781      13 12 09:52 ?        00:00:01 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster monitor
root      222150  221781  0 09:52 ?        00:00:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster monitor
root      222151  222150  0 09:52 ?        00:00:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster monitor
root      222152  222151  0 09:52 ?        00:00:00 /bin/sh /usr/sbin/rabbitmqctl status
root      222153  222151  0 09:52 ?        00:00:00 sed -n -e s/^.*[S|s]tatus of node \(.*\)\s.*$/\1/p
root      222154  222151  0 09:52 ?        00:00:00 tr -d '
root      222165  222152  0 09:52 ?        00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl  'status'
rabbitmq  222166  222165 72 09:52 ?        00:00:01 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -B -- -root /usr/lib64/erlang -prognam
e erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.15/ebin -noshell -noinput -hidden -boot start_clean 
-sasl errlog_type error -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@controller-2" -s rabbit_control_main -nodename rabbit@controller-
2 -extra status
rabbitmq  222279  222166 56 09:52 ?        00:00:00 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -- -root /usr/lib64/erlang -progname e
rl -- -home /var/lib/rabbitmq -- -sname epmd-starter-420005555 -proto_dist "inet_tcp" -noshell -eval halt().
root      222318  220659  0 09:52 ?        00:00:00 ps -ef
root      222319  220659  0 09:52 ?        00:00:00 bash
()[root@controller-2 /]# 
()[root@controller-2 /]# 
()[root@controller-2 /]# 
()[root@controller-2 /]# ps -ef|more
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 Nov27 ?        00:00:47 pcmk-init
root          13       1  0 Nov27 ?        00:24:46 /usr/sbin/pacemaker_remoted
rabbitmq     150       1  0 Nov27 ?        00:01:42 /usr/lib64/erlang/erts-7.3.1.4/bin/epmd -daemon
root      220659       0  0 09:52 ?        00:00:00 bash
root      238233  220659  0 10:03 ?        00:00:00 ps -ef
root      238234  220659  0 10:03 ?        00:00:00 more

()[root@controller-2 /]# pstree
pacemaker_remot-+-epmd
                `-pacemaker_remot

()[root@controller-2 /]# pstree -apln 
pacemaker_remot,1                  
  |-pacemaker_remot,13
  `-epmd,150 -daemon

()[root@controller-2 /]# ps -ef|more
UID          PID    PPID  C STIME TTY          TIME CMD
root           1       0  0 Nov27 ?        00:00:47 pcmk-init
root          13       1  0 Nov27 ?        00:24:46 /usr/sbin/pacemaker_remoted
rabbitmq     150       1  0 Nov27 ?        00:01:42 /usr/lib64/erlang/erts-7.3.1.4/bin/epmd -daemon
root      220659       0  0 09:52 ?        00:00:00 bash
root      238245      13 56 10:04 ?        00:00:01 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster start
root      238249  238245  0 10:04 ?        00:00:00 /bin/sh /usr/sbin/rabbitmqctl eval rabbit_mnesia:cluster_status_from_mnesia().
root      238250  238245  0 10:04 ?        00:00:00 grep -q ^{ok
root      238264  238249  0 10:04 ?        00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl  'eval' 'rabbit_mnesia
:cluster_status_from_mnesia().'
rabbitmq  238265  238264 99 10:04 ?        00:00:01 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -B -- -root /usr/lib64/erlang -prognam
e erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.15/ebin -noshell -noinput -hidden -boot start_clean 
-sasl errlog_type error -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@controller-2" -s rabbit_control_main -nodename rabbit@controller-
2 -extra eval rabbit_mnesia:cluster_status_from_mnesia().
root      238377  220659  0 10:04 ?        00:00:00 ps -ef
root      238378  220659  0 10:04 ?        00:00:00 more

Comment 6 Marek Suchánek 2018-12-12 18:39:04 UTC
Damien, thanks very much for the draft. I've edited the doc text a bit to make it shorter. If I accidentally mixed up something in it, please let me know.

Comment 7 errata-xmlrpc 2018-12-17 17:08:25 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:3832