Bug 1657138
Summary: | rabbitmq-cluster: regression when restarting inside a bundle [rhel-7.6.z] | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 7 | Reporter: | RAD team bot copy to z-stream <autobot-eus-copy> |
Component: | resource-agents | Assignee: | Oyvind Albrigtsen <oalbrigt> |
Status: | CLOSED ERRATA | QA Contact: | pkomarov |
Severity: | urgent | Docs Contact: | |
Priority: | urgent | ||
Version: | 7.6 | CC: | abeekhof, agk, aherr, cfeist, chjones, cluster-maint, dciabrin, fdinitto, jeckersb, michele, mlisik, msuchane, oalbrigt, pkomarov, plemenko, salmy, sasha, sbradley |
Target Milestone: | rc | Keywords: | Triaged, ZStream |
Target Release: | --- | ||
Hardware: | All | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | resource-agents-4.1.1-12.el7_6.7 | Doc Type: | If docs needed, set a value |
Doc Text: |
When a containerized RabbitMQ cluster was stopped entirely, but the containers were not stopped, the RabbitMQ resource agent failed to update the Pacemaker view of the RabbitMQ cluster. Consequently, RabbitMQ servers failed to restart the cluster. With this update, the RabbitMQ resource agent cleans up cluster attributes on RabbitMQ shutdown, and, as a result, the described problem no longer occurs.
|
Story Points: | --- |
Clone Of: | 1656368 | Environment: | |
Last Closed: | 2018-12-17 17:08:25 UTC | Type: | --- |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: | |||
Bug Depends On: | 1656368 | ||
Bug Blocks: | 1655764 |
Description
RAD team bot copy to z-stream
2018-12-07 08:21:57 UTC
Verified ,
(undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'docker exec `docker ps -f name=rabbitmq-bundle -q` sh -c "hostname -f;rpm -q resource-agents"'
controller-2 | SUCCESS | rc=0 >>
controller-2.localdomain
resource-agents-4.1.1-12.el7_6.7.x86_64
controller-1 | SUCCESS | rc=0 >>
controller-1.localdomain
resource-agents-4.1.1-12.el7_6.7.x86_64
(undercloud) [stack@undercloud-0 ~]$ ansible controller -b -mshell -a'cat /etc/redhat-release'
controller-2 | SUCCESS | rc=0 >>
Red Hat Enterprise Linux Server release 7.6 (Maipo)
controller-1 | SUCCESS | rc=0 >>
Red Hat Enterprise Linux Server release 7.6 (Maipo)
#injecting the cib change to trigger a restart:
[root@controller-1 ~]# diff NEW_cibadmin.xml ORG_cibadmin.xml
67c67
< <nvpair id="rabbitmq-instance_attributes-set_policy" name="set_policy" value="ha-all ^(?!amq\.).* {"ha-mode":"exactly","ha-params":2}"/>
---
> <nvpair id="rabbitmq-instance_attributes-set_policy" name="set_policy" value="ha-all ^(?!amq\.).* {"ha-mode":"all"}"/>
[root@controller-1 ~]# cibadmin --replace --xml-file NEW_cibadmin.xml
#crm_mon:
...
Docker container set: rabbitmq-bundle [192.168.24.1:8787/rhosp13/openstack-rabbitmq:pcmklatest]
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopping controller-2
...
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Starting controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Stopped controller-2
...
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Starting controller-2
...
rabbitmq-bundle-1 (ocf::heartbeat:rabbitmq-cluster): Started controller-1
rabbitmq-bundle-2 (ocf::heartbeat:rabbitmq-cluster): Started controller-2
#a view inside a rabbitmq-bundle container :
()[root@controller-2 /]# ps -ef|more
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Nov27 ? 00:00:47 pcmk-init
root 13 1 0 Nov27 ? 00:24:45 /usr/sbin/pacemaker_remoted
rabbitmq 150 1 0 Nov27 ? 00:01:42 /usr/lib64/erlang/erts-7.3.1.4/bin/epmd -daemon
root 34579 1 0 Nov27 ? 00:00:00 sh -c /usr/sbin/rabbitmq-server > /var/log/rabbitmq/startup_log 2> /var/log/rabbit
mq/startup_err
root 34582 34579 0 Nov27 ? 00:00:00 /bin/sh /usr/sbin/rabbitmq-server
root 34602 34582 0 Nov27 ? 00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 34608 34602 0 Nov27 ? 00:00:00 /bin/sh /usr/lib/rabbitmq/bin/rabbitmq-server
rabbitmq 34841 34608 30 Nov27 ? 3-21:03:38 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -W w -A 256 -K true -P 1048576 -K tr
ue -B i -- -root /usr/lib64/erlang -progname erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.15/ebin -
noshell -noinput -s rabbit boot -sname rabbit@controller-2 -boot start_sasl -config /etc/rabbitmq/rabbitmq -kernel inet_default_connec
t_options [{nodelay,true}] -kernel inet_default_connect_options [{nodelay,true}] -sasl errlog_type error -sasl sasl_error_logger false
-rabbit error_logger {file,"/var/log/rabbitmq/rabbit"} -rabbit sasl_error_logger {file,"/var/log/rabbitmq/rabbit@con
troller-2-sasl.log"} -rabbit enabled_plugins_file "/etc/rabbitmq/enabled_plugins" -rabbit plugins_dir "/usr/lib/rabbitmq/plugins:/usr/
lib/rabbitmq/lib/rabbitmq_server-3.6.15/plugins" -rabbit plugins_expand_dir "/var/lib/rabbitmq/mnesia/rabbit@controller-2-plugins-expa
nd" -os_mon start_cpu_sup false -os_mon start_disksup false -os_mon start_memsup false -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@co
ntroller-2"
rabbitmq 35217 34841 0 Nov27 ? 00:00:08 inet_gethost 4
rabbitmq 35218 35217 0 Nov27 ? 00:00:16 inet_gethost 4
root 220659 0 0 09:52 ? 00:00:00 bash
root 221781 13 12 09:52 ? 00:00:01 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster monitor
root 222150 221781 0 09:52 ? 00:00:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster monitor
root 222151 222150 0 09:52 ? 00:00:00 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster monitor
root 222152 222151 0 09:52 ? 00:00:00 /bin/sh /usr/sbin/rabbitmqctl status
root 222153 222151 0 09:52 ? 00:00:00 sed -n -e s/^.*[S|s]tatus of node \(.*\)\s.*$/\1/p
root 222154 222151 0 09:52 ? 00:00:00 tr -d '
root 222165 222152 0 09:52 ? 00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl 'status'
rabbitmq 222166 222165 72 09:52 ? 00:00:01 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -B -- -root /usr/lib64/erlang -prognam
e erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.15/ebin -noshell -noinput -hidden -boot start_clean
-sasl errlog_type error -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@controller-2" -s rabbit_control_main -nodename rabbit@controller-
2 -extra status
rabbitmq 222279 222166 56 09:52 ? 00:00:00 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -- -root /usr/lib64/erlang -progname e
rl -- -home /var/lib/rabbitmq -- -sname epmd-starter-420005555 -proto_dist "inet_tcp" -noshell -eval halt().
root 222318 220659 0 09:52 ? 00:00:00 ps -ef
root 222319 220659 0 09:52 ? 00:00:00 bash
()[root@controller-2 /]#
()[root@controller-2 /]#
()[root@controller-2 /]#
()[root@controller-2 /]# ps -ef|more
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Nov27 ? 00:00:47 pcmk-init
root 13 1 0 Nov27 ? 00:24:46 /usr/sbin/pacemaker_remoted
rabbitmq 150 1 0 Nov27 ? 00:01:42 /usr/lib64/erlang/erts-7.3.1.4/bin/epmd -daemon
root 220659 0 0 09:52 ? 00:00:00 bash
root 238233 220659 0 10:03 ? 00:00:00 ps -ef
root 238234 220659 0 10:03 ? 00:00:00 more
()[root@controller-2 /]# pstree
pacemaker_remot-+-epmd
`-pacemaker_remot
()[root@controller-2 /]# pstree -apln
pacemaker_remot,1
|-pacemaker_remot,13
`-epmd,150 -daemon
()[root@controller-2 /]# ps -ef|more
UID PID PPID C STIME TTY TIME CMD
root 1 0 0 Nov27 ? 00:00:47 pcmk-init
root 13 1 0 Nov27 ? 00:24:46 /usr/sbin/pacemaker_remoted
rabbitmq 150 1 0 Nov27 ? 00:01:42 /usr/lib64/erlang/erts-7.3.1.4/bin/epmd -daemon
root 220659 0 0 09:52 ? 00:00:00 bash
root 238245 13 56 10:04 ? 00:00:01 /bin/sh /usr/lib/ocf/resource.d/heartbeat/rabbitmq-cluster start
root 238249 238245 0 10:04 ? 00:00:00 /bin/sh /usr/sbin/rabbitmqctl eval rabbit_mnesia:cluster_status_from_mnesia().
root 238250 238245 0 10:04 ? 00:00:00 grep -q ^{ok
root 238264 238249 0 10:04 ? 00:00:00 su rabbitmq -s /bin/sh -c /usr/lib/rabbitmq/bin/rabbitmqctl 'eval' 'rabbit_mnesia
:cluster_status_from_mnesia().'
rabbitmq 238265 238264 99 10:04 ? 00:00:01 /usr/lib64/erlang/erts-7.3.1.4/bin/beam.smp -B -- -root /usr/lib64/erlang -prognam
e erl -- -home /var/lib/rabbitmq -- -pa /usr/lib/rabbitmq/lib/rabbitmq_server-3.6.15/ebin -noshell -noinput -hidden -boot start_clean
-sasl errlog_type error -mnesia dir "/var/lib/rabbitmq/mnesia/rabbit@controller-2" -s rabbit_control_main -nodename rabbit@controller-
2 -extra eval rabbit_mnesia:cluster_status_from_mnesia().
root 238377 220659 0 10:04 ? 00:00:00 ps -ef
root 238378 220659 0 10:04 ? 00:00:00 more
Damien, thanks very much for the draft. I've edited the doc text a bit to make it shorter. If I accidentally mixed up something in it, please let me know. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2018:3832 |