Verified , #check puppet-tripleo version: (undercloud) $ ansible controller -b -mshell -a'rpm -qa|grep puppet-tripleo-8.2.0-0.20180122224520' controller-1 | SUCCESS | rc=0 >> puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch controller-2 | SUCCESS | rc=0 >> puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch controller-0 | SUCCESS | rc=0 >> puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch #check that ceilometer is configured in HA mode - using a pacemaker VIP: (undercloud) $ ansible controller -b -mshell -a'grep redis /var/log/containers/ceilometer/agent-notification.log' controller-2 | SUCCESS | rc=0 >> 2018-02-14 21:25:35.801 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902 2018-02-14 21:25:35.842 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902 controller-1 | SUCCESS | rc=0 >> 2018-02-14 21:26:00.909 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902 2018-02-14 21:26:00.948 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902 controller-0 | SUCCESS | rc=0 >> 2018-02-14 21:26:05.596 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902 2018-02-14 21:26:05.638 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902 #Before redis master switch: [root@controller-0 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-0 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Master controller-0 redis-bundle-1 (ocf::heartbeat:redis): Slave controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 ip-172.17.0.18 (ocf::heartbeat:IPaddr2): Started controller-2 [root@controller-0 ~]# pcs cluster stop Stopping Cluster (pacemaker)... Stopping Cluster (corosync)... After redis master switch: [root@controller-1 ~]# pcs status Cluster name: tripleo_cluster Stack: corosync Current DC: controller-1 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum Online: [ controller-1 controller-2 ] OFFLINE: [ controller-0 ] Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Stopped redis-bundle-1 (ocf::heartbeat:redis): Master controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 ip-172.17.0.18 (ocf::heartbeat:IPaddr2): Started controller-2 #after redis Master switch we see correct reconnection notification from ceilometer agent , from /var/log/containers/ceilometer/agent-notification.log : 2018-02-15 16:34:18.079 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer 2018-02-15 16:34:19.103 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None: error: [Errno 111] Connection refused 2018-02-15 16:34:20.637 25 INFO oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] Reconnected to AMQP server on controller-2.internalapi.localdomain:5672 via [amqp] client with port 58898. 2018-02-15 16:34:18.079 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer 2018-02-15 16:34:19.103 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None: error: [Errno 111] Connection refused 2018-02-15 16:34:20.637 25 INFO oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] Reconnected to AMQP server on controller-2.internalapi.localdomain:5672 via [amqp] client with port 58898 #check for established tcp connection between the redis server Master (controller-1) and the internal-api Vip holder (controller2) on the redis port : $ ansible 'controller:!controller-0' -b -mshell -a'netstat -tapl|grep redis' [WARNING]: Found both group and host with same name: undercloud controller-2 | SUCCESS | rc=0 >> tcp 0 0 controller-2.local:6379 0.0.0.0:* LISTEN 620626/redis-server tcp 0 0 controller-2.loca:41310 controller-1.local:6379 ESTABLISHED 620626/redis-server tcp 0 0 controller-2.local:6379 controller-1.loca:35200 ESTABLISHED 620626/redis-server controller-1 | SUCCESS | rc=0 >> tcp 0 0 controller-1.local:6379 0.0.0.0:* LISTEN 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56030 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53204 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40292 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:54798 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40224 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:47438 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:45372 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:54144 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40306 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55964 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53208 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56044 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40296 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53218 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40326 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:59172 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40200 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:57170 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55842 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56156 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56214 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:49346 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40290 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55694 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53216 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40192 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40208 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53128 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55434 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53220 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40202 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40322 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40324 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53192 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53224 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40152 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53222 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53172 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55468 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:45380 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56674 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53190 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40252 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53236 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53156 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53168 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53200 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56092 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40194 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40210 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53198 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53132 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:51890 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53212 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:57744 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53238 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40222 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:35422 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56336 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53228 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53240 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:41474 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55804 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:56332 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53196 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40320 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53226 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:40254 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53210 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55786 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55780 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55902 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53152 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53164 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55590 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:42068 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:34690 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53214 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:53230 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:41656 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:41310 ESTABLISHED 254914/redis-server tcp 0 0 controller-1.local:6379 controller-2.loca:55652 ESTABLISHED 254914/redis-server
Re-opening this bug because the issue is back. So looks like regression. I see now the following errors in the gnocchi api log file: I have talked to Mehdi Abaakouk (sileht) and he pointed me to this bug saying that it's a pacemaker issue. Gnocchi is connecting to the wrong redis node that's means pacemaker fail to close redis connection before switching the master node I work with puppet-tripleo-8.3.2-6.el7ost.noarch ------------------------------------------------------------ File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start self._start() File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start self.heartbeat() File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat value=self.STILL_ALIVE) File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures cause=e) File "/usr/lib/python2.7/site-packages/tooz/utils.py", line 225, in raise_with_cause excutils.raise_with_cause(exc_cls, message, *args, **kwargs) File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 143, in raise_with_cause six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause')) File "/usr/lib/python2.7/site-packages/six.py", line 718, in raise_from raise value ToozError: You can't write against a read only slave. 2018-06-10 10:40:35,748 [26] ERROR gnocchi.utils: Unable to initialize coordination driver Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call result = fn(*args, **kwargs) File "/usr/lib/python2.7/site-packages/gnocchi/cli/metricd.py", line 45, in get_coordinator_and_start coord.start(start_heart=True) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 687, in start super(CoordinationDriverWithExecutor, self).start(start_heart) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start self._start() File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start self.heartbeat() File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat value=self.STILL_ALIVE) File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures cause=e) File "/usr/lib/python2.7/site-packages/tooz/utils.py", line 225, in raise_with_cause excutils.raise_with_cause(exc_cls, message, *args, **kwargs) File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 143, in raise_with_cause six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause')) File "/usr/lib/python2.7/site-packages/six.py", line 718, in raise_from raise value ToozError: You can't write against a read only slave. 2018-06-10 10:40:43,207 [23] ERROR gnocchi.utils: Unable to initialize coordination driver Traceback (most recent call last): File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call result = fn(*args, **kwargs) File "/usr/lib/python2.7/site-packages/gnocchi/cli/metricd.py", line 45, in get_coordinator_and_start coord.start(start_heart=True) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 687, in start super(CoordinationDriverWithExecutor, self).start(start_heart) File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start self._start() File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start self.heartbeat() File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat value=self.STILL_ALIVE) File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__ self.gen.throw(type, value, traceback) File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086