Bug 1414967 - Lower haproxy health check timeout for redis
Summary: Lower haproxy health check timeout for redis
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: puppet-tripleo
Version: 10.0 (Newton)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: beta
: 13.0 (Queens)
Assignee: Michele Baldessari
QA Contact: Marian Krcmarik
URL:
Whiteboard:
Depends On: 1305549
Blocks: 1369066 1400103
TreeView+ depends on / blocked
 
Reported: 2017-01-19 20:27 UTC by Marian Krcmarik
Modified: 2018-06-27 13:31 UTC (History)
36 users (show)

Fixed In Version: puppet-tripleo-8.2.0-0.20180122224520.el7ost
Doc Type: If docs needed, set a value
Doc Text:
Clone Of: 1305549
Environment:
Last Closed: 2018-06-27 13:29:16 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Launchpad 1742086 0 None None None 2018-01-16 07:56:20 UTC
OpenStack gerrit 529107 0 None MERGED Use on-marked-down shutdown-sessions for redis haproxy conf 2020-06-09 04:10:50 UTC
Red Hat Product Errata RHEA-2018:2086 0 None None None 2018-06-27 13:31:13 UTC

Comment 24 pkomarov 2018-02-18 10:21:37 UTC
Verified , 

#check puppet-tripleo version: 

(undercloud) $ ansible controller -b -mshell -a'rpm -qa|grep puppet-tripleo-8.2.0-0.20180122224520'

controller-1 | SUCCESS | rc=0 >>
puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch

controller-2 | SUCCESS | rc=0 >>
puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch

controller-0 | SUCCESS | rc=0 >>
puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch


#check that ceilometer is configured in HA mode - using a pacemaker VIP: 

(undercloud) $ ansible controller -b -mshell -a'grep redis /var/log/containers/ceilometer/agent-notification.log'

controller-2 | SUCCESS | rc=0 >>
2018-02-14 21:25:35.801 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902
2018-02-14 21:25:35.842 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902

controller-1 | SUCCESS | rc=0 >>
2018-02-14 21:26:00.909 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902
2018-02-14 21:26:00.948 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902

controller-0 | SUCCESS | rc=0 >>
2018-02-14 21:26:05.596 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902
2018-02-14 21:26:05.638 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902

#Before redis master switch: 

[root@controller-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-0 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum
 
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Master controller-0
   redis-bundle-1	(ocf::heartbeat:redis):	Slave controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Slave controller-2
 
ip-172.17.0.18	(ocf::heartbeat:IPaddr2):	Started controller-2


[root@controller-0 ~]# pcs cluster stop
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...

After redis master switch: 

[root@controller-1 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-1 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum

Online: [ controller-1 controller-2 ]
OFFLINE: [ controller-0 ]
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Stopped
   redis-bundle-1	(ocf::heartbeat:redis):	Master controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Slave controller-2
 
ip-172.17.0.18	(ocf::heartbeat:IPaddr2):	Started controller-2


#after redis Master switch we see correct reconnection notification from ceilometer agent , from /var/log/containers/ceilometer/agent-notification.log : 

2018-02-15 16:34:18.079 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer
2018-02-15 16:34:19.103 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None: error: [Errno 111] Connection refused
2018-02-15 16:34:20.637 25 INFO oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] Reconnected to AMQP server on controller-2.internalapi.localdomain:5672 via [amqp] client with port 58898.
2018-02-15 16:34:18.079 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer
2018-02-15 16:34:19.103 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None: error: [Errno 111] Connection refused

2018-02-15 16:34:20.637 25 INFO oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] Reconnected to AMQP server on controller-2.internalapi.localdomain:5672 via [amqp] client with port 58898


#check for established tcp connection between the redis server Master (controller-1) and the internal-api Vip holder (controller2) on the redis port :
$ ansible 'controller:!controller-0' -b -mshell -a'netstat -tapl|grep redis'
 [WARNING]: Found both group and host with same name: undercloud

controller-2 | SUCCESS | rc=0 >>
tcp        0      0 controller-2.local:6379 0.0.0.0:*               LISTEN      620626/redis-server 
tcp        0      0 controller-2.loca:41310 controller-1.local:6379 ESTABLISHED 620626/redis-server 
tcp        0      0 controller-2.local:6379 controller-1.loca:35200 ESTABLISHED 620626/redis-server 

controller-1 | SUCCESS | rc=0 >>
tcp        0      0 controller-1.local:6379 0.0.0.0:*               LISTEN      254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56030 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53204 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40292 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:54798 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40224 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:47438 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:45372 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:54144 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40306 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55964 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53208 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56044 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40296 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53218 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40326 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:59172 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40200 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:57170 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55842 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56156 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56214 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:49346 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40290 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55694 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53216 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40192 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40208 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53128 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55434 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53220 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40202 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40322 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40324 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53192 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53224 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40152 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53222 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53172 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55468 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:45380 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56674 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53190 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40252 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53236 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53156 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53168 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53200 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56092 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40194 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40210 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53198 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53132 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:51890 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53212 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:57744 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53238 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40222 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:35422 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56336 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53228 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53240 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:41474 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55804 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56332 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53196 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40320 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53226 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40254 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53210 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55786 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55780 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55902 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53152 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53164 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55590 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:42068 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:34690 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53214 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53230 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:41656 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:41310 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55652 ESTABLISHED 254914/redis-server

Comment 27 Leonid Natapov 2018-06-11 08:22:30 UTC
Re-opening this bug because the issue is back. So looks like regression.

I see now the following errors in the gnocchi api log file:

I have talked to Mehdi Abaakouk  (sileht) and he pointed me to this bug saying that it's a pacemaker issue. Gnocchi is connecting to the wrong redis node that's means pacemaker fail to close redis connection before switching the master node

I work with puppet-tripleo-8.3.2-6.el7ost.noarch
------------------------------------------------------------
File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start
    self._start()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start
    self.heartbeat()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat
    value=self.STILL_ALIVE)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures
    cause=e)
  File "/usr/lib/python2.7/site-packages/tooz/utils.py", line 225, in raise_with_cause
    excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 143, in raise_with_cause
    six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause'))
  File "/usr/lib/python2.7/site-packages/six.py", line 718, in raise_from
    raise value
ToozError: You can't write against a read only slave.
2018-06-10 10:40:35,748 [26] ERROR    gnocchi.utils: Unable to initialize coordination driver
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/cli/metricd.py", line 45, in get_coordinator_and_start
    coord.start(start_heart=True)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 687, in start
    super(CoordinationDriverWithExecutor, self).start(start_heart)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start
    self._start()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start
    self.heartbeat()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat
    value=self.STILL_ALIVE)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures
    cause=e)
  File "/usr/lib/python2.7/site-packages/tooz/utils.py", line 225, in raise_with_cause
    excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 143, in raise_with_cause
    six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause'))
  File "/usr/lib/python2.7/site-packages/six.py", line 718, in raise_from
    raise value
ToozError: You can't write against a read only slave.
2018-06-10 10:40:43,207 [23] ERROR    gnocchi.utils: Unable to initialize coordination driver
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/cli/metricd.py", line 45, in get_coordinator_and_start
    coord.start(start_heart=True)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 687, in start
    super(CoordinationDriverWithExecutor, self).start(start_heart)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start
    self._start()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start
    self.heartbeat()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat
    value=self.STILL_ALIVE)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures

Comment 30 errata-xmlrpc 2018-06-27 13:29:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.