Bug 1414967

Summary: Lower haproxy health check timeout for redis
Product: Red Hat OpenStack Reporter: Marian Krcmarik <mkrcmari>
Component: puppet-tripleoAssignee: Michele Baldessari <michele>
Status: CLOSED ERRATA QA Contact: Marian Krcmarik <mkrcmari>
Severity: high Docs Contact:
Priority: high    
Version: 10.0 (Newton)CC: agk, ahirshbe, apevec, athomas, bperkins, cfeist, chjones, cluster-maint, dbecker, ebarrera, fdinitto, hfukumot, jcoufal, jjoyce, jschluet, k-akatsuka, knoha, lhh, lnatapov, mburns, michele, mkrcmari, mnovacek, morazi, oalbrigt, oblaut, pkilambi, pkomarov, rhel-osp-director-maint, rohara, royoung, slinaber, tvignaud, ushkalim, ykawada, yprokule
Target Milestone: betaKeywords: Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: puppet-tripleo-8.2.0-0.20180122224520.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1305549 Environment:
Last Closed: 2018-06-27 13:29:16 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1305549    
Bug Blocks: 1369066, 1400103    

Comment 24 pkomarov 2018-02-18 10:21:37 UTC
Verified , 

#check puppet-tripleo version: 

(undercloud) $ ansible controller -b -mshell -a'rpm -qa|grep puppet-tripleo-8.2.0-0.20180122224520'

controller-1 | SUCCESS | rc=0 >>
puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch

controller-2 | SUCCESS | rc=0 >>
puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch

controller-0 | SUCCESS | rc=0 >>
puppet-tripleo-8.2.0-0.20180122224520.el7ost.noarch


#check that ceilometer is configured in HA mode - using a pacemaker VIP: 

(undercloud) $ ansible controller -b -mshell -a'grep redis /var/log/containers/ceilometer/agent-notification.log'

controller-2 | SUCCESS | rc=0 >>
2018-02-14 21:25:35.801 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902
2018-02-14 21:25:35.842 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902

controller-1 | SUCCESS | rc=0 >>
2018-02-14 21:26:00.909 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902
2018-02-14 21:26:00.948 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902

controller-0 | SUCCESS | rc=0 >>
2018-02-14 21:26:05.596 1 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902
2018-02-14 21:26:05.638 25 DEBUG cotyledon.oslo_config_glue [-] coordination.backend_url       = redis://:Z2hch4D8b4EY7R46xXHQY47Uf.0.18:6379/ log_opt_values /usr/lib/python2.7/site-packages/oslo_config/cfg.py:2902

#Before redis master switch: 

[root@controller-0 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-0 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum
 
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Master controller-0
   redis-bundle-1	(ocf::heartbeat:redis):	Slave controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Slave controller-2
 
ip-172.17.0.18	(ocf::heartbeat:IPaddr2):	Started controller-2


[root@controller-0 ~]# pcs cluster stop
Stopping Cluster (pacemaker)...
Stopping Cluster (corosync)...

After redis master switch: 

[root@controller-1 ~]# pcs status
Cluster name: tripleo_cluster
Stack: corosync
Current DC: controller-1 (version 1.1.16-12.el7_4.7-94ff4df) - partition with quorum

Online: [ controller-1 controller-2 ]
OFFLINE: [ controller-0 ]
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0	(ocf::heartbeat:redis):	Stopped
   redis-bundle-1	(ocf::heartbeat:redis):	Master controller-1
   redis-bundle-2	(ocf::heartbeat:redis):	Slave controller-2
 
ip-172.17.0.18	(ocf::heartbeat:IPaddr2):	Started controller-2


#after redis Master switch we see correct reconnection notification from ceilometer agent , from /var/log/containers/ceilometer/agent-notification.log : 

2018-02-15 16:34:18.079 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer
2018-02-15 16:34:19.103 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None: error: [Errno 111] Connection refused
2018-02-15 16:34:20.637 25 INFO oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] Reconnected to AMQP server on controller-2.internalapi.localdomain:5672 via [amqp] client with port 58898.
2018-02-15 16:34:18.079 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 104] Connection reset by peer. Trying again in 1 seconds. Client port: None: error: [Errno 104] Connection reset by peer
2018-02-15 16:34:19.103 25 ERROR oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] AMQP server on controller-0.internalapi.localdomain:5672 is unreachable: [Errno 111] Connection refused. Trying again in 1 seconds. Client port: None: error: [Errno 111] Connection refused

2018-02-15 16:34:20.637 25 INFO oslo.messaging._drivers.impl_rabbit [-] [818f4ec8-ec39-4a03-bd6e-435c5d239eef] Reconnected to AMQP server on controller-2.internalapi.localdomain:5672 via [amqp] client with port 58898


#check for established tcp connection between the redis server Master (controller-1) and the internal-api Vip holder (controller2) on the redis port :
$ ansible 'controller:!controller-0' -b -mshell -a'netstat -tapl|grep redis'
 [WARNING]: Found both group and host with same name: undercloud

controller-2 | SUCCESS | rc=0 >>
tcp        0      0 controller-2.local:6379 0.0.0.0:*               LISTEN      620626/redis-server 
tcp        0      0 controller-2.loca:41310 controller-1.local:6379 ESTABLISHED 620626/redis-server 
tcp        0      0 controller-2.local:6379 controller-1.loca:35200 ESTABLISHED 620626/redis-server 

controller-1 | SUCCESS | rc=0 >>
tcp        0      0 controller-1.local:6379 0.0.0.0:*               LISTEN      254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56030 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53204 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40292 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:54798 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40224 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:47438 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:45372 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:54144 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40306 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55964 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53208 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56044 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40296 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53218 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40326 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:59172 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40200 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:57170 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55842 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56156 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56214 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:49346 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40290 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55694 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53216 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40192 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40208 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53128 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55434 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53220 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40202 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40322 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40324 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53192 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53224 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40152 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53222 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53172 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55468 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:45380 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56674 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53190 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40252 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53236 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53156 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53168 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53200 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56092 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40194 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40210 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53198 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53132 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:51890 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53212 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:57744 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53238 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40222 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:35422 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56336 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53228 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53240 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:41474 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55804 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:56332 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53196 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40320 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53226 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:40254 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53210 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55786 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55780 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55902 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53152 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53164 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55590 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:42068 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:34690 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53214 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:53230 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:41656 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:41310 ESTABLISHED 254914/redis-server 
tcp        0      0 controller-1.local:6379 controller-2.loca:55652 ESTABLISHED 254914/redis-server

Comment 27 Leonid Natapov 2018-06-11 08:22:30 UTC
Re-opening this bug because the issue is back. So looks like regression.

I see now the following errors in the gnocchi api log file:

I have talked to Mehdi Abaakouk  (sileht) and he pointed me to this bug saying that it's a pacemaker issue. Gnocchi is connecting to the wrong redis node that's means pacemaker fail to close redis connection before switching the master node

I work with puppet-tripleo-8.3.2-6.el7ost.noarch
------------------------------------------------------------
File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start
    self._start()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start
    self.heartbeat()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat
    value=self.STILL_ALIVE)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures
    cause=e)
  File "/usr/lib/python2.7/site-packages/tooz/utils.py", line 225, in raise_with_cause
    excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 143, in raise_with_cause
    six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause'))
  File "/usr/lib/python2.7/site-packages/six.py", line 718, in raise_from
    raise value
ToozError: You can't write against a read only slave.
2018-06-10 10:40:35,748 [26] ERROR    gnocchi.utils: Unable to initialize coordination driver
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/cli/metricd.py", line 45, in get_coordinator_and_start
    coord.start(start_heart=True)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 687, in start
    super(CoordinationDriverWithExecutor, self).start(start_heart)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start
    self._start()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start
    self.heartbeat()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat
    value=self.STILL_ALIVE)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures
    cause=e)
  File "/usr/lib/python2.7/site-packages/tooz/utils.py", line 225, in raise_with_cause
    excutils.raise_with_cause(exc_cls, message, *args, **kwargs)
  File "/usr/lib/python2.7/site-packages/oslo_utils/excutils.py", line 143, in raise_with_cause
    six.raise_from(exc_cls(message, *args, **kwargs), kwargs.get('cause'))
  File "/usr/lib/python2.7/site-packages/six.py", line 718, in raise_from
    raise value
ToozError: You can't write against a read only slave.
2018-06-10 10:40:43,207 [23] ERROR    gnocchi.utils: Unable to initialize coordination driver
Traceback (most recent call last):
  File "/usr/lib/python2.7/site-packages/tenacity/__init__.py", line 298, in call
    result = fn(*args, **kwargs)
  File "/usr/lib/python2.7/site-packages/gnocchi/cli/metricd.py", line 45, in get_coordinator_and_start
    coord.start(start_heart=True)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 687, in start
    super(CoordinationDriverWithExecutor, self).start(start_heart)
  File "/usr/lib/python2.7/site-packages/tooz/coordination.py", line 423, in start
    self._start()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 479, in _start
    self.heartbeat()
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 513, in heartbeat
    value=self.STILL_ALIVE)
  File "/usr/lib64/python2.7/contextlib.py", line 35, in __exit__
    self.gen.throw(type, value, traceback)
  File "/usr/lib/python2.7/site-packages/tooz/drivers/redis.py", line 55, in _translate_failures

Comment 30 errata-xmlrpc 2018-06-27 13:29:16 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086