Bug 1518126 - REDIS replication with TLS everywhere does not work [NEEDINFO]
Summary: REDIS replication with TLS everywhere does not work
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 12.0 (Pike)
Hardware: All
OS: Linux
high
high
Target Milestone: beta
: 13.0 (Queens)
Assignee: Chris Jones
QA Contact: Udi Shkalim
URL:
Whiteboard:
: 1538301 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-28 09:25 UTC by Michele Baldessari
Modified: 2018-06-27 13:40 UTC (History)
15 users (show)

Fixed In Version: openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost
Doc Type: Known Issue
Doc Text:
Redis is unable to correctly replicate data across nodes in a HA deployment with TLS enabled. Redis follower nodes will not contain any data from the leader node. It is recommended to disable TLS for Redis deployments.
Clone Of:
Environment:
Last Closed: 2018-06-27 13:39:31 UTC
Target Upstream Version:
kbasil: needinfo? (aherr)


Attachments (Terms of Use)


Links
System ID Priority Status Summary Last Updated
Launchpad 1735259 None None None 2017-11-29 21:06:43 UTC
OpenStack gerrit 527694 None MERGED Fix Redis TLS setup and its HA deployment 2020-02-02 17:01:09 UTC
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 13:40:23 UTC

Description Michele Baldessari 2017-11-28 09:25:25 UTC
Description of problem:
Seems like “slave” redis servers are not connected to the redis “master” that is being started by pacemaker when you deploy with TLS Everywhere.

On an unencrypted Redis cluster, we can see slaves connected to the master:
# /usr/bin/redis-cli -a UdEnQrH6Jfdy6A4W2Rnja7UuZ -s '/var/run/redis/redis.sock' info

verify if there are any slaves. Here is a working example:
# Replication
role:master
connected_slaves:2
slave0:ip=172.16.2.10,port=6379,state=online,offset=30985,lag=1
slave1:ip=172.16.2.16,port=6379,state=online,offset=30985,lag=1
master_repl_offset:30985
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:30984

We don’t see any slave connection when TLS everywhere is enabled.
On initial deployment, all replicas of the redis resource start correctly in pacemaker, and give 1 Master and 2 Slave (no error). But that is only because no replication has taken place at all.

However, when restarting a Slave, the start operation won’t succeed because the redis resource agent will try to connect to the redis master, and would fail to do:
Failed Actions:
* redis_start_0 on redis-bundle-2 'unknown error' (1): call=8, status=Timed Out, exitreason='none',
    last-rc-change='Mon Nov 27 15:02:01 2017', queued=1ms, exec=200001ms

With following logs from the slave redis server:
96:S 27 Nov 14:24:15.116 # Error condition on socket for SYNC: Connection reset by peer
96:S 27 Nov 14:24:16.116 * Connecting to MASTER overcloud-controller-1:6379
96:S 27 Nov 14:24:16.117 * MASTER <-> SLAVE sync started
96:S 27 Nov 14:24:16.117 * Non blocking connect for SYNC fired the event.
96:S 27 Nov 14:24:16.117 # Error condition on socket for SYNC: Connection reset by peer
96:S 27 Nov 14:24:17.120 * Connecting to MASTER overcloud-controller-1:6379

This is because on the 6379 port of the remote host there is an stunnel process expecting SSL traffic, whereas redis sends unencrypted traffic to it.

Comment 7 Chris Jones 2017-12-07 17:04:57 UTC
Added a suggested Known Issue for the OSP12 release

Comment 22 Keith Basil 2018-04-09 19:17:37 UTC
*** Bug 1538301 has been marked as a duplicate of this bug. ***

Comment 23 Marian Krcmarik 2018-04-29 20:09:42 UTC
Verified
$ sudo pcs status | grep redis
GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Slave controller-0
   redis-bundle-1       (ocf::heartbeat:redis): Master controller-1
   redis-bundle-2       (ocf::heartbeat:redis): Slave controller-2
$ sudo pcs resource restart redis-bundle controller-0
redis-bundle successfully restarted

$ sudo /usr/bin/redis-cli -a NFzhyzTZMUVvNv9BayXxMNKr6 -s '/var/run/redis/redis.sock' info
...skipped
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6662,state=online,offset=13015861,lag=0
slave1:ip=127.0.0.1,port=6660,state=online,offset=12968353,lag=1
master_repl_offset:13024540
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11975965
repl_backlog_histlen:1048576

Comment 25 errata-xmlrpc 2018-06-27 13:39:31 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086


Note You need to log in before you can comment on or make changes to this bug.