Bugzilla will be upgraded to version 5.0 on a still to be determined date in the near future. The original upgrade date has been delayed.
Bug 1518126 - REDIS replication with TLS everywhere does not work [NEEDINFO]
REDIS replication with TLS everywhere does not work
Status: CLOSED ERRATA
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates (Show other bugs)
12.0 (Pike)
All Linux
high Severity high
: beta
: 13.0 (Queens)
Assigned To: Chris Jones
Udi Shkalim
: Triaged
: 1538301 (view as bug list)
Depends On:
Blocks:
  Show dependency treegraph
 
Reported: 2017-11-28 04:25 EST by Michele Baldessari
Modified: 2018-06-27 09:40 EDT (History)
15 users (show)

See Also:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost
Doc Type: Known Issue
Doc Text:
Redis is unable to correctly replicate data across nodes in a HA deployment with TLS enabled. Redis follower nodes will not contain any data from the leader node. It is recommended to disable TLS for Redis deployments.
Story Points: ---
Clone Of:
Environment:
Last Closed: 2018-06-27 09:39:31 EDT
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---
kbasil: needinfo? (aherr)


Attachments (Terms of Use)


External Trackers
Tracker ID Priority Status Summary Last Updated
Launchpad 1735259 None None None 2017-11-29 16:06 EST
OpenStack gerrit 527694 None None None 2018-04-02 10:52 EDT
Red Hat Product Errata RHEA-2018:2086 None None None 2018-06-27 09:40 EDT

  None (edit)
Description Michele Baldessari 2017-11-28 04:25:25 EST
Description of problem:
Seems like “slave” redis servers are not connected to the redis “master” that is being started by pacemaker when you deploy with TLS Everywhere.

On an unencrypted Redis cluster, we can see slaves connected to the master:
# /usr/bin/redis-cli -a UdEnQrH6Jfdy6A4W2Rnja7UuZ -s '/var/run/redis/redis.sock' info

verify if there are any slaves. Here is a working example:
# Replication
role:master
connected_slaves:2
slave0:ip=172.16.2.10,port=6379,state=online,offset=30985,lag=1
slave1:ip=172.16.2.16,port=6379,state=online,offset=30985,lag=1
master_repl_offset:30985
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:30984

We don’t see any slave connection when TLS everywhere is enabled.
On initial deployment, all replicas of the redis resource start correctly in pacemaker, and give 1 Master and 2 Slave (no error). But that is only because no replication has taken place at all.

However, when restarting a Slave, the start operation won’t succeed because the redis resource agent will try to connect to the redis master, and would fail to do:
Failed Actions:
* redis_start_0 on redis-bundle-2 'unknown error' (1): call=8, status=Timed Out, exitreason='none',
    last-rc-change='Mon Nov 27 15:02:01 2017', queued=1ms, exec=200001ms

With following logs from the slave redis server:
96:S 27 Nov 14:24:15.116 # Error condition on socket for SYNC: Connection reset by peer
96:S 27 Nov 14:24:16.116 * Connecting to MASTER overcloud-controller-1:6379
96:S 27 Nov 14:24:16.117 * MASTER <-> SLAVE sync started
96:S 27 Nov 14:24:16.117 * Non blocking connect for SYNC fired the event.
96:S 27 Nov 14:24:16.117 # Error condition on socket for SYNC: Connection reset by peer
96:S 27 Nov 14:24:17.120 * Connecting to MASTER overcloud-controller-1:6379

This is because on the 6379 port of the remote host there is an stunnel process expecting SSL traffic, whereas redis sends unencrypted traffic to it.
Comment 7 Chris Jones 2017-12-07 12:04:57 EST
Added a suggested Known Issue for the OSP12 release
Comment 22 Keith Basil 2018-04-09 15:17:37 EDT
*** Bug 1538301 has been marked as a duplicate of this bug. ***
Comment 23 Marian Krcmarik 2018-04-29 16:09:42 EDT
Verified
$ sudo pcs status | grep redis
GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Slave controller-0
   redis-bundle-1       (ocf::heartbeat:redis): Master controller-1
   redis-bundle-2       (ocf::heartbeat:redis): Slave controller-2
$ sudo pcs resource restart redis-bundle controller-0
redis-bundle successfully restarted

$ sudo /usr/bin/redis-cli -a NFzhyzTZMUVvNv9BayXxMNKr6 -s '/var/run/redis/redis.sock' info
...skipped
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6662,state=online,offset=13015861,lag=0
slave1:ip=127.0.0.1,port=6660,state=online,offset=12968353,lag=1
master_repl_offset:13024540
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11975965
repl_backlog_histlen:1048576
Comment 25 errata-xmlrpc 2018-06-27 09:39:31 EDT
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Note You need to log in before you can comment on or make changes to this bug.