1518126 – REDIS replication with TLS everywhere does not work

Bug 1518126 - REDIS replication with TLS everywhere does not work

Summary: REDIS replication with TLS everywhere does not work

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat OpenStack
Classification:	Red Hat
Component:	openstack-tripleo-heat-templates
Sub Component:
Version:	12.0 (Pike)
Hardware:	All
OS:	Linux
Priority:	high
Severity:	high
Target Milestone:	beta
Target Release:	13.0 (Queens)
Assignee:	Chris Jones
QA Contact:	Udi Shkalim
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1538301 (view as bug list)
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2017-11-28 09:25 UTC by Michele Baldessari
Modified:	2023-09-14 04:16 UTC (History)
CC List:	15 users (show)
Fixed In Version:	openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost
Doc Type:	Known Issue
Doc Text:	Redis is unable to correctly replicate data across nodes in a HA deployment with TLS enabled. Redis follower nodes will not contain any data from the leader node. It is recommended to disable TLS for Redis deployments.
Clone Of:
Environment:
Last Closed:	2018-06-27 13:39:31 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Launchpad	1735259	None	None	None	2017-11-29 21:06:43 UTC
OpenStack gerrit	527694	None	MERGED	Fix Redis TLS setup and its HA deployment	2020-02-02 17:01:09 UTC
Red Hat Issue Tracker	OSP-28657	None	None	None	2023-09-14 04:16:14 UTC
Red Hat Product Errata	RHEA-2018:2086	None	None	None	2018-06-27 13:40:23 UTC

Description Michele Baldessari 2017-11-28 09:25:25 UTC

Description of problem:
Seems like “slave” redis servers are not connected to the redis “master” that is being started by pacemaker when you deploy with TLS Everywhere.

On an unencrypted Redis cluster, we can see slaves connected to the master:
# /usr/bin/redis-cli -a UdEnQrH6Jfdy6A4W2Rnja7UuZ -s '/var/run/redis/redis.sock' info

verify if there are any slaves. Here is a working example:
# Replication
role:master
connected_slaves:2
slave0:ip=172.16.2.10,port=6379,state=online,offset=30985,lag=1
slave1:ip=172.16.2.16,port=6379,state=online,offset=30985,lag=1
master_repl_offset:30985
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:30984

We don’t see any slave connection when TLS everywhere is enabled.
On initial deployment, all replicas of the redis resource start correctly in pacemaker, and give 1 Master and 2 Slave (no error). But that is only because no replication has taken place at all.

However, when restarting a Slave, the start operation won’t succeed because the redis resource agent will try to connect to the redis master, and would fail to do:
Failed Actions:
* redis_start_0 on redis-bundle-2 'unknown error' (1): call=8, status=Timed Out, exitreason='none',
    last-rc-change='Mon Nov 27 15:02:01 2017', queued=1ms, exec=200001ms

With following logs from the slave redis server:
96:S 27 Nov 14:24:15.116 # Error condition on socket for SYNC: Connection reset by peer
96:S 27 Nov 14:24:16.116 * Connecting to MASTER overcloud-controller-1:6379
96:S 27 Nov 14:24:16.117 * MASTER <-> SLAVE sync started
96:S 27 Nov 14:24:16.117 * Non blocking connect for SYNC fired the event.
96:S 27 Nov 14:24:16.117 # Error condition on socket for SYNC: Connection reset by peer
96:S 27 Nov 14:24:17.120 * Connecting to MASTER overcloud-controller-1:6379

This is because on the 6379 port of the remote host there is an stunnel process expecting SSL traffic, whereas redis sends unencrypted traffic to it.

Comment 7 Chris Jones 2017-12-07 17:04:57 UTC

Added a suggested Known Issue for the OSP12 release

Comment 22 Keith Basil 2018-04-09 19:17:37 UTC

*** Bug 1538301 has been marked as a duplicate of this bug. ***

Comment 23 Marian Krcmarik 2018-04-29 20:09:42 UTC

Verified
$ sudo pcs status | grep redis
GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ]
 Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest]
   redis-bundle-0       (ocf::heartbeat:redis): Slave controller-0
   redis-bundle-1       (ocf::heartbeat:redis): Master controller-1
   redis-bundle-2       (ocf::heartbeat:redis): Slave controller-2
$ sudo pcs resource restart redis-bundle controller-0
redis-bundle successfully restarted

$ sudo /usr/bin/redis-cli -a NFzhyzTZMUVvNv9BayXxMNKr6 -s '/var/run/redis/redis.sock' info
...skipped
# Replication
role:master
connected_slaves:2
slave0:ip=127.0.0.1,port=6662,state=online,offset=13015861,lag=0
slave1:ip=127.0.0.1,port=6660,state=online,offset=12968353,lag=1
master_repl_offset:13024540
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11975965
repl_backlog_histlen:1048576

Comment 25 errata-xmlrpc 2018-06-27 13:39:31 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086

Comment 26 Red Hat Bugzilla 2023-09-14 04:12:41 UTC

The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days

Note You need to log in before you can comment on or make changes to this bug.