Description of problem: Seems like “slave” redis servers are not connected to the redis “master” that is being started by pacemaker when you deploy with TLS Everywhere. On an unencrypted Redis cluster, we can see slaves connected to the master: # /usr/bin/redis-cli -a UdEnQrH6Jfdy6A4W2Rnja7UuZ -s '/var/run/redis/redis.sock' info verify if there are any slaves. Here is a working example: # Replication role:master connected_slaves:2 slave0:ip=172.16.2.10,port=6379,state=online,offset=30985,lag=1 slave1:ip=172.16.2.16,port=6379,state=online,offset=30985,lag=1 master_repl_offset:30985 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:2 repl_backlog_histlen:30984 We don’t see any slave connection when TLS everywhere is enabled. On initial deployment, all replicas of the redis resource start correctly in pacemaker, and give 1 Master and 2 Slave (no error). But that is only because no replication has taken place at all. However, when restarting a Slave, the start operation won’t succeed because the redis resource agent will try to connect to the redis master, and would fail to do: Failed Actions: * redis_start_0 on redis-bundle-2 'unknown error' (1): call=8, status=Timed Out, exitreason='none', last-rc-change='Mon Nov 27 15:02:01 2017', queued=1ms, exec=200001ms With following logs from the slave redis server: 96:S 27 Nov 14:24:15.116 # Error condition on socket for SYNC: Connection reset by peer 96:S 27 Nov 14:24:16.116 * Connecting to MASTER overcloud-controller-1:6379 96:S 27 Nov 14:24:16.117 * MASTER <-> SLAVE sync started 96:S 27 Nov 14:24:16.117 * Non blocking connect for SYNC fired the event. 96:S 27 Nov 14:24:16.117 # Error condition on socket for SYNC: Connection reset by peer 96:S 27 Nov 14:24:17.120 * Connecting to MASTER overcloud-controller-1:6379 This is because on the 6379 port of the remote host there is an stunnel process expecting SSL traffic, whereas redis sends unencrypted traffic to it.
Added a suggested Known Issue for the OSP12 release
*** Bug 1538301 has been marked as a duplicate of this bug. ***
Verified $ sudo pcs status | grep redis GuestOnline: [ galera-bundle-0@controller-0 galera-bundle-1@controller-1 galera-bundle-2@controller-2 rabbitmq-bundle-0@controller-0 rabbitmq-bundle-1@controller-1 rabbitmq-bundle-2@controller-2 redis-bundle-0@controller-0 redis-bundle-1@controller-1 redis-bundle-2@controller-2 ] Docker container set: redis-bundle [192.168.24.1:8787/rhosp13/openstack-redis:pcmklatest] redis-bundle-0 (ocf::heartbeat:redis): Slave controller-0 redis-bundle-1 (ocf::heartbeat:redis): Master controller-1 redis-bundle-2 (ocf::heartbeat:redis): Slave controller-2 $ sudo pcs resource restart redis-bundle controller-0 redis-bundle successfully restarted $ sudo /usr/bin/redis-cli -a NFzhyzTZMUVvNv9BayXxMNKr6 -s '/var/run/redis/redis.sock' info ...skipped # Replication role:master connected_slaves:2 slave0:ip=127.0.0.1,port=6662,state=online,offset=13015861,lag=0 slave1:ip=127.0.0.1,port=6660,state=online,offset=12968353,lag=1 master_repl_offset:13024540 repl_backlog_active:1 repl_backlog_size:1048576 repl_backlog_first_byte_offset:11975965 repl_backlog_histlen:1048576
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2018:2086
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 1000 days