Bug 1236407 - Redis replication breaks after network partitioning Redis master
Summary: Redis replication breaks after network partitioning Redis master
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat OpenStack
Classification: Red Hat
Component: openstack-tripleo-heat-templates
Version: 7.0 (Kilo)
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ga
: Director
Assignee: Giulio Fidente
QA Contact: Marius Cornea
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2015-06-28 16:44 UTC by Marius Cornea
Modified: 2015-08-05 13:57 UTC (History)
14 users (show)

Fixed In Version: openstack-tripleo-heat-templates-0.8.6-33.el7ost
Doc Type: Bug Fix
Doc Text:
On Overclouds with network isolation enabled, Pacemaker set the redis master to a hostname on a network where the master was unreachable. This meant redis nodes failed to join the cluster. This fix resolves Pacemaker hostnames against the internal_api addresses when deploying with network isolation.
Clone Of:
Environment:
Last Closed: 2015-08-05 13:57:29 UTC
Target Upstream Version:


Attachments (Terms of Use)
redis logs (26.32 KB, text/plain)
2015-06-28 16:44 UTC, Marius Cornea
no flags Details


Links
System ID Priority Status Summary Last Updated
OpenStack gerrit 192036 None None None Never
OpenStack gerrit 198294 None None None Never
Red Hat Product Errata RHEA-2015:1549 normal SHIPPED_LIVE Red Hat Enterprise Linux OpenStack Platform director Release 2015-08-05 17:49:10 UTC

Description Marius Cornea 2015-06-28 16:44:57 UTC
Created attachment 1044072 [details]
redis logs

Description of problem:
I'm running a 3 controller node deployment. After turning off for 30s the nic of the node where the Redis master runs the redis replication breaks. Initially, after turning the nic back on the cluster reaches a split brain condition, having 2 masters. After aprox. 1 minute the new master switches the role to slave but having the master_host set as no-such-master. The other slave in the cluster has master_host set to the other slave in the cluster.    

Version-Release number of selected component (if applicable):
openstack-puppet-modules-2015.1.7-5.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy 3 controller overcloud in virt env
2. Identify the controller where the Redis master runs. Get its mac address.
3. Go to the physical host and identify the vnet interface that has the controller mac address identified in step 2
4. Run 'ip l set dev vnet$i down; sleep 30;ip l set dev vnet$i up' 

Actual results:
[stack@instack ~]$ nova list
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks            |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| 9c96bea8-4458-46c8-af07-8050e6e0c8ed | overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.24 |
| 7b811bc2-b1a9-4f0a-83d3-dd7f4d574749 | overcloud-compute-1    | ACTIVE | -          | Running     | ctlplane=192.0.2.6  |
| 1d281977-32f5-42e1-b077-da2a515fbb01 | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
| 3cc7a774-5a98-4d93-98f0-1d717c2f5ebe | overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.7  |
| 3f270f6c-8e7b-4b12-9da7-237fb0808135 | overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+

################# Before network partition #############################
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.8 6379
$316
# Replication
role:master
connected_slaves:2
slave0:ip=192.0.2.9,port=6379,state=online,offset=196142,lag=0
slave1:ip=192.0.2.7,port=6379,state=online,offset=196239,lag=0
master_repl_offset:196336
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:196335

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.7 6379
$376
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:198276
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.9 6379
$376
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:199551
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

############# After network partition ##############################
[stack@instack ~]$ 
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.8 6379
$318
# Replication
role:master
connected_slaves:2
slave0:ip=192.0.2.9,port=6379,state=online,offset=201990,lag=37
slave1:ip=192.0.2.7,port=6379,state=online,offset=202184,lag=37
master_repl_offset:204467
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:204466

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.7 6379
$376
# Replication
role:slave
master_host:overcloud-controller-2
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:202551
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.9 6379
$254
# Replication
role:master
connected_slaves:1
slave0:ip=192.0.2.7,port=6379,state=online,offset=202551,lag=1
master_repl_offset:202551
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:202283
repl_backlog_histlen:269

^C

################ Wait a couple of seconds ###############################
[stack@instack ~]$ 
[stack@instack ~]$ 
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.8 6379
$188
# Replication
role:master
connected_slaves:0
master_repl_offset:225703
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:225702

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.7 6379
$409
# Replication
role:slave
master_host:overcloud-controller-2
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:46
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.9 6379
$409
# Replication
role:slave
master_host:no-such-master
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:106
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:202283
repl_backlog_histlen:311


Expected results:
Redis cluster would not reach the split brain condition. 

Additional info:
Redis logs attached.
I think we should use Redis Sentinel as described in the comments of this post:
http://blog.haproxy.com/2014/01/02/haproxy-advanced-redis-health-check/

Comment 3 Giulio Fidente 2015-06-29 10:56:25 UTC
Fabio, Andrew, FWIW, currently we do not set 'slaveof' in redis.conf and do not configure redis-sentinel either, leaving control of master setting and promotion/demotion to the resource agent.

Comment 4 David Vossel 2015-06-29 15:50:12 UTC
The redis agent requires fencing to produce consistent and safe results with regards to split partitions. We determined that fencing was not in use which will produce undeterministic results.

My advice is to re-test with fencing enabled.  If you're using libvirt quests, setting up fence_virsh just for testing is a simple option.

After fencing is enabled, if we still hit this issue please create a crm_report during the time frame the issue occurred. This will help me understand exactly what pacemaker did in hopes of better understanding why the redis agent behaved a certain way. I also wouldn't be surprised to see this issue completely disappear after enabling fencing.

Looking at the testing procedure, this is a great test. I'm really glad this sort of scenario is being validated. Other scenarios that are important involve simple things like 'put the pacemaker node that contains a redis master into standby, verify a new master is promoted and all slave instances point to the new master' or 'kill active master redis daemon, verify state of both slaves and master instances after recovery'

Comment 5 chris alfonso 2015-06-29 17:14:09 UTC
Please let us know if the fencing setup does indeed resolve this issue.

Comment 6 Leonid Natapov 2015-06-30 14:49:15 UTC
Another issue:
After rebooting slave node it didn't start after the controller came up.
it's probably trying to reconnect to master on an ip where master is not binding.

[4149] 30 Jun 10:09:55.677 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:56.679 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:56.679 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:56.680 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:57.680 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:57.680 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:57.680 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:58.682 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:58.682 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:58.683 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:59.684 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:59.684 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:59.684 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:00.687 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:00.687 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:00.688 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:01.688 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:01.688 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:01.688 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:02.690 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:02.691 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:02.691 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:03.693 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:03.693 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:03.693 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:04.695 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:04.695 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:04.695 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:05.696 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:05.696 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:05.696 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:06.698 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:06.698 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:06.699 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:07.701 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:07.702 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:07.702 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:08.705 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:08.707 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:08.707 # Error condition on socket for SYNC: Connection refused
[4149 | signal handler] (1435673409) Received SIGTERM scheduling shutdown...
[4149] 30 Jun 10:10:09.108 # User requested shutdown...
[4149] 30 Jun 10:10:09.109 * Saving the final RDB snapshot before exiting.
[4149] 30 Jun 10:10:09.121 * DB saved on disk
[4149] 30 Jun 10:10:09.121 * Removing the pid file.
[4149] 30 Jun 10:10:09.121 * Removing the unix socket file.
[4149] 30 Jun 10:10:09.121 # Redis is now ready to exit, bye bye...

Comment 7 Giulio Fidente 2015-07-01 13:59:09 UTC
Looks like we provide as master a hostname which resolves to a network where redis is not listening.

Comment 8 David Vossel 2015-07-01 17:23:15 UTC
(In reply to Giulio Fidente from comment #7)
> Looks like we provide as master a hostname which resolves to a network where
> redis is not listening.

The redis agent expects pacemaker node names to be network resolvable. When a redis instance is promoted to master, all the slave redis instances are told to point at the new master instance which is represented by the pacemaker node name.

Comment 10 Marius Cornea 2015-07-22 09:28:28 UTC
Tested this on a baremetal environment with fencing enabled and the issue is not present anymore. 

[stack@bldr16cc09 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| c162f9fe-efba-4351-8403-45223d715fd9 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=10.3.58.10 |
| 82b237cd-a9ac-409d-a750-e9d012c704d0 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=10.3.58.11 |
| 0c658847-9faf-4209-bdbf-8acf0f55834f | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=10.3.58.12 |
| ae2cc01b-378a-476c-a3a4-20e73cfcc62a | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=10.3.58.14 |
| f373d947-cf5b-42d9-beab-d6ce2ba7c916 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=10.3.58.13 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.12 6379
$263
# Replication
role:master
connected_slaves:1
slave0:ip=10.3.58.14,port=6379,state=online,offset=11240430,lag=1
master_repl_offset:11240430
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11228839
repl_backlog_histlen:11592

^C
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.14 6379
$378
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:11241220
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.13 6379
^C
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ ping 10.3.58.13
PING 10.3.58.13 (10.3.58.13) 56(84) bytes of data.
From 10.3.58.1 icmp_seq=1 Destination Host Unreachable
From 10.3.58.1 icmp_seq=2 Destination Host Unreachable
From 10.3.58.1 icmp_seq=3 Destination Host Unreachable
From 10.3.58.1 icmp_seq=4 Destination Host Unreachable
64 bytes from 10.3.58.13: icmp_seq=5 ttl=64 time=1444 ms
64 bytes from 10.3.58.13: icmp_seq=6 ttl=64 time=444 ms
64 bytes from 10.3.58.13: icmp_seq=7 ttl=64 time=0.260 ms
^C
--- 10.3.58.13 ping statistics ---
33 packets transmitted, 29 received, +4 errors, 12% packet loss, time 32001ms
rtt min/avg/max/mdev = 0.150/65.336/1444.062/272.848 ms, pipe 4
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.12 6379
$330
# Replication
role:master
connected_slaves:2
slave0:ip=10.3.58.14,port=6379,state=online,offset=11268018,lag=1
slave1:ip=10.3.58.13,port=6379,state=online,offset=11268018,lag=1
master_repl_offset:11268212
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11228839
repl_backlog_histlen:39374

^C
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.13 6379
$378
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:11268794
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.14 6379
$378
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:11269196
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

Comment 12 errata-xmlrpc 2015-08-05 13:57:29 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549


Note You need to log in before you can comment on or make changes to this bug.