Bug 1236407

Summary:

Redis replication breaks after network partitioning Redis master

Product:

Red Hat OpenStack

Reporter:

Marius Cornea <mcornea>

Component:

openstack-tripleo-heat-templates

Assignee:

Giulio Fidente <gfidente>

Status:

CLOSED ERRATA

QA Contact:

Marius Cornea <mcornea>

Severity:

high

Docs Contact:

Priority:

high

Version:

7.0 (Kilo)

CC:

abeekhof, calfonso, chdent, dmacpher, dvossel, fdinitto, jason.dobies, kbasil, lnatapov, mburns, mcornea, rhel-osp-director-maint, yeylon

Target Milestone:

Keywords:

Triaged

Target Release:

Director

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

openstack-tripleo-heat-templates-0.8.6-33.el7ost

Doc Type:

Bug Fix

Doc Text:

On Overclouds with network isolation enabled, Pacemaker set the redis master to a hostname on a network where the master was unreachable. This meant redis nodes failed to join the cluster. This fix resolves Pacemaker hostnames against the internal_api addresses when deploying with network isolation.

Story Points:

---

Clone Of:

Environment:

Last Closed:

2015-08-05 13:57:29 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
redis logs	none

Description Marius Cornea 2015-06-28 16:44:57 UTC

Created attachment 1044072 [details]
redis logs

Description of problem:
I'm running a 3 controller node deployment. After turning off for 30s the nic of the node where the Redis master runs the redis replication breaks. Initially, after turning the nic back on the cluster reaches a split brain condition, having 2 masters. After aprox. 1 minute the new master switches the role to slave but having the master_host set as no-such-master. The other slave in the cluster has master_host set to the other slave in the cluster.    

Version-Release number of selected component (if applicable):
openstack-puppet-modules-2015.1.7-5.el7ost.noarch
openstack-tripleo-puppet-elements-0.0.1-2.el7ost.noarch

How reproducible:
100%

Steps to Reproduce:
1. Deploy 3 controller overcloud in virt env
2. Identify the controller where the Redis master runs. Get its mac address.
3. Go to the physical host and identify the vnet interface that has the controller mac address identified in step 2
4. Run 'ip l set dev vnet$i down; sleep 30;ip l set dev vnet$i up' 

Actual results:
[stack@instack ~]$ nova list
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                   | Status | Task State | Power State | Networks            |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+
| 9c96bea8-4458-46c8-af07-8050e6e0c8ed | overcloud-compute-0    | ACTIVE | -          | Running     | ctlplane=192.0.2.24 |
| 7b811bc2-b1a9-4f0a-83d3-dd7f4d574749 | overcloud-compute-1    | ACTIVE | -          | Running     | ctlplane=192.0.2.6  |
| 1d281977-32f5-42e1-b077-da2a515fbb01 | overcloud-controller-0 | ACTIVE | -          | Running     | ctlplane=192.0.2.8  |
| 3cc7a774-5a98-4d93-98f0-1d717c2f5ebe | overcloud-controller-1 | ACTIVE | -          | Running     | ctlplane=192.0.2.7  |
| 3f270f6c-8e7b-4b12-9da7-237fb0808135 | overcloud-controller-2 | ACTIVE | -          | Running     | ctlplane=192.0.2.9  |
+--------------------------------------+------------------------+--------+------------+-------------+---------------------+

################# Before network partition #############################
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.8 6379
$316
# Replication
role:master
connected_slaves:2
slave0:ip=192.0.2.9,port=6379,state=online,offset=196142,lag=0
slave1:ip=192.0.2.7,port=6379,state=online,offset=196239,lag=0
master_repl_offset:196336
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:196335

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.7 6379
$376
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:198276
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.9 6379
$376
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:199551
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

############# After network partition ##############################
[stack@instack ~]$ 
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.8 6379
$318
# Replication
role:master
connected_slaves:2
slave0:ip=192.0.2.9,port=6379,state=online,offset=201990,lag=37
slave1:ip=192.0.2.7,port=6379,state=online,offset=202184,lag=37
master_repl_offset:204467
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:204466

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.7 6379
$376
# Replication
role:slave
master_host:overcloud-controller-2
master_port:6379
master_link_status:up
master_last_io_seconds_ago:3
master_sync_in_progress:0
slave_repl_offset:202551
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.9 6379
$254
# Replication
role:master
connected_slaves:1
slave0:ip=192.0.2.7,port=6379,state=online,offset=202551,lag=1
master_repl_offset:202551
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:202283
repl_backlog_histlen:269

^C

################ Wait a couple of seconds ###############################
[stack@instack ~]$ 
[stack@instack ~]$ 
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.8 6379
$188
# Replication
role:master
connected_slaves:0
master_repl_offset:225703
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:2
repl_backlog_histlen:225702

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.7 6379
$409
# Replication
role:slave
master_host:overcloud-controller-2
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:46
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@instack ~]$ cat <(echo info replication) - | nc 192.0.2.9 6379
$409
# Replication
role:slave
master_host:no-such-master
master_port:6379
master_link_status:down
master_last_io_seconds_ago:-1
master_sync_in_progress:0
slave_repl_offset:1
master_link_down_since_seconds:106
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:202283
repl_backlog_histlen:311


Expected results:
Redis cluster would not reach the split brain condition. 

Additional info:
Redis logs attached.
I think we should use Redis Sentinel as described in the comments of this post:
http://blog.haproxy.com/2014/01/02/haproxy-advanced-redis-health-check/

Comment 3 Giulio Fidente 2015-06-29 10:56:25 UTC

Fabio, Andrew, FWIW, currently we do not set 'slaveof' in redis.conf and do not configure redis-sentinel either, leaving control of master setting and promotion/demotion to the resource agent.

Comment 4 David Vossel 2015-06-29 15:50:12 UTC

The redis agent requires fencing to produce consistent and safe results with regards to split partitions. We determined that fencing was not in use which will produce undeterministic results.

My advice is to re-test with fencing enabled.  If you're using libvirt quests, setting up fence_virsh just for testing is a simple option.

After fencing is enabled, if we still hit this issue please create a crm_report during the time frame the issue occurred. This will help me understand exactly what pacemaker did in hopes of better understanding why the redis agent behaved a certain way. I also wouldn't be surprised to see this issue completely disappear after enabling fencing.

Looking at the testing procedure, this is a great test. I'm really glad this sort of scenario is being validated. Other scenarios that are important involve simple things like 'put the pacemaker node that contains a redis master into standby, verify a new master is promoted and all slave instances point to the new master' or 'kill active master redis daemon, verify state of both slaves and master instances after recovery'

Comment 5 chris alfonso 2015-06-29 17:14:09 UTC

Please let us know if the fencing setup does indeed resolve this issue.

Comment 6 Leonid Natapov 2015-06-30 14:49:15 UTC

Another issue:
After rebooting slave node it didn't start after the controller came up.
it's probably trying to reconnect to master on an ip where master is not binding.

[4149] 30 Jun 10:09:55.677 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:56.679 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:56.679 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:56.680 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:57.680 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:57.680 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:57.680 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:58.682 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:58.682 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:58.683 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:09:59.684 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:09:59.684 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:09:59.684 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:00.687 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:00.687 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:00.688 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:01.688 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:01.688 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:01.688 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:02.690 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:02.691 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:02.691 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:03.693 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:03.693 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:03.693 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:04.695 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:04.695 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:04.695 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:05.696 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:05.696 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:05.696 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:06.698 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:06.698 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:06.699 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:07.701 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:07.702 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:07.702 # Error condition on socket for SYNC: Connection refused
[4149] 30 Jun 10:10:08.705 * Connecting to MASTER overcloud-controller-0:6379
[4149] 30 Jun 10:10:08.707 * MASTER <-> SLAVE sync started
[4149] 30 Jun 10:10:08.707 # Error condition on socket for SYNC: Connection refused
[4149 | signal handler] (1435673409) Received SIGTERM scheduling shutdown...
[4149] 30 Jun 10:10:09.108 # User requested shutdown...
[4149] 30 Jun 10:10:09.109 * Saving the final RDB snapshot before exiting.
[4149] 30 Jun 10:10:09.121 * DB saved on disk
[4149] 30 Jun 10:10:09.121 * Removing the pid file.
[4149] 30 Jun 10:10:09.121 * Removing the unix socket file.
[4149] 30 Jun 10:10:09.121 # Redis is now ready to exit, bye bye...

Comment 7 Giulio Fidente 2015-07-01 13:59:09 UTC

Looks like we provide as master a hostname which resolves to a network where redis is not listening.

Comment 8 David Vossel 2015-07-01 17:23:15 UTC

(In reply to Giulio Fidente from comment #7)
> Looks like we provide as master a hostname which resolves to a network where
> redis is not listening.

The redis agent expects pacemaker node names to be network resolvable. When a redis instance is promoted to master, all the slave redis instances are told to point at the new master instance which is represented by the pacemaker node name.

Comment 10 Marius Cornea 2015-07-22 09:28:28 UTC

Tested this on a baremetal environment with fencing enabled and the issue is not present anymore. 

[stack@bldr16cc09 ~]$ nova list
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| ID                                   | Name                    | Status | Task State | Power State | Networks            |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
| c162f9fe-efba-4351-8403-45223d715fd9 | overcloud-cephstorage-0 | ACTIVE | -          | Running     | ctlplane=10.3.58.10 |
| 82b237cd-a9ac-409d-a750-e9d012c704d0 | overcloud-compute-0     | ACTIVE | -          | Running     | ctlplane=10.3.58.11 |
| 0c658847-9faf-4209-bdbf-8acf0f55834f | overcloud-controller-0  | ACTIVE | -          | Running     | ctlplane=10.3.58.12 |
| ae2cc01b-378a-476c-a3a4-20e73cfcc62a | overcloud-controller-1  | ACTIVE | -          | Running     | ctlplane=10.3.58.14 |
| f373d947-cf5b-42d9-beab-d6ce2ba7c916 | overcloud-controller-2  | ACTIVE | -          | Running     | ctlplane=10.3.58.13 |
+--------------------------------------+-------------------------+--------+------------+-------------+---------------------+
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.12 6379
$263
# Replication
role:master
connected_slaves:1
slave0:ip=10.3.58.14,port=6379,state=online,offset=11240430,lag=1
master_repl_offset:11240430
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11228839
repl_backlog_histlen:11592

^C
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.14 6379
$378
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:11241220
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

^C
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.13 6379
^C
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ ping 10.3.58.13
PING 10.3.58.13 (10.3.58.13) 56(84) bytes of data.
From 10.3.58.1 icmp_seq=1 Destination Host Unreachable
From 10.3.58.1 icmp_seq=2 Destination Host Unreachable
From 10.3.58.1 icmp_seq=3 Destination Host Unreachable
From 10.3.58.1 icmp_seq=4 Destination Host Unreachable
64 bytes from 10.3.58.13: icmp_seq=5 ttl=64 time=1444 ms
64 bytes from 10.3.58.13: icmp_seq=6 ttl=64 time=444 ms
64 bytes from 10.3.58.13: icmp_seq=7 ttl=64 time=0.260 ms
^C
--- 10.3.58.13 ping statistics ---
33 packets transmitted, 29 received, +4 errors, 12% packet loss, time 32001ms
rtt min/avg/max/mdev = 0.150/65.336/1444.062/272.848 ms, pipe 4
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ 
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.12 6379
$330
# Replication
role:master
connected_slaves:2
slave0:ip=10.3.58.14,port=6379,state=online,offset=11268018,lag=1
slave1:ip=10.3.58.13,port=6379,state=online,offset=11268018,lag=1
master_repl_offset:11268212
repl_backlog_active:1
repl_backlog_size:1048576
repl_backlog_first_byte_offset:11228839
repl_backlog_histlen:39374

^C
[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.13 6379
$378
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:0
master_sync_in_progress:0
slave_repl_offset:11268794
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

[stack@bldr16cc09 ~]$ cat <(echo info replication) - | nc 10.3.58.14 6379
$378
# Replication
role:slave
master_host:overcloud-controller-0
master_port:6379
master_link_status:up
master_last_io_seconds_ago:1
master_sync_in_progress:0
slave_repl_offset:11269196
slave_priority:100
slave_read_only:1
connected_slaves:0
master_repl_offset:0
repl_backlog_active:0
repl_backlog_size:1048576
repl_backlog_first_byte_offset:0
repl_backlog_histlen:0

Comment 12 errata-xmlrpc 2015-08-05 13:57:29 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2015:1549