Description of problem:
etcd database is over 700M
it is possible to start one member cluster. when second is added the following messages appeared and no synchronization is performed.
2016-04-19 10:30:35.044306 D | raft: 5429af89680001 [firstindex: 57198439, commit: 57204158] sent snapshot[index: 57198438, term: 1389] to a7e9771a3b6aca97 [next = 1, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2016-04-19 10:30:35.044337 D | raft: 5429af89680001 paused sending replication messages to a7e9771a3b6aca97 [next = 1, match = 0, state = ProgressStateSnapshot, waiting = true, pendingSnapshot = 57198438]
2016-04-19 10:30:40.455394 E | rafthttp: failed to write a7e9771a3b6aca97 on pipeline (read tcp 192.168.100.131:2380: i/o timeout)
2016-04-19 10:30:40.455435 D | raft: 5429af89680001 failed to send message to a7e9771a3b6aca97 because it is unreachable [next = 1, match = 0, state = ProgressStateSnapshot, waiting = true, pendingSnapshot = 57198438]
2016-04-19 10:30:40.455450 D | raft: 5429af89680001 snapshot failed, resumed sending replication messages to a7e9771a3b6aca97 [next = 1, match = 0, state = ProgressStateProbe, waiting = false, pendingSnapshot = 0]
2016-04-19 10:28:59.184800 E | etcdserver: publish error: etcdserver: request timed out
2016-04-19 10:29:01.519021 E | rafthttp: failed to read raft message (unexpected EOF)
2016-04-19 10:29:07.073515 E | rafthttp: failed to read raft message (unexpected EOF)
With an empty database or with backup made on testing environment everything run fine (on same servers).
So, there is no issues with network connections or with configuration of the servers. Adding member does not work only with a particular database backup which is more then 700M in size.
Version-Release number of selected component (if applicable):
take backup and try to build 2 member cluster.
Steps to Reproduce:
The use case here is slightly concerning.
It sounds like: they have an existing cluster and want to add a new member, but there are several steps involved on key creation etc that needs to be done.
I'm not certain if etcd cluster addition(s) is a vetted path in the installer.
(In reply to Timothy St. Clair from comment #4)
The issues being discussed here seems to be with the following procedure: https://docs.openshift.com/enterprise/latest/install_config/downgrade.html#downgrade-bringing-openshift-services-back-online
This process of re-importing the data. However the errors above don't make since unless https://docs.openshift.com/enterprise/latest/install_config/downgrade.html#downgrade-adding-addtl-etcd-members (e) is being followed after this step.
Can you provide more details on the reproducer.
(In reply to Eric Rich from comment #5)
> (In reply to Timothy St. Clair from comment #4)
> > xref:
> > https://docs.openshift.com/enterprise/latest/install_config/downgrade.
> > html#downgrading-restoring-etcd
> The issues being discussed here seems to be with the following procedure:
> This process of re-importing the data. However the errors above don't make
> since unless
> html#downgrade-adding-addtl-etcd-members (e) is being followed after this
Can we confirm that the following has been run:
>> Now start etcd on the new member:
>> # rm -rf /var/lib/etcd/member
>> # systemctl enable etcd
>> # systemctl start etcd
I ask as its not explicitly mentioned in any of the reproducer notes.
I can reproduce this. And when I built etcd v2.3.1 by hand (and fixed a small bug in that version as well), I was able to add a new cluster member whereas with v2.2.2 and v2.2.5 it fails.
A correction to my previous comment. It turns out that when I built both v2.2.5 and v2.3.1 with go 1.6, adding a new cluster member works. And when I built v2.2.5 with go 1.4.2 (which is what we use to build the etcd RPM), adding a new member failed.
Jakub, any idea what changed between go 1.4.2 and 1.6 (yes, I know, it's a huge delta presumably) that would cause i/o timeouts with https connections in 1.4.2 and no issues in 1.6?
I've narrowed down that the fix is somewhere after go1.5.4 and before go1.6rc1
I am investigating how to restore a node from a backup (which probably comes for the founder node during disaster recovery).
In general this is described here for failed nodes (in contrast to a cluster which must be recovered):
It is discouraged to use a backup for a disaster discovery though.
At https://github.com/coreos/etcd/blame/master/contrib/systemd/etcd2-backup-coreos/README.md#L249 it is sketched that it is possible to re-use a copy of the founder's data-dir. This though does not work out of the box because the WAL in the founder's copy has hard-coded its node-id. Each non-founder would re-use that, leading to conflict.
It is not hard to add a feature to 'etcdctl backup' to set a specific node-id in the WAL. This allows nodes to come up with the snapshot of the founder, but with their own node-id.
This is implemented experimentally here:
Next step: test this with the customer data.
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.
For information on the advisory, and where to find the updated
files, follow the link below.
If the solution does not work for you, open a new bug report.