Bug 1402771 - The database backed by etcd-3.x can't be used by etcd-2.x
Summary: The database backed by etcd-3.x can't be used by etcd-2.x
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.4.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: ---
Assignee: Scott Dodson
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-08 10:15 UTC by Anping Li
Modified: 2017-08-16 19:51 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
If the version of etcd used to produce the etcd backup was version 3.x the backup can only be loaded by etcd 3.x. This occurs when running etcd in a containerized install and the version of the rpm installed on the host differs from that running inside the container. We have updated the backup playbooks to use the version of etcd from within the container which ensures that a matching version of etcd is used.
Clone Of:
Environment:
Last Closed: 2017-08-10 05:17:28 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHEA-2017:1716 0 normal SHIPPED_LIVE Red Hat OpenShift Container Platform 3.6 RPM Release Advisory 2017-08-10 09:02:50 UTC

Description Anping Li 2016-12-08 10:15:54 UTC
Description of problem:
if the etcd data was backed by the etcd-3.x which is done by upgrade_etcd.yml, it failed to rollback to etcd-2.x using the backup data,  

There are similar error with the etcd backup issue.

"Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: panic: runtime error: makeslice: len out of range"

Version-Release number of selected component (if applicable):
openshift-ansible-playbooks-3.4.35-1.git.0.2e13650.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. install ocp v3.3 with standalone etcd 
2. upgrade etcd
ansile-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/upgrade_etcd.yml
3 systemctl stop etcd_container
4 copy backup etcd data file
# ETCD_DIR=/var/lib/etcd/
# mv $ETCD_DIR /var/lib/etcd.orig
# cp -Rp /var/lib/origin/etcd-backup-<timestamp>/ $ETCD_DIR
# chcon -R --reference /var/lib/etcd.orig/ $ETCD_DIR
# chown -R etcd:etcd $ETCD_DIR

5 Modify /etc/systemd/system/etcd_container.service;
  3.1)  set correct image version
  3.2)  add --force-new-cluster
  For example
   ExecStart=/usr/bin/docker run --name etcd_container --rm -v /var/lib/etcd:/var/lib/etcd:z -v /etc/etcd:/etc/etcd:z --env-file=/etc/etcd/etcd.conf --net=host --entrypoint=/usr/bin/etcd registry.access.redhat.com/rhel7/etcd3:3.0.14
   ExecStart=/usr/bin/docker run --name etcd_container --rm -v /var/lib/etcd:/var/lib/etcd:z -v /etc/etcd:/etc/etcd:z --env-file=/etc/etcd/etcd.conf --net=host --entrypoint=/usr/bin/etcd registry.access.redhat.com/rhel7/etcd --force-new-cluster
6 systemctl daemon-reload ; systemctl start etcd_container
7 check the etcd_container service status

Actual results:
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.148588 I | etcdmain: listening for client requests on https://192.168.1.172:2379
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150030 I | etcdserver: name = default
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150041 I | etcdserver: force new cluster
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150047 I | etcdserver: data dir = /var/lib/etcd/
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150056 I | etcdserver: member dir = /var/lib/etcd/member
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150063 I | etcdserver: heartbeat = 500ms
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150068 I | etcdserver: election = 2500ms
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150074 I | etcdserver: snapshot count = 10000
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: 2016-12-08 09:27:39.150087 I | etcdserver: advertise client URLs = https://192.168.1.172:2379
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: panic: runtime error: makeslice: len out of range
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: goroutine 1 [running]:
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: panic(0xdbd840, 0xc8201dfa90)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /usr/lib/golang/src/runtime/panic.go:481 +0x3e6
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/wal.(*decoder).decode(0xc82021a990, 0xc820187e58, 0x0, 0x0)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/wal/decoder.go:55 +0x142
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/wal.(*WAL).ReadAll(0xc8201e2410, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, 0x0, ...)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/wal/wal.go:237 +0x214
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/etcdserver.readWAL(0xc8202228c0, 0x18, 0x0, 0x0, 0x0, 0x0, 0x0, 0xc8201e2410, 0xc820010200, 0x0, ...)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/etcdserver/storage.go:87 +0x228
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/etcdserver.restartAsStandaloneNode(0xc820075e00, 0x0, 0x7f654e095028, 0xc82015e1c0, 0x0, 0x0, 0x0, 0x0)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/etcdserver/raft.go:371 +0x11a
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/etcdserver.NewServer(0xc820075e00, 0xc820075eb8, 0x0, 0x0)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/etcdserver/server.go:335 +0x3ca6
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/etcdmain.startEtcd(0xc820077400, 0x0, 0x0, 0x0)
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/etcdmain/etcd.go:302 +0x1b40
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: github.com/coreos/etcd/etcdmain.Main()
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/etcdmain/etcd.go:118 +0x2142
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: main.main()
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25602]: /builddir/build/BUILD/etcd-2.3.7/src/github.com/coreos/etcd/main.go:37 +0xe3
Dec 08 04:27:39 ha1-ose-1-4.novalocal systemd[1]: etcd_container.service: main process exited, code=exited, status=2/INVALIDARGUMENT
Dec 08 04:27:39 ha1-ose-1-4.novalocal etcd_container[25679]: Failed to stop container (etcd_container): Error response from daemon: No such container: etcd_container
Dec 08 04:27:39 ha1-ose-1-4.novalocal systemd[1]: etcd_container.service: control process exited, code=exited status=1
Dec 08 04:27:39 ha1-ose-1-4.novalocal systemd[1]: Unit etcd_container.service entered failed state.
Dec 08 04:27:39 ha1-ose-1-4.novalocal systemd[1]: etcd_container.service failed.

Expected results:


Additional info:

Comment 8 Scott Dodson 2017-06-09 04:04:40 UTC
This should be fixed in the current playbooks because we perform the backup from within the container now. Please confirm

Comment 9 Anping Li 2017-06-09 11:26:39 UTC
The database was backed up by etcdctl in containers. no such issue in v3.6

Comment 11 errata-xmlrpc 2017-08-10 05:17:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:1716


Note You need to log in before you can comment on or make changes to this bug.