Hide Forgot
Description of problem: Running restore.sh failed with 'Error: data-dir "/var/lib/etcd" exists' when follow the doc https://docs.google.com/document/d/1Z7xow84WdLUkgFiOaeY-QXmH1H8wnTg2vP1pQiuj22o/edit Looks like the script `cp -rap ${ETCD_DATA_DIR} $ASSET_DIR/backup/` should use `mv`: mv -rap ${ETCD_DATA_DIR} $ASSET_DIR/backup/? Version-Release number of selected component (if applicable): quay.io/openshift-release-dev/ocp-release:4.1.0-rc.3 How reproducible: Always Steps to Reproduce: 1. Prepare a 4.1 cluster where only one master is remained alive by: Create an IPI 4.1 cluster which has 3 masters and 2 workers. Check cluster status is normal (e.g. `oc get no`, `oc get co`, `oc get po -A | grep -vE "(Running|Completed)"` etc.) Then stop one master xxia-0513-destructive-6ksqs-master-0, wait a moment (new master will be recreated by machine api), the cluster still is alive. Then stop another master xxia-0513-destructive-6ksqs-master-1. The cluster will be dead because 2 of 3 ectd pods are down (i.e. etcd quorum loss) due to https://bugzilla.redhat.com/show_bug.cgi?id=1667557 2. Create a bastion, then follow the doc https://docs.google.com/document/d/1Z7xow84WdLUkgFiOaeY-QXmH1H8wnTg2vP1pQiuj22o/edit Actual results: 2. When coming to run the step `sh restore.sh`, it fails with below error: [root@ip-10-0-129-162 xxia-test-master-replacement]# sh restore.sh Creating asset directory ./assets Downloading etcdctl binary.. etcdctl version: 3.3.10 API version: 3.3 Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/ Backing up etcd data-dir.. Stopping etcd.. Waiting for etcd-member to stop Waiting for etcd-member to stop Waiting for etcd-member to stop Waiting for etcd-member to stop Restoring etcd member etcd-member-ip-10-0-129-162.ap-south-1.compute.internal from snapshot.. 2019-05-13 08:32:04.753779 I | pkg/netutil: resolving etcd-2.xxia-0513-destructive.qe.devcluster.openshift.com:2380 to 10.0.129.162:2380 Error: data-dir "/var/lib/etcd" exists Expected results: The error should not happen. Additional info:
(In reply to Xingxing Xia from comment #0) > Looks like the script `cp -rap ${ETCD_DATA_DIR} $ASSET_DIR/backup/` should > use `mv`: mv -rap ${ETCD_DATA_DIR} $ASSET_DIR/backup/? After re-trying with `mv`, the error disappeared. So the script should be updated for user to use.
Verified the new script works without error during testing https://bugzilla.redhat.com/show_bug.cgi?id=1709802 .