Description of problem: Unable to backup etcd data per instructions at https://docs.openshift.com/container-platform/4.1/disaster_recovery/backing-up-etcd.html#backing-up-etcd-data_backup-etcd Version-Release number of selected component (if applicable): 4.1.8 How reproducible: Consistently Steps to Reproduce: 1. rsh in to master node 2. Attempt to backup etcd member using instructions 3. Actual results: sh-4.4# /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db Downloading etcdctl binary.. etcdctl version: 3.3.10 API version: 3.3 Trying to backup etcd client certs.. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-1/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-2/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-3/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-4/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-5/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-6/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-8/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-9/secrets/etcd-client does not contain etcd client certs, trying next source .. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-10/secrets/etcd-client does not contain etcd client certs, trying next source .. etcd-member.yaml found in ./assets/backup/ Error: open ./assets/backup/etcd-client.crt: no such file or directory Expected results: Snapshot should be taken Additional info: It does appear the certs being searched for exist, but etcd-member isn't being included in the search path. Contents of /etc/kubernetes/static-pod-resources: sh-4.4# ls -l /etc/kubernetes/static-pod-resources/ total 4 drwxr-xr-x. 2 root root 4096 Jul 19 17:28 etcd-member drwxr-xr-x. 4 root root 39 Jul 19 17:28 kube-apiserver-certs drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-apiserver-pod-24 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-apiserver-pod-27 drwxr-xr-x. 4 root root 70 Jul 20 00:55 kube-apiserver-pod-28 drwxr-xr-x. 4 root root 70 Aug 1 17:57 kube-apiserver-pod-29 drwxr-xr-x. 4 root root 70 Aug 2 12:56 kube-apiserver-pod-30 drwxr-xr-x. 4 root root 70 Aug 3 05:40 kube-apiserver-pod-31 drwxr-xr-x. 4 root root 70 Aug 4 00:54 kube-apiserver-pod-32 drwxr-xr-x. 3 root root 24 Jul 19 17:28 kube-controller-manager-certs drwxr-xr-x. 4 root root 79 Jul 19 17:28 kube-controller-manager-pod-18 drwxr-xr-x. 4 root root 79 Jul 19 17:28 kube-controller-manager-pod-20 drwxr-xr-x. 4 root root 79 Jul 19 17:28 kube-controller-manager-pod-21 drwxr-xr-x. 4 root root 79 Jul 19 17:28 kube-controller-manager-pod-22 drwxr-xr-x. 4 root root 79 Jul 20 00:55 kube-controller-manager-pod-23 drwxr-xr-x. 4 root root 79 Aug 3 05:40 kube-controller-manager-pod-25 drwxr-xr-x. 4 root root 79 Aug 4 00:54 kube-controller-manager-pod-26 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-scheduler-pod-16 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-scheduler-pod-17 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-scheduler-pod-18 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-scheduler-pod-19 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-scheduler-pod-20 drwxr-xr-x. 4 root root 70 Aug 3 05:40 kube-scheduler-pod-21 drwxr-xr-x. 4 root root 70 Jul 19 17:28 kube-scheduler-pod-5
Actually, after reviewing the script, it appears that only kube-apiserver-pod{1..10} are being searched. backup_etcd_client_certs() { echo "Trying to backup etcd client certs.." if [ -f "$ASSET_DIR/backup/etcd-ca-bundle.crt" ] && [ -f "$ASSET_DIR/backup/etcd-client.crt" ] && [ -f "$ASSET_DIR/backup/etcd-client.key" ]; then echo "etcd client certs already backed up and available $ASSET_DIR/backup/" else for i in {1..10}; do SECRET_DIR="${CONFIG_FILE_DIR}/static-pod-resources/kube-apiserver-pod-${i}/secrets/etcd-client" CONFIGMAP_DIR="${CONFIG_FILE_DIR}/static-pod-resources/kube-apiserver-pod-${i}/configmaps/etcd-serving-ca" if [ -f "$CONFIGMAP_DIR/ca-bundle.crt" ] && [ -f "$SECRET_DIR/tls.crt" ] && [ -f "$SECRET_DIR/tls.key" ]; then cp $CONFIGMAP_DIR/ca-bundle.crt $ASSET_DIR/backup/etcd-ca-bundle.crt cp $SECRET_DIR/tls.crt $ASSET_DIR/backup/etcd-client.crt cp $SECRET_DIR/tls.key $ASSET_DIR/backup/etcd-client.key break else echo "$SECRET_DIR does not contain etcd client certs, trying next source .." fi done fi } The certs are in kube-apiserver-pod-24 and above in my cluster. ./kube-apiserver-pod-24/configmaps/etcd-serving-ca ./kube-apiserver-pod-24/configmaps/etcd-serving-ca/ca-bundle.crt ./kube-apiserver-pod-27/configmaps/etcd-serving-ca ./kube-apiserver-pod-27/configmaps/etcd-serving-ca/ca-bundle.crt ./kube-apiserver-pod-28/configmaps/etcd-serving-ca ./kube-apiserver-pod-28/configmaps/etcd-serving-ca/ca-bundle.crt ./kube-apiserver-pod-29/configmaps/etcd-serving-ca ./kube-apiserver-pod-29/configmaps/etcd-serving-ca/ca-bundle.crt ./kube-apiserver-pod-30/configmaps/etcd-serving-ca ./kube-apiserver-pod-30/configmaps/etcd-serving-ca/ca-bundle.crt ./kube-apiserver-pod-31/configmaps/etcd-serving-ca ./kube-apiserver-pod-31/configmaps/etcd-serving-ca/ca-bundle.crt ./kube-apiserver-pod-32/configmaps/etcd-serving-ca ./kube-apiserver-pod-32/configmaps/etcd-serving-ca/ca-bundle.crt
Hi Sam, the question is what result in the kube-apiserver-pod-x(x >=10) appears? The default situation after fresh installation is below: # cd /etc/kubernetes/static-pod-resources/ [*****static-pod-resources]# ls etcd-member kube-apiserver-pod-3 kube-apiserver-pod-6 kube-controller-manager-pod-3 kube-scheduler-pod-4 kube-apiserver-certs kube-apiserver-pod-4 kube-controller-manager-certs kube-controller-manager-pod-4 kube-scheduler-pod-5 kube-apiserver-pod-2 kube-apiserver-pod-5 kube-controller-manager-pod-2 kube-controller-manager-pod-5 and I tested it many times, have not met the kube-apiserver-pod-x more than 10. So it's a valuable testcase.
> Hi Sam, the question is what result in the kube-apiserver-pod-x(x >=10) appears? My understanding is that the apiserver has a pruner for static pod assets to keep this from happening. While we should cover this in the script it also seems like a possible bug in cluster-kube-apiserver-operator pruner?
Verified in 4.2.0-0.nightly-2019-08-13-183722 # oc edit kubeapiserver kubeapiserver.operator.openshift.io/cluster edited add Debug to logLevel, it will trigger kube-apiserver-pod-x, x number will add 1 spec: logLevel: "" $ sudo /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db Creating asset directory ./assets Downloading etcdctl binary.. etcdctl version: 3.3.10 API version: 3.3 Trying to backup etcd client certs.. /etc/kubernetes/static-pod-resources/kube-apiserver-pod-10 does not contain etcd client certs, trying next .. etcd client certs found in /etc/kubernetes/static-pod-resources/kube-apiserver-pod-11 backing up to ./assets/backup/ Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/ Snapshot saved at ./assets/backup/snapshot.db
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2019:2922