Bug 1737575

Summary: Unable to backup etcd data
Product: OpenShift Container Platform Reporter: rvanderp
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED ERRATA QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: high    
Version: 4.1.0CC: laparici, sbatsche
Target Milestone: ---   
Target Release: 4.2.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1741479 (view as bug list) Environment:
OpenShift Container Platform v4.1.8
Last Closed: 2019-10-16 06:34:48 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1740147, 1741479    

Description rvanderp 2019-08-05 16:55:54 UTC
Description of problem:
Unable to backup etcd data per instructions at https://docs.openshift.com/container-platform/4.1/disaster_recovery/backing-up-etcd.html#backing-up-etcd-data_backup-etcd

Version-Release number of selected component (if applicable):
4.1.8

How reproducible:
Consistently

Steps to Reproduce:
1. rsh in to master node
2. Attempt to backup etcd member using instructions
3.

Actual results:
sh-4.4# /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db
Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
Trying to backup etcd client certs..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-1/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-2/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-3/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-4/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-5/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-6/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-8/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-9/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-10/secrets/etcd-client does not contain etcd client certs, trying next source ..
etcd-member.yaml found in ./assets/backup/
Error: open ./assets/backup/etcd-client.crt: no such file or directory


Expected results:
Snapshot should be taken

Additional info:
It does appear the certs being searched for exist, but etcd-member isn't being included in the search path.

Contents of /etc/kubernetes/static-pod-resources:
sh-4.4# ls -l /etc/kubernetes/static-pod-resources/
total 4
drwxr-xr-x. 2 root root 4096 Jul 19 17:28 etcd-member
drwxr-xr-x. 4 root root   39 Jul 19 17:28 kube-apiserver-certs
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-apiserver-pod-24
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-apiserver-pod-27
drwxr-xr-x. 4 root root   70 Jul 20 00:55 kube-apiserver-pod-28
drwxr-xr-x. 4 root root   70 Aug  1 17:57 kube-apiserver-pod-29
drwxr-xr-x. 4 root root   70 Aug  2 12:56 kube-apiserver-pod-30
drwxr-xr-x. 4 root root   70 Aug  3 05:40 kube-apiserver-pod-31
drwxr-xr-x. 4 root root   70 Aug  4 00:54 kube-apiserver-pod-32
drwxr-xr-x. 3 root root   24 Jul 19 17:28 kube-controller-manager-certs
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-18
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-20
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-21
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-22
drwxr-xr-x. 4 root root   79 Jul 20 00:55 kube-controller-manager-pod-23
drwxr-xr-x. 4 root root   79 Aug  3 05:40 kube-controller-manager-pod-25
drwxr-xr-x. 4 root root   79 Aug  4 00:54 kube-controller-manager-pod-26
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-16
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-17
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-18
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-19
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-20
drwxr-xr-x. 4 root root   70 Aug  3 05:40 kube-scheduler-pod-21
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-5

Comment 1 rvanderp 2019-08-05 17:05:11 UTC
Actually, after reviewing the script, it appears that only kube-apiserver-pod{1..10} are being searched.  

backup_etcd_client_certs() {
  echo "Trying to backup etcd client certs.."
  if [ -f "$ASSET_DIR/backup/etcd-ca-bundle.crt" ] && [ -f "$ASSET_DIR/backup/etcd-client.crt" ] && [ -f "$ASSET_DIR/backup/etcd-client.key" ]; then
     echo "etcd client certs already backed up and available $ASSET_DIR/backup/"
  else
    for i in {1..10}; do
        SECRET_DIR="${CONFIG_FILE_DIR}/static-pod-resources/kube-apiserver-pod-${i}/secrets/etcd-client"
        CONFIGMAP_DIR="${CONFIG_FILE_DIR}/static-pod-resources/kube-apiserver-pod-${i}/configmaps/etcd-serving-ca"
        if [ -f "$CONFIGMAP_DIR/ca-bundle.crt" ] && [ -f "$SECRET_DIR/tls.crt" ] && [ -f "$SECRET_DIR/tls.key" ]; then
          cp $CONFIGMAP_DIR/ca-bundle.crt $ASSET_DIR/backup/etcd-ca-bundle.crt
          cp $SECRET_DIR/tls.crt $ASSET_DIR/backup/etcd-client.crt
          cp $SECRET_DIR/tls.key $ASSET_DIR/backup/etcd-client.key
          break
        else
          echo "$SECRET_DIR does not contain etcd client certs, trying next source .."
        fi
    done
   fi
}

The certs are in kube-apiserver-pod-24 and above in my cluster.


./kube-apiserver-pod-24/configmaps/etcd-serving-ca
./kube-apiserver-pod-24/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-27/configmaps/etcd-serving-ca
./kube-apiserver-pod-27/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-28/configmaps/etcd-serving-ca
./kube-apiserver-pod-28/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-29/configmaps/etcd-serving-ca
./kube-apiserver-pod-29/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-30/configmaps/etcd-serving-ca
./kube-apiserver-pod-30/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-31/configmaps/etcd-serving-ca
./kube-apiserver-pod-31/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-32/configmaps/etcd-serving-ca
./kube-apiserver-pod-32/configmaps/etcd-serving-ca/ca-bundle.crt

Comment 2 ge liu 2019-08-06 03:31:22 UTC
Hi Sam, the question is what result in the kube-apiserver-pod-x(x >=10) appears?

The default situation after fresh installation is below:

# cd /etc/kubernetes/static-pod-resources/
[*****static-pod-resources]# ls
etcd-member           kube-apiserver-pod-3  kube-apiserver-pod-6           kube-controller-manager-pod-3  kube-scheduler-pod-4
kube-apiserver-certs  kube-apiserver-pod-4  kube-controller-manager-certs  kube-controller-manager-pod-4  kube-scheduler-pod-5
kube-apiserver-pod-2  kube-apiserver-pod-5  kube-controller-manager-pod-2  kube-controller-manager-pod-5

and I tested it many times, have not met the kube-apiserver-pod-x more than 10. So it's a valuable testcase.

Comment 3 Sam Batschelet 2019-08-06 13:12:54 UTC
> Hi Sam, the question is what result in the kube-apiserver-pod-x(x >=10) appears?

My understanding is that the apiserver has a pruner for static pod assets to keep this from happening. While we should cover this in the script it also seems like a possible bug in cluster-kube-apiserver-operator pruner?

Comment 8 ge liu 2019-08-15 08:36:17 UTC
Verified in 4.2.0-0.nightly-2019-08-13-183722
# oc edit kubeapiserver
kubeapiserver.operator.openshift.io/cluster edited

add Debug to logLevel, it will trigger kube-apiserver-pod-x, x number will add 1
spec:
  logLevel: ""

$ sudo /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db
Creating asset directory ./assets
Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
Trying to backup etcd client certs..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-10 does not contain etcd client certs, trying next ..
etcd client certs found in /etc/kubernetes/static-pod-resources/kube-apiserver-pod-11 backing up to ./assets/backup/
Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/
Snapshot saved at ./assets/backup/snapshot.db

Comment 9 errata-xmlrpc 2019-10-16 06:34:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922