Bug 1737575 - Unable to backup etcd data
Summary: Unable to backup etcd data
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd
Version: 4.1.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.2.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks: 1740147 1741479
TreeView+ depends on / blocked
 
Reported: 2019-08-05 16:55 UTC by rvanderp
Modified: 2019-10-16 06:35 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1741479 (view as bug list)
Environment:
OpenShift Container Platform v4.1.8
Last Closed: 2019-10-16 06:34:48 UTC
Target Upstream Version:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift machine-config-operator pull 1052 0 None None None 2019-08-08 18:02:07 UTC
Red Hat Product Errata RHBA-2019:2922 0 None None None 2019-10-16 06:35:05 UTC

Description rvanderp 2019-08-05 16:55:54 UTC
Description of problem:
Unable to backup etcd data per instructions at https://docs.openshift.com/container-platform/4.1/disaster_recovery/backing-up-etcd.html#backing-up-etcd-data_backup-etcd

Version-Release number of selected component (if applicable):
4.1.8

How reproducible:
Consistently

Steps to Reproduce:
1. rsh in to master node
2. Attempt to backup etcd member using instructions
3.

Actual results:
sh-4.4# /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db
Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
Trying to backup etcd client certs..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-1/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-2/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-3/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-4/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-5/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-6/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-7/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-8/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-9/secrets/etcd-client does not contain etcd client certs, trying next source ..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-10/secrets/etcd-client does not contain etcd client certs, trying next source ..
etcd-member.yaml found in ./assets/backup/
Error: open ./assets/backup/etcd-client.crt: no such file or directory


Expected results:
Snapshot should be taken

Additional info:
It does appear the certs being searched for exist, but etcd-member isn't being included in the search path.

Contents of /etc/kubernetes/static-pod-resources:
sh-4.4# ls -l /etc/kubernetes/static-pod-resources/
total 4
drwxr-xr-x. 2 root root 4096 Jul 19 17:28 etcd-member
drwxr-xr-x. 4 root root   39 Jul 19 17:28 kube-apiserver-certs
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-apiserver-pod-24
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-apiserver-pod-27
drwxr-xr-x. 4 root root   70 Jul 20 00:55 kube-apiserver-pod-28
drwxr-xr-x. 4 root root   70 Aug  1 17:57 kube-apiserver-pod-29
drwxr-xr-x. 4 root root   70 Aug  2 12:56 kube-apiserver-pod-30
drwxr-xr-x. 4 root root   70 Aug  3 05:40 kube-apiserver-pod-31
drwxr-xr-x. 4 root root   70 Aug  4 00:54 kube-apiserver-pod-32
drwxr-xr-x. 3 root root   24 Jul 19 17:28 kube-controller-manager-certs
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-18
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-20
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-21
drwxr-xr-x. 4 root root   79 Jul 19 17:28 kube-controller-manager-pod-22
drwxr-xr-x. 4 root root   79 Jul 20 00:55 kube-controller-manager-pod-23
drwxr-xr-x. 4 root root   79 Aug  3 05:40 kube-controller-manager-pod-25
drwxr-xr-x. 4 root root   79 Aug  4 00:54 kube-controller-manager-pod-26
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-16
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-17
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-18
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-19
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-20
drwxr-xr-x. 4 root root   70 Aug  3 05:40 kube-scheduler-pod-21
drwxr-xr-x. 4 root root   70 Jul 19 17:28 kube-scheduler-pod-5

Comment 1 rvanderp 2019-08-05 17:05:11 UTC
Actually, after reviewing the script, it appears that only kube-apiserver-pod{1..10} are being searched.  

backup_etcd_client_certs() {
  echo "Trying to backup etcd client certs.."
  if [ -f "$ASSET_DIR/backup/etcd-ca-bundle.crt" ] && [ -f "$ASSET_DIR/backup/etcd-client.crt" ] && [ -f "$ASSET_DIR/backup/etcd-client.key" ]; then
     echo "etcd client certs already backed up and available $ASSET_DIR/backup/"
  else
    for i in {1..10}; do
        SECRET_DIR="${CONFIG_FILE_DIR}/static-pod-resources/kube-apiserver-pod-${i}/secrets/etcd-client"
        CONFIGMAP_DIR="${CONFIG_FILE_DIR}/static-pod-resources/kube-apiserver-pod-${i}/configmaps/etcd-serving-ca"
        if [ -f "$CONFIGMAP_DIR/ca-bundle.crt" ] && [ -f "$SECRET_DIR/tls.crt" ] && [ -f "$SECRET_DIR/tls.key" ]; then
          cp $CONFIGMAP_DIR/ca-bundle.crt $ASSET_DIR/backup/etcd-ca-bundle.crt
          cp $SECRET_DIR/tls.crt $ASSET_DIR/backup/etcd-client.crt
          cp $SECRET_DIR/tls.key $ASSET_DIR/backup/etcd-client.key
          break
        else
          echo "$SECRET_DIR does not contain etcd client certs, trying next source .."
        fi
    done
   fi
}

The certs are in kube-apiserver-pod-24 and above in my cluster.


./kube-apiserver-pod-24/configmaps/etcd-serving-ca
./kube-apiserver-pod-24/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-27/configmaps/etcd-serving-ca
./kube-apiserver-pod-27/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-28/configmaps/etcd-serving-ca
./kube-apiserver-pod-28/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-29/configmaps/etcd-serving-ca
./kube-apiserver-pod-29/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-30/configmaps/etcd-serving-ca
./kube-apiserver-pod-30/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-31/configmaps/etcd-serving-ca
./kube-apiserver-pod-31/configmaps/etcd-serving-ca/ca-bundle.crt
./kube-apiserver-pod-32/configmaps/etcd-serving-ca
./kube-apiserver-pod-32/configmaps/etcd-serving-ca/ca-bundle.crt

Comment 2 ge liu 2019-08-06 03:31:22 UTC
Hi Sam, the question is what result in the kube-apiserver-pod-x(x >=10) appears?

The default situation after fresh installation is below:

# cd /etc/kubernetes/static-pod-resources/
[*****static-pod-resources]# ls
etcd-member           kube-apiserver-pod-3  kube-apiserver-pod-6           kube-controller-manager-pod-3  kube-scheduler-pod-4
kube-apiserver-certs  kube-apiserver-pod-4  kube-controller-manager-certs  kube-controller-manager-pod-4  kube-scheduler-pod-5
kube-apiserver-pod-2  kube-apiserver-pod-5  kube-controller-manager-pod-2  kube-controller-manager-pod-5

and I tested it many times, have not met the kube-apiserver-pod-x more than 10. So it's a valuable testcase.

Comment 3 Sam Batschelet 2019-08-06 13:12:54 UTC
> Hi Sam, the question is what result in the kube-apiserver-pod-x(x >=10) appears?

My understanding is that the apiserver has a pruner for static pod assets to keep this from happening. While we should cover this in the script it also seems like a possible bug in cluster-kube-apiserver-operator pruner?

Comment 8 ge liu 2019-08-15 08:36:17 UTC
Verified in 4.2.0-0.nightly-2019-08-13-183722
# oc edit kubeapiserver
kubeapiserver.operator.openshift.io/cluster edited

add Debug to logLevel, it will trigger kube-apiserver-pod-x, x number will add 1
spec:
  logLevel: ""

$ sudo /usr/local/bin/etcd-snapshot-backup.sh ./assets/backup/snapshot.db
Creating asset directory ./assets
Downloading etcdctl binary..
etcdctl version: 3.3.10
API version: 3.3
Trying to backup etcd client certs..
/etc/kubernetes/static-pod-resources/kube-apiserver-pod-10 does not contain etcd client certs, trying next ..
etcd client certs found in /etc/kubernetes/static-pod-resources/kube-apiserver-pod-11 backing up to ./assets/backup/
Backing up /etc/kubernetes/manifests/etcd-member.yaml to ./assets/backup/
Snapshot saved at ./assets/backup/snapshot.db

Comment 9 errata-xmlrpc 2019-10-16 06:34:48 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2019:2922


Note You need to log in before you can comment on or make changes to this bug.