Have you tried the defrag procedure for etcd? The procedure is roughly explained in this Github comment: https://github.com/kubernetes/kops/issues/4005#issuecomment-349048006
Yes the following steps were run: ~~~ # export ETCDCTL_API=3 # source /etc/etcd/etcd.conf # rev=$(etcdctl3 --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS endpoint status --write-out="json" | egrep -o '"revision":[0-9]*' | egrep -o '[0-9]*' -m1) # etcdctl --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS compact $rev # etcdctl --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS defrag # etcdctl3 --cert=$ETCD_PEER_CERT_FILE --key=$ETCD_PEER_KEY_FILE --cacert=$ETCD_TRUSTED_CA_FILE --endpoints=$ETCD_LISTEN_CLIENT_URLS alarm disarm ~~~ I think the following issue is being hit: https://github.com/kubernetes/kubernetes/issues/45037 https://github.com/coreos/etcd/issues/8009 https://github.com/coreos/etcd/pull/8210
Ultimately this diagnosed to be a configuration issue! With one ETCD host not being configured with the recommended 4GB quota limit defaulting to 2GB quota limit. If this is still an issue please file a new BZ capturing the issue.
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days