Description of problem:
I deleted etcd container from one of the masters in my HA cluster using "crictl rm". I was expecting it will start back up since its a static container. But it is stuck in CrashLoopBackOff state.
I see following error in the container log
2018-05-09 18:57:20.326665 W | etcdmain: found invalid file/dir test under data dir /var/lib/etcd/ (Ignore this if you are upgrading etcd)
2018-05-09 18:57:20.326680 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-05-09 18:57:20.326699 I | embed: peerTLS: cert = /etc/etcd/peer.crt, key = /etc/etcd/peer.key, ca = , trusted-ca = /etc/etcd/ca.crt, client-cert-auth = true
2018-05-09 18:57:20.327403 I | embed: listening for peers on https://172.31.59.66:2380
2018-05-09 18:57:20.327491 I | embed: listening for client requests on 172.31.59.66:2379
2018-05-09 18:57:20.327697 C | etcdmain: cannot access data directory: open /var/lib/etcd/.touch: permission denied
Version-Release number of selected component (if applicable):
Steps to Reproduce:
1. Create a HA cluster with 3 etcd and master co-located
2. ssh to one of the masters and "crictl rm <etcd-container>"
3. it never starts back
Etcd container never starts back
Etcd container should start again
Note: I was able to make it start after rebooting the instance.
It seems like the volume that etcd used for /var/lib/etcd is getting re-used but the permissions are wrong for the user that runs the container?
This might be storage or CRI-O bug (not related to etcd), assigning to containers team for triage.
Can you share the k8s configuration for etcd?
Also, what are the permissions including SELinux label for the /var/lib/etcd directory?
Can you do a smoke test enabling/disabling selinux just to make sure everything "works" with and w/o selinux?
I am not able to reproduce this issue in following build, closing it.