Bugzilla (bugzilla.redhat.com) will be under maintenance for infrastructure upgrades and will not be available on July 31st between 12:30 AM - 05:30 AM UTC. We appreciate your understanding and patience. You can follow status.redhat.com for details.
Bug 1576564 - CRIO - Etcd keeps restarting in CrashLoopBackOff after container is deleted
Summary: CRIO - Etcd keeps restarting in CrashLoopBackOff after container is deleted
Keywords:
Status: CLOSED WORKSFORME
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Containers
Version: 3.10.0
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 3.10.0
Assignee: Mrunal Patel
QA Contact: DeShuai Ma
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-05-09 19:02 UTC by Vikas Laad
Modified: 2018-05-21 20:20 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-05-21 20:20:02 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Vikas Laad 2018-05-09 19:02:05 UTC
Description of problem:
I deleted etcd container from one of the masters in my HA cluster using "crictl rm". I was expecting it will start back up since its a static container. But it is stuck in CrashLoopBackOff state.

I see following error in the container log
2018-05-09 18:57:20.326665 W | etcdmain: found invalid file/dir test under data dir /var/lib/etcd/ (Ignore this if you are upgrading etcd)
2018-05-09 18:57:20.326680 N | etcdmain: the server is already initialized as member before, starting as etcd member...
2018-05-09 18:57:20.326699 I | embed: peerTLS: cert = /etc/etcd/peer.crt, key = /etc/etcd/peer.key, ca = , trusted-ca = /etc/etcd/ca.crt, client-cert-auth = true
2018-05-09 18:57:20.327403 I | embed: listening for peers on https://172.31.59.66:2380
2018-05-09 18:57:20.327491 I | embed: listening for client requests on 172.31.59.66:2379
2018-05-09 18:57:20.327697 C | etcdmain: cannot access data directory: open /var/lib/etcd/.touch: permission denied

Version-Release number of selected component (if applicable):
openshift v3.10.0-0.32.0
kubernetes v1.10.0+b81c8f8
etcd 3.2.16

Steps to Reproduce:
1. Create a HA cluster with 3 etcd and master co-located
2. ssh to one of the masters and "crictl rm <etcd-container>"
3. it never starts back

Actual results:
Etcd container never starts back

Expected results:
Etcd container should start again

Additional info:

Comment 1 Vikas Laad 2018-05-09 19:04:42 UTC
Note: I was able to make it start after rebooting the instance.

Comment 2 Michal Fojtik 2018-05-10 09:25:16 UTC
It seems like the volume that etcd used for /var/lib/etcd is getting re-used but the permissions are wrong for the user that runs the container?

This might be storage or CRI-O bug (not related to etcd), assigning to containers team for triage.

Comment 3 Mrunal Patel 2018-05-10 18:20:59 UTC
Can you share the k8s configuration for etcd?
Also, what are the permissions including SELinux label for the /var/lib/etcd directory?

Comment 4 Antonio Murdaca 2018-05-21 09:37:56 UTC
Can you do a smoke test enabling/disabling selinux just to make sure everything "works" with and w/o selinux?

Comment 5 Vikas Laad 2018-05-21 20:20:02 UTC
I am not able to reproduce this issue in following build, closing it.

openshift v3.10.0-0.47.0
kubernetes v1.10.0+b81c8f8
etcd 3.2.16


Note You need to log in before you can comment on or make changes to this bug.