Bug 1578087

Summary:

Stopping/Restarting etcd leader cause master api and controllers pods restart multiple times

Product:

OpenShift Container Platform

Reporter:

Vikas Laad <vlaad>

Component:

Master

Assignee:

Jordan Liggitt <jliggitt>

Status:

CLOSED ERRATA

QA Contact:

Vikas Laad <vlaad>

Severity:

high

Docs Contact:

Priority:

unspecified

Version:

3.10.0

CC:

aos-bugs, deads, jokerman, mfojtik, mmccomas, vlaad

Target Milestone:

---

Target Release:

3.10.0

Hardware:

Unspecified

OS:

Unspecified

Whiteboard:

Fixed In Version:

Doc Type:

No Doc Update

Doc Text:

undefined

Story Points:

---

Clone Of:

Environment:

Last Closed:

2018-07-30 19:15:23 UTC

Type:

Bug

Regression:

---

Mount Type:

---

Documentation:

---

CRM:

Verified Versions:

Category:

---

oVirt Team:

---

RHEL 7.3 requirements from Atomic Host:

Cloudforms Team:

---

Target Upstream Version:

Embargoed:

Attachments:

Description	Flags
master journal log	none
exited api container log	none
master api pod log	none
controllers exited container log	none
controller manager exited container log	none
api exited container log	none

Description Vikas Laad 2018-05-14 19:00:46 UTC

Description of problem:
Restarting or Stoppint etcd causes master api pod restart multiple times.

NAME                                                            READY     STATUS    RESTARTS   AGE
master-api-ip-172-31-49-98.us-west-2.compute.internal           1/1       Running   13         1h
master-controllers-ip-172-31-49-98.us-west-2.compute.internal   1/1       Running   4          1h

Version-Release number of selected component (if applicable):
openshift v3.10.0-0.41.0
kubernetes v1.10.0+b81c8f8
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1. Create OCP cluster with 3 etcd (not co-located), 1 master, 1 infra and 2 compute nodes
2. Create few pods, imagestreams, builds etc
3. Stop etcd leader node from aws console
4. watch master api and controllers pods
5. oc commands fail sometimes when api pod is restarting

Actual results:
Many restarts of master api and controllers pods

Expected results:
No restarts of master api and controllers pods

Additional info:
attaching jounal logs from master node, master api and controller pod logs, exited container logs

Comment 1 Vikas Laad 2018-05-14 19:03:51 UTC

Created attachment 1436508 [details]
master journal log

Comment 2 Vikas Laad 2018-05-14 19:04:11 UTC

Created attachment 1436509 [details]
exited api container log

Comment 3 Vikas Laad 2018-05-14 19:04:31 UTC

Created attachment 1436510 [details]
master api pod log

Comment 5 Vikas Laad 2018-05-15 13:27:37 UTC

I will attach controller manager logs today.

Comment 6 Vikas Laad 2018-05-15 14:24:19 UTC

Created attachment 1436801 [details]
controllers exited container log

Comment 7 Vikas Laad 2018-05-15 14:24:42 UTC

Created attachment 1436802 [details]
controller manager exited container log

Comment 8 Vikas Laad 2018-05-15 14:25:03 UTC

Created attachment 1436803 [details]
api exited container log

Comment 17 Jordan Liggitt 2018-05-21 11:49:55 UTC

This seems like it might be related to the issue fixed by https://github.com/openshift/origin/pull/19638

Comment 18 Michal Fojtik 2018-05-21 12:15:54 UTC

Definitely, moving on QA to test that fix. 

Vikas can you try with the latest build?

Comment 20 Vikas Laad 2018-05-22 15:39:11 UTC

I did not see this problem in following version, tried multiple times to restart the etcd leader.

openshift v3.10.0-0.50.0
kubernetes v1.10.0+b81c8f8

Comment 22 errata-xmlrpc 2018-07-30 19:15:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816