Bug 1578087

Summary: Stopping/Restarting etcd leader cause master api and controllers pods restart multiple times
Product: OpenShift Container Platform Reporter: Vikas Laad <vlaad>
Component: MasterAssignee: Jordan Liggitt <jliggitt>
Status: CLOSED ERRATA QA Contact: Vikas Laad <vlaad>
Severity: high Docs Contact:
Priority: unspecified    
Version: 3.10.0CC: aos-bugs, deads, jokerman, mfojtik, mmccomas, vlaad
Target Milestone: ---   
Target Release: 3.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
undefined
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-07-30 19:15:23 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
master journal log
none
exited api container log
none
master api pod log
none
controllers exited container log
none
controller manager exited container log
none
api exited container log none

Description Vikas Laad 2018-05-14 19:00:46 UTC
Description of problem:
Restarting or Stoppint etcd causes master api pod restart multiple times.

NAME                                                            READY     STATUS    RESTARTS   AGE
master-api-ip-172-31-49-98.us-west-2.compute.internal           1/1       Running   13         1h
master-controllers-ip-172-31-49-98.us-west-2.compute.internal   1/1       Running   4          1h

Version-Release number of selected component (if applicable):
openshift v3.10.0-0.41.0
kubernetes v1.10.0+b81c8f8
etcd 3.2.16

How reproducible:
Always

Steps to Reproduce:
1. Create OCP cluster with 3 etcd (not co-located), 1 master, 1 infra and 2 compute nodes
2. Create few pods, imagestreams, builds etc
3. Stop etcd leader node from aws console
4. watch master api and controllers pods
5. oc commands fail sometimes when api pod is restarting

Actual results:
Many restarts of master api and controllers pods

Expected results:
No restarts of master api and controllers pods

Additional info:
attaching jounal logs from master node, master api and controller pod logs, exited container logs

Comment 1 Vikas Laad 2018-05-14 19:03:51 UTC
Created attachment 1436508 [details]
master journal log

Comment 2 Vikas Laad 2018-05-14 19:04:11 UTC
Created attachment 1436509 [details]
exited api container log

Comment 3 Vikas Laad 2018-05-14 19:04:31 UTC
Created attachment 1436510 [details]
master api pod log

Comment 5 Vikas Laad 2018-05-15 13:27:37 UTC
I will attach controller manager logs today.

Comment 6 Vikas Laad 2018-05-15 14:24:19 UTC
Created attachment 1436801 [details]
controllers exited container log

Comment 7 Vikas Laad 2018-05-15 14:24:42 UTC
Created attachment 1436802 [details]
controller manager exited container log

Comment 8 Vikas Laad 2018-05-15 14:25:03 UTC
Created attachment 1436803 [details]
api exited container log

Comment 17 Jordan Liggitt 2018-05-21 11:49:55 UTC
This seems like it might be related to the issue fixed by https://github.com/openshift/origin/pull/19638

Comment 18 Michal Fojtik 2018-05-21 12:15:54 UTC
Definitely, moving on QA to test that fix. 

Vikas can you try with the latest build?

Comment 20 Vikas Laad 2018-05-22 15:39:11 UTC
I did not see this problem in following version, tried multiple times to restart the etcd leader.

openshift v3.10.0-0.50.0
kubernetes v1.10.0+b81c8f8

Comment 22 errata-xmlrpc 2018-07-30 19:15:23 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2018:1816