Bug 1863763

Summary: Statefulset creates and deletes pod repeatedly (StatefulSet creates multiple controllerrevisions)
Product: OpenShift Container Platform Reporter: Christian Koep <ckoep>
Component: kube-controller-managerAssignee: Tomáš Nožička <tnozicka>
Status: CLOSED ERRATA QA Contact: zhou ying <yinzhou>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.11.0CC: aos-bugs, knarra, mfojtik, tnozicka
Target Milestone: ---   
Target Release: 3.11.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-09-16 07:46:50 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Christian Koep 2020-08-03 17:19:53 UTC
Description of problem:

- Customer reportedly hit the issue described in the following upstream issues in Red Hat OpenShift Container Platform 3.11.219.

  - https://github.com/openshift/origin/issues/17435
  - https://github.com/openshift/origin/pull/17513/files
  - https://github.com/kubernetes/kubernetes/issues/56355
  - https://github.com/kubernetes/kubernetes/issues/58347

Version-Release number of selected component (if applicable):

- Red Hat OpenShift Container Platform 3.11.219


How reproducible:

- Very hard to reproduce, no clear pattern visible yet.


Actual results:

- StatefulSet creates multiple controllerrevisions, which leads to continuous re-creation of pods.


Expected results:

- $Actual results does not happen.


Additional info:

- I'll attach additional information to the Bugzilla momentarily (privately)

Comment 23 RamaKasturi 2020-08-31 13:37:47 UTC
Tried to verify the bug with the below payload by following the steps in comment12 and when created multiple controllerrevisions do not find the issue anymore.

[root@knarra-311zmaster-etcd-nfs-1 ~]# oc version
oc v3.11.273
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://knarra-311zmaster-etcd-nfs-1:8443
openshift v3.11.273
kubernetes v1.11.0+d4cacc0

pods have been running for more than 2 hours:

[root@knarra-311zmaster-etcd-nfs-1 ~]# oc get pods
NAME                                           READY     STATUS    RESTARTS   AGE
alertmanager-main-0                            3/3       Running   0          6h
alertmanager-main-1                            3/3       Running   0          6h
alertmanager-main-2                            3/3       Running   0          6h
cluster-monitoring-operator-576c6b8b55-qphms   1/1       Running   0          6h
grafana-6dc585b845-wr64m                       2/2       Running   0          6h
kube-state-metrics-585c47c777-jtdvc            3/3       Running   0          6h
node-exporter-28jdn                            2/2       Running   0          6h
node-exporter-4mg8g                            2/2       Running   0          6h
node-exporter-chh6s                            2/2       Running   0          6h
prometheus-k8s-0                               4/4       Running   1          2h
prometheus-k8s-1                               4/4       Running   1          2h
prometheus-operator-754d586f64-789mf           1/1       Running   0          6h

And do not see any events related to the pods getting killed here:
======================================================================
[root@knarra-311zmaster-etcd-nfs-1 ~]# oc get events --sort-by='{.metadata.creationTimestamp}' --all-namespaces
NAMESPACE   LAST SEEN   FIRST SEEN   COUNT     NAME                                       KIND                   SUBOBJECT   TYPE      REASON           SOURCE                               MESSAGE
default     2m          6h           50        ansible-service-broker.163048c43fdd30df    ClusterServiceBroker               Normal    FetchedCatalog   service-catalog-controller-manager   Successfully fetched catalog entries from broker.
default     13s         6h           39        template-service-broker.163048e7897fe43a   ClusterServiceBroker               Normal    FetchedCatalog   service-catalog-controller-manager   Successfully fetched catalog entries from broker.

when the second revisioncontroller was created, could see the pods restarted once, after that did not see the pods getting restarted. Based on the above, moving the bug to verified state.

Comment 25 errata-xmlrpc 2020-09-16 07:46:50 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.286 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3695