1863763 – Statefulset creates and deletes pod repeatedly (StatefulSet creates multiple controllerrevisions)

Bug 1863763 - Statefulset creates and deletes pod repeatedly (StatefulSet creates multiple controllerrevisions)

Summary: Statefulset creates and deletes pod repeatedly (StatefulSet creates multiple ...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	kube-controller-manager
Sub Component:
Version:	3.11.0
Hardware:	Unspecified
OS:	Unspecified
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	3.11.z
Assignee:	Tomáš Nožička
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-08-03 17:19 UTC by Christian Koep
Modified:	2023-12-15 18:41 UTC (History)
CC List:	4 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-09-16 07:46:50 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Github	openshift origin pull 25441	0	None	closed	Bug 1863763: Fix statefulset update loop	2021-02-02 20:23:22 UTC
Red Hat Product Errata	RHBA-2020:3695	0	None	None	None	2020-09-16 07:46:59 UTC

Description Christian Koep 2020-08-03 17:19:53 UTC

Description of problem:

- Customer reportedly hit the issue described in the following upstream issues in Red Hat OpenShift Container Platform 3.11.219.

  - https://github.com/openshift/origin/issues/17435
  - https://github.com/openshift/origin/pull/17513/files
  - https://github.com/kubernetes/kubernetes/issues/56355
  - https://github.com/kubernetes/kubernetes/issues/58347

Version-Release number of selected component (if applicable):

- Red Hat OpenShift Container Platform 3.11.219


How reproducible:

- Very hard to reproduce, no clear pattern visible yet.


Actual results:

- StatefulSet creates multiple controllerrevisions, which leads to continuous re-creation of pods.


Expected results:

- $Actual results does not happen.


Additional info:

- I'll attach additional information to the Bugzilla momentarily (privately)

Comment 23 RamaKasturi 2020-08-31 13:37:47 UTC

Tried to verify the bug with the below payload by following the steps in comment12 and when created multiple controllerrevisions do not find the issue anymore.

[root@knarra-311zmaster-etcd-nfs-1 ~]# oc version
oc v3.11.273
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://knarra-311zmaster-etcd-nfs-1:8443
openshift v3.11.273
kubernetes v1.11.0+d4cacc0

pods have been running for more than 2 hours:

[root@knarra-311zmaster-etcd-nfs-1 ~]# oc get pods
NAME                                           READY     STATUS    RESTARTS   AGE
alertmanager-main-0                            3/3       Running   0          6h
alertmanager-main-1                            3/3       Running   0          6h
alertmanager-main-2                            3/3       Running   0          6h
cluster-monitoring-operator-576c6b8b55-qphms   1/1       Running   0          6h
grafana-6dc585b845-wr64m                       2/2       Running   0          6h
kube-state-metrics-585c47c777-jtdvc            3/3       Running   0          6h
node-exporter-28jdn                            2/2       Running   0          6h
node-exporter-4mg8g                            2/2       Running   0          6h
node-exporter-chh6s                            2/2       Running   0          6h
prometheus-k8s-0                               4/4       Running   1          2h
prometheus-k8s-1                               4/4       Running   1          2h
prometheus-operator-754d586f64-789mf           1/1       Running   0          6h

And do not see any events related to the pods getting killed here:
======================================================================
[root@knarra-311zmaster-etcd-nfs-1 ~]# oc get events --sort-by='{.metadata.creationTimestamp}' --all-namespaces
NAMESPACE   LAST SEEN   FIRST SEEN   COUNT     NAME                                       KIND                   SUBOBJECT   TYPE      REASON           SOURCE                               MESSAGE
default     2m          6h           50        ansible-service-broker.163048c43fdd30df    ClusterServiceBroker               Normal    FetchedCatalog   service-catalog-controller-manager   Successfully fetched catalog entries from broker.
default     13s         6h           39        template-service-broker.163048e7897fe43a   ClusterServiceBroker               Normal    FetchedCatalog   service-catalog-controller-manager   Successfully fetched catalog entries from broker.

when the second revisioncontroller was created, could see the pods restarted once, after that did not see the pods getting restarted. Based on the above, moving the bug to verified state.

Comment 25 errata-xmlrpc 2020-09-16 07:46:50 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Container Platform 3.11.286 bug fix and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3695

Note You need to log in before you can comment on or make changes to this bug.