Description of problem: - Customer reportedly hit the issue described in the following upstream issues in Red Hat OpenShift Container Platform 3.11.219. - https://github.com/openshift/origin/issues/17435 - https://github.com/openshift/origin/pull/17513/files - https://github.com/kubernetes/kubernetes/issues/56355 - https://github.com/kubernetes/kubernetes/issues/58347 Version-Release number of selected component (if applicable): - Red Hat OpenShift Container Platform 3.11.219 How reproducible: - Very hard to reproduce, no clear pattern visible yet. Actual results: - StatefulSet creates multiple controllerrevisions, which leads to continuous re-creation of pods. Expected results: - $Actual results does not happen. Additional info: - I'll attach additional information to the Bugzilla momentarily (privately)
Tried to verify the bug with the below payload by following the steps in comment12 and when created multiple controllerrevisions do not find the issue anymore. [root@knarra-311zmaster-etcd-nfs-1 ~]# oc version oc v3.11.273 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://knarra-311zmaster-etcd-nfs-1:8443 openshift v3.11.273 kubernetes v1.11.0+d4cacc0 pods have been running for more than 2 hours: [root@knarra-311zmaster-etcd-nfs-1 ~]# oc get pods NAME READY STATUS RESTARTS AGE alertmanager-main-0 3/3 Running 0 6h alertmanager-main-1 3/3 Running 0 6h alertmanager-main-2 3/3 Running 0 6h cluster-monitoring-operator-576c6b8b55-qphms 1/1 Running 0 6h grafana-6dc585b845-wr64m 2/2 Running 0 6h kube-state-metrics-585c47c777-jtdvc 3/3 Running 0 6h node-exporter-28jdn 2/2 Running 0 6h node-exporter-4mg8g 2/2 Running 0 6h node-exporter-chh6s 2/2 Running 0 6h prometheus-k8s-0 4/4 Running 1 2h prometheus-k8s-1 4/4 Running 1 2h prometheus-operator-754d586f64-789mf 1/1 Running 0 6h And do not see any events related to the pods getting killed here: ====================================================================== [root@knarra-311zmaster-etcd-nfs-1 ~]# oc get events --sort-by='{.metadata.creationTimestamp}' --all-namespaces NAMESPACE LAST SEEN FIRST SEEN COUNT NAME KIND SUBOBJECT TYPE REASON SOURCE MESSAGE default 2m 6h 50 ansible-service-broker.163048c43fdd30df ClusterServiceBroker Normal FetchedCatalog service-catalog-controller-manager Successfully fetched catalog entries from broker. default 13s 6h 39 template-service-broker.163048e7897fe43a ClusterServiceBroker Normal FetchedCatalog service-catalog-controller-manager Successfully fetched catalog entries from broker. when the second revisioncontroller was created, could see the pods restarted once, after that did not see the pods getting restarted. Based on the above, moving the bug to verified state.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Container Platform 3.11.286 bug fix and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3695