Description of problem: A problematic Deployment with a invalid `ImageStream` reference can cause an infinite number of Replicasets and if Replicasts are not limited via Quota it can quickly reach the etcd quota limit thus bringing down the entire OpenShift Container Platform - Cluster. Version-Release number of selected component (if applicable): - OpenShift Container Platform 4.7.13 How reproducible: - Always Steps to Reproduce: 1. See details in Comment #1 Actual results: Infinite number of ReplicaSets are created, causing at some point in time that etcd quota of 8 GB is reached and OpenShift Container Platform 4 will eventually go down. Expected results: Prevent the creation of the infinite ReplicaSets and report the problem about the invalid `ImageStream`. Additional info: Using `count/replicasets.apps` in Quota to limit the numbers of ReplicaSets created will prevent sever issues. But that might not be applied in all cases and the overall behavior is considered a bug.
FYI, I filed this moments ago as https://bugzilla.redhat.com/show_bug.cgi?id=1976774
*** Bug 1976774 has been marked as a duplicate of this bug. ***
I think it's potentially similar to https://bugzilla.redhat.com/show_bug.cgi?id=1925180. Filip can you check if your fix will solve this problem as well?
I am not aware of such option, unless you want to tinker with the images. Anyway, I will try to reproduce this and will let you know if the fix works or if anything else can be done.
The issue is similar in some aspects and the fix in https://bugzilla.redhat.com/show_bug.cgi?id=1925180 indeed works. Basically, the deployment image stays unresolved because the imagestream was not available on admission (and is not updated because there is no image.openshift.io/triggers annotation). Once the image stream is available, each new ReplicaSet is updated with a new image and this cause tug of war between deployment controller and the apiserver admission.
> Are we planning to back-port this to OpenShift Container Platform 4.7 since Bug https://bugzilla.redhat.com/show_bug.cgi?id=1925180 is set for 4.8 and I don't see any cherry-pick for 4.7 or similar.
(In reply to Filip Krepinsky from comment #12) > > Are we planning to back-port this to OpenShift Container Platform 4.7 since Bug https://bugzilla.redhat.com/show_bug.cgi?id=1925180 is set for 4.8 and I don't see any cherry-pick for 4.7 or similar. Yeah, I think it's reasonable to backport this all the way back to 4.6 even.
reproduce with ocp4.8: [root@localhost ~]# oc version Client Version: 4.8.0-rc.3 Server Version: 4.8.0-0.nightly-2021-07-09-181248 Kubernetes Version: v1.21.1+f36aa36 [root@localhost ~]# oc create -f /tmp/depl.yaml deployment.apps/i-spawn-replicas created [root@localhost ~]# oc get deploy NAME READY UP-TO-DATE AVAILABLE AGE i-spawn-replicas 0/0 0 0 7s [root@localhost ~]# oc get rs NAME DESIRED CURRENT READY AGE i-spawn-replicas-84778cc586 0 0 0 14s [root@localhost ~]# oc get deployment i-spawn-replicas -o json | jq '.spec' { "progressDeadlineSeconds": 600, "replicas": 0, "revisionHistoryLimit": 10, "selector": { "matchLabels": { "app": "i-spawn-replicas" } }, "strategy": { "rollingUpdate": { "maxSurge": "25%", "maxUnavailable": "25%" }, "type": "RollingUpdate" }, "template": { "metadata": { "creationTimestamp": null, "labels": { "app": "i-spawn-replicas" }, "name": "i-spawn-replicas" }, "spec": { "containers": [ { "image": "imagestreamname", "imagePullPolicy": "Always", "name": "container1", "resources": {}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File" } ], "dnsPolicy": "ClusterFirst", "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "terminationGracePeriodSeconds": 30 } } } [root@localhost ~]# oc get rs i-spawn-replicas-84778cc586 -o json | jq '.spec' { "replicas": 0, "selector": { "matchLabels": { "app": "i-spawn-replicas", "pod-template-hash": "84778cc586" } }, "template": { "metadata": { "creationTimestamp": null, "labels": { "app": "i-spawn-replicas", "pod-template-hash": "84778cc586" }, "name": "i-spawn-replicas" }, "spec": { "containers": [ { "image": "imagestreamname", "imagePullPolicy": "Always", "name": "container1", "resources": {}, "terminationMessagePath": "/dev/termination-log", "terminationMessagePolicy": "File" } ], "dnsPolicy": "ClusterFirst", "restartPolicy": "Always", "schedulerName": "default-scheduler", "securityContext": {}, "terminationGracePeriodSeconds": 30 } } } [root@localhost ~]# oc create -f /tmp/is.yaml imagestream.image.openshift.io/imagestreamname created [root@localhost ~]# oc get imagestream.image.openshift.io/imagestreamname -o json | jq '.spec' { "lookupPolicy": { "local": true } } [root@localhost ~]# oc patch deployment i-spawn-replicas --type merge --patch "$(cat /tmp/patch.yaml)" deployment.apps/i-spawn-replicas patched [root@localhost ~]# oc get rs | wc -l 143 [root@localhost ~]# oc get rs | wc -l 250 [root@localhost ~]# oc get rs | wc -l 292
can't reproduce with ocp4.9: [root@localhost ~]# oc create -f /tmp/depl.yaml deployment.apps/i-spawn-replicas created [root@localhost ~]# oc get deploy NAME READY UP-TO-DATE AVAILABLE AGE i-spawn-replicas 0/0 0 0 5s [root@localhost ~]# oc get rs NAME DESIRED CURRENT READY AGE i-spawn-replicas-84778cc586 0 0 0 8s [root@localhost ~]# oc get rs -o yaml apiVersion: v1 items: - apiVersion: apps/v1 kind: ReplicaSet metadata: annotations: deployment.kubernetes.io/desired-replicas: "0" deployment.kubernetes.io/max-replicas: "0" deployment.kubernetes.io/revision: "1" creationTimestamp: "2021-07-13T03:30:51Z" generation: 1 labels: app: i-spawn-replicas pod-template-hash: 84778cc586 name: i-spawn-replicas-84778cc586 namespace: zhouyt ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: Deployment name: i-spawn-replicas uid: e4c6dbe0-eac2-4dec-a6a6-59cb2474c10e resourceVersion: "81811" uid: f145b74f-47ed-4c19-a2e1-688d2aaff0c9 spec: replicas: 0 selector: matchLabels: app: i-spawn-replicas pod-template-hash: 84778cc586 template: metadata: creationTimestamp: null labels: app: i-spawn-replicas pod-template-hash: 84778cc586 name: i-spawn-replicas spec: containers: - image: imagestreamname imagePullPolicy: Always name: container1 resources: {} terminationMessagePath: /dev/termination-log terminationMessagePolicy: File dnsPolicy: ClusterFirst restartPolicy: Always schedulerName: default-scheduler securityContext: {} terminationGracePeriodSeconds: 30 status: observedGeneration: 1 replicas: 0 kind: List metadata: resourceVersion: "" selfLink: "" [root@localhost ~]# vi /tmp/is.yaml [root@localhost ~]# oc create -f /tmp/is.yaml imagestream.image.openshift.io/imagestreamname created [root@localhost ~]# oc get is NAME IMAGE REPOSITORY TAGS UPDATED imagestreamname image-registry.openshift-image-registry.svc:5000/zhouyt/imagestreamname [root@localhost ~]# oc get is -o yaml apiVersion: v1 items: - apiVersion: image.openshift.io/v1 kind: ImageStream metadata: creationTimestamp: "2021-07-13T03:32:22Z" generation: 1 name: imagestreamname namespace: zhouyt resourceVersion: "82442" uid: 308a2e4a-ee1c-4504-b934-f66effb73665 spec: lookupPolicy: local: true status: dockerImageRepository: image-registry.openshift-image-registry.svc:5000/zhouyt/imagestreamname kind: List metadata: resourceVersion: "" selfLink: "" [root@localhost ~]# oc get is -o json |jq '.spec' null [root@localhost ~]# oc get is NAME IMAGE REPOSITORY TAGS UPDATED imagestreamname image-registry.openshift-image-registry.svc:5000/zhouyt/imagestreamname [root@localhost ~]# oc get is imagestreamname -o json |jq '.spec' { "lookupPolicy": { "local": true } } [root@localhost ~]# oc get rs NAME DESIRED CURRENT READY AGE i-spawn-replicas-84778cc586 0 0 0 3m12s [root@localhost ~]# vi /tmp/patch.yaml [root@localhost ~]# oc get deploy NAME READY UP-TO-DATE AVAILABLE AGE i-spawn-replicas 0/0 0 0 3m55s [root@localhost ~]# oc patch deployment i-spawn-replicas --type merge --patch "$(cat /tmp/patch.yaml)" deployment.apps/i-spawn-replicas patched [root@localhost ~]# oc get rs NAME DESIRED CURRENT READY AGE i-spawn-replicas-5677bfc44d 0 0 0 10s i-spawn-replicas-84778cc586 0 0 0 4m34s [root@localhost ~]# oc get rs |wc -l 3 [root@localhost ~]# oc get rs |wc -l 3 [root@localhost ~]# oc get rs |wc -l 3 [root@localhost ~]# oc get rs |wc -l 3 [root@localhost ~]# oc get rs |wc -l 3 [root@localhost ~]# oc get rs |wc -l 3 [root@localhost ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-07-12-143404 True False 4h24m Cluster version is 4.9.0-0.nightly-2021-07-12-143404
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759
*** Bug 1925180 has been marked as a duplicate of this bug. ***