Bug 1976775 - Problematic Deployment creates infinite number Replicasets causing etcd to reach quota limit
Summary: Problematic Deployment creates infinite number Replicasets causing etcd to re...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: kube-apiserver
Version: 4.7
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
: 4.9.0
Assignee: Filip Krepinsky
QA Contact: zhou ying
URL:
Whiteboard:
: 1925180 1976774 (view as bug list)
Depends On:
Blocks: 1981770
TreeView+ depends on / blocked
 
Reported: 2021-06-28 08:19 UTC by Simon Reber
Modified: 2024-12-20 20:21 UTC (History)
7 users (show)

Fixed In Version:
Doc Type: Bug Fix
Doc Text:
Cause: caused if deployment is created with unresolved image (no image stream) and no image.openshift.io/triggers annotation. By creating image stream and subsequently patching our deployment, new image resolution will occur on new replica sets. This will result in inconsistent state between deployment controller and apiserver's imagepolicy plugin. Consequence: deployment controller creates replica sets in infinite loop Fix: responsibilities of apiserver's imagepolicy plugin were lowered Result: inconsistent image resolution should not occur in the deployments and thus it should not cause creation of infinite replica sets anymore
Clone Of:
: 1981770 1981775 (view as bug list)
Environment:
Last Closed: 2021-10-18 17:36:53 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github openshift apiserver-library-go pull 50 0 None closed Bug 1925180: fix Deployment creates a huge number of ReplicaSets 2021-07-06 14:18:00 UTC
Github openshift kubernetes pull 846 0 None closed UPSTREAM: <drop>: bump(apiserver-library-go) 2021-07-08 15:39:02 UTC
Red Hat Product Errata RHSA-2021:3759 0 None None None 2021-10-18 17:37:10 UTC

Description Simon Reber 2021-06-28 08:19:46 UTC
Description of problem:

A problematic Deployment with a invalid `ImageStream` reference can cause an infinite number of Replicasets and if Replicasts are not limited via Quota it can quickly reach the etcd quota limit thus bringing down the entire OpenShift Container Platform - Cluster.

Version-Release number of selected component (if applicable):

 - OpenShift Container Platform 4.7.13


How reproducible:

 - Always

Steps to Reproduce:
1. See details in Comment #1

Actual results:

Infinite number of ReplicaSets are created, causing at some point in time that etcd quota of 8 GB is reached and OpenShift Container Platform 4 will eventually go down.

Expected results:

Prevent the creation of the infinite ReplicaSets and report the problem about the invalid `ImageStream`.

Additional info:

Using `count/replicasets.apps` in Quota to limit the numbers of ReplicaSets created will prevent sever issues. But that might not be applied in all cases and the overall behavior is considered a bug.

Comment 3 Dave Baker 2021-06-28 08:32:30 UTC
FYI, I filed this moments ago as https://bugzilla.redhat.com/show_bug.cgi?id=1976774

Comment 6 Dave Baker 2021-06-28 09:17:10 UTC
*** Bug 1976774 has been marked as a duplicate of this bug. ***

Comment 7 Maciej Szulik 2021-06-28 13:54:07 UTC
I think it's potentially similar to https://bugzilla.redhat.com/show_bug.cgi?id=1925180. Filip can you check if your fix will solve this problem as well?

Comment 9 Filip Krepinsky 2021-06-28 15:18:39 UTC
I am not aware of such option, unless you want to tinker with the images. Anyway, I will try to reproduce this and will let you know if the fix works or if anything else can be done.

Comment 10 Filip Krepinsky 2021-06-29 14:50:29 UTC
The issue is similar in some aspects and the fix in https://bugzilla.redhat.com/show_bug.cgi?id=1925180 indeed works.

Basically, the deployment image stays unresolved because the imagestream was not available on admission (and is not updated because there is no image.openshift.io/triggers annotation). Once the image stream is available, each new ReplicaSet is updated with a new image and this cause tug of war between deployment controller and the apiserver admission.

Comment 12 Filip Krepinsky 2021-07-02 08:47:20 UTC
> Are we planning to back-port this to OpenShift Container Platform 4.7 since Bug https://bugzilla.redhat.com/show_bug.cgi?id=1925180 is set for 4.8 and I don't see any cherry-pick for 4.7 or similar.

Comment 13 Maciej Szulik 2021-07-02 14:50:10 UTC
(In reply to Filip Krepinsky from comment #12)
> > Are we planning to back-port this to OpenShift Container Platform 4.7 since Bug https://bugzilla.redhat.com/show_bug.cgi?id=1925180 is set for 4.8 and I don't see any cherry-pick for 4.7 or similar.

Yeah, I think it's reasonable to backport this all the way back to 4.6 even.

Comment 15 zhou ying 2021-07-13 05:55:06 UTC
reproduce with ocp4.8:
[root@localhost ~]# oc version 
Client Version: 4.8.0-rc.3
Server Version: 4.8.0-0.nightly-2021-07-09-181248
Kubernetes Version: v1.21.1+f36aa36


[root@localhost ~]# oc create -f /tmp/depl.yaml 
deployment.apps/i-spawn-replicas created
[root@localhost ~]# oc get deploy
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
i-spawn-replicas   0/0     0            0           7s
[root@localhost ~]# oc get rs
NAME                          DESIRED   CURRENT   READY   AGE
i-spawn-replicas-84778cc586   0         0         0       14s


[root@localhost ~]# oc get deployment i-spawn-replicas -o json | jq '.spec'
{
  "progressDeadlineSeconds": 600,
  "replicas": 0,
  "revisionHistoryLimit": 10,
  "selector": {
    "matchLabels": {
      "app": "i-spawn-replicas"
    }
  },
  "strategy": {
    "rollingUpdate": {
      "maxSurge": "25%",
      "maxUnavailable": "25%"
    },
    "type": "RollingUpdate"
  },
  "template": {
    "metadata": {
      "creationTimestamp": null,
      "labels": {
        "app": "i-spawn-replicas"
      },
      "name": "i-spawn-replicas"
    },
    "spec": {
      "containers": [
        {
          "image": "imagestreamname",
          "imagePullPolicy": "Always",
          "name": "container1",
          "resources": {},
          "terminationMessagePath": "/dev/termination-log",
          "terminationMessagePolicy": "File"
        }
      ],
      "dnsPolicy": "ClusterFirst",
      "restartPolicy": "Always",
      "schedulerName": "default-scheduler",
      "securityContext": {},
      "terminationGracePeriodSeconds": 30
    }
  }
}
[root@localhost ~]# oc get rs i-spawn-replicas-84778cc586 -o json | jq '.spec'
{
  "replicas": 0,
  "selector": {
    "matchLabels": {
      "app": "i-spawn-replicas",
      "pod-template-hash": "84778cc586"
    }
  },
  "template": {
    "metadata": {
      "creationTimestamp": null,
      "labels": {
        "app": "i-spawn-replicas",
        "pod-template-hash": "84778cc586"
      },
      "name": "i-spawn-replicas"
    },
    "spec": {
      "containers": [
        {
          "image": "imagestreamname",
          "imagePullPolicy": "Always",
          "name": "container1",
          "resources": {},
          "terminationMessagePath": "/dev/termination-log",
          "terminationMessagePolicy": "File"
        }
      ],
      "dnsPolicy": "ClusterFirst",
      "restartPolicy": "Always",
      "schedulerName": "default-scheduler",
      "securityContext": {},
      "terminationGracePeriodSeconds": 30
    }
  }
}
[root@localhost ~]# oc create -f /tmp/is.yaml
imagestream.image.openshift.io/imagestreamname created
[root@localhost ~]# oc get imagestream.image.openshift.io/imagestreamname -o json | jq '.spec'
{
  "lookupPolicy": {
    "local": true
  }
}
[root@localhost ~]# oc patch deployment i-spawn-replicas --type merge --patch "$(cat /tmp/patch.yaml)"
deployment.apps/i-spawn-replicas patched
[root@localhost ~]# oc get rs | wc -l
143
[root@localhost ~]# oc get rs | wc -l
250
[root@localhost ~]# oc get rs | wc -l
292

Comment 16 zhou ying 2021-07-13 05:57:12 UTC
can't reproduce with ocp4.9:


[root@localhost ~]# oc create -f /tmp/depl.yaml 
deployment.apps/i-spawn-replicas created
[root@localhost ~]# oc get deploy
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
i-spawn-replicas   0/0     0            0           5s
[root@localhost ~]# oc get rs 
NAME                          DESIRED   CURRENT   READY   AGE
i-spawn-replicas-84778cc586   0         0         0       8s
[root@localhost ~]# oc get rs -o yaml 
apiVersion: v1
items:
- apiVersion: apps/v1
  kind: ReplicaSet
  metadata:
    annotations:
      deployment.kubernetes.io/desired-replicas: "0"
      deployment.kubernetes.io/max-replicas: "0"
      deployment.kubernetes.io/revision: "1"
    creationTimestamp: "2021-07-13T03:30:51Z"
    generation: 1
    labels:
      app: i-spawn-replicas
      pod-template-hash: 84778cc586
    name: i-spawn-replicas-84778cc586
    namespace: zhouyt
    ownerReferences:
    - apiVersion: apps/v1
      blockOwnerDeletion: true
      controller: true
      kind: Deployment
      name: i-spawn-replicas
      uid: e4c6dbe0-eac2-4dec-a6a6-59cb2474c10e
    resourceVersion: "81811"
    uid: f145b74f-47ed-4c19-a2e1-688d2aaff0c9
  spec:
    replicas: 0
    selector:
      matchLabels:
        app: i-spawn-replicas
        pod-template-hash: 84778cc586
    template:
      metadata:
        creationTimestamp: null
        labels:
          app: i-spawn-replicas
          pod-template-hash: 84778cc586
        name: i-spawn-replicas
      spec:
        containers:
        - image: imagestreamname
          imagePullPolicy: Always
          name: container1
          resources: {}
          terminationMessagePath: /dev/termination-log
          terminationMessagePolicy: File
        dnsPolicy: ClusterFirst
        restartPolicy: Always
        schedulerName: default-scheduler
        securityContext: {}
        terminationGracePeriodSeconds: 30
  status:
    observedGeneration: 1
    replicas: 0
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
[root@localhost ~]# vi /tmp/is.yaml
[root@localhost ~]# oc create -f /tmp/is.yaml 
imagestream.image.openshift.io/imagestreamname created
[root@localhost ~]# oc get is 
NAME              IMAGE REPOSITORY                                                          TAGS   UPDATED
imagestreamname   image-registry.openshift-image-registry.svc:5000/zhouyt/imagestreamname          
[root@localhost ~]# oc get is -o yaml 
apiVersion: v1
items:
- apiVersion: image.openshift.io/v1
  kind: ImageStream
  metadata:
    creationTimestamp: "2021-07-13T03:32:22Z"
    generation: 1
    name: imagestreamname
    namespace: zhouyt
    resourceVersion: "82442"
    uid: 308a2e4a-ee1c-4504-b934-f66effb73665
  spec:
    lookupPolicy:
      local: true
  status:
    dockerImageRepository: image-registry.openshift-image-registry.svc:5000/zhouyt/imagestreamname
kind: List
metadata:
  resourceVersion: ""
  selfLink: ""
[root@localhost ~]# oc get is -o json |jq '.spec'
null
[root@localhost ~]# oc get is 
NAME              IMAGE REPOSITORY                                                          TAGS   UPDATED
imagestreamname   image-registry.openshift-image-registry.svc:5000/zhouyt/imagestreamname          
[root@localhost ~]# oc get is imagestreamname -o json |jq '.spec'
{
  "lookupPolicy": {
    "local": true
  }
}
[root@localhost ~]# oc get rs 
NAME                          DESIRED   CURRENT   READY   AGE
i-spawn-replicas-84778cc586   0         0         0       3m12s
[root@localhost ~]# vi /tmp/patch.yaml
[root@localhost ~]# oc get deploy
NAME               READY   UP-TO-DATE   AVAILABLE   AGE
i-spawn-replicas   0/0     0            0           3m55s
[root@localhost ~]# oc patch deployment i-spawn-replicas  --type merge --patch "$(cat /tmp/patch.yaml)"
deployment.apps/i-spawn-replicas patched
[root@localhost ~]# oc get rs 
NAME                          DESIRED   CURRENT   READY   AGE
i-spawn-replicas-5677bfc44d   0         0         0       10s
i-spawn-replicas-84778cc586   0         0         0       4m34s
[root@localhost ~]# oc get rs |wc -l
3
[root@localhost ~]# oc get rs |wc -l
3
[root@localhost ~]# oc get rs |wc -l
3
[root@localhost ~]# oc get rs |wc -l
3
[root@localhost ~]# oc get rs |wc -l
3
[root@localhost ~]# oc get rs |wc -l
3


[root@localhost ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.9.0-0.nightly-2021-07-12-143404   True        False         4h24m   Cluster version is 4.9.0-0.nightly-2021-07-12-143404

Comment 19 errata-xmlrpc 2021-10-18 17:36:53 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2021:3759

Comment 20 Scott Dodson 2021-10-21 17:48:13 UTC
*** Bug 1925180 has been marked as a duplicate of this bug. ***


Note You need to log in before you can comment on or make changes to this bug.