Bug 1925180
Summary: | Deployment creates a huge number of ReplicaSets - image-lookup bits | ||||||
---|---|---|---|---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | Maciej Szulik <maszulik> | ||||
Component: | kube-apiserver | Assignee: | Filip Krepinsky <fkrepins> | ||||
Status: | CLOSED DUPLICATE | QA Contact: | zhou ying <yinzhou> | ||||
Severity: | medium | Docs Contact: | |||||
Priority: | medium | ||||||
Version: | 4.6 | CC: | alkazako, aos-bugs, cjerolim, cvogt, fkrepins, mfojtik, mgugino, mjobanek, nmukherj, wking, xxia, yinzhou | ||||
Target Milestone: | --- | ||||||
Target Release: | 4.9.0 | ||||||
Hardware: | Unspecified | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | Bug Fix | |||||
Doc Text: |
Cause: When deployment and image stream is created at the same time race condition can occur in the deployment image resolution.
Consequence: deployment controller creates replica sets in infinite loop
Fix: responsibilities of apiserver's imagepolicy plugin were lowered
Result: concurrent creation of the deployment and image stream does not cause infinite replica sets anymore
|
Story Points: | --- | ||||
Clone Of: | 1921717 | ||||||
: | 1982717 (view as bug list) | Environment: | |||||
Last Closed: | 2021-10-18 17:29:03 UTC | Type: | --- | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Maciej Szulik
2021-02-04 14:35:45 UTC
This is to look into the annotation settings: "alpha.image.policy.openshift.io/resolve-names": "*", "image.openshift.io/triggers": "[{\"from\":{\"kind\":\"ImageStreamTag\",\"name\":\"golang-sample:latest\",\"namespace\":\"mjobanek-dev\"},\"fieldPath\":\"spec.template.spec.containers[?(@.name==\\\"golang-sample\\\")].image\",\"pause\":\"false\"}]", in comparison with: image.openshift.io/triggers: '[{"from":{"kind":"ImageStreamTag","name":"django-ex:latest"},"fieldPath":"spec.template.spec.containers[?(@.name==\"django-ex\")].image"}]' to ensure we don't shoot ourselves in the foot next time in the future. This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that. Hey @maszulik, I've been investigating this issue a little bit and you can find some notes in the origin issue, see https://bugzilla.redhat.com/show_bug.cgi?id=1921717#c17 You can find my scripts to reproduce this issue in the origin issue as well or on GitHub https://github.com/jerolimov/openshift/tree/master/issues/bz-1921717. They fails with two different issues. On the sandbox cluster they fails for me in 2/3 cases (when creating the resources parallel) - 7x was successful - 7x failed with Forbidden: this image is prohibited by policy: this image is prohibited by policy (changed after admission) - 6x failed with too much replica sets On a local CRC cluster the numbers are a little bit better, but they fails sometimes as well. - 10x was successful - 1x failed with Forbidden: this image is prohibited by policy: this image is prohibited by policy (changed after admission) - 1x failed with too much replica sets (Created over 1000 ReplicaSets within minutes!) I'm sure you can drop some of the resources (the Secrets, Route and Service), but I wanted a script which reproduce the complete console api calls. If you have any question please feel free to contact me here or in Slack. The LifecycleStale keyword was removed because the bug got commented on recently. The bug assignee was notified. This bug hasn't had any activity in the last 30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale" and decreasing the severity/priority. If you have further information on the current state of the bug, please update it, otherwise this bug can be closed in about 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. Additionally, you can add LifecycleFrozen into Keywords if you think this bug should never be marked as stale. Please consult with bug assignee before you do that. Sample deployment that triggered this problem is https://github.com/openshift/hypershift/pull/207/files#diff-cb5a6beecbd454946c5c62ee50feffee25a9680a6909477c5f3999d10cf1ebd0 Created attachment 1788551 [details]
reproduce.sh
I uploaded minimal reproducible example (based on @cjerolim repo). It didn't reproduce for me in default namespace, but in test namespace. The reproducibility varies - it is between 10-70%. The results observed by @cjerolim are caused by the following Scenario 1 and Scenario 2. The bug itself is originating in kube-apiserver admission plugin, but ultimately it is facilitated by these 3 component (kube-apiserver, kube-controller-manager and openshift-apiserver) and the following edge case scenarios: Scenario 1 Deployment D admitted by apiserver (has original image) Deployment D verified by apiserver (has original image) Deployment D created ImageStream I created Deployment D started syncing Deployment D created ReplicaSet R (both have original image tag) ReplicaSet R admitted by apiserver (changes image to a newer version according to the ImageStream I) ReplicaSet R verified by apiserver (has new image) ReplicaSet R created Deployment finished syncing and failed due to receiving new replica Deployment D starts syncing again Deployment D compares ReplicaSet R with its pod spec and they do not equal Deployment D decides to create a new ReplicaSet R2 (both have original image tag) ReplicaSet R2 admitted by apiserver (changes image to a newer version according to the ImageStream I) ReplicaSet R2 verified by apiserver (has new image) Deployment finished syncing and failed due to receiving new replica Deployment D starts syncing again ... Additionaly Deployment D will never have the correct image tag according to the ImageStream I, because the annotation image.openshift.io/triggers in Deployment D is in a wrong format and will never resolve in openshift-apiserver. So the kube-controller-manager and kube-apiserver will keep reconciling these incompatible states forever Scenario 2 Deployment A admitted by apiserver (has original image tag) ImageStream I created Deployment A is getting verified by apiserver (wants to change image according to the ImageStream I in verify) apiserver -> Forbidden: this image is prohibited by policy: this image is prohibited by policy (changed after admission) Scenario 3 ImageStream I created Deployment D admitted (changes image to a newer version according to the ImageStream I) all the logic works fine... Solution - We can prevent touching references that are owned by controllers when admitting them. And only set correct images for the parents. This will give time to the controllers to react to these changes and eventually the correct images will be set in children as well. Fixes Scenario 1. - We can skip verify when race occurs as in Scenario 2. This will make the admission atomic and will be similar to a flow where ImageStream I gets created just right after verify. - We could also check format of image.openshift.io/triggers (JSON validation) to prevent resources admition with incorrectly set annoation. Although I am not sure if this is a correct way and if annotations should affect admission. This would go to another BZ. moving to kube-apiserver - I posted a fix for both scenarios to apiserver-library-go for i in {100..1100}; do oc new-project test$i; /tmp/broken.sh ; oc delete project test$i ; done Run the loop , can't reproduce this issue , will move to verified status . [root@localhost ~]# oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.9.0-0.nightly-2021-07-12-203753 True False 25h Cluster version is 4.9.0-0.nightly-2021-07-12-203753 Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759 *** This bug has been marked as a duplicate of bug 1976775 *** |