Bug 1686838
Summary: | clarify behaviour of --force and --cascade in oc replace | ||
---|---|---|---|
Product: | OpenShift Container Platform | Reporter: | daniel <dmoessne> |
Component: | oc | Assignee: | Maciej Szulik <maszulik> |
Status: | CLOSED ERRATA | QA Contact: | zhou ying <yinzhou> |
Severity: | medium | Docs Contact: | |
Priority: | low | ||
Version: | 3.11.0 | CC: | aos-bugs, dmoessne, jokerman, mfojtik, mmccomas, xxia, yinzhou |
Target Milestone: | --- | ||
Target Release: | 3.11.z | ||
Hardware: | Unspecified | ||
OS: | Linux | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | Bug Fix | |
Doc Text: |
Cause:
Deployment Config controller had broken adoption mechanism responsible for identifying owned replication controllers.
Consequence:
oc replace without --force was seeing misbehavior.
Fix:
Fix the adoption mechanism.
Result:
oc replace should properly remove dependent objects.
|
Story Points: | --- |
Clone Of: | Environment: | ||
Last Closed: | 2020-06-17 20:21:25 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
daniel
2019-03-08 13:16:06 UTC
In 3.11 the deletion of dependant is happening on the server, it's garbage collection responsibility. Client marks object for removal including the decision how to deal with its dependants (pods and replication controllers for deployments). Then the GC controller periodically looks for those objects and removes them. If the dependants tree is small the time between oc delete and resources being gone should be relatively small, but with bigger trees it might make same time. Can you verify if the objects will actually be removed? Also since you're invoking oc replace you might not see the moment when old objects are removed and new created in place of the old ones. Finally, there's a question if GC is working as it should, which can be easily verified through oc delete with the exact same set of flags as you use for oc replace, since they share the code responsible for removing objects. (In reply to Maciej Szulik from comment #1) > In 3.11 the deletion of dependant is happening on the server, it's garbage > collection responsibility. > Client marks object for removal including the decision how to deal with its > dependants (pods and > replication controllers for deployments). Then the GC controller > periodically looks for those objects > and removes them. If the dependants tree is small the time between oc delete > and resources being gone > should be relatively small, but with bigger trees it might make same time. > Can you verify if the objects > will actually be removed? Well, I run a # oc replace -f deployment-example.yaml last Friday afternoon and checking some minutes ago, still shows everything, i.e. all dc versions, rc versions while the expectation would have been as --cascade=true per manepage anyway would do that. And waiting roughly >50h should actually be sufficient, or am I missing something ? > Also since you're invoking oc replace you might > not see the moment when old > objects are removed and new created in place of the old ones. Well, but again, from manpage --cascade=true is set per default which should as per mapage delete all dependant (dc/rc/pods) resources. Pods are newly started, that's fine, but at least my understanding is that dc and rc versions are deleted as well. This works if I do run `$ oc replace --force=true -f deployment-example.yaml`. All dc and rc versions are gone. But from how I understand the man page, the force should not be necessary to get those removed. But perhaps I misunderstood the man page ? ~~~~ [quicklab@master-0 dc-test]$ oc replace -f deployment-example.yaml deploymentconfig.apps.openshift.io/deployment-example replaced [quicklab@master-0 dc-test]$ [quicklab@master-0 dc-test]$ oc get all NAME READY STATUS RESTARTS AGE pod/deployment-example-11-mgwtc 1/1 Running 0 5m NAME DESIRED CURRENT READY AGE replicationcontroller/deployment-example-1 0 0 0 11m replicationcontroller/deployment-example-10 0 0 0 6m replicationcontroller/deployment-example-11 1 1 1 5m replicationcontroller/deployment-example-2 0 0 0 11m replicationcontroller/deployment-example-3 0 0 0 10m replicationcontroller/deployment-example-4 0 0 0 10m replicationcontroller/deployment-example-5 0 0 0 9m replicationcontroller/deployment-example-6 0 0 0 8m replicationcontroller/deployment-example-7 0 0 0 7m replicationcontroller/deployment-example-8 0 0 0 7m replicationcontroller/deployment-example-9 0 0 0 7m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/deployment-example ClusterIP 172.30.139.165 <none> 8080/TCP 11m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/deployment-example 11 1 1 config,image(deployment-example:latest) NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/deployment-example docker-registry.default.svc:5000/test/deployment-example latest 11 minutes ago ----- [quicklab@master-0 dc-test]$ oc replace --force=true -f deployment-example.yaml deploymentconfig.apps.openshift.io "deployment-example" deleted deploymentconfig.apps.openshift.io/deployment-example replaced [quicklab@master-0 dc-test]$ [quicklab@master-0 dc-test]$ oc get all NAME READY STATUS RESTARTS AGE pod/deployment-example-1-deploy 0/1 ContainerCreating 0 <invalid> pod/deployment-example-11-mgwtc 1/1 Terminating 0 6m NAME DESIRED CURRENT READY AGE replicationcontroller/deployment-example-1 0 0 0 <invalid> NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/deployment-example ClusterIP 172.30.139.165 <none> 8080/TCP 13m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/deployment-example 1 1 0 config,image(deployment-example:latest) NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/deployment-example docker-registry.default.svc:5000/test/deployment-example latest 13 minutes ago [quicklab@master-0 dc-test]$ ~~~~ So expectation on the above would be the same result, possibly with force a bit faster, but still all dc,rc versions cleaned up > Finally, there's a question if GC is > working as it should, which can be easily verified through oc delete with > the exact same set of > flags as you use for oc replace, since they share the code responsible for > removing objects. Well, it seems to behave as intended : ~~~ [quicklab@master-0 dc-test]$ oc get all NAME READY STATUS RESTARTS AGE pod/deployment-example-10-p9jch 1/1 Running 0 1m NAME DESIRED CURRENT READY AGE replicationcontroller/deployment-example-1 0 0 0 5m replicationcontroller/deployment-example-10 1 1 1 1m replicationcontroller/deployment-example-2 0 0 0 4m replicationcontroller/deployment-example-3 0 0 0 4m replicationcontroller/deployment-example-4 0 0 0 3m replicationcontroller/deployment-example-5 0 0 0 3m replicationcontroller/deployment-example-6 0 0 0 3m replicationcontroller/deployment-example-7 0 0 0 2m replicationcontroller/deployment-example-8 0 0 0 2m replicationcontroller/deployment-example-9 0 0 0 1m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/deployment-example ClusterIP 172.30.165.227 <none> 8080/TCP 5m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/deployment-example 10 1 1 config,image(deployment-example:latest) NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/deployment-example docker-registry.default.svc:5000/test/deployment-example latest 5 minutes ago [quicklab@master-0 dc-test]$ [quicklab@master-0 dc-test]$ [quicklab@master-0 dc-test]$ oc get -o yaml dc/deployment-example --export > deployment-example.yaml [quicklab@master-0 dc-test]$ [quicklab@master-0 dc-test]$ oc delete -f deployment-example.yaml deploymentconfig.apps.openshift.io "deployment-example" deleted [quicklab@master-0 dc-test]$ oc get all NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/deployment-example ClusterIP 172.30.165.227 <none> 8080/TCP 6m NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/deployment-example docker-registry.default.svc:5000/test/deployment-example latest 6 minutes ago [quicklab@master-0 dc-test]$ ~~~ se here all dc and rc versions are removed *** Bug 1687902 has been marked as a duplicate of this bug. *** So it looks like there was a bug in the DC adoption mechanism which might have caused this issue. See https://bugzilla.redhat.com/show_bug.cgi?id=1620608 and https://github.com/openshift/origin/pull/22324 for fix. Moving to qa since that merged already. (In reply to Xingxing Xia from comment #6) > 2. I read carefully this bug and bug 1687902, the reported behavior is > expected IMO (see next comment). When `--force=true` is used, the command will delete the DC, if --cascade=true is also set (the default), the command will also delete DC's managed resources too, and new DC is created. if --cascade=false is set, the deleted DC's managed resources remain, and new DC is created. When `--force=false` is used, the command doesn't differ no matter --cascade is true or false. Whether "we get an additional" RC (as mentioned in below reference) depends on whether deployment-example.yaml modified the pod template (i.e. the part under .spec.template in the DC yaml) or not. If yes, it satisfies ConfigChange of DC thus additional RC is triggered. If no, DC is only updated without new RC. (In reply to daniel from comment https://bugzilla.redhat.com/show_bug.cgi?id=1687902#c0) > B) for 1-3,5-6): when replacing w/o --force we get an additional dc (#11) Force only applies when you also specify --grace-period=0, did you get "warning: --force is ignored because --grace-period is not 0" when applying force w/o grace-period? No, didn't see the warning. $ oc new-project xxia1-proj1 $ oc new-app openshift/deployment-example # Then wait for app pod Running $ oc set env dc deployment-example --env=REVISION=2 # Get another deployment. Then wait for new app pod Running $ oc label --dry-run -o yaml dc deployment-example added-modification=anyvalue-1 > deployment-example.yaml # Modify the DC (non-pod-template part) $ oc replace -f deployment-example.yaml --force=true deploymentconfig.apps.openshift.io "deployment-example" deleted deploymentconfig.apps.openshift.io/deployment-example replaced This bug hasn't had any engineering activity in the last ~30 days. Maybe the problem got resolved, was a duplicate of something else, or became less pressing for some reason - or maybe it's still relevant but just hasn't been looked at yet. As such, we're marking this bug as "LifecycleStale". If you have further information on the current state of the bug, please update it and remove the "LifecycleStale" keyword, otherwise this bug will be automatically closed in 7 days. The information can be, for example, that the problem still occurs, that you still want the feature, that more information is needed, or that the bug is (for whatever reason) no longer relevant. I think we diverged with this bug to a different topic. Can we state the original problem of customer is solved now? confirmed with 4.5.0-0.nightly-2020-05-19-041951, without --force the DC not deleted: 1)oc new-project zhouy 2)oc new-app openshift/deployment-example 3)oc set env dc deployment-example --env=REVISION=2 4)oc label --dry-run -o yaml dc deployment-example added-modification=anyvalue-1 >/tmp/deployment-example.yaml 5) oc get all : [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/deployment-example-1-deploy 0/1 Completed 0 2m11s pod/deployment-example-2-deploy 0/1 Completed 0 68s pod/deployment-example-2-fbbpr 1/1 Running 0 64s NAME DESIRED CURRENT READY AGE replicationcontroller/deployment-example-1 0 0 0 2m12s replicationcontroller/deployment-example-2 1 1 1 69s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/deployment-example ClusterIP 172.30.128.99 <none> 8080/TCP 2m13s NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/deployment-example 2 1 1 config,image(deployment-example:latest) NAME IMAGE REPOSITORY TAGS UPDATED imagestream.image.openshift.io/deployment-example image-registry.openshift-image-registry.svc:5000/zhouy/deployment-example latest 2 minutes ago 6) [root@dhcp-140-138 ~]# oc replace -f /tmp/deployment-example.yaml deploymentconfig.apps.openshift.io/deployment-example replaced [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/deployment-example-1-deploy 0/1 Completed 0 3m58s pod/deployment-example-2-deploy 0/1 Completed 0 2m55s pod/deployment-example-2-fbbpr 1/1 Running 0 2m51s NAME DESIRED CURRENT READY AGE replicationcontroller/deployment-example-1 0 0 0 3m58s replicationcontroller/deployment-example-2 1 1 1 2m55s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/deployment-example ClusterIP 172.30.128.99 <none> 8080/TCP 4m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/deployment-example 2 1 1 config,image(deployment-example:latest) NAME IMAGE REPOSITORY TAGS UPDATED imagestream.image.openshift.io/deployment-example image-registry.openshift-image-registry.svc:5000/zhouy/deployment-example latest 4 minutes ago the default '--cascade=true' does not work. Only use --force will delete the DC , but not get any worning: [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/hello-openshift-1-deploy 0/1 Completed 0 5m27s pod/hello-openshift-2-deploy 0/1 Completed 0 2m5s pod/hello-openshift-2-g5wj8 1/1 Running 0 2m1s NAME DESIRED CURRENT READY AGE replicationcontroller/hello-openshift-1 0 0 0 5m28s replicationcontroller/hello-openshift-2 1 1 1 2m6s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hello-openshift ClusterIP 172.30.93.76 <none> 8080/TCP,8888/TCP 5m29s NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/hello-openshift 2 1 1 config,image(hello-openshift:latest) NAME IMAGE REPOSITORY TAGS UPDATED imagestream.image.openshift.io/hello-openshift image-registry.openshift-image-registry.svc:5000/dftest/hello-openshift latest 5 minutes ago [root@dhcp-140-138 ~]# oc label --dry-run=client -o yaml dc/hello-openshift added-modification=anyvalue-1 >/tmp/hello-openshift.yaml [root@dhcp-140-138 ~]# oc replace -f /tmp/hello-openshift.yaml --force=true deploymentconfig.apps.openshift.io "hello-openshift" deleted deploymentconfig.apps.openshift.io/hello-openshift replaced [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/hello-openshift-1-deploy 0/1 Completed 0 14s pod/hello-openshift-1-qgwj4 1/1 Running 0 12s NAME DESIRED CURRENT READY AGE replicationcontroller/hello-openshift-1 1 1 1 14s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hello-openshift ClusterIP 172.30.93.76 <none> 8080/TCP,8888/TCP 7m11s NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/hello-openshift 1 1 1 config,image(hello-openshift:latest) NAME IMAGE REPOSITORY TAGS UPDATED imagestream.image.openshift.io/hello-openshift image-registry.openshift-image-registry.svc:5000/dftest/hello-openshift latest 7 minutes ago (In reply to zhou ying from comment #16) > Only use --force will delete the DC , but not get any worning: That is fine, warning is to let you know, force is like hammer it will always pass. Without the `--force=true`, just replace , no delete dc. so no differ with --cascade is true or false [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/hello-openshift-2-8sjbk 1/1 Running 0 1m NAME DESIRED CURRENT READY AGE replicationcontroller/hello-openshift-1 0 0 0 3m replicationcontroller/hello-openshift-2 1 1 1 1m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hello-openshift ClusterIP 172.30.216.222 <none> 8080/TCP,8888/TCP 3m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/hello-openshift 2 1 1 config,image(hello-openshift:latest) NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/hello-openshift docker-registry.default.svc:5000/zhouy/hello-openshift latest 3 minutes ago [root@dhcp-140-138 ~]# oc get po NAME READY STATUS RESTARTS AGE hello-openshift-2-8sjbk 1/1 Running 0 1m [root@dhcp-140-138 ~]# oc replace -f /tmp/hello-openshift.yaml deploymentconfig.apps.openshift.io/hello-openshift replaced When use '--force=true`, the dc will be deleted, and now dc will created: [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/hello-openshift-2-8sjbk 1/1 Running 0 4m NAME DESIRED CURRENT READY AGE replicationcontroller/hello-openshift-1 0 0 0 6m replicationcontroller/hello-openshift-2 1 1 1 4m NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hello-openshift ClusterIP 172.30.216.222 <none> 8080/TCP,8888/TCP 6m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/hello-openshift 2 1 1 config,image(hello-openshift:latest) NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/hello-openshift docker-registry.default.svc:5000/zhouy/hello-openshift latest 6 minutes ago [root@dhcp-140-138 ~]# oc label --dry-run -o yaml dc hello-openshift added-modification2=anyvalue-2 >/tmp/hello-openshift2.yaml [root@dhcp-140-138 ~]# oc replace -f /tmp/hello-openshift2.yaml --force=true deploymentconfig.apps.openshift.io "hello-openshift" deleted deploymentconfig.apps.openshift.io/hello-openshift replaced [root@dhcp-140-138 ~]# oc get all NAME READY STATUS RESTARTS AGE pod/hello-openshift-1-jgtmr 1/1 Running 0 15s NAME DESIRED CURRENT READY AGE replicationcontroller/hello-openshift-1 1 1 1 18s NAME TYPE CLUSTER-IP EXTERNAL-IP PORT(S) AGE service/hello-openshift ClusterIP 172.30.216.222 <none> 8080/TCP,8888/TCP 8m NAME REVISION DESIRED CURRENT TRIGGERED BY deploymentconfig.apps.openshift.io/hello-openshift 1 1 1 config,image(hello-openshift:latest) NAME DOCKER REPO TAGS UPDATED imagestream.image.openshift.io/hello-openshift docker-registry.default.svc:5000/zhouy/hello-openshift latest 8 minutes ago Confirmed with oc version: [root@dhcp-140-138 ~]# oc version oc v3.11.232 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:2477 |