Bug 1620608
| Summary: | Restoring deployment config with history leads to weird state | ||||||
|---|---|---|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Joel Rosental R. <jrosenta> | ||||
| Component: | openshift-controller-manager | Assignee: | Maciej Szulik <maszulik> | ||||
| openshift-controller-manager sub component: | apps | QA Contact: | zhou ying <yinzhou> | ||||
| Status: | CLOSED ERRATA | Docs Contact: | |||||
| Severity: | medium | ||||||
| Priority: | medium | CC: | adam.kaplan, agawand, aos-bugs, dmoessne, jokerman, maszulik, mdame, mfojtik, mmccomas, pamoedom, sttts, xxia | ||||
| Version: | 4.5 | ||||||
| Target Milestone: | --- | ||||||
| Target Release: | 4.7.0 | ||||||
| Hardware: | Unspecified | ||||||
| OS: | Linux | ||||||
| Whiteboard: | workloads | ||||||
| Fixed In Version: | Doc Type: | No Doc Update | |||||
| Doc Text: | Story Points: | --- | |||||
| Clone Of: | Environment: | ||||||
| Last Closed: | 2021-02-24 15:10:48 UTC | Type: | Bug | ||||
| Regression: | --- | Mount Type: | --- | ||||
| Documentation: | --- | CRM: | |||||
| Verified Versions: | Category: | --- | |||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||
| Embargoed: | |||||||
| Attachments: |
|
||||||
|
Description
Joel Rosental R.
2018-08-23 10:00:24 UTC
You need to strip the ownerrefs manually or by a tool before re-creating it from the dump. (Or create a separate BZ targeted at the CLI and helping you do that with e.g. `oc create --strip-ownerrefs`. This is a general issue applicable to upstream Kubernetes as well.) The "import" will be fine then and the RCs get adopted. We have an issue there making you do dummy rollouts which we are going to fix here. Hi Joel, yes, having a tool to strip down ownerReferences from object dumps is an RFE. The issue still could be reproduced by latest OCP 3.11: [zhouying@dhcp-140-138 ~]$ oc version oc v3.11.105 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://vm-10-0-77-161.hosted.upshift.rdu2.redhat.com:8443 openshift v3.11.104 kubernetes v1.11.0+d4cacc0 [zhouying@dhcp-140-138 ~]$ oc create -f bakkkk.yaml deploymentconfig.apps.openshift.io/hello-openshift created Error from server (Forbidden): replicationcontrollers "hello-openshift-1" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, <nil> Error from server (Forbidden): replicationcontrollers "hello-openshift-2" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, <nil> [zhouying@dhcp-140-138 ~]$ oc get rc No resources found. [zhouying@dhcp-140-138 ~]$ oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY hello-openshift 0 1 0 config,image(hello-openshift:latest) [zhouying@dhcp-140-138 ~]$ oc get po No resources found. And the dc can't be rollout because no related imagestream created. [zhouying@dhcp-140-138 ~]$ oc rollout latest dc/hello-openshift Error from server (BadRequest): cannot trigger a deployment for "hello-openshift" because it contains unresolved images Hi all, any update here? > Error from server (Forbidden): replicationcontrollers "hello-openshift-1" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched,
you need to delete owner references manually before recreating RCs (the UIDs would differ anyways)
Hi Tomáš : When I delete the owner references, then created succeed, but the DC will lost the rc: [yinzhou@192 ~]$ oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY hello-openshift 0 1 0 config,image(hello-openshift:latest) [yinzhou@192 ~]$ oc get rc NAME DESIRED CURRENT READY AGE hello-openshift-1 0 0 0 30m hello-openshift-2 1 1 1 30m Is this by design ? [root@dhcp-140-138 oc-client]# oc get rc
NAME DESIRED CURRENT READY AGE
hello-openshift-1 0 0 0 4h
hello-openshift-2 1 1 1 4h
[root@dhcp-140-138 oc-client]# oc get dc
NAME REVISION DESIRED CURRENT TRIGGERED BY
hello-openshift 0 1 0 config,image(hello-openshift:latest)
[root@dhcp-140-138 oc-client]# oc get rc hello-openshift-2 -o template --template='{{.metadata.ownerReferences}}'
[map[apiVersion:apps.openshift.io/v1 blockOwnerDeletion:true controller:true kind:DeploymentConfig name:hello-openshift uid:915b4cb8-b98d-11e9-afaa-fa163e39231b]]
When I recreate the DC and RC, the DC failed to adopt the RC .
Created attachment 1601729 [details]
dump the dc and rc after receate
dc.status.latestVersion shouldn't be 0 I'll spin a cluster and look into it I have just tried it by building ose v3.11.104-1+95ffd35 and the adoption worked fine for me. It almost looks like the cluster didn't have the new patch. wonder why there is oc version v3.11.105 and openshift api v3.11.104 - not that oc would matter git tag --contains cfd91671c9c96552bd0c52e3bb7ccd8e86e3246f v3.11.101-1 v3.11.102-1 v3.11.103-1 v3.11.104-1 v3.11.105-1 v3.11.106-1 v3.11.107-1 v3.11.108-1 v3.11.109-1 v3.11.110-1 v3.11.111-1 v3.11.112-1 v3.11.113-1 v3.11.114-1 v3.11.115-1 v3.11.116-1 v3.11.117-1 v3.11.118-1 v3.11.119-1 v3.11.120-1 v3.11.121-1 v3.11.122-1 v3.11.123-1 v3.11.124-1 v3.11.125-1 v3.11.126-1 v3.11.127-1 v3.11.128-1 v3.11.129-1 v3.11.130-1 v3.11.131-1 v3.11.132-1 v3.11.133-1 v3.11.134-1 v3.11.135-1 v3.11.136-1 Also there is an e2e in https://github.com/openshift/origin/blob/a3dcfc0040cd5c6b1bda6e7d0d93192a39b5d473/test/extended/deployments/deployments.go#L1549 which should hopefully cover it if you want to look at differences. can you try the approach shown in https://bugzilla.redhat.com/show_bug.cgi?id=1741133#c0 ? Otherwise I'd need to see master controllers logs or possibly leaving the QA cluster alive so I can investigate there. Double confirmed with : [zhouying@dhcp-140-138 test-bugs]$ oc version oc v3.11.136 kubernetes v1.11.0+d4cacc0 features: Basic-Auth GSSAPI Kerberos SPNEGO Server https://ci-vm-10-0-151-207.hosted.upshift.rdu2.redhat.com:8443 openshift v3.11.136 kubernetes v1.11.0+d4cacc0 When I follow the steps: 1. oc new-project test 2. oc new-app <URL> 3. oc rollout latest test 4. oc get rc (this should show two rc's) 5. oc get rc,dc -o yaml > backup.yaml 6. oc delete all --all 7. Delete the owner references from the backup yaml file 8. oc create -f backup.yaml Then the result will be: the dc will lost adoption of the RC : [zhouying@dhcp-140-138 test-bugs]$ oc get dc NAME REVISION DESIRED CURRENT TRIGGERED BY hello-openshift 0 1 0 config,image(hello-openshift:latest) [zhouying@dhcp-140-138 test-bugs]$ oc get rc NAME DESIRED CURRENT READY AGE hello-openshift-1 0 0 0 59s hello-openshift-2 1 1 1 59s When I follow the steps from: https://bugzilla.redhat.com/show_bug.cgi?id=1741133#c0 , the dc.status.latestVersion still be 2, not the respected 3. 3.11 is closed for non critical fixes, this might have been fixed since then. Moving to QA to test against our current code base. [root@dhcp-140-138 roottest]# oc version -o yaml
clientVersion:
buildDate: "2020-05-23T15:25:26Z"
compiler: gc
gitCommit: 44354e2c9621e62b46d1854fd2d868f46fcdffff
gitTreeState: clean
gitVersion: 4.5.0-202005231517-44354e2
goVersion: go1.13.4
major: ""
minor: ""
platform: linux/amd64
1) oc create deploymentconfig dctest --image=openshift/hello-openshift
2) oc rollout latest dc/dctest
3) oc get rc,dc -o yaml > /tmp/backup.yaml
4) oc delete all --all
5) delete the owner references from the backup yaml for the rc;
6) oc create -f /tmp/backup.yaml
replicationcontroller/dctest-1 created
replicationcontroller/dctest-2 created
deploymentconfig.apps.openshift.io/dctest created
7) [root@dhcp-140-138 roottest]# oc get dc
NAME REVISION DESIRED CURRENT TRIGGERED BY
dctest 1 1 0 config
[root@dhcp-140-138 roottest]# oc describe dc/dctest
Name: dctest
Namespace: zhouydc
Created: About a minute ago
Labels: <none>
Annotations: <none>
Latest Version: 1
Selector: deployment-config.name=dctest
Replicas: 1
Triggers: Config
Strategy: Rolling
Template:
Pod Template:
Labels: deployment-config.name=dctest
Containers:
default-container:
Image: openshift/hello-openshift
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Latest Deployment: <none>
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Warning DeploymentCreationFailed 28s (x14 over 69s) deploymentconfig-controller Couldn't deploy version 1: replicationcontrollers "dctest-1" already exists
the DC's .status.latestVersion is 1, not respected as 3. Zhou Ying, thanks for re-verification on our newest release, looks like this needs to be investigated. I'll try to schedule some time to look into it. Adding UpcomingSprint as I was fully occupied with bugs having higher priority. I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint. This should be fixed by now in 4.7 since with k8s 1.20 we've got improved GC which matches resources by their UIDs. Maciej Szulik:
Checked with latest
[root@dhcp-140-138 ~]# oc version
Client Version: 4.7.0-202102032256.p0-c66c03f
Server Version: 4.7.0-0.nightly-2021-02-03-165316
Kubernetes Version: v1.20.0+e761892
[root@dhcp-140-138 ~]# oc get clusterversion
NAME VERSION AVAILABLE PROGRESSING SINCE STATUS
version 4.7.0-0.nightly-2021-02-03-165316 True False 83m Cluster version is 4.7.0-0.nightly-2021-02-03-165316
see the steps:
1) oc create deploymentconfig dctest --image=openshift/hello-openshift
2) oc rollout latest dc/dctest
3) oc get rc,dc -o yaml > /tmp/backup.yaml
4) oc delete all --all
5) delete the owner references from the backup yaml for the rc;
6) oc create -f /tmp/backup.yaml
replicationcontroller/dctest-1 created
replicationcontroller/dctest-2 created
deploymentconfig.apps.openshift.io/dctest created
7) [root@dhcp-140-138 roottest]# oc get dc
[root@dhcp-140-138 ~]# oc get dc
NAME REVISION DESIRED CURRENT TRIGGERED BY
dctest 2 1 1 config
[root@dhcp-140-138 ~]# oc describe dc/dctest
Name: dctest
Namespace: zhouyt
Created: 24 seconds ago
Labels: <none>
Annotations: <none>
Latest Version: 2
Selector: deployment-config.name=dctest
Replicas: 1
Triggers: Config
Strategy: Rolling
Template:
Pod Template:
Labels: deployment-config.name=dctest
Containers:
default-container:
Image: openshift/hello-openshift
Port: <none>
Host Port: <none>
Environment: <none>
Mounts: <none>
Volumes: <none>
Deployment #2 (latest):
Name: dctest-2
Created: 26 seconds ago
Status: Complete
Replicas: 1 current / 1 desired
Selector: deployment-config.name=dctest,deployment=dctest-2,deploymentconfig=dctest
Labels: openshift.io/deployment-config.name=dctest
Pods Status: 1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Deployment #1:
Created: 26 seconds ago
Status: Complete
Replicas: 0 current / 0 desired
Events: <none>
But dc.status.latestVersion still be 2, is this expected ?
Maciej Szulik: Please ignore my last question , no issue now , will move to verified status. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2020:5633 |