1620608 – Restoring deployment config with history leads to weird state

Bug 1620608 - Restoring deployment config with history leads to weird state

Summary: Restoring deployment config with history leads to weird state

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	openshift-controller-manager
Sub Component:
Version:	4.5
Hardware:	Unspecified
OS:	Linux
Priority:	medium
Severity:	medium
Target Milestone:	---
Target Release:	4.7.0
Assignee:	Maciej Szulik
QA Contact:	zhou ying
Docs Contact:
URL:
Whiteboard:	workloads
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2018-08-23 10:00 UTC by Joel Rosental R.
Modified:	2023-09-07 19:19 UTC (History)
CC List:	12 users (show)
Fixed In Version:
Doc Type:	No Doc Update
Doc Text:
Clone Of:
Environment:
Last Closed:	2021-02-24 15:10:48 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)
dump the dc and rc after receate (11.94 KB, text/plain) 2019-08-08 08:29 UTC, zhou ying	no flags	Details
View All

Links
System	ID	Private	Priority	Status	Summary	Last Updated
Red Hat Product Errata	RHSA-2020:5633	0	None	None	None	2021-02-24 15:11:52 UTC

Description Joel Rosental R. 2018-08-23 10:00:24 UTC

Description of problem:
When backing up a deploymentConfig with more than 1 revision within a project, e.g: `oc get dc,rc -o yaml > backup.yaml` and then trying to restore it, there are two things occurring:

1.- RC's are not created since they contain an ownerRefs section which refers to a non-existent DeploymentCconfig and the UUID does not match with the newly created DeploymentConfig object.

2.- If latest revision was set to X, where X is > 1, e.g: 3, then if ownerRef field within RC's object definition is removed, the RC's objects are created (since they don't refer to a non-existant DC anymore), however the revision for the DC is set to 1 instead of X (3 in this case). That means, when `oc rollout latest <dc>` is executed, it will tell you that it successfully rolled out, but nothing will happen (just the DC revision is bumped) until you call that command three times. On fourth time, it will actually trigger a new rollout.

Version-Release number of selected component (if applicable):
It has been reproduced in 3.7 and 3.11

How reproducible:
Always

Steps to Reproduce:
1. oc new-project test
2. oc new-app <URL>
3. oc rollout latest test
4. oc get rc (this should show two rc's)
5. oc get rc,dc -o yaml > backup.yaml
6. oc delete all --all
7. oc create -f backup.yaml

Actual results:
Firstly ReplicationController objects are not created since they contain an 'ownerRef' field which refers to a non-existent DC object through the UUID, and secondly if this field is removed from the ReplicationControllers object definition and the rc's get created, the deploymentconfig's revision number is set to one instead of to X (where X was the last revision when the dump was taken).

Expected results:
It should be able to restore from a dump file, e.g: backup.yaml and get all objects created with the proper revisions number in place.

Additional info:

Upstream issue: https://github.com/openshift/origin/issues/20729

Comment 1 Tomáš Nožička 2018-08-23 10:40:57 UTC

You need to strip the ownerrefs manually or by a tool before re-creating it from the dump. (Or create a separate BZ targeted at the CLI and helping you do that with e.g. `oc create --strip-ownerrefs`. This is a general issue applicable to upstream Kubernetes as well.)

The "import" will be fine then and the RCs get adopted. We have an issue there making you do dummy rollouts which we are going to fix here.

Comment 3 Tomáš Nožička 2018-09-24 11:12:00 UTC

Hi Joel,

yes, having a tool to strip down ownerReferences from object dumps is an RFE.

Comment 8 Tomáš Nožička 2019-03-15 16:05:53 UTC

https://github.com/openshift/origin/pull/22324

Comment 10 zhou ying 2019-04-12 07:09:16 UTC

The issue still could be reproduced by latest OCP 3.11:
[zhouying@dhcp-140-138 ~]$ oc version 
oc v3.11.105
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://vm-10-0-77-161.hosted.upshift.rdu2.redhat.com:8443
openshift v3.11.104
kubernetes v1.11.0+d4cacc0


[zhouying@dhcp-140-138 ~]$ oc create -f bakkkk.yaml 
deploymentconfig.apps.openshift.io/hello-openshift created
Error from server (Forbidden): replicationcontrollers "hello-openshift-1" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, <nil>
Error from server (Forbidden): replicationcontrollers "hello-openshift-2" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, <nil>
[zhouying@dhcp-140-138 ~]$ oc get rc
No resources found.
[zhouying@dhcp-140-138 ~]$ oc get dc
NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-openshift   0          1         0         config,image(hello-openshift:latest)
[zhouying@dhcp-140-138 ~]$ oc get po 
No resources found.


And the dc can't be rollout because no related imagestream created.
[zhouying@dhcp-140-138 ~]$ oc rollout latest dc/hello-openshift
Error from server (BadRequest): cannot trigger a deployment for "hello-openshift" because it contains unresolved images

Comment 11 Pedro Amoedo 2019-08-01 09:07:37 UTC

Hi all, any update here?

Comment 12 Tomáš Nožička 2019-08-06 09:12:36 UTC

> Error from server (Forbidden): replicationcontrollers "hello-openshift-1" is forbidden: cannot set blockOwnerDeletion if an ownerReference refers to a resource you can't set finalizers on: no RBAC policy matched, 

you need to delete owner references manually before recreating RCs (the UIDs would differ anyways)

Comment 13 zhou ying 2019-08-07 10:04:26 UTC

Hi Tomáš :

  When I delete the owner references, then created succeed, but the DC will lost the rc:
[yinzhou@192 ~]$  oc get dc
NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-openshift   0          1         0         config,image(hello-openshift:latest)
[yinzhou@192 ~]$ oc get rc
NAME                DESIRED   CURRENT   READY     AGE
hello-openshift-1   0         0         0         30m
hello-openshift-2   1         1         1         30m


Is this by design ?

Comment 14 zhou ying 2019-08-08 07:59:04 UTC

[root@dhcp-140-138 oc-client]# oc get rc
NAME                DESIRED   CURRENT   READY   AGE
hello-openshift-1   0         0         0       4h
hello-openshift-2   1         1         1       4h
[root@dhcp-140-138 oc-client]# oc get dc
NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-openshift   0          1         0         config,image(hello-openshift:latest)

[root@dhcp-140-138 oc-client]# oc get rc hello-openshift-2  -o template --template='{{.metadata.ownerReferences}}'
[map[apiVersion:apps.openshift.io/v1 blockOwnerDeletion:true controller:true kind:DeploymentConfig name:hello-openshift uid:915b4cb8-b98d-11e9-afaa-fa163e39231b]]

When I recreate the DC and RC, the DC failed to adopt the RC .

Comment 15 zhou ying 2019-08-08 08:29:58 UTC

Created attachment 1601729 [details]
dump the dc and rc after receate

Comment 16 Tomáš Nožička 2019-08-08 08:40:03 UTC

dc.status.latestVersion shouldn't be 0

I'll spin a cluster and look into it

Comment 17 Tomáš Nožička 2019-08-14 14:35:58 UTC

I have just tried it by building ose v3.11.104-1+95ffd35 and the adoption worked fine for me. It almost looks like the cluster didn't have the new patch.

wonder why there is oc version v3.11.105 and openshift api v3.11.104 - not that oc would matter

git tag --contains cfd91671c9c96552bd0c52e3bb7ccd8e86e3246f
v3.11.101-1
v3.11.102-1
v3.11.103-1
v3.11.104-1
v3.11.105-1
v3.11.106-1
v3.11.107-1
v3.11.108-1
v3.11.109-1
v3.11.110-1
v3.11.111-1
v3.11.112-1
v3.11.113-1
v3.11.114-1
v3.11.115-1
v3.11.116-1
v3.11.117-1
v3.11.118-1
v3.11.119-1
v3.11.120-1
v3.11.121-1
v3.11.122-1
v3.11.123-1
v3.11.124-1
v3.11.125-1
v3.11.126-1
v3.11.127-1
v3.11.128-1
v3.11.129-1
v3.11.130-1
v3.11.131-1
v3.11.132-1
v3.11.133-1
v3.11.134-1
v3.11.135-1
v3.11.136-1


Also there is an e2e in 
  https://github.com/openshift/origin/blob/a3dcfc0040cd5c6b1bda6e7d0d93192a39b5d473/test/extended/deployments/deployments.go#L1549
which should hopefully cover it if you want to look at differences.


can you try the approach shown in https://bugzilla.redhat.com/show_bug.cgi?id=1741133#c0 ?

Otherwise I'd need to see master controllers logs or possibly leaving the QA cluster alive so I can investigate there.

Comment 18 zhou ying 2019-08-16 08:21:08 UTC

Double confirmed with :
[zhouying@dhcp-140-138 test-bugs]$ oc version
oc v3.11.136
kubernetes v1.11.0+d4cacc0
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://ci-vm-10-0-151-207.hosted.upshift.rdu2.redhat.com:8443
openshift v3.11.136
kubernetes v1.11.0+d4cacc0


When I follow the steps:
1. oc new-project test
2. oc new-app <URL>
3. oc rollout latest test
4. oc get rc (this should show two rc's)
5. oc get rc,dc -o yaml > backup.yaml
6. oc delete all --all
7. Delete the owner references from the backup yaml file
8. oc create -f backup.yaml

Then the result will be: the dc will lost adoption of the RC :
[zhouying@dhcp-140-138 test-bugs]$ oc get dc
NAME              REVISION   DESIRED   CURRENT   TRIGGERED BY
hello-openshift   0          1         0         config,image(hello-openshift:latest)
[zhouying@dhcp-140-138 test-bugs]$ oc get rc
NAME                DESIRED   CURRENT   READY     AGE
hello-openshift-1   0         0         0         59s
hello-openshift-2   1         1         1         59s


When I follow the steps from: https://bugzilla.redhat.com/show_bug.cgi?id=1741133#c0 , the dc.status.latestVersion still be 2, not the respected 3.

Comment 20 Tomáš Nožička 2020-05-20 09:02:14 UTC

3.11 is closed for non critical fixes, this might have been fixed since then. Moving to QA to test against our current code base.

Comment 23 zhou ying 2020-05-25 07:31:42 UTC

[root@dhcp-140-138 roottest]#  oc version -o yaml
clientVersion:
  buildDate: "2020-05-23T15:25:26Z"
  compiler: gc
  gitCommit: 44354e2c9621e62b46d1854fd2d868f46fcdffff
  gitTreeState: clean
  gitVersion: 4.5.0-202005231517-44354e2
  goVersion: go1.13.4
  major: ""
  minor: ""
  platform: linux/amd64


1) oc create deploymentconfig dctest  --image=openshift/hello-openshift
2) oc rollout latest dc/dctest 
3) oc get rc,dc -o yaml > /tmp/backup.yaml
4) oc delete all --all
5) delete the owner references from the backup yaml for the rc;
6) oc create -f /tmp/backup.yaml 
replicationcontroller/dctest-1 created
replicationcontroller/dctest-2 created
deploymentconfig.apps.openshift.io/dctest created

7) [root@dhcp-140-138 roottest]# oc get dc
NAME     REVISION   DESIRED   CURRENT   TRIGGERED BY
dctest   1          1         0         config
[root@dhcp-140-138 roottest]# oc describe dc/dctest 
Name:		dctest
Namespace:	zhouydc
Created:	About a minute ago
Labels:		<none>
Annotations:	<none>
Latest Version:	1
Selector:	deployment-config.name=dctest
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	deployment-config.name=dctest
  Containers:
   default-container:
    Image:		openshift/hello-openshift
    Port:		<none>
    Host Port:		<none>
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Latest Deployment:	<none>

Events:
  Type		Reason				Age			From				Message
  ----		------				----			----				-------
  Warning	DeploymentCreationFailed	28s (x14 over 69s)	deploymentconfig-controller	Couldn't deploy version 1: replicationcontrollers "dctest-1" already exists

Comment 24 zhou ying 2020-05-25 07:33:20 UTC

the DC's .status.latestVersion is 1, not respected as 3.

Comment 25 Tomáš Nožička 2020-05-25 13:35:03 UTC

Zhou Ying, thanks for re-verification on our newest release, looks like this needs to be investigated.

I'll try to schedule some time to look into it. Adding UpcomingSprint as I was fully occupied with bugs having higher priority.

Comment 26 Tomáš Nožička 2020-06-18 09:10:30 UTC

I’m adding UpcomingSprint, because I was occupied by fixing bugs with higher priority/severity, developing new features with higher priority, or developing new features to improve stability at a macro level. I will revisit this bug next sprint.

Comment 31 Maciej Szulik 2021-02-02 17:31:41 UTC

This should be fixed by now in 4.7 since with k8s 1.20 we've got improved GC which matches resources by their UIDs.

Comment 33 zhou ying 2021-02-04 02:59:45 UTC

Maciej Szulik:

Checked with latest 
[root@dhcp-140-138 ~]# oc version 
Client Version: 4.7.0-202102032256.p0-c66c03f
Server Version: 4.7.0-0.nightly-2021-02-03-165316
Kubernetes Version: v1.20.0+e761892
[root@dhcp-140-138 ~]# oc get clusterversion 
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-02-03-165316   True        False         83m     Cluster version is 4.7.0-0.nightly-2021-02-03-165316

see the steps:
1) oc create deploymentconfig dctest  --image=openshift/hello-openshift
2) oc rollout latest dc/dctest 
3) oc get rc,dc -o yaml > /tmp/backup.yaml
4) oc delete all --all
5) delete the owner references from the backup yaml for the rc;
6) oc create -f /tmp/backup.yaml 
replicationcontroller/dctest-1 created
replicationcontroller/dctest-2 created
deploymentconfig.apps.openshift.io/dctest created

7) [root@dhcp-140-138 roottest]# oc get dc
[root@dhcp-140-138 ~]# oc get dc
NAME     REVISION   DESIRED   CURRENT   TRIGGERED BY
dctest   2          1         1         config
[root@dhcp-140-138 ~]# oc describe dc/dctest 
Name:		dctest
Namespace:	zhouyt
Created:	24 seconds ago
Labels:		<none>
Annotations:	<none>
Latest Version:	2
Selector:	deployment-config.name=dctest
Replicas:	1
Triggers:	Config
Strategy:	Rolling
Template:
Pod Template:
  Labels:	deployment-config.name=dctest
  Containers:
   default-container:
    Image:		openshift/hello-openshift
    Port:		<none>
    Host Port:		<none>
    Environment:	<none>
    Mounts:		<none>
  Volumes:		<none>

Deployment #2 (latest):
	Name:		dctest-2
	Created:	26 seconds ago
	Status:		Complete
	Replicas:	1 current / 1 desired
	Selector:	deployment-config.name=dctest,deployment=dctest-2,deploymentconfig=dctest
	Labels:		openshift.io/deployment-config.name=dctest
	Pods Status:	1 Running / 0 Waiting / 0 Succeeded / 0 Failed
Deployment #1:
	Created:	26 seconds ago
	Status:		Complete
	Replicas:	0 current / 0 desired

Events:	<none>

But dc.status.latestVersion still be 2, is this expected ?

Comment 34 zhou ying 2021-02-04 03:11:35 UTC

Maciej Szulik:

Please ignore my last question , no issue now , will move to verified status.

Comment 39 errata-xmlrpc 2021-02-24 15:10:48 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.7.0 security, bug fix, and enhancement update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2020:5633

Note You need to log in before you can comment on or make changes to this bug.