Description of problem: `oc apply --force` will delete resource when failing to apply Version-Release number of selected component (if applicable): v3.9.0-0.31.0 How reproducible: Always Steps to Reproduce: 1. User creates resource, e.g. DC $ oc run hello --image=openshift/hello-openshift -l run=hello,version=3.1 2. When pod is deployed running, user gets dc yaml $ oc get dc hello -o yaml > mydc.yaml Then user updates mydc.yaml $ vi mydc.yaml # under spec.template.metadata.labels, user updates '3.1' to '3.2', but forgets to update '3.1' in other places ..... template: metadata: labels: run: hello version: "3.2" ..... 3. $ oc apply -f mydc.yaml --overwrite=true --force=true Actual results: 3. The output will include prompt: The DeploymentConfig "hello" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"run":"hello", "version":"3.2"}: `selector` does not match template `labels` But `oc get dc` will see the DC no longer exists Expected results: 3. When failing to apply, it could give corresponding prompt, but should not delete resource Additional info:
This is actually how --force is supposed to work: --force=false: Delete and re-create the specified resource, when PATCH encounters conflict and has retried for 5 times However the resource is supposed to also be recreated. Is it not being recreated after it has been deleted?
https://github.com/kubernetes/kubernetes/issues/58984
Alright I think I got to the bottom of this. Because you are saving the object by piping the entirety of the existing one into a file (from step 2)... > 2. When pod is deployed running, user gets dc yaml > $ oc get dc hello -o yaml > mydc.yaml ...The resource version of the object is also being saved in "mydc.yaml". When you perform the apply command, using what's stored in "mydc.yaml", you are patching a bunch of fields on the existing object, including the metadata.resourceVersion field which was also saved from step 2. Since you are sending patch bytes to the server containing a different resourceVersion than what now exists on the object in the server, you are causing a conflict error here [1]. Normally this would just result in the command failing with no side-effects. However, the apply command sees this conflict error as its cue to to keep retrying the patch five more times [2]. When this fails, it sees that the error did not end up as nil after those 5 attempts, and goes on to delete the object, and create a new one from the contents in "mydc.yaml" [3]. It is then that the object is deleted, but not recreated because aside from it having a conflict error, caused by leaving in the "resourceVersion" field, the change you made to "mydc.yaml" was also invalid. So normally the invalid change would be caught here [1] when sending the patch bytes to the server. But because the object also had a different "resourceVersion" than expected (this field should not have been present in "mydc.yaml" in the first place), that conflict error was returned instead, causing us to end up back at [3]. There is another error here, and that is that were you to remove the "resourceVersion" from "mydc.yaml", you would be left with a non-conflict error. But because the error's kind is not checked before deleting-and-creating-the-new-object [4], you would end up in a similar situation as the scenario discussed above. I have opened a patch that attempts to address both of these situations [5]. If `--force` is provided, its effects only take place if the final error returned by the server after trying to patch is actually a conflict error, and not just any error. Additionally, if a user accidentally leaves in a resourceVersion field (or decides to include it for whatever reason), causing an actual conflict error AND the patch that is provided also ends up being invalid, [5] would attempt to restore the original object when the new one inevitably fails to be created. 1. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L660 2. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L667 3. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L678 4. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L677 5. https://github.com/openshift/origin/pull/18337
Dan, I'm wondering if it is worth keeping the part of the PR [1] in place that accounts for when a resource is both invalid and causes a conflict error. Should we expect users to not include a resourceVersion field? Also could such a case be hit in any other way? 1. https://github.com/openshift/origin/pull/18337/files#diff-cb112dedd27138a42778ba4c699ff68fR702
(In reply to Juan Vallejo from comment #1) > This is actually how --force is supposed to work: > > --force=false: Delete and re-create the specified resource, when PATCH > encounters conflict and > has retried for 5 times > > However the resource is supposed to also be recreated. Is it not being > recreated after it has been deleted? Yeah, checked help info of '--force' when reporting, and found it is not recreated after deletion, thus reported :)
Upstream PR [1] has merged. Waiting on Origin PR [2]. 1. https://github.com/kubernetes/kubernetes/pull/58991/files 2. https://github.com/openshift/origin/pull/18337
Origin PR has merged. Moving to ON_QA
PR just newly merged. Will verify when new upcoming OCP 3.9 version available
Tested v3.9.0-0.42.0 per comment 0 steps, step 3 gives the pasted prompt, but now does not delete the resource. Issue is fixed