Bug 1539529 - `oc apply --force` will delete resource when failing to apply
Summary: `oc apply --force` will delete resource when failing to apply
Keywords:
Status: CLOSED CURRENTRELEASE
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: oc
Version: 3.9.0
Hardware: Unspecified
OS: Unspecified
medium
medium
Target Milestone: ---
: 3.9.0
Assignee: Juan Vallejo
QA Contact: Xingxing Xia
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2018-01-29 06:36 UTC by Xingxing Xia
Modified: 2018-06-18 18:17 UTC (History)
8 users (show)

Fixed In Version: v3.9.0-0.42.0
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-06-18 18:17:45 UTC
Target Upstream Version:
Embargoed:
xxia: needinfo-


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2018:2013 0 normal SHIPPED_LIVE Important: OpenShift Container Platform 3.9 security, bug fix, and enhancement update 2018-06-27 22:01:43 UTC

Description Xingxing Xia 2018-01-29 06:36:15 UTC
Description of problem:
`oc apply --force` will delete resource when failing to apply

Version-Release number of selected component (if applicable):
v3.9.0-0.31.0

How reproducible:
Always

Steps to Reproduce:
1. User creates resource, e.g. DC
$ oc run hello --image=openshift/hello-openshift -l run=hello,version=3.1

2. When pod is deployed running, user gets dc yaml
$ oc get dc hello -o yaml > mydc.yaml

Then user updates mydc.yaml
$ vi mydc.yaml # under spec.template.metadata.labels, user updates '3.1' to '3.2', but forgets to update '3.1' in other places
.....
  template:
    metadata:
      labels:
        run: hello
        version: "3.2"
.....

3. $ oc apply -f mydc.yaml --overwrite=true  --force=true

Actual results:
3. The output will include prompt:
The DeploymentConfig "hello" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"run":"hello", "version":"3.2"}: `selector` does not match template `labels`

But `oc get dc` will see the DC no longer exists

Expected results:
3. When failing to apply, it could give corresponding prompt, but should not delete resource

Additional info:

Comment 1 Juan Vallejo 2018-01-29 19:11:01 UTC
This is actually how --force is supposed to work:

--force=false: Delete and re-create the specified resource, when PATCH encounters conflict and
has retried for 5 times

However the resource is supposed to also be recreated. Is it not being recreated after it has been deleted?

Comment 3 Juan Vallejo 2018-01-29 22:21:14 UTC
Alright I think I got to the bottom of this.

Because you are saving the object by piping the entirety of the existing one into a file (from step 2)...

> 2. When pod is deployed running, user gets dc yaml
> $ oc get dc hello -o yaml > mydc.yaml

...The resource version of the object is also being saved in "mydc.yaml".

When you perform the apply command, using what's stored in "mydc.yaml", you are patching a bunch of fields on the existing object, including the metadata.resourceVersion field which was also saved from step 2.

Since you are sending patch bytes to the server containing a different resourceVersion than what now exists on the object in the server, you are causing a conflict error here [1].

Normally this would just result in the command failing with no side-effects.
However, the apply command sees this conflict error as its cue to to keep retrying the patch five more times [2].

When this fails, it sees that the error did not end up as nil after those 5 attempts, and goes on to delete the object, and create a new one from the contents in "mydc.yaml" [3].

It is then that the object is deleted, but not recreated because aside from it having a conflict error, caused by leaving in the "resourceVersion" field, the change you made to "mydc.yaml" was also invalid.

So normally the invalid change would be caught here [1] when sending the patch bytes to the server. But because the object also had a different "resourceVersion" than expected (this field should not have been present in "mydc.yaml" in the first place), that conflict error was returned instead, causing us to end up back at [3].

There is another error here, and that is that were you to remove the "resourceVersion" from "mydc.yaml", you would be left with a non-conflict error. But because the error's kind is not checked before deleting-and-creating-the-new-object [4], you would end up in a similar situation as the scenario discussed above.

I have opened a patch that attempts to address both of these situations [5].
If `--force` is provided, its effects only take place if the final error returned by the server after trying to patch is actually a conflict error, and not just any error.
Additionally, if a user accidentally leaves in a resourceVersion field (or decides to include it for whatever reason), causing an actual conflict error AND the patch that is provided also ends up being invalid, [5] would attempt to restore the original object when the new one inevitably fails to be created.

1. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L660

2. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L667

3. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L678

4. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L677

5. https://github.com/openshift/origin/pull/18337

Comment 4 Juan Vallejo 2018-01-29 22:25:53 UTC
Dan, I'm wondering if it is worth keeping the part of the PR [1] in place that accounts for when a resource is both invalid and causes a conflict error. Should we expect users to not include a resourceVersion field? Also could such a case be hit in any other way?

1. https://github.com/openshift/origin/pull/18337/files#diff-cb112dedd27138a42778ba4c699ff68fR702

Comment 5 Xingxing Xia 2018-01-30 01:57:06 UTC
(In reply to Juan Vallejo from comment #1)
> This is actually how --force is supposed to work:
> 
> --force=false: Delete and re-create the specified resource, when PATCH
> encounters conflict and
> has retried for 5 times
> 
> However the resource is supposed to also be recreated. Is it not being
> recreated after it has been deleted?
Yeah, checked help info of '--force' when reporting, and found it is not recreated after deletion, thus reported :)

Comment 6 Juan Vallejo 2018-02-06 19:07:37 UTC
Upstream PR [1] has merged.
Waiting on Origin PR [2].

1. https://github.com/kubernetes/kubernetes/pull/58991/files
2. https://github.com/openshift/origin/pull/18337

Comment 8 Juan Vallejo 2018-02-08 19:51:42 UTC
Origin PR has merged. Moving to ON_QA

Comment 9 Xingxing Xia 2018-02-09 08:04:34 UTC
PR just newly merged. Will verify when new upcoming OCP 3.9 version available

Comment 10 Xingxing Xia 2018-02-11 03:35:42 UTC
Tested v3.9.0-0.42.0 per comment 0 steps, step 3 gives the pasted prompt, but now does not delete the resource. Issue is fixed


Note You need to log in before you can comment on or make changes to this bug.