Bug 1539529

Summary: `oc apply --force` will delete resource when failing to apply
Product: OpenShift Container Platform Reporter: Xingxing Xia <xxia>
Component: ocAssignee: Juan Vallejo <jvallejo>
Status: CLOSED CURRENTRELEASE QA Contact: Xingxing Xia <xxia>
Severity: medium Docs Contact:
Priority: medium    
Version: 3.9.0CC: aos-bugs, dmace, hasha, jokerman, mfojtik, mmccomas, smunilla, xxia
Target Milestone: ---Flags: xxia: needinfo-
Target Release: 3.9.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: v3.9.0-0.42.0 Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-18 18:17:45 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Xingxing Xia 2018-01-29 06:36:15 UTC
Description of problem:
`oc apply --force` will delete resource when failing to apply

Version-Release number of selected component (if applicable):
v3.9.0-0.31.0

How reproducible:
Always

Steps to Reproduce:
1. User creates resource, e.g. DC
$ oc run hello --image=openshift/hello-openshift -l run=hello,version=3.1

2. When pod is deployed running, user gets dc yaml
$ oc get dc hello -o yaml > mydc.yaml

Then user updates mydc.yaml
$ vi mydc.yaml # under spec.template.metadata.labels, user updates '3.1' to '3.2', but forgets to update '3.1' in other places
.....
  template:
    metadata:
      labels:
        run: hello
        version: "3.2"
.....

3. $ oc apply -f mydc.yaml --overwrite=true  --force=true

Actual results:
3. The output will include prompt:
The DeploymentConfig "hello" is invalid: spec.template.metadata.labels: Invalid value: map[string]string{"run":"hello", "version":"3.2"}: `selector` does not match template `labels`

But `oc get dc` will see the DC no longer exists

Expected results:
3. When failing to apply, it could give corresponding prompt, but should not delete resource

Additional info:

Comment 1 Juan Vallejo 2018-01-29 19:11:01 UTC
This is actually how --force is supposed to work:

--force=false: Delete and re-create the specified resource, when PATCH encounters conflict and
has retried for 5 times

However the resource is supposed to also be recreated. Is it not being recreated after it has been deleted?

Comment 3 Juan Vallejo 2018-01-29 22:21:14 UTC
Alright I think I got to the bottom of this.

Because you are saving the object by piping the entirety of the existing one into a file (from step 2)...

> 2. When pod is deployed running, user gets dc yaml
> $ oc get dc hello -o yaml > mydc.yaml

...The resource version of the object is also being saved in "mydc.yaml".

When you perform the apply command, using what's stored in "mydc.yaml", you are patching a bunch of fields on the existing object, including the metadata.resourceVersion field which was also saved from step 2.

Since you are sending patch bytes to the server containing a different resourceVersion than what now exists on the object in the server, you are causing a conflict error here [1].

Normally this would just result in the command failing with no side-effects.
However, the apply command sees this conflict error as its cue to to keep retrying the patch five more times [2].

When this fails, it sees that the error did not end up as nil after those 5 attempts, and goes on to delete the object, and create a new one from the contents in "mydc.yaml" [3].

It is then that the object is deleted, but not recreated because aside from it having a conflict error, caused by leaving in the "resourceVersion" field, the change you made to "mydc.yaml" was also invalid.

So normally the invalid change would be caught here [1] when sending the patch bytes to the server. But because the object also had a different "resourceVersion" than expected (this field should not have been present in "mydc.yaml" in the first place), that conflict error was returned instead, causing us to end up back at [3].

There is another error here, and that is that were you to remove the "resourceVersion" from "mydc.yaml", you would be left with a non-conflict error. But because the error's kind is not checked before deleting-and-creating-the-new-object [4], you would end up in a similar situation as the scenario discussed above.

I have opened a patch that attempts to address both of these situations [5].
If `--force` is provided, its effects only take place if the final error returned by the server after trying to patch is actually a conflict error, and not just any error.
Additionally, if a user accidentally leaves in a resourceVersion field (or decides to include it for whatever reason), causing an actual conflict error AND the patch that is provided also ends up being invalid, [5] would attempt to restore the original object when the new one inevitably fails to be created.

1. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L660

2. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L667

3. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L678

4. https://github.com/openshift/origin/blob/master/vendor/k8s.io/kubernetes/pkg/kubectl/cmd/apply.go#L677

5. https://github.com/openshift/origin/pull/18337

Comment 4 Juan Vallejo 2018-01-29 22:25:53 UTC
Dan, I'm wondering if it is worth keeping the part of the PR [1] in place that accounts for when a resource is both invalid and causes a conflict error. Should we expect users to not include a resourceVersion field? Also could such a case be hit in any other way?

1. https://github.com/openshift/origin/pull/18337/files#diff-cb112dedd27138a42778ba4c699ff68fR702

Comment 5 Xingxing Xia 2018-01-30 01:57:06 UTC
(In reply to Juan Vallejo from comment #1)
> This is actually how --force is supposed to work:
> 
> --force=false: Delete and re-create the specified resource, when PATCH
> encounters conflict and
> has retried for 5 times
> 
> However the resource is supposed to also be recreated. Is it not being
> recreated after it has been deleted?
Yeah, checked help info of '--force' when reporting, and found it is not recreated after deletion, thus reported :)

Comment 6 Juan Vallejo 2018-02-06 19:07:37 UTC
Upstream PR [1] has merged.
Waiting on Origin PR [2].

1. https://github.com/kubernetes/kubernetes/pull/58991/files
2. https://github.com/openshift/origin/pull/18337

Comment 8 Juan Vallejo 2018-02-08 19:51:42 UTC
Origin PR has merged. Moving to ON_QA

Comment 9 Xingxing Xia 2018-02-09 08:04:34 UTC
PR just newly merged. Will verify when new upcoming OCP 3.9 version available

Comment 10 Xingxing Xia 2018-02-11 03:35:42 UTC
Tested v3.9.0-0.42.0 per comment 0 steps, step 3 gives the pasted prompt, but now does not delete the resource. Issue is fixed