Bug 1508290

Summary: [3.6]Upgrade masters failed due to migrate storage failed
Product: OpenShift Container Platform Reporter: liujia <jiajliu>
Component: apiserver-authAssignee: Simo Sorce <ssorce>
Status: CLOSED WONTFIX QA Contact: Chuan Yu <chuyu>
Severity: high Docs Contact:
Priority: high    
Version: 3.6.1CC: aos-bugs, eparis, jialiu, jkaur, jokerman, mkhan, mmccomas, rbost, sdodson, ssorce, vsemushi, wmeng
Target Milestone: ---   
Target Release: 3.6.z   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-02-15 21:35:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description liujia 2017-11-01 05:17:16 UTC
Description of problem:
Upgrade masters from v3.5 to v3.6, failed at task [Upgrade all storage].

fatal: [x.x.x.x]: FAILED! => {
    "changed": true, 
    "cmd": [
        "oc", 
        "adm", 
        "--config=/etc/origin/master/admin.kubeconfig", 
        "migrate", 
        "storage", 
        "--include=*", 
        "--confirm"
    ], 
    "delta": "0:00:24.425728", 
    "end": "2017-11-01 00:50:38.722587", 
    "failed": true, 
    "failed_when_result": true, 
    "rc": 1, 
    "start": "2017-11-01 00:50:14.296859"
}

STDOUT:

error:     pods/docker-registry-1-nng13 -n default: Pod "docker-registry-1-nng13" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/registry-console-1-lvrk9 -n default: Pod "registry-console-1-lvrk9" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/router-1-d2k16 -n default: Pod "router-1-d2k16" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/mongodb-1-zqpmp -n install-test: Pod "mongodb-1-zqpmp" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/nodejs-mongodb-example-1-2379k -n install-test: Pod "nodejs-mongodb-example-1-2379k" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/nodejs-mongodb-example-1-build -n install-test: Pod "nodejs-mongodb-example-1-build" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
summary: total=678 errors=6 ignored=0 unchanged=174 migrated=498
info: to rerun only failing resources, add --include=pods
error: 6 resources failed to migrate


STDERR:

error: exit directly


MSG:

non-zero return code



Version-Release number of the following components:
atomic-openshift-utils-3.6.173.0.48-1.git.0.1609d30.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. upgrade masters from v3.5 to v3.6
# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade_control_plane.yml
2.
3.

Actual results:
upgrade failed

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Scott Dodson 2017-11-01 12:21:02 UTC
*** Bug 1508294 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2017-11-01 15:06:17 UTC
Routing to Security.  Similar to the one from yesterday https://bugzilla.redhat.com/show_bug.cgi?id=1508027

Comment 5 Simo Sorce 2017-11-01 16:03:57 UTC
Liujia,
does this happen for 3.6 -> 3.7 upgrades too ?

Seth,
why doyo uhtink this is a Security issue ? If upgrade is performning forbidden operation ssounds like it is an upgrade problem ?

Scott,
this is not a duplicate of 1508294 afaict.

Comment 6 Scott Dodson 2017-11-01 17:24:23 UTC
(In reply to Simo Sorce from comment #5)

> Scott,
> this is not a duplicate of 1508294 afaict.

Yup, re-opened 1508294 and send it to build team.

The upgrade process runs `oc adm migrate storage --include=*` prior to the upgrade. We've been told that this is a mandatory and fatal if it fails. I can't say whether this is pod or security but all the migrate does is read an object and write it back to the API ensuring that all objects adhere to the current spec and that's the part that's failing here. So we send these bugs to the owners of the respective objects to decide if there's some way to automatically resolve the problem or if we need to document it for manual reconciliation.

Comment 7 Mo 2017-11-01 20:57:11 UTC
This most likely has the same root cause as bug 1383707 - SCC is trying to mutate the pod spec on update which causes validation to fail the update, which then fails the migrate command.

The PR to fix this https://github.com/openshift/origin/pull/16934 will not land until 3.8

I will let Simo decide how to update this BZ.

Comment 8 liujia 2017-11-03 03:29:48 UTC
(In reply to Simo Sorce from comment #5)
> Liujia,
> does this happen for 3.6 -> 3.7 upgrades too ?

Not yet.

Comment 9 Simo Sorce 2017-12-15 15:02:25 UTC
Did you move it to 3.6.z because you expect a backport ?
Slava,
is this backportable ?

Comment 10 Slava Semushin 2017-12-15 16:04:30 UTC
> Slava, is this backportable ?

It's possible but hard and most likely will require to backport some other changes.

Comment 11 Robert Bost 2018-02-14 15:31:44 UTC
If the backport to 3.6.z is too difficult/risky, would it be possible to identify a workaround?

Comment 12 Simo Sorce 2018-02-14 17:42:44 UTC
The workaround is to create an SCC that will not try to modify the Pod Spec and set it as higher priority I think.

Slava,
do we have any doc on this ?

Comment 13 Slava Semushin 2018-02-14 18:10:35 UTC
> The workaround is to create an SCC

Jordan has suggested and couple users confirmed that instead of creating a new SCC, "privileged" can be used:

https://bugzilla.redhat.com/show_bug.cgi?id=1383707#c20
https://bugzilla.redhat.com/show_bug.cgi?id=1383707#c47
https://bugzilla.redhat.com/show_bug.cgi?id=1383707#c45

And don't forget to revert this priority back later.

> do we have any doc on this ?

On what exactly? How to modify a priority field? By using oc edit/oc patch commands. These are the typical operations:
https://docs.openshift.org/latest/admin_guide/manage_scc.html#ensure-that-admission-attempts-to-use-a-specific-scc-first
https://docs.openshift.org/latest/admin_guide/manage_scc.html#updating-security-context-constraints

Comment 14 Simo Sorce 2018-02-14 18:14:17 UTC
Robert,
is this information sufficient ?