Bug 1508290 - [3.6]Upgrade masters failed due to migrate storage failed
Summary: [3.6]Upgrade masters failed due to migrate storage failed
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: apiserver-auth
Version: 3.6.1
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 3.6.z
Assignee: Simo Sorce
QA Contact: Chuan Yu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-11-01 05:17 UTC by liujia
Modified: 2021-06-10 13:25 UTC (History)
12 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2018-02-15 21:35:07 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description liujia 2017-11-01 05:17:16 UTC
Description of problem:
Upgrade masters from v3.5 to v3.6, failed at task [Upgrade all storage].

fatal: [x.x.x.x]: FAILED! => {
    "changed": true, 
    "cmd": [
        "oc", 
        "adm", 
        "--config=/etc/origin/master/admin.kubeconfig", 
        "migrate", 
        "storage", 
        "--include=*", 
        "--confirm"
    ], 
    "delta": "0:00:24.425728", 
    "end": "2017-11-01 00:50:38.722587", 
    "failed": true, 
    "failed_when_result": true, 
    "rc": 1, 
    "start": "2017-11-01 00:50:14.296859"
}

STDOUT:

error:     pods/docker-registry-1-nng13 -n default: Pod "docker-registry-1-nng13" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/registry-console-1-lvrk9 -n default: Pod "registry-console-1-lvrk9" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/router-1-d2k16 -n default: Pod "router-1-d2k16" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/mongodb-1-zqpmp -n install-test: Pod "mongodb-1-zqpmp" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/nodejs-mongodb-example-1-2379k -n install-test: Pod "nodejs-mongodb-example-1-2379k" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
error:     pods/nodejs-mongodb-example-1-build -n install-test: Pod "nodejs-mongodb-example-1-build" is invalid: spec: Forbidden: pod updates may not change fields other than `containers[*].image` or `spec.activeDeadlineSeconds`
summary: total=678 errors=6 ignored=0 unchanged=174 migrated=498
info: to rerun only failing resources, add --include=pods
error: 6 resources failed to migrate


STDERR:

error: exit directly


MSG:

non-zero return code



Version-Release number of the following components:
atomic-openshift-utils-3.6.173.0.48-1.git.0.1609d30.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. upgrade masters from v3.5 to v3.6
# ansible-playbook -i hosts /usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_6/upgrade_control_plane.yml
2.
3.

Actual results:
upgrade failed

Expected results:
Upgrade succeed.

Additional info:
Please attach logs from ansible-playbook with the -vvv flag

Comment 2 Scott Dodson 2017-11-01 12:21:02 UTC
*** Bug 1508294 has been marked as a duplicate of this bug. ***

Comment 4 Seth Jennings 2017-11-01 15:06:17 UTC
Routing to Security.  Similar to the one from yesterday https://bugzilla.redhat.com/show_bug.cgi?id=1508027

Comment 5 Simo Sorce 2017-11-01 16:03:57 UTC
Liujia,
does this happen for 3.6 -> 3.7 upgrades too ?

Seth,
why doyo uhtink this is a Security issue ? If upgrade is performning forbidden operation ssounds like it is an upgrade problem ?

Scott,
this is not a duplicate of 1508294 afaict.

Comment 6 Scott Dodson 2017-11-01 17:24:23 UTC
(In reply to Simo Sorce from comment #5)

> Scott,
> this is not a duplicate of 1508294 afaict.

Yup, re-opened 1508294 and send it to build team.

The upgrade process runs `oc adm migrate storage --include=*` prior to the upgrade. We've been told that this is a mandatory and fatal if it fails. I can't say whether this is pod or security but all the migrate does is read an object and write it back to the API ensuring that all objects adhere to the current spec and that's the part that's failing here. So we send these bugs to the owners of the respective objects to decide if there's some way to automatically resolve the problem or if we need to document it for manual reconciliation.

Comment 7 Mo 2017-11-01 20:57:11 UTC
This most likely has the same root cause as bug 1383707 - SCC is trying to mutate the pod spec on update which causes validation to fail the update, which then fails the migrate command.

The PR to fix this https://github.com/openshift/origin/pull/16934 will not land until 3.8

I will let Simo decide how to update this BZ.

Comment 8 liujia 2017-11-03 03:29:48 UTC
(In reply to Simo Sorce from comment #5)
> Liujia,
> does this happen for 3.6 -> 3.7 upgrades too ?

Not yet.

Comment 9 Simo Sorce 2017-12-15 15:02:25 UTC
Did you move it to 3.6.z because you expect a backport ?
Slava,
is this backportable ?

Comment 10 Slava Semushin 2017-12-15 16:04:30 UTC
> Slava, is this backportable ?

It's possible but hard and most likely will require to backport some other changes.

Comment 11 Robert Bost 2018-02-14 15:31:44 UTC
If the backport to 3.6.z is too difficult/risky, would it be possible to identify a workaround?

Comment 12 Simo Sorce 2018-02-14 17:42:44 UTC
The workaround is to create an SCC that will not try to modify the Pod Spec and set it as higher priority I think.

Slava,
do we have any doc on this ?

Comment 13 Slava Semushin 2018-02-14 18:10:35 UTC
> The workaround is to create an SCC

Jordan has suggested and couple users confirmed that instead of creating a new SCC, "privileged" can be used:

https://bugzilla.redhat.com/show_bug.cgi?id=1383707#c20
https://bugzilla.redhat.com/show_bug.cgi?id=1383707#c47
https://bugzilla.redhat.com/show_bug.cgi?id=1383707#c45

And don't forget to revert this priority back later.

> do we have any doc on this ?

On what exactly? How to modify a priority field? By using oc edit/oc patch commands. These are the typical operations:
https://docs.openshift.org/latest/admin_guide/manage_scc.html#ensure-that-admission-attempts-to-use-a-specific-scc-first
https://docs.openshift.org/latest/admin_guide/manage_scc.html#updating-security-context-constraints

Comment 14 Simo Sorce 2018-02-14 18:14:17 UTC
Robert,
is this information sufficient ?


Note You need to log in before you can comment on or make changes to this bug.