Bug 1428229 - fail to upgrade ocp3.4 to 3.5 while petset created in cluster
Summary: fail to upgrade ocp3.4 to 3.5 while petset created in cluster
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Cluster Version Operator
Version: 3.5.0
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: ---
Assignee: Tim Bielawa
QA Contact: Anping Li
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2017-03-02 05:26 UTC by liujia
Modified: 2017-07-24 14:11 UTC (History)
7 users (show)

Fixed In Version: openshift-ansible-3.5.30-1
Doc Type: Bug Fix
Doc Text:
Cause: K8s resources which are not supported by Red Hat were in an OCP cluster. During the upgrade from 3.4 to 3.5 the k8s resources were deprecated and replaced with a new (unsupported) resource: StatefulSets. Automatic migration is not possible from PetSets to StatefulSets. Consequence: The upgrade will fail because the unsupported resources can not be automatically migrated. Fix: An additional validation step was added to the pre-upgrade validation playbook. PetSets are searched for in the cluster. Result: If any existing PetSets are detected the installation errors and quits. The user is given an information message (including documentation references) describing: what went wrong, why, and what the users choices are for continuing the upgrade without migrating PetSets.
Clone Of:
Environment:
Last Closed: 2017-04-12 19:03:02 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
The upgrade logs when use petset (170.88 KB, application/x-gzip)
2017-03-09 04:59 UTC, Anping Li
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHBA-2017:0903 0 normal SHIPPED_LIVE OpenShift Container Platform atomic-openshift-utils bug fix and enhancement 2017-04-12 22:45:42 UTC

Description liujia 2017-03-02 05:26:01 UTC
Description of problem:
Upgrade OCP3.4 to 3.5 while petset created in cluster due to petset is not supported in 3.5(Kubernetes version 1.5).

    fatal: [x.x.x.x -> x.x.x.x]: FAILED! => {
        "changed": true,
        "cmd": [
            "/usr/local/bin/oadm",
            "drain",
            "ip-172-18-14-148.ec2.internal",
            "--force",
            "--delete-local-data",
            "--ignore-daemonsets"
        ],
        "delta": "0:00:04.205678",
        "end": "2017-03-01 05:53:54.861837",
        "failed": true,
        "invocation": {
            "module_args": {
                "_raw_params": "/usr/local/bin/oadm drain ip-172-18-14-148.ec2.internal --force --delete-local-data --ignore-daemonsets",
                "_uses_shell": false,
                "chdir": null,
                "creates": null,
                "executable": null,
                "removes": null,
                "warn": true
            },
            "module_name": "command"
        },
        "rc": 1,
        "start": "2017-03-01 05:53:50.656159",
        "warnings": []
    }
     
    STDOUT:
     
    node "ip-172-18-14-148.ec2.internal" already cordoned
     
     
    STDERR:
     
    error: Unknown controller kind "PetSet": hello-petset-1, hello-petset-1



Version-Release number of selected component (if applicable):
atomic-openshift-utils-3.5.17-1.git.0.561702e.el7.noarch

How reproducible:
always

Steps to Reproduce:
1. Install ocp 3.4.
2. Create a petset. 
3. Upgrade ocp3.4 to 3.5.

Actual results:
Fail to upgrade at task [Drain Node for Kubelet upgrade].

Expected results:
Upgrade with petset successfully

Additional info:
https://kubernetes.io/docs/tasks/manage-stateful-set/upgrade-pet-set-to-stateful-set/

Comment 1 Scott Dodson 2017-03-06 19:30:55 UTC
Clayton, what should we be doing about pet sets during upgrades where we drain nodes?

Comment 2 Tim Bielawa 2017-03-06 21:47:10 UTC
Asking around on aos-devel to get broader input on this.

Comment 4 Tim Bielawa 2017-03-07 16:03:05 UTC
Consensus from the mailing list:

Migrating petsets to stateful sets is out of scope and will not be supported in the OpenShift-Ansible 3.4->3.5 upgrade playbooks.

These features were never officially supported, no statements about them ever provided. Furthermore, petsets were only ever ALPHA status in their supported kube version. As StatefulSets are still beta resources in kube 1.5 they may also not be supported in version migrations when 3.6 is released (pending future official support communications, of course).

I am working on a patch now that runs during upgrade pre-validation which will detect existing petsets. Users will be given a helpful message, clarifying non-support of the petset feature, and a reference to the official kube migration docs will be provided.

Comment 5 Tim Bielawa 2017-03-07 22:00:40 UTC
Merged into master. Please re-test.

Comment 7 Anping Li 2017-03-09 04:59:58 UTC
Created attachment 1261429 [details]
The upgrade logs when use petset

Failed to skip petset

1. oc create -f https://raw.githubusercontent.com/openshift-qe/v3-testfiles/b90a5a05c6af96b8e94085822e723ef7be57fe5b/petset/hello-petset.yaml

2. run upgrade.yml
fatal: [openshift-225.lab.eng.nay.redhat.com -> openshift-223.lab.eng.nay.redhat.com]: FAILED! => {
    "changed": true,
    "cmd": [
        "/usr/local/bin/oadm",
        "drain",
        "openshift-225.lab.eng.nay.redhat.com",
        "--force",
        "--delete-local-data",
        "--ignore-daemonsets"
    ],
    "delta": "0:00:00.350344",
    "end": "2017-03-08 23:32:31.383682",
    "failed": true,
    "invocation": {
        "module_args": {
            "_raw_params": "/usr/local/bin/oadm drain openshift-225.lab.eng.nay.redhat.com --force --delete-local-data --ignore-daemonsets",
            "_uses_shell": false,
            "chdir": null,
            "creates": null,
            "executable": null,
            "removes": null,
            "warn": true
        },
        "module_name": "command"
    },
    "rc": 1,
    "start": "2017-03-08 23:32:31.033338",
    "warnings": []
}

STDOUT:

node "openshift-225.lab.eng.nay.redhat.com" already cordoned


STDERR:

error: Unknown controller kind "PetSet": hello-petset-0, hello-petset-0, hello-petset-1, hello-petset-1

Comment 8 Tim Bielawa 2017-03-10 19:01:23 UTC
I believe I've fixed the issue you ran into. I had an incorrect comparison test in the original PR. This works better

https://github.com/openshift/openshift-ansible/pull/3623

The other issue you see mentioned in there is unrelated to my changes.

Comment 9 Anping Li 2017-03-13 02:01:05 UTC
Yes, I can see the fixed task was executed. But not sure why  "skipped" is true for TASK [FAIL ON Resource migration 'PetSets' unsupported] ****

TASK [Check if legacy PetSets exist] *******************************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/v3_5/validator.yml:31
Using module file /usr/share/ansible/openshift-ansible/roles/lib_openshift/library/oc_obj.py
<openshift-223.lab.eng.nay.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<openshift-223.lab.eng.nay.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r openshift-223.lab.eng.nay.redhat.com '/bin/sh -c '"'"'( umask 77 && mkdir -p "` echo ~/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639 `" && echo ansible-tmp-1489033498.92-73057137625639="` echo ~/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639 `" ) && sleep 0'"'"''
<openshift-223.lab.eng.nay.redhat.com> PUT /tmp/tmphkwjKv TO /root/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639/oc_obj.py
<openshift-223.lab.eng.nay.redhat.com> SSH: EXEC sftp -b - -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r '[openshift-223.lab.eng.nay.redhat.com]'
<openshift-223.lab.eng.nay.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<openshift-223.lab.eng.nay.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r openshift-223.lab.eng.nay.redhat.com '/bin/sh -c '"'"'chmod u+x /root/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639/ /root/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639/oc_obj.py && sleep 0'"'"''
<openshift-223.lab.eng.nay.redhat.com> ESTABLISH SSH CONNECTION FOR USER: root
<openshift-223.lab.eng.nay.redhat.com> SSH: EXEC ssh -C -o ControlMaster=auto -o ControlPersist=60s -o KbdInteractiveAuthentication=no -o PreferredAuthentications=gssapi-with-mic,gssapi-keyex,hostbased,publickey -o PasswordAuthentication=no -o User=root -o ConnectTimeout=10 -o ControlPath=/root/.ansible/cp/ansible-ssh-%h-%p-%r -tt openshift-223.lab.eng.nay.redhat.com '/bin/sh -c '"'"'/usr/bin/python /root/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639/oc_obj.py; rm -rf "/root/.ansible/tmp/ansible-tmp-1489033498.92-73057137625639/" > /dev/null 2>&1 && sleep 0'"'"''
ok: [openshift-223.lab.eng.nay.redhat.com] => {
    "changed": false, 
    "invocation": {
        "module_args": {
            "all_namespaces": true, 
            "content": null, 
            "debug": false, 
            "delete_after": false, 
            "files": null, 
            "force": false, 
            "kind": "petsets", 
            "kubeconfig": "/etc/origin/master/admin.kubeconfig", 
            "name": null, 
            "namespace": "default", 
            "selector": null, 
            "state": "list"
        }, 
        "module_name": "oc_obj"
    }, 
    "results": {
        "cmd": "/usr/local/bin/oc get petsets -o json --all-namespaces", 
        "results": [
            {
                "apiVersion": "v1", 
                "items": [
                    {
                        "apiVersion": "apps/v1alpha1", 
                        "kind": "PetSet", 
                        "metadata": {
                            "creationTimestamp": "2017-03-09T03:29:45Z", 
                            "generation": 1, 
                            "labels": {
                                "app": "hello-pod"
                            }, 
                            "name": "hello-petset", 
                            "namespace": "default", 
                            "resourceVersion": "5629", 
                            "selfLink": "/apis/apps/v1alpha1/namespaces/default/petsets/hello-petset", 
                            "uid": "a42a35a6-0478-11e7-932f-fa163e30eba3"
                        }, 
                        "spec": {
                            "replicas": 2, 
                            "selector": {
                                "matchLabels": {
                                    "app": "hello-pod"
                                }
                            }, 
                            "serviceName": "foo", 
                            "template": {
                                "metadata": {
                                    "annotations": {
                                        "pod.alpha.kubernetes.io/initialized": "true"
                                    }, 
                                    "creationTimestamp": null, 
                                    "labels": {
                                        "app": "hello-pod"
                                    }
                                }, 
                                "spec": {
                                    "containers": [
                                        {
                                            "image": "openshift/hello-openshift:latest", 
                                            "imagePullPolicy": "IfNotPresent", 
                                            "name": "hello-pod", 
                                            "ports": [
                                                {
                                                    "containerPort": 8080, 
                                                    "protocol": "TCP"
                                                }
                                            ], 
                                            "resources": {}, 
                                            "securityContext": {
                                                "capabilities": {}, 
                                                "privileged": false
                                            }, 
                                            "terminationMessagePath": "/dev/termination-log", 
                                            "volumeMounts": [
                                                {
                                                    "mountPath": "/tmp", 
                                                    "name": "tmp"
                                                }
                                            ]
                                        }
                                    ], 
                                    "dnsPolicy": "ClusterFirst", 
                                    "restartPolicy": "Always", 
                                    "securityContext": {}, 
                                    "terminationGracePeriodSeconds": 0, 
                                    "volumes": [
                                        {
                                            "emptyDir": {}, 
                                            "name": "tmp"
                                        }
                                    ]
                                }
                            }
                        }, 
                        "status": {
                            "replicas": 2
                        }
                    }
                ], 
                "kind": "List", 
                "metadata": {}
            }
        ], 
        "returncode": 0
    }, 
    "state": "list"
}

TASK [FAIL ON Resource migration 'PetSets' unsupported] ************************
task path: /usr/share/ansible/openshift-ansible/playbooks/common/openshift-cluster/upgrades/v3_5/validator.yml:38
skipping: [openshift-223.lab.eng.nay.redhat.com] => {
    "changed": false, 
    "skip_reason": "Conditional check failed", 
    "skipped": true
}

Comment 10 Scott Dodson 2017-03-13 18:09:46 UTC
https://github.com/openshift/openshift-ansible/pull/3638 Backported to release-1.5

The task name referenced in comment 9 no longer exists so I think we should test with a new build.

Comment 11 Scott Dodson 2017-03-13 18:11:58 UTC
fixed in openshift-ansible-3.5.30-1

Comment 12 Anping Li 2017-03-14 03:27:27 UTC
TASK [Fail on unsupported resource migration 'PetSets'] ************************
fatal: [openshift-224.lab.eng.nay.redhat.com]: FAILED! => {
    "changed": false, 
    "failed": true
}

MSG:

PetSet objects were detected in your cluster. These are an Alpha feature in upstream Kubernetes 1.4 and are not supported by Red Hat. In Kubernetes 1.5, they are replaced by the Beta feature StatefulSets. Red Hat currently does not offer support for either PetSets or StatefulSets.
Automatically migrating PetSets to StatefulSets in OpenShift Container Platform (OCP) 3.5 is not supported. See the Kubernetes "Upgrading from PetSets to StatefulSets" documentation for additional information:
https://kubernetes.io/docs/tasks/manage-stateful-set/upgrade-pet-set-to-stateful-set/
PetSets MUST be removed before upgrading to OCP 3.5. Red Hat strongly recommends reading the above referenced documentation in its entirety before taking any destructive actions.
If you want to simply remove all PetSets without manually migrating to StatefulSets, run this command as a user with cluster-admin privileges:
$ oc get petsets --all-namespaces -o yaml | oc delete -f - --cascade=false

	to retry, use: --limit @/usr/share/ansible/openshift-ansible/playbooks/byo/openshift-cluster/upgrades/v3_5/upgrade.retry

PLAY RECAP *********************************************************************
localhost                  : ok=10   changed=0    unreachable=0    failed=0   
openshift-223.lab.eng.nay.redhat.com : ok=103  changed=9    unreachable=0    failed=0   
openshift-224.lab.eng.nay.redhat.com : ok=131  changed=10   unreachable=0    failed=1

Comment 14 errata-xmlrpc 2017-04-12 19:03:02 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2017:0903


Note You need to log in before you can comment on or make changes to this bug.