Bug 1812860
| Summary: | [UPI]Failed to upgrade from 4.3.5 to 4.4 due to EtcdMemberIPMigrator_Error | |||
|---|---|---|---|---|
| Product: | OpenShift Container Platform | Reporter: | Ke Wang <kewang> | |
| Component: | Etcd Operator | Assignee: | Sam Batschelet <sbatsche> | |
| Status: | CLOSED DUPLICATE | QA Contact: | ge liu <geliu> | |
| Severity: | high | Docs Contact: | ||
| Priority: | unspecified | |||
| Version: | 4.4 | CC: | aos-bugs, mfojtik, nagrawal, wsun | |
| Target Milestone: | --- | Keywords: | Regression, Reopened | |
| Target Release: | 4.5.0 | |||
| Hardware: | x86_64 | |||
| OS: | Linux | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 1813190 (view as bug list) | Environment: | ||
| Last Closed: | 2020-03-23 09:56:07 UTC | Type: | Bug | |
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 1813190 | |||
Tried 4.3.5 ipi on aws env, can upgrade to latest 4.4.0-0.nightly-2020-03-12-082023 successfully without above issue: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.4.0-0.nightly-2020-03-12-082023 True False 7m11s Cluster version is 4.4.0-0.nightly-2020-03-12-082023 $ oc get co --no-headers | grep -v "4.4.0-0.nightly-2020-03-12-082023.*True.*False.*False" Reproduced the bug when upgrade from 4.3.5 to 4.4 nightly build. it blocked upgrade test of 4.3.5 upi on vSphere env. *** This bug has been marked as a duplicate of bug 1812584 *** Per https://bugzilla.redhat.com/show_bug.cgi?id=1812584#c6, I have to reopen it to track on vSphere. In the latest test, upgrade 4.3.5 to 4.4rc.0 passed on vSphere, so move it verify. If once again, will reopen. *** This bug has been marked as a duplicate of bug 1812584 *** I confirmed that above passed case with 3 etcd members, the previous failed with 4 etcd members. Will try it later. |
Description of problem: Failed to upgrade from 4.3.5 to 4.4 due to EtcdMemberIPMigrator_Error S:H/H target release: 4.4 Version-Release number of selected component (if applicable): $ oc version -o yaml clientVersion: buildDate: "2020-03-06T07:29:25Z" compiler: gc gitCommit: 2576e482bf003e34e67ba3d69edcf5d411cfd6f3 gitTreeState: clean gitVersion: 4.4.0-202003060720-2576e48 goVersion: go1.13.4 major: "" minor: "" platform: linux/amd64 openshiftVersion: 4.3.5 serverVersion: buildDate: "2020-03-02T08:50:52Z" compiler: gc gitCommit: b3bfb5a gitTreeState: clean gitVersion: v1.16.2 goVersion: go1.12.12 major: "1" minor: "16" platform: linux/amd64 How reproducible: Always Actual results: $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.3.5 True True 46m Unable to apply 4.4.0-rc.0: the cluster operator etcd is degraded $ oc describe co etcd ... Status: Conditions: Last Transition Time: 2020-03-12T07:31:22Z Message: EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists Reason: EtcdMemberIPMigrator_Error Status: True Type: Degraded Last Transition Time: 2020-03-12T07:31:52Z Message: NodeInstallerProgressing: 3 nodes are at revision 2 ... From above, we can see upgrade was stuck at IPMigrator. $ oc get kubeapiserver cluster -o yaml, shows as below, urls: - https://etcd-0.....openshift.com:2379 - https://etcd-1.....com:2379 - https://etcd-2.....com:2379 Per the bug 1812071, Using IPs instead of dns names after PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/791 merged in. The PR merged date is MAR 11, we checked serverl Jenkins ci triggered upgrade jobs from 4.3 to 4.4, found that upgrade jobs failed with the same error since Mar 11, before this date, upgrade jobs passed. There is a strong possibility that the introduction of PR is the cause the upgrade to fail. Expected results: Upgrade from 4.3 to 4.4 should be passed. Additional info: