Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1812860

Summary: [UPI]Failed to upgrade from 4.3.5 to 4.4 due to EtcdMemberIPMigrator_Error
Product: OpenShift Container Platform Reporter: Ke Wang <kewang>
Component: Etcd OperatorAssignee: Sam Batschelet <sbatsche>
Status: CLOSED DUPLICATE QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.4CC: aos-bugs, mfojtik, nagrawal, wsun
Target Milestone: ---Keywords: Regression, Reopened
Target Release: 4.5.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of:
: 1813190 (view as bug list) Environment:
Last Closed: 2020-03-23 09:56:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1813190    

Description Ke Wang 2020-03-12 11:07:44 UTC
Description of problem:
Failed to upgrade from 4.3.5 to 4.4 due to EtcdMemberIPMigrator_Error

S:H/H
target release: 4.4

Version-Release number of selected component (if applicable):
$ oc version -o yaml
clientVersion:
  buildDate: "2020-03-06T07:29:25Z"
  compiler: gc
  gitCommit: 2576e482bf003e34e67ba3d69edcf5d411cfd6f3
  gitTreeState: clean
  gitVersion: 4.4.0-202003060720-2576e48
  goVersion: go1.13.4
  major: ""
  minor: ""
  platform: linux/amd64
openshiftVersion: 4.3.5
serverVersion:
  buildDate: "2020-03-02T08:50:52Z"
  compiler: gc
  gitCommit: b3bfb5a
  gitTreeState: clean
  gitVersion: v1.16.2
  goVersion: go1.12.12
  major: "1"
  minor: "16"
  platform: linux/amd64

How reproducible:
Always

Actual results:
$ oc get clusterversion
NAME      VERSION   AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.3.5     True        True          46m     Unable to apply 4.4.0-rc.0: the cluster operator etcd is degraded

$ oc describe co etcd
...
Status:
  Conditions:
    Last Transition Time:  2020-03-12T07:31:22Z
    Message:               EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists
    Reason:                EtcdMemberIPMigrator_Error
    Status:                True
    Type:                  Degraded
    Last Transition Time:  2020-03-12T07:31:52Z
    Message:               NodeInstallerProgressing: 3 nodes are at revision 2
...

From above, we can see upgrade was stuck at IPMigrator.

$ oc get kubeapiserver cluster -o yaml, shows as below,
      urls:
      - https://etcd-0.....openshift.com:2379
      - https://etcd-1.....com:2379
      - https://etcd-2.....com:2379
     
Per the bug 1812071, Using IPs instead of dns names after PR https://github.com/openshift/cluster-kube-apiserver-operator/pull/791 merged in. The PR merged date is MAR 11, we checked serverl Jenkins ci triggered upgrade jobs from 4.3 to 4.4, found that upgrade jobs failed with the same error since Mar 11, before this date, upgrade jobs passed. There is a strong possibility that the introduction of PR is the cause the upgrade to fail.

     
Expected results:
Upgrade from 4.3 to 4.4 should be passed.

Additional info:

Comment 3 Xingxing Xia 2020-03-12 13:00:38 UTC
Tried 4.3.5 ipi on aws env, can upgrade to latest 4.4.0-0.nightly-2020-03-12-082023 successfully without above issue:
$ oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.4.0-0.nightly-2020-03-12-082023   True        False         7m11s   Cluster version is 4.4.0-0.nightly-2020-03-12-082023

$ oc get co --no-headers | grep -v "4.4.0-0.nightly-2020-03-12-082023.*True.*False.*False"

Comment 4 Wei Sun 2020-03-12 13:10:34 UTC
Per the comment 3, removing testblocker keyword

Comment 6 Ke Wang 2020-03-13 08:03:11 UTC
Reproduced the bug when upgrade from 4.3.5 to 4.4 nightly build.

Comment 7 Ke Wang 2020-03-13 08:12:15 UTC
 it blocked upgrade test of 4.3.5 upi on vSphere env.

Comment 9 Sam Batschelet 2020-03-13 20:40:18 UTC

*** This bug has been marked as a duplicate of bug 1812584 ***

Comment 10 Ke Wang 2020-03-17 09:51:20 UTC
Per https://bugzilla.redhat.com/show_bug.cgi?id=1812584#c6, I have to reopen it to track on vSphere.

Comment 11 Ke Wang 2020-03-23 09:43:03 UTC
In the latest test, upgrade 4.3.5 to 4.4rc.0 passed on vSphere, so move it verify. If once again, will reopen.

Comment 12 Ke Wang 2020-03-23 09:56:07 UTC

*** This bug has been marked as a duplicate of bug 1812584 ***

Comment 13 Ke Wang 2020-03-23 09:57:41 UTC
I confirmed that above passed case with 3 etcd members, the previous failed with 4 etcd members. Will try it later.