Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1813057

Summary: The cluster operator etcd is degraded during 4.3.5 upgrade on cluster with 20 nodes.
Product: OpenShift Container Platform Reporter: Simon <skordas>
Component: Etcd OperatorAssignee: Sam Batschelet <sbatsche>
Status: CLOSED DUPLICATE QA Contact: ge liu <geliu>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.4   
Target Milestone: ---   
Target Release: 4.5.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-03-13 20:08:26 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Simon 2020-03-12 20:15:24 UTC
Description of problem:
During upgrade loaded cluster (20 working nodes) from 4.3.5 to 4.4 nightly build etcd operator is degraded

Version-Release number of selected component (if applicable):
4.3.5 -> 4.4.0-0.nightly-2020-03-11-095741

How reproducible:
so far 1 attempt

Steps to Reproduce:
1. create AWS cluster with 20 working nodes:
oc adm release extract --tools quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64

untar files
./openshift-install create install-config


apiVersion: v1
baseDomain: perf-testing.devcluster.openshift.com
compute:
- hyperthreading: Enabled
  name: worker
  platform:
    aws:
      type: m5.xlarge
  replicas: 20
controlPlane:
  hyperthreading: Enabled
  name: master
  platform:
    aws:
      type: m5.xlarge
  replicas: 3
metadata:
  creationTimestamp: null
  name: yoursupercluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-west-2
publish: External
pullSecret: '{pullSectret here}'
sshKey: |
  ssh-rsa key here


./openshift-install create cluster


2. Load cluster with projects (99 projects):

cp $KUBECONFIG ~/.kube/config
git clone https://github.com/openshift/svt.git
cd svt/openshift_scalability/
python cluster-loader.py -f config/pyconfigLoadedProject.yaml 

3. Upgrade cluster to latest nightly build

oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-03-11-095741 --force --allow-explicit-upgrade

4. Check clusteroperators and clusterversion

oc get clusterversions.config.openshift.io && oc get clusteroperators.config.openshift.io

Actual results:

Upgrade stops due etcd is degraded
oc get clusteroperator etcd -o yaml

status:
  conditions:
  - lastTransitionTime: "2020-03-12T17:33:20Z"
    message: 'EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists'
    reason: EtcdMemberIPMigrator_Error
    status: "True"
    type: Degraded


Expected results:

All operators are upgraded

Additional info:

Comment 2 Simon 2020-03-13 13:44:46 UTC
related bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1813190

Comment 4 Sam Batschelet 2020-03-13 20:08:26 UTC

*** This bug has been marked as a duplicate of bug 1812584 ***