1813057 – The cluster operator etcd is degraded during 4.3.5 upgrade on cluster with 20 nodes.

Bug 1813057 - The cluster operator etcd is degraded during 4.3.5 upgrade on cluster with 20 nodes.

Summary: The cluster operator etcd is degraded during 4.3.5 upgrade on cluster with 20...

Keywords:
Status:	CLOSED DUPLICATE of bug 1812584
Alias:	None
Product:	OpenShift Container Platform
Classification:	Red Hat
Component:	Etcd Operator
Sub Component:
Version:	4.4
Hardware:	Unspecified
OS:	Unspecified
Priority:	unspecified
Severity:	medium
Target Milestone:	---
Target Release:	4.5.0
Assignee:	Sam Batschelet
QA Contact:	ge liu
Docs Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+	depends on / blocked

Reported:	2020-03-12 20:15 UTC by Simon
Modified:	2020-03-13 20:08 UTC (History)
CC List:	0 users
Fixed In Version:
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2020-03-13 20:08:26 UTC
Target Upstream Version:
Embargoed:

Attachments	(Terms of Use)

Description Simon 2020-03-12 20:15:24 UTC

Description of problem:
During upgrade loaded cluster (20 working nodes) from 4.3.5 to 4.4 nightly build etcd operator is degraded

Version-Release number of selected component (if applicable):
4.3.5 -> 4.4.0-0.nightly-2020-03-11-095741

How reproducible:
so far 1 attempt

Steps to Reproduce:
1. create AWS cluster with 20 working nodes:
oc adm release extract --tools quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64

untar files
./openshift-install create install-config


apiVersion: v1
baseDomain: perf-testing.devcluster.openshift.com
compute:
- hyperthreading: Enabled
  name: worker
  platform:
    aws:
      type: m5.xlarge
  replicas: 20
controlPlane:
  hyperthreading: Enabled
  name: master
  platform:
    aws:
      type: m5.xlarge
  replicas: 3
metadata:
  creationTimestamp: null
  name: yoursupercluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-west-2
publish: External
pullSecret: '{pullSectret here}'
sshKey: |
  ssh-rsa key here


./openshift-install create cluster


2. Load cluster with projects (99 projects):

cp $KUBECONFIG ~/.kube/config
git clone https://github.com/openshift/svt.git
cd svt/openshift_scalability/
python cluster-loader.py -f config/pyconfigLoadedProject.yaml 

3. Upgrade cluster to latest nightly build

oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-03-11-095741 --force --allow-explicit-upgrade

4. Check clusteroperators and clusterversion

oc get clusterversions.config.openshift.io && oc get clusteroperators.config.openshift.io

Actual results:

Upgrade stops due etcd is degraded
oc get clusteroperator etcd -o yaml

status:
  conditions:
  - lastTransitionTime: "2020-03-12T17:33:20Z"
    message: 'EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists'
    reason: EtcdMemberIPMigrator_Error
    status: "True"
    type: Degraded


Expected results:

All operators are upgraded

Additional info:

Comment 2 Simon 2020-03-13 13:44:46 UTC

related bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1813190

Comment 4 Sam Batschelet 2020-03-13 20:08:26 UTC


*** This bug has been marked as a duplicate of bug 1812584 ***

Note You need to log in before you can comment on or make changes to this bug.