Bug 1813057 - The cluster operator etcd is degraded during 4.3.5 upgrade on cluster with 20 nodes.
Summary: The cluster operator etcd is degraded during 4.3.5 upgrade on cluster with 20...
Keywords:
Status: CLOSED DUPLICATE of bug 1812584
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd Operator
Version: 4.4
Hardware: Unspecified
OS: Unspecified
unspecified
medium
Target Milestone: ---
: 4.5.0
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-03-12 20:15 UTC by Simon
Modified: 2020-03-13 20:08 UTC (History)
0 users

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-03-13 20:08:26 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description Simon 2020-03-12 20:15:24 UTC
Description of problem:
During upgrade loaded cluster (20 working nodes) from 4.3.5 to 4.4 nightly build etcd operator is degraded

Version-Release number of selected component (if applicable):
4.3.5 -> 4.4.0-0.nightly-2020-03-11-095741

How reproducible:
so far 1 attempt

Steps to Reproduce:
1. create AWS cluster with 20 working nodes:
oc adm release extract --tools quay.io/openshift-release-dev/ocp-release:4.3.5-x86_64

untar files
./openshift-install create install-config


apiVersion: v1
baseDomain: perf-testing.devcluster.openshift.com
compute:
- hyperthreading: Enabled
  name: worker
  platform:
    aws:
      type: m5.xlarge
  replicas: 20
controlPlane:
  hyperthreading: Enabled
  name: master
  platform:
    aws:
      type: m5.xlarge
  replicas: 3
metadata:
  creationTimestamp: null
  name: yoursupercluster
networking:
  clusterNetwork:
  - cidr: 10.128.0.0/14
    hostPrefix: 23
  machineNetwork:
  - cidr: 10.0.0.0/16
  networkType: OpenShiftSDN
  serviceNetwork:
  - 172.30.0.0/16
platform:
  aws:
    region: us-west-2
publish: External
pullSecret: '{pullSectret here}'
sshKey: |
  ssh-rsa key here


./openshift-install create cluster


2. Load cluster with projects (99 projects):

cp $KUBECONFIG ~/.kube/config
git clone https://github.com/openshift/svt.git
cd svt/openshift_scalability/
python cluster-loader.py -f config/pyconfigLoadedProject.yaml 

3. Upgrade cluster to latest nightly build

oc patch clusterversion/version --patch '{"spec":{"upstream":"https://openshift-release.svc.ci.openshift.org/graph"}}' --type=merge
oc adm upgrade --to-image=registry.svc.ci.openshift.org/ocp/release:4.4.0-0.nightly-2020-03-11-095741 --force --allow-explicit-upgrade

4. Check clusteroperators and clusterversion

oc get clusterversions.config.openshift.io && oc get clusteroperators.config.openshift.io

Actual results:

Upgrade stops due etcd is degraded
oc get clusteroperator etcd -o yaml

status:
  conditions:
  - lastTransitionTime: "2020-03-12T17:33:20Z"
    message: 'EtcdMemberIPMigratorDegraded: etcdserver: Peer URLs already exists'
    reason: EtcdMemberIPMigrator_Error
    status: "True"
    type: Degraded


Expected results:

All operators are upgraded

Additional info:

Comment 2 Simon 2020-03-13 13:44:46 UTC
related bug:
https://bugzilla.redhat.com/show_bug.cgi?id=1813190

Comment 4 Sam Batschelet 2020-03-13 20:08:26 UTC

*** This bug has been marked as a duplicate of bug 1812584 ***


Note You need to log in before you can comment on or make changes to this bug.