Bug 1704686 - Missed 2 etcd-cluster members(3 totally before upgrade) after upgrade
Summary: Missed 2 etcd-cluster members(3 totally before upgrade) after upgrade
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Etcd Operator
Version: 4.1.0
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
: ---
Assignee: Sam Batschelet
QA Contact: ge liu
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-04-30 10:57 UTC by ge liu
Modified: 2019-08-09 15:40 UTC (History)
5 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-09 15:40:03 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)

Description ge liu 2019-04-30 10:57:02 UTC
Description of problem:

upgrade ocp cluster(UPI,Bare Metal), before upgrde, create etcd-operator and etcd-cluster(3 member pods) with single namespace way.
after upgrde, there is only 1 etcd-cluster member pod rest, 2 missed.

got error msg in etcd-operator logs:
time="2019-04-30T09:13:06Z" level=info msg="Finish reconciling" cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T09:13:06Z" level=error msg="failed to reconcile: lost quorum" cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T09:13:14Z" level=info msg="Start reconciling" cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T09:13:14Z" level=info msg="running members: example-6pddvc8zt9" cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T09:13:14Z" level=info msg="cluster membership: example-w2kt6b8llg,example-6pddvc8zt9" cluster-name=example cluster-namespace=lgproj pkg=cluster



How reproducible:
no sure, hit it once

Steps to Reproduce:
1. ocp cluster(UPI,Bare Metal) with payload: 4.1.0-0.nightly-2019-04-25-121505

2. In OperatorHub in web console, create etcd-operator and etcd-cluster with 3 member pods in namespace 'lgproj'(choose single namespace mode)

3. etcd-operator and etcd-cluster pods running well.

4. upgrade to 4.1.0-0.nightly-2019-04-28-064010

5. Check namespace,there is only 1 etcd-cluster pods exist:

# oc get pods
NAME                             READY   STATUS    RESTARTS   AGE
etcd-operator-5d77b5448d-mxfdc   3/3     Running   3          79m
example-6pddvc8zt9               1/1     Running   0          78m
# oc get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
etcd-restore-operator   ClusterIP   172.30.1xx.xx    <none>        19999/TCP           3h26m
example                 ClusterIP   None             <none>        2379/TCP,2380/TCP   3h14m
example-client          ClusterIP   172.30.xx.xx   <none>        2379/TCP            3h14m
# oc get svc
NAME                    TYPE        CLUSTER-IP       EXTERNAL-IP   PORT(S)             AGE
etcd-restore-operator   ClusterIP   172.30.1xx.xx    <none>        19999/TCP           3h39m
example                 ClusterIP   None             <none>        2379/TCP,2380/TCP   3h26m
example-client          ClusterIP   172.30.1xx.1xx   <none>        2379/TCP            3h26m


# oc logs etcd-operator-5d77b5448d-mxfdc -c etcd-operator
time="2019-04-30T08:19:27Z" level=info msg="etcd-operator Version: 0.9.4"
time="2019-04-30T08:19:27Z" level=info msg="Git SHA: c8a1c64"
time="2019-04-30T08:19:27Z" level=info msg="Go Version: go1.11.5"
time="2019-04-30T08:19:27Z" level=info msg="Go OS/Arch: linux/amd64"
time="2019-04-30T08:19:27Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"lgproj\", Name:\"etcd-operator\", UID:\"57e1c4c8-6b0f-11e9-8fb0-801844eeec7c\", APIVersion:\"v1\", ResourceVersion:\"256188\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator-5d77b5448d-mxfdc became leader"
time="2019-04-30T08:19:28Z" level=info msg="start running..." cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T08:19:36Z" level=info msg="Start reconciling" cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T08:19:36Z" level=info msg="running members: example-6pddvc8zt9" cluster-name=example cluster-namespace=lgproj pkg=cluster
time="2019-04-30T08:19:36Z" level=info msg="cluster membership: example-6pddvc8zt9,example-w2kt6b8llg" cluster-name=example cluster-namespace=lgproj pkg=cluster


Actual results:
as title
Expected results:
etcd-cluster pods should be same before and after upgrade

Comment 1 Sam Batschelet 2019-08-09 15:40:03 UTC
We are going to move support for etcd-operator into the new etcd-ha-operator which should go live in 4.3.


Note You need to log in before you can comment on or make changes to this bug.