Bug 1659875

Summary: etcd-operator fail to manage clusters in all namespaces
Product: OpenShift Container Platform Reporter: ge liu <geliu>
Component: EtcdAssignee: Sam Batschelet <sbatsche>
Status: CLOSED CURRENTRELEASE QA Contact: ge liu <geliu>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.1.0CC: jiazha
Target Milestone: ---   
Target Release: 4.1.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1627690 Environment:
Last Closed: 2019-04-03 20:18:48 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 1627690    
Bug Blocks:    

Description ge liu 2018-12-17 05:52:17 UTC
+++ This bug was initially created as a clone of Bug #1627690 +++

Description of problem:
Try to enable etcd-operator managing clusters in all namespaces by add items(
annotations:
    etcd.database.coreos.com/scope: clusterwide
) into etcd-cluster.yaml, then create etcd cluster, but the etcd cluster pods have not be created.

try with etcd-cluster.yaml without these items, the etcd cluster pods started successfully as expected.


openshift v3.11.0-0.32.0

How reproducible:
Always

Steps to Reproduce:
1. Create etcd subscription in project: operator-lifecycle-manager

2. Create etcd cluster with file:

apiVersion: "etcd.database.coreos.com/v1beta2"
kind: "EtcdCluster"
metadata:
  name: "example-etcd-cluster"
  annotations:
    etcd.database.coreos.com/scope: clusterwide
spec:
  size: 3
  version: "3.2.13"

3. # oc create -f etcd-cluster.yaml
etcdcluster.etcd.database.coreos.com/example-etcd-cluster created

4. Check that there is not etcd cluster pods be started
# oc get pods
NAME                                READY     STATUS    RESTARTS   AGE
alm-operator-798c765f5c-npn56       1/1       Running   0          45m
catalog-operator-548958ff7f-45z7w   1/1       Running   0          44m
etcd-operator-7b49974f5b-cq899      3/3       Running   0          2m

5. Check the etcd operator pods and logs:

# oc describe pods etcd-operator-7b49974f5b-cq899
Name:               etcd-operator-7b49974f5b-cq899
Namespace:          operator-lifecycle-manager
Priority:           0
PriorityClassName:  <none>
Node:               qe-juzhao-311-gce-1-master-etcd-1/10.240.0.12
Start Time:         Tue, 11 Sep 2018 08:20:21 +0000
Labels:             name=etcd-operator-alm-owned
                    pod-template-hash=3605530916
Annotations:        openshift.io/scc=restricted
Status:             Running
IP:                 10.128.0.16
Controlled By:      ReplicaSet/etcd-operator-7b49974f5b
Containers:
  etcd-operator:
    Container ID:  docker://5c10b99b9c6f401462836621763a2c298dfa5d3b620a5c4a85871e5fafaa1d24
    Image:         quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2
    Image ID:      docker-pullable://quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2
    Port:          <none>
    Host Port:     <none>
    Command:
      etcd-operator
      --create-crd=false
    State:          Running
      Started:      Tue, 11 Sep 2018 08:20:24 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      MY_POD_NAMESPACE:  operator-lifecycle-manager (v1:metadata.namespace)
      MY_POD_NAME:       etcd-operator-7b49974f5b-cq899 (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from etcd-operator-token-68b99 (ro)
  etcd-backup-operator:
    Container ID:  docker://9cf63973559c77410d6571a7e914c55e6d4174be87807b32ac47e4b94d560648
    Image:         quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2
    Image ID:      docker-pullable://quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2
    Port:          <none>
    Host Port:     <none>
    Command:
      etcd-backup-operator
      --create-crd=false
    State:          Running
      Started:      Tue, 11 Sep 2018 08:20:24 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      MY_POD_NAMESPACE:  operator-lifecycle-manager (v1:metadata.namespace)
      MY_POD_NAME:       etcd-operator-7b49974f5b-cq899 (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from etcd-operator-token-68b99 (ro)
  etcd-restore-operator:
    Container ID:  docker://82f39a6a7ad91917c3a8bf9a3d80267fd1216e3f06cad44cc89531ab3d55fe2a
    Image:         quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2
    Image ID:      docker-pullable://quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2
    Port:          <none>
    Host Port:     <none>
    Command:
      etcd-restore-operator
      --create-crd=false
    State:          Running
      Started:      Tue, 11 Sep 2018 08:20:24 +0000
    Ready:          True
    Restart Count:  0
    Environment:
      MY_POD_NAMESPACE:  operator-lifecycle-manager (v1:metadata.namespace)
      MY_POD_NAME:       etcd-operator-7b49974f5b-cq899 (v1:metadata.name)
    Mounts:
      /var/run/secrets/kubernetes.io/serviceaccount from etcd-operator-token-68b99 (ro)
Conditions:
  Type              Status
  Initialized       True 
  Ready             True 
  ContainersReady   True 
  PodScheduled      True 
Volumes:
  etcd-operator-token-68b99:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  etcd-operator-token-68b99
    Optional:    false
QoS Class:       BestEffort
Node-Selectors:  <none>
Tolerations:     <none>
Events:
  Type    Reason     Age   From                                        Message
  ----    ------     ----  ----                                        -------
  Normal  Scheduled  3m    default-scheduler                           Successfully assigned operator-lifecycle-manager/etcd-operator-7b49974f5b-cq899 to qe-juzhao-311-gce-1-master-etcd-1
  Normal  Pulled     3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Container image "quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2" already present on machine
  Normal  Created    3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Created container
  Normal  Started    3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Started container
  Normal  Pulled     3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Container image "quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2" already present on machine
  Normal  Created    3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Created container
  Normal  Started    3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Started container
  Normal  Pulled     3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Container image "quay.io/coreos/etcd-operator@sha256:c0301e4686c3ed4206e370b42de5a3bd2229b9fb4906cf85f3f30650424abec2" already present on machine
  Normal  Created    3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Created container
  Normal  Started    3m    kubelet, qe-juzhao-311-gce-1-master-etcd-1  Started container

# oc logs etcd-operator-7b49974f5b-cq899 -c etcd-operator
time="2018-09-11T08:20:24Z" level=info msg="etcd-operator Version: 0.9.2"
time="2018-09-11T08:20:24Z" level=info msg="Git SHA: a0032c1f"
time="2018-09-11T08:20:24Z" level=info msg="Go Version: go1.10"
time="2018-09-11T08:20:24Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-09-11T08:20:41Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"operator-lifecycle-manager\", Name:\"etcd-operator\", UID:\"3ffff76d-b59b-11e8-b1e8-42010af0000c\", APIVersion:\"v1\", ResourceVersion:\"81741\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator-7b49974f5b-cq899 became leader"
[root@qe-juzhao-311-gce-1-master-etcd-1 tmp]# oc logs etcd-operator-7b49974f5b-cq899 -c etcd-operator -f
time="2018-09-11T08:20:24Z" level=info msg="etcd-operator Version: 0.9.2"
time="2018-09-11T08:20:24Z" level=info msg="Git SHA: a0032c1f"
time="2018-09-11T08:20:24Z" level=info msg="Go Version: go1.10"
time="2018-09-11T08:20:24Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-09-11T08:20:41Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"operator-lifecycle-manager\", Name:\"etcd-operator\", UID:\"3ffff76d-b59b-11e8-b1e8-42010af0000c\", APIVersion:\"v1\", ResourceVersion:\"81741\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator-7b49974f5b-cq899 became leader"
^C
[root@qe-juzhao-311-gce-1-master-etcd-1 tmp]# oc logs etcd-operator-7b49974f5b-cq899 -c etcd-operator -f -c etcd-backup-operator
time="2018-09-11T08:20:24Z" level=info msg="Go Version: go1.10"
time="2018-09-11T08:20:24Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-09-11T08:20:24Z" level=info msg="etcd-backup-operator Version: 0.9.2"
time="2018-09-11T08:20:24Z" level=info msg="Git SHA: a0032c1f"
time="2018-09-11T08:20:41Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"operator-lifecycle-manager\", Name:\"etcd-backup-operator\", UID:\"494f0e20-b59a-11e8-b1e8-42010af0000c\", APIVersion:\"v1\", ResourceVersion:\"81744\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator-7b49974f5b-cq899 became leader"
time="2018-09-11T08:20:41Z" level=info msg="starting backup controller" pkg=controller
^C
[root@qe-juzhao-311-gce-1-master-etcd-1 tmp]# oc logs etcd-operator-7b49974f5b-cq899 -c etcd-operator -f -c etcd-restore-operator
time="2018-09-11T08:20:24Z" level=info msg="Go Version: go1.10"
time="2018-09-11T08:20:24Z" level=info msg="Go OS/Arch: linux/amd64"
time="2018-09-11T08:20:24Z" level=info msg="etcd-restore-operator Version: 0.9.2"
time="2018-09-11T08:20:24Z" level=info msg="Git SHA: a0032c1f"
time="2018-09-11T08:20:42Z" level=info msg="listening on 0.0.0.0:19999"
time="2018-09-11T08:20:42Z" level=info msg="Event(v1.ObjectReference{Kind:\"Endpoints\", Namespace:\"operator-lifecycle-manager\", Name:\"etcd-restore-operator\", UID:\"4985b50b-b59a-11e8-b1e8-42010af0000c\", APIVersion:\"v1\", ResourceVersion:\"81747\", FieldPath:\"\"}): type: 'Normal' reason: 'LeaderElection' etcd-operator-7b49974f5b-cq899 became leader"
time="2018-09-11T08:20:42Z" level=info msg="starting restore controller" pkg=controller



6. Try with etcd-cluster.yaml without 'clusterwide' setting, it works well as expected


Actual results:

As title

Expected results:

etcd cluster pods should be started with setting 'clusterwide' items in etcd-cluster.yaml

Comment 1 ge liu 2018-12-17 05:54:00 UTC
$ oc version
oc v4.0.0-alpha.0+3d38233-777
kubernetes v1.11.0+3d38233
features: Basic-Auth GSSAPI Kerberos SPNEGO

Server https://juzhao7-api.origin-ci-int-aws.dev.rhcloud.com:6443
kubernetes v1.11.0+3d38233


bin/openshift-install v0.7.0-master-4-ga4e426ee762c20019bbb90fe35d33c9b26d23393

Comment 3 ge liu 2019-03-04 07:20:13 UTC
Is there any update for this issue? I tried it in latest 4.0 build, still exist. thx

Comment 6 Jian Zhang 2019-03-21 06:24:30 UTC
> Try with etcd-cluster.yaml without 'clusterwide' setting, it works well as expected

I submit a PR to fix it: https://github.com/operator-framework/community-operators/pull/206

Comment 7 ge liu 2019-04-02 02:06:06 UTC
This issue is fixed in v0.9.4, the clusterwide seting is works well.