Bug 1857301 - kubemacpool - At some point kubemacpool-leader is not defined, kmp service is down
Summary: kubemacpool - At some point kubemacpool-leader is not defined, kmp service is...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Container Native Virtualization (CNV)
Classification: Red Hat
Component: Networking
Version: 2.4.1
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: 2.4.1
Assignee: Petr Horáček
QA Contact: Meni Yakove
URL:
Whiteboard:
: 1868403 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2020-07-15 15:52 UTC by Ruth Netser
Modified: 2023-12-15 18:31 UTC (History)
4 users (show)

Fixed In Version: kubemacpool-container-v2.4.1-2
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2020-09-03 20:31:08 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
Pod log (1.04 MB, text/plain)
2020-07-15 15:52 UTC, Ruth Netser
no flags Details
Pod log (1.89 KB, text/plain)
2020-07-15 15:52 UTC, Ruth Netser
no flags Details
Pods yaml,log and describe files (18.16 KB, application/x-xz)
2020-08-06 15:29 UTC, Ruth Netser
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Github k8snetworkplumbingwg kubemacpool pull 239 0 None closed [release-0.14] Set number of replicas to 1 and drop leader election 2021-01-12 19:39:32 UTC
Red Hat Issue Tracker CNV-10893 0 None None None 2023-12-15 18:31:41 UTC
Red Hat Product Errata RHBA-2020:3629 0 None None None 2020-09-03 20:31:20 UTC

Description Ruth Netser 2020-07-15 15:52:13 UTC
Created attachment 1701254 [details]
Pod log

Description of problem:
On a cluster which was deployed successfully and was functioning OK, kubemacpool-leader is not defined, kmp service is down (no endpoint is associated with it).
Both kubemacpool-mac-controller-manager pods are running.

Deleting both pods solved the problem

Version-Release number of selected component (if applicable):
CNV 2.4.0

How reproducible:
Unknown

Steps to Reproduce:
1. No specific steps.


Actual results:
kmp service is down, fail to create a VM

Expected results:
leader should always be assigned.

Additional info:
=======================
KMP logs attached.

=======================
$ oc describe Service -n openshift-cnv kubemacpool-service
Name:              kubemacpool-service
Namespace:         openshift-cnv
Labels:            networkaddonsoperator.network.kubevirt.io/version=sha256_ecbcbe6e8ed9015ed23aa3a93440fc3f4728ee79b97c1cfcf9152d05
Annotations:       <none>
Selector:          kubemacpool-leader=true
Type:              ClusterIP
IP:                172.30.137.156
Port:              <unset>  443/TCP
TargetPort:        8000/TCP
Endpoints:         <none>
Session Affinity:  None
Events:            <none>


=======================
$ oc get pods -A| grep kubemac 
openshift-cnv                                      kubemacpool-mac-controller-manager-865d98484c-7hkhc               1/1     Running     0          3h58m
openshift-cnv                                      kubemacpool-mac-controller-manager-865d98484c-8vpvp               1/1     Running     0          3h50m


=======================
$ oc get pods -n openshift-cnv -l kubemacpool-leader=true
No resources found in openshift-cnv namespace.


=======================
$ oc get Service -n openshift-cnv kubemacpool-service -oyaml
apiVersion: v1
kind: Service
metadata:
  creationTimestamp: "2020-07-14T10:34:12Z"
  labels:
    networkaddonsoperator.network.kubevirt.io/version: sha256_ecbcbe6e8ed9015ed23aa3a93440fc3f4728ee79b97c1cfcf9152d05
  managedFields:
  - apiVersion: v1
    fieldsType: FieldsV1
    fieldsV1:
      f:metadata:
        f:labels:
          .: {}
          f:networkaddonsoperator.network.kubevirt.io/version: {}
        f:ownerReferences:
          .: {}
          k:{"uid":"f5b53444-eae9-4810-8252-94739af6cfb8"}:
            .: {}
            f:apiVersion: {}
            f:blockOwnerDeletion: {}
            f:controller: {}
            f:kind: {}
            f:name: {}
            f:uid: {}
      f:spec:
        f:ports:
          .: {}
          k:{"port":443,"protocol":"TCP"}:
            .: {}
            f:port: {}
            f:protocol: {}
            f:targetPort: {}
        f:publishNotReadyAddresses: {}
        f:selector:
          .: {}
          f:kubemacpool-leader: {}
        f:sessionAffinity: {}
        f:type: {}
    manager: cluster-network-addons-operator
    operation: Update
    time: "2020-07-14T10:34:12Z"
  name: kubemacpool-service
  namespace: openshift-cnv
  ownerReferences:
  - apiVersion: networkaddonsoperator.network.kubevirt.io/v1alpha1
    blockOwnerDeletion: true
    controller: true
    kind: NetworkAddonsConfig
    name: cluster
    uid: f5b53444-eae9-4810-8252-94739af6cfb8
  resourceVersion: "46753"
  selfLink: /api/v1/namespaces/openshift-cnv/services/kubemacpool-service
  uid: 0eb110eb-898d-43fa-b09c-4f669add5c19
spec:
  clusterIP: 172.30.137.156
  ports:
  - port: 443
    protocol: TCP
    targetPort: 8000
  publishNotReadyAddresses: true
  selector:
    kubemacpool-leader: "true"
  sessionAffinity: None
  type: ClusterIP
status:
  loadBalancer: {}

===============

Comment 1 Ruth Netser 2020-07-15 15:52:46 UTC
Created attachment 1701255 [details]
Pod log

Comment 3 Petr Horáček 2020-07-16 13:02:00 UTC
Since it only happened on PSI and the reproducer is unclear, I'm targeting this for the next release to keep the bug around. If we see it again, please don't hesitate to raise it. If we don't, we should eventually close it.

Comment 4 Ruth Netser 2020-08-06 15:26:42 UTC
Reproduced during CNV upgrade 2.4.0 -> 2.4.1
Upgrade does not end because the deployment is not ready.
Both kubemacpool-mac-controller-manager pods have:

  - lastProbeTime: null
    lastTransitionTime: "2020-08-06T13:42:24Z"
    message: corresponding condition of pod readiness gate "kubemacpool.io/leader-ready"
      does not exist.
    reason: ReadinessGatesNotReady
    status: "False"
    type: Ready

(pod yaml, describe and logs are attached)

======================================================================================
$ oc get csv -A|grep kube
openshift-cnv                          kubevirt-hyperconverged-operator.v2.4.0        OpenShift Virtualization      2.4.0                                                             Replacing
openshift-cnv                          kubevirt-hyperconverged-operator.v2.4.1        OpenShift Virtualization      2.4.1                   kubevirt-hyperconverged-operator.v2.4.0   Installing

======================================================================================
$ oc get pod -n openshift-cnv|grep 'hco\|pool'
hco-operator-5fc59cddd7-qdw5m                         0/1     Running   0          95m
kubemacpool-mac-controller-manager-6cdb6dff69-tls55   1/1     Running   0          93m
kubemacpool-mac-controller-manager-6cdb6dff69-xfshg   1/1     Running   0          93m

======================================================================================
$ oc describe HyperConverged -n openshift-cnv kubevirt-hyperconverged
....
Status:
  Conditions:
    Last Heartbeat Time:   2020-08-06T15:16:15Z
    Last Transition Time:  2020-08-06T08:02:31Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                True
    Type:                  ReconcileComplete
    Last Heartbeat Time:   2020-08-06T15:16:15Z
    Last Transition Time:  2020-08-06T13:42:14Z
    Message:               NetworkAddonsConfig is not available: Configuration is in process
    Reason:                NetworkAddonsConfigNotAvailable
    Status:                False
    Type:                  Available
    Last Heartbeat Time:   2020-08-06T15:16:15Z
    Last Transition Time:  2020-08-06T13:42:14Z
    Message:               HCO is now upgrading to version v2.4.1
    Reason:                HCOUpgrading
    Status:                True
    Type:                  Progressing
    Last Heartbeat Time:   2020-08-06T15:16:15Z
    Last Transition Time:  2020-08-06T13:46:31Z
    Message:               Reconcile completed successfully
    Reason:                ReconcileCompleted
    Status:                False
    Type:                  Degraded
    Last Heartbeat Time:   2020-08-06T15:16:15Z
    Last Transition Time:  2020-08-06T13:42:14Z
    Message:               NetworkAddonsConfig is progressing: Deployment "openshift-cnv/kubemacpool-mac-controller-manager" is not available (awaiting 2 nodes)
    Reason:                NetworkAddonsConfigProgressing
    Status:                False
    Type:                  Upgradeable

======================================================================================
$ oc describe deployment -n openshift-cnv kubemacpool-mac-controller-manager
Name:               kubemacpool-mac-controller-manager
Namespace:          openshift-cnv
CreationTimestamp:  Thu, 06 Aug 2020 08:02:31 +0000
Labels:             control-plane=mac-controller-manager
                    controller-tools.k8s.io=1.0
                    networkaddonsoperator.network.kubevirt.io/version=sha256_607b5bce4672e96b2cfe2b84c0d1eab2f1ce26c54c1667284f972dda
Annotations:        deployment.kubernetes.io/revision: 2
Selector:           control-plane=mac-controller-manager,controller-tools.k8s.io=1.0
Replicas:           2 desired | 2 updated | 2 total | 0 available | 2 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:  app=kubemacpool
           control-plane=mac-controller-manager
           controller-tools.k8s.io=1.0
  Containers:
   manager:
    Image:      registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:bd55beb56f4a481c83bdb64e58311146510b6cbe2641c58f059143467052bf1b
    Port:       8000/TCP
    Host Port:  0/TCP
    Command:
      /manager
    Args:
      --v=production
      --wait-time=600
    Limits:
      cpu:     300m
      memory:  600Mi
    Requests:
      cpu:      100m
      memory:   300Mi
    Readiness:  http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
      POD_NAME:        (v1:metadata.name)
      RANGE_START:    <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'>  Optional: false
      RANGE_END:      <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'>    Optional: false
    Mounts:
      /etc/webhook/certs from tls-key-pair (ro)
  Volumes:
   tls-key-pair:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubemacpool-service
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Available      False   MinimumReplicasUnavailable
  Progressing    False   ProgressDeadlineExceeded
OldReplicaSets:  <none>
NewReplicaSet:   kubemacpool-mac-controller-manager-6cdb6dff69 (2/2 replicas created)
Events:
  Type    Reason             Age   From                   Message
  ----    ------             ----  ----                   -------
  Normal  ScalingReplicaSet  95m   deployment-controller  Scaled down replica set kubemacpool-mac-controller-manager-798cc44d88 to 0
  Normal  ScalingReplicaSet  95m   deployment-controller  Scaled up replica set kubemacpool-mac-controller-manager-6cdb6dff69 to 2


======================================================================================
$ oc get replicasets -n openshift-cnv|grep pool
kubemacpool-mac-controller-manager-6cdb6dff69   2         2         0       96m
kubemacpool-mac-controller-manager-798cc44d88   0         0         0       7h16m

======================================================================================
$ oc describe replicasets -n openshift-cnv kubemacpool-mac-controller-manager-6cdb6dff69
Name:           kubemacpool-mac-controller-manager-6cdb6dff69
Namespace:      openshift-cnv
Selector:       control-plane=mac-controller-manager,controller-tools.k8s.io=1.0,pod-template-hash=6cdb6dff69
Labels:         app=kubemacpool
                control-plane=mac-controller-manager
                controller-tools.k8s.io=1.0
                pod-template-hash=6cdb6dff69
Annotations:    deployment.kubernetes.io/desired-replicas: 2
                deployment.kubernetes.io/max-replicas: 2
                deployment.kubernetes.io/revision: 2
Controlled By:  Deployment/kubemacpool-mac-controller-manager
Replicas:       2 current / 2 desired
Pods Status:    2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
  Labels:  app=kubemacpool
           control-plane=mac-controller-manager
           controller-tools.k8s.io=1.0
           pod-template-hash=6cdb6dff69
  Containers:
   manager:
    Image:      registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:bd55beb56f4a481c83bdb64e58311146510b6cbe2641c58f059143467052bf1b
    Port:       8000/TCP
    Host Port:  0/TCP
    Command:
      /manager
    Args:
      --v=production
      --wait-time=600
    Limits:
      cpu:     300m
      memory:  600Mi
    Requests:
      cpu:      100m
      memory:   300Mi
    Readiness:  http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
      POD_NAME:        (v1:metadata.name)
      RANGE_START:    <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'>  Optional: false
      RANGE_END:      <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'>    Optional: false
    Mounts:
      /etc/webhook/certs from tls-key-pair (ro)
  Volumes:
   tls-key-pair:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubemacpool-service
    Optional:    false
Events:
  Type    Reason            Age   From                   Message
  ----    ------            ----  ----                   -------
  Normal  SuccessfulCreate  97m   replicaset-controller  Created pod: kubemacpool-mac-controller-manager-6cdb6dff69-xfshg
  Normal  SuccessfulCreate  97m   replicaset-controller  Created pod: kubemacpool-mac-controller-manager-6cdb6dff69-tls55

Comment 5 Petr Horáček 2020-08-06 15:28:54 UTC
Thanks Ruth. Since it happened again and is causing dead-on-arrival CNV, I'm proposing this as a blocker for 2.4.1.

Comment 6 Ruth Netser 2020-08-06 15:29:47 UTC
Created attachment 1710678 [details]
Pods yaml,log and describe files

Comment 7 Simone Tiraboschi 2020-08-13 14:53:19 UTC
*** Bug 1868403 has been marked as a duplicate of this bug. ***

Comment 8 yzaindbe 2020-08-19 02:22:07 UTC
Test Environment :
==================

$ oc version
Client Version: 4.5.0-202007240519.p0-b66f2d3
Server Version: 4.5.6
Kubernetes Version: v1.18.3+002a51f

CNV Version:
$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.4.1

steps
=====

Bug Summary: CNV was dead on arrival, or was deployed but kmp service was down because of leader issues of kmp pods.
Fix: only one kmp pod will be deployed and with no leader label.

================================================================================================================================================================
check cnv is deployed successfully:

$ oc get csv -A|grep kube
NAMESPACE                              NAME                                           DISPLAY                       VERSION                 REPLACES   PHASE
openshift-cnv                          kubevirt-hyperconverged-operator.v2.4.1        OpenShift Virtualization      2.4.1                              Succeeded
================================================================================================================================================================
check there's only one kmp pod and it is running:

$ oc get pod -n openshift-cnv|grep kubemacpool
NAME                                                 READY   STATUS    RESTARTS   AGE
kubemacpool-mac-controller-manager-7d89555fd-75xbw   1/1     Running   0          41m
================================================================================================================================================================
check deployment conditions are good:

$ oc describe deployment -n openshift-cnv kubemacpool-mac-controller-manager
Name:               kubemacpool-mac-controller-manager
Namespace:          openshift-cnv
CreationTimestamp:  Tue, 18 Aug 2020 10:20:40 +0000
Labels:             control-plane=mac-controller-manager
                    controller-tools.k8s.io=1.0
                    networkaddonsoperator.network.kubevirt.io/version=sha256_496384595539e60f6652ee4283f857e23608cdf4a975b1037a4abefb
Annotations:        deployment.kubernetes.io/revision: 1
Selector:           control-plane=mac-controller-manager,controller-tools.k8s.io=1.0
Replicas:           1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType:       Recreate
MinReadySeconds:    0
Pod Template:
  Labels:  app=kubemacpool
           control-plane=mac-controller-manager
           controller-tools.k8s.io=1.0
  Containers:
   manager:
    Image:      registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:ab09d946469e26ddf52791dff864e24662d9ba898f6082d92b50023f080b4467
    Port:       8000/TCP
    Host Port:  0/TCP
    Command:
      /manager
    Args:
      --v=production
      --wait-time=600
    Limits:
      cpu:     300m
      memory:  600Mi
    Requests:
      cpu:      100m
      memory:   300Mi
    Readiness:  http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3
    Environment:
      POD_NAMESPACE:   (v1:metadata.namespace)
      POD_NAME:        (v1:metadata.name)
      RANGE_START:    <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'>  Optional: false
      RANGE_END:      <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'>    Optional: false
    Mounts:
      /etc/webhook/certs from tls-key-pair (ro)
  Volumes:
   tls-key-pair:
    Type:        Secret (a volume populated by a Secret)
    SecretName:  kubemacpool-service
    Optional:    false
Conditions:
  Type           Status  Reason
  ----           ------  ------
  Progressing    True    NewReplicaSetAvailable
  Available      True    MinimumReplicasAvailable
OldReplicaSets:  <none>
NewReplicaSet:   kubemacpool-mac-controller-manager-7d89555fd (1/1 replicas created)
Events:
  Type    Reason             Age                From                   Message
  ----    ------             ----               ----                   -------
  Normal  ScalingReplicaSet  45m                deployment-controller  Scaled down replica set kubemacpool-mac-controller-manager-7d89555fd to 0
  Normal  ScalingReplicaSet  45m (x2 over 15h)  deployment-controller  Scaled up replica set kubemacpool-mac-controller-manager-7d89555fd to 1
================================================================================================================================================================
check replicaset:

$ oc get replicasets -n openshift-cnv|grep pool
NAME                                           DESIRED   CURRENT   READY   AGE
kubemacpool-mac-controller-manager-7d89555fd   1         1         1       15h
================================================================================================================================================================

Comment 13 errata-xmlrpc 2020-09-03 20:31:08 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2020:3629


Note You need to log in before you can comment on or make changes to this bug.