Created attachment 1701254 [details] Pod log Description of problem: On a cluster which was deployed successfully and was functioning OK, kubemacpool-leader is not defined, kmp service is down (no endpoint is associated with it). Both kubemacpool-mac-controller-manager pods are running. Deleting both pods solved the problem Version-Release number of selected component (if applicable): CNV 2.4.0 How reproducible: Unknown Steps to Reproduce: 1. No specific steps. Actual results: kmp service is down, fail to create a VM Expected results: leader should always be assigned. Additional info: ======================= KMP logs attached. ======================= $ oc describe Service -n openshift-cnv kubemacpool-service Name: kubemacpool-service Namespace: openshift-cnv Labels: networkaddonsoperator.network.kubevirt.io/version=sha256_ecbcbe6e8ed9015ed23aa3a93440fc3f4728ee79b97c1cfcf9152d05 Annotations: <none> Selector: kubemacpool-leader=true Type: ClusterIP IP: 172.30.137.156 Port: <unset> 443/TCP TargetPort: 8000/TCP Endpoints: <none> Session Affinity: None Events: <none> ======================= $ oc get pods -A| grep kubemac openshift-cnv kubemacpool-mac-controller-manager-865d98484c-7hkhc 1/1 Running 0 3h58m openshift-cnv kubemacpool-mac-controller-manager-865d98484c-8vpvp 1/1 Running 0 3h50m ======================= $ oc get pods -n openshift-cnv -l kubemacpool-leader=true No resources found in openshift-cnv namespace. ======================= $ oc get Service -n openshift-cnv kubemacpool-service -oyaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2020-07-14T10:34:12Z" labels: networkaddonsoperator.network.kubevirt.io/version: sha256_ecbcbe6e8ed9015ed23aa3a93440fc3f4728ee79b97c1cfcf9152d05 managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:networkaddonsoperator.network.kubevirt.io/version: {} f:ownerReferences: .: {} k:{"uid":"f5b53444-eae9-4810-8252-94739af6cfb8"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:ports: .: {} k:{"port":443,"protocol":"TCP"}: .: {} f:port: {} f:protocol: {} f:targetPort: {} f:publishNotReadyAddresses: {} f:selector: .: {} f:kubemacpool-leader: {} f:sessionAffinity: {} f:type: {} manager: cluster-network-addons-operator operation: Update time: "2020-07-14T10:34:12Z" name: kubemacpool-service namespace: openshift-cnv ownerReferences: - apiVersion: networkaddonsoperator.network.kubevirt.io/v1alpha1 blockOwnerDeletion: true controller: true kind: NetworkAddonsConfig name: cluster uid: f5b53444-eae9-4810-8252-94739af6cfb8 resourceVersion: "46753" selfLink: /api/v1/namespaces/openshift-cnv/services/kubemacpool-service uid: 0eb110eb-898d-43fa-b09c-4f669add5c19 spec: clusterIP: 172.30.137.156 ports: - port: 443 protocol: TCP targetPort: 8000 publishNotReadyAddresses: true selector: kubemacpool-leader: "true" sessionAffinity: None type: ClusterIP status: loadBalancer: {} ===============
Created attachment 1701255 [details] Pod log
Since it only happened on PSI and the reproducer is unclear, I'm targeting this for the next release to keep the bug around. If we see it again, please don't hesitate to raise it. If we don't, we should eventually close it.
Reproduced during CNV upgrade 2.4.0 -> 2.4.1 Upgrade does not end because the deployment is not ready. Both kubemacpool-mac-controller-manager pods have: - lastProbeTime: null lastTransitionTime: "2020-08-06T13:42:24Z" message: corresponding condition of pod readiness gate "kubemacpool.io/leader-ready" does not exist. reason: ReadinessGatesNotReady status: "False" type: Ready (pod yaml, describe and logs are attached) ====================================================================================== $ oc get csv -A|grep kube openshift-cnv kubevirt-hyperconverged-operator.v2.4.0 OpenShift Virtualization 2.4.0 Replacing openshift-cnv kubevirt-hyperconverged-operator.v2.4.1 OpenShift Virtualization 2.4.1 kubevirt-hyperconverged-operator.v2.4.0 Installing ====================================================================================== $ oc get pod -n openshift-cnv|grep 'hco\|pool' hco-operator-5fc59cddd7-qdw5m 0/1 Running 0 95m kubemacpool-mac-controller-manager-6cdb6dff69-tls55 1/1 Running 0 93m kubemacpool-mac-controller-manager-6cdb6dff69-xfshg 1/1 Running 0 93m ====================================================================================== $ oc describe HyperConverged -n openshift-cnv kubevirt-hyperconverged .... Status: Conditions: Last Heartbeat Time: 2020-08-06T15:16:15Z Last Transition Time: 2020-08-06T08:02:31Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: True Type: ReconcileComplete Last Heartbeat Time: 2020-08-06T15:16:15Z Last Transition Time: 2020-08-06T13:42:14Z Message: NetworkAddonsConfig is not available: Configuration is in process Reason: NetworkAddonsConfigNotAvailable Status: False Type: Available Last Heartbeat Time: 2020-08-06T15:16:15Z Last Transition Time: 2020-08-06T13:42:14Z Message: HCO is now upgrading to version v2.4.1 Reason: HCOUpgrading Status: True Type: Progressing Last Heartbeat Time: 2020-08-06T15:16:15Z Last Transition Time: 2020-08-06T13:46:31Z Message: Reconcile completed successfully Reason: ReconcileCompleted Status: False Type: Degraded Last Heartbeat Time: 2020-08-06T15:16:15Z Last Transition Time: 2020-08-06T13:42:14Z Message: NetworkAddonsConfig is progressing: Deployment "openshift-cnv/kubemacpool-mac-controller-manager" is not available (awaiting 2 nodes) Reason: NetworkAddonsConfigProgressing Status: False Type: Upgradeable ====================================================================================== $ oc describe deployment -n openshift-cnv kubemacpool-mac-controller-manager Name: kubemacpool-mac-controller-manager Namespace: openshift-cnv CreationTimestamp: Thu, 06 Aug 2020 08:02:31 +0000 Labels: control-plane=mac-controller-manager controller-tools.k8s.io=1.0 networkaddonsoperator.network.kubevirt.io/version=sha256_607b5bce4672e96b2cfe2b84c0d1eab2f1ce26c54c1667284f972dda Annotations: deployment.kubernetes.io/revision: 2 Selector: control-plane=mac-controller-manager,controller-tools.k8s.io=1.0 Replicas: 2 desired | 2 updated | 2 total | 0 available | 2 unavailable StrategyType: Recreate MinReadySeconds: 0 Pod Template: Labels: app=kubemacpool control-plane=mac-controller-manager controller-tools.k8s.io=1.0 Containers: manager: Image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:bd55beb56f4a481c83bdb64e58311146510b6cbe2641c58f059143467052bf1b Port: 8000/TCP Host Port: 0/TCP Command: /manager Args: --v=production --wait-time=600 Limits: cpu: 300m memory: 600Mi Requests: cpu: 100m memory: 300Mi Readiness: http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAMESPACE: (v1:metadata.namespace) POD_NAME: (v1:metadata.name) RANGE_START: <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'> Optional: false RANGE_END: <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'> Optional: false Mounts: /etc/webhook/certs from tls-key-pair (ro) Volumes: tls-key-pair: Type: Secret (a volume populated by a Secret) SecretName: kubemacpool-service Optional: false Conditions: Type Status Reason ---- ------ ------ Available False MinimumReplicasUnavailable Progressing False ProgressDeadlineExceeded OldReplicaSets: <none> NewReplicaSet: kubemacpool-mac-controller-manager-6cdb6dff69 (2/2 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 95m deployment-controller Scaled down replica set kubemacpool-mac-controller-manager-798cc44d88 to 0 Normal ScalingReplicaSet 95m deployment-controller Scaled up replica set kubemacpool-mac-controller-manager-6cdb6dff69 to 2 ====================================================================================== $ oc get replicasets -n openshift-cnv|grep pool kubemacpool-mac-controller-manager-6cdb6dff69 2 2 0 96m kubemacpool-mac-controller-manager-798cc44d88 0 0 0 7h16m ====================================================================================== $ oc describe replicasets -n openshift-cnv kubemacpool-mac-controller-manager-6cdb6dff69 Name: kubemacpool-mac-controller-manager-6cdb6dff69 Namespace: openshift-cnv Selector: control-plane=mac-controller-manager,controller-tools.k8s.io=1.0,pod-template-hash=6cdb6dff69 Labels: app=kubemacpool control-plane=mac-controller-manager controller-tools.k8s.io=1.0 pod-template-hash=6cdb6dff69 Annotations: deployment.kubernetes.io/desired-replicas: 2 deployment.kubernetes.io/max-replicas: 2 deployment.kubernetes.io/revision: 2 Controlled By: Deployment/kubemacpool-mac-controller-manager Replicas: 2 current / 2 desired Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed Pod Template: Labels: app=kubemacpool control-plane=mac-controller-manager controller-tools.k8s.io=1.0 pod-template-hash=6cdb6dff69 Containers: manager: Image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:bd55beb56f4a481c83bdb64e58311146510b6cbe2641c58f059143467052bf1b Port: 8000/TCP Host Port: 0/TCP Command: /manager Args: --v=production --wait-time=600 Limits: cpu: 300m memory: 600Mi Requests: cpu: 100m memory: 300Mi Readiness: http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAMESPACE: (v1:metadata.namespace) POD_NAME: (v1:metadata.name) RANGE_START: <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'> Optional: false RANGE_END: <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'> Optional: false Mounts: /etc/webhook/certs from tls-key-pair (ro) Volumes: tls-key-pair: Type: Secret (a volume populated by a Secret) SecretName: kubemacpool-service Optional: false Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal SuccessfulCreate 97m replicaset-controller Created pod: kubemacpool-mac-controller-manager-6cdb6dff69-xfshg Normal SuccessfulCreate 97m replicaset-controller Created pod: kubemacpool-mac-controller-manager-6cdb6dff69-tls55
Thanks Ruth. Since it happened again and is causing dead-on-arrival CNV, I'm proposing this as a blocker for 2.4.1.
Created attachment 1710678 [details] Pods yaml,log and describe files
*** Bug 1868403 has been marked as a duplicate of this bug. ***
Test Environment : ================== $ oc version Client Version: 4.5.0-202007240519.p0-b66f2d3 Server Version: 4.5.6 Kubernetes Version: v1.18.3+002a51f CNV Version: $ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1 2.4.1 steps ===== Bug Summary: CNV was dead on arrival, or was deployed but kmp service was down because of leader issues of kmp pods. Fix: only one kmp pod will be deployed and with no leader label. ================================================================================================================================================================ check cnv is deployed successfully: $ oc get csv -A|grep kube NAMESPACE NAME DISPLAY VERSION REPLACES PHASE openshift-cnv kubevirt-hyperconverged-operator.v2.4.1 OpenShift Virtualization 2.4.1 Succeeded ================================================================================================================================================================ check there's only one kmp pod and it is running: $ oc get pod -n openshift-cnv|grep kubemacpool NAME READY STATUS RESTARTS AGE kubemacpool-mac-controller-manager-7d89555fd-75xbw 1/1 Running 0 41m ================================================================================================================================================================ check deployment conditions are good: $ oc describe deployment -n openshift-cnv kubemacpool-mac-controller-manager Name: kubemacpool-mac-controller-manager Namespace: openshift-cnv CreationTimestamp: Tue, 18 Aug 2020 10:20:40 +0000 Labels: control-plane=mac-controller-manager controller-tools.k8s.io=1.0 networkaddonsoperator.network.kubevirt.io/version=sha256_496384595539e60f6652ee4283f857e23608cdf4a975b1037a4abefb Annotations: deployment.kubernetes.io/revision: 1 Selector: control-plane=mac-controller-manager,controller-tools.k8s.io=1.0 Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable StrategyType: Recreate MinReadySeconds: 0 Pod Template: Labels: app=kubemacpool control-plane=mac-controller-manager controller-tools.k8s.io=1.0 Containers: manager: Image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:ab09d946469e26ddf52791dff864e24662d9ba898f6082d92b50023f080b4467 Port: 8000/TCP Host Port: 0/TCP Command: /manager Args: --v=production --wait-time=600 Limits: cpu: 300m memory: 600Mi Requests: cpu: 100m memory: 300Mi Readiness: http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3 Environment: POD_NAMESPACE: (v1:metadata.namespace) POD_NAME: (v1:metadata.name) RANGE_START: <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'> Optional: false RANGE_END: <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'> Optional: false Mounts: /etc/webhook/certs from tls-key-pair (ro) Volumes: tls-key-pair: Type: Secret (a volume populated by a Secret) SecretName: kubemacpool-service Optional: false Conditions: Type Status Reason ---- ------ ------ Progressing True NewReplicaSetAvailable Available True MinimumReplicasAvailable OldReplicaSets: <none> NewReplicaSet: kubemacpool-mac-controller-manager-7d89555fd (1/1 replicas created) Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal ScalingReplicaSet 45m deployment-controller Scaled down replica set kubemacpool-mac-controller-manager-7d89555fd to 0 Normal ScalingReplicaSet 45m (x2 over 15h) deployment-controller Scaled up replica set kubemacpool-mac-controller-manager-7d89555fd to 1 ================================================================================================================================================================ check replicaset: $ oc get replicasets -n openshift-cnv|grep pool NAME DESIRED CURRENT READY AGE kubemacpool-mac-controller-manager-7d89555fd 1 1 1 15h ================================================================================================================================================================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3629