Bug 1857301
| Summary: | kubemacpool - At some point kubemacpool-leader is not defined, kmp service is down | ||||||||||
|---|---|---|---|---|---|---|---|---|---|---|---|
| Product: | Container Native Virtualization (CNV) | Reporter: | Ruth Netser <rnetser> | ||||||||
| Component: | Networking | Assignee: | Petr Horáček <phoracek> | ||||||||
| Status: | CLOSED ERRATA | QA Contact: | Meni Yakove <myakove> | ||||||||
| Severity: | high | Docs Contact: | |||||||||
| Priority: | unspecified | ||||||||||
| Version: | 2.4.1 | CC: | cnv-qe-bugs, ncredi, oramraz, ysegev | ||||||||
| Target Milestone: | --- | ||||||||||
| Target Release: | 2.4.1 | ||||||||||
| Hardware: | Unspecified | ||||||||||
| OS: | Unspecified | ||||||||||
| Whiteboard: | |||||||||||
| Fixed In Version: | kubemacpool-container-v2.4.1-2 | Doc Type: | If docs needed, set a value | ||||||||
| Doc Text: | Story Points: | --- | |||||||||
| Clone Of: | Environment: | ||||||||||
| Last Closed: | 2020-09-03 20:31:08 UTC | Type: | Bug | ||||||||
| Regression: | --- | Mount Type: | --- | ||||||||
| Documentation: | --- | CRM: | |||||||||
| Verified Versions: | Category: | --- | |||||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||||
| Embargoed: | |||||||||||
| Attachments: |
|
||||||||||
Created attachment 1701255 [details]
Pod log
Since it only happened on PSI and the reproducer is unclear, I'm targeting this for the next release to keep the bug around. If we see it again, please don't hesitate to raise it. If we don't, we should eventually close it. Reproduced during CNV upgrade 2.4.0 -> 2.4.1
Upgrade does not end because the deployment is not ready.
Both kubemacpool-mac-controller-manager pods have:
- lastProbeTime: null
lastTransitionTime: "2020-08-06T13:42:24Z"
message: corresponding condition of pod readiness gate "kubemacpool.io/leader-ready"
does not exist.
reason: ReadinessGatesNotReady
status: "False"
type: Ready
(pod yaml, describe and logs are attached)
======================================================================================
$ oc get csv -A|grep kube
openshift-cnv kubevirt-hyperconverged-operator.v2.4.0 OpenShift Virtualization 2.4.0 Replacing
openshift-cnv kubevirt-hyperconverged-operator.v2.4.1 OpenShift Virtualization 2.4.1 kubevirt-hyperconverged-operator.v2.4.0 Installing
======================================================================================
$ oc get pod -n openshift-cnv|grep 'hco\|pool'
hco-operator-5fc59cddd7-qdw5m 0/1 Running 0 95m
kubemacpool-mac-controller-manager-6cdb6dff69-tls55 1/1 Running 0 93m
kubemacpool-mac-controller-manager-6cdb6dff69-xfshg 1/1 Running 0 93m
======================================================================================
$ oc describe HyperConverged -n openshift-cnv kubevirt-hyperconverged
....
Status:
Conditions:
Last Heartbeat Time: 2020-08-06T15:16:15Z
Last Transition Time: 2020-08-06T08:02:31Z
Message: Reconcile completed successfully
Reason: ReconcileCompleted
Status: True
Type: ReconcileComplete
Last Heartbeat Time: 2020-08-06T15:16:15Z
Last Transition Time: 2020-08-06T13:42:14Z
Message: NetworkAddonsConfig is not available: Configuration is in process
Reason: NetworkAddonsConfigNotAvailable
Status: False
Type: Available
Last Heartbeat Time: 2020-08-06T15:16:15Z
Last Transition Time: 2020-08-06T13:42:14Z
Message: HCO is now upgrading to version v2.4.1
Reason: HCOUpgrading
Status: True
Type: Progressing
Last Heartbeat Time: 2020-08-06T15:16:15Z
Last Transition Time: 2020-08-06T13:46:31Z
Message: Reconcile completed successfully
Reason: ReconcileCompleted
Status: False
Type: Degraded
Last Heartbeat Time: 2020-08-06T15:16:15Z
Last Transition Time: 2020-08-06T13:42:14Z
Message: NetworkAddonsConfig is progressing: Deployment "openshift-cnv/kubemacpool-mac-controller-manager" is not available (awaiting 2 nodes)
Reason: NetworkAddonsConfigProgressing
Status: False
Type: Upgradeable
======================================================================================
$ oc describe deployment -n openshift-cnv kubemacpool-mac-controller-manager
Name: kubemacpool-mac-controller-manager
Namespace: openshift-cnv
CreationTimestamp: Thu, 06 Aug 2020 08:02:31 +0000
Labels: control-plane=mac-controller-manager
controller-tools.k8s.io=1.0
networkaddonsoperator.network.kubevirt.io/version=sha256_607b5bce4672e96b2cfe2b84c0d1eab2f1ce26c54c1667284f972dda
Annotations: deployment.kubernetes.io/revision: 2
Selector: control-plane=mac-controller-manager,controller-tools.k8s.io=1.0
Replicas: 2 desired | 2 updated | 2 total | 0 available | 2 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app=kubemacpool
control-plane=mac-controller-manager
controller-tools.k8s.io=1.0
Containers:
manager:
Image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:bd55beb56f4a481c83bdb64e58311146510b6cbe2641c58f059143467052bf1b
Port: 8000/TCP
Host Port: 0/TCP
Command:
/manager
Args:
--v=production
--wait-time=600
Limits:
cpu: 300m
memory: 600Mi
Requests:
cpu: 100m
memory: 300Mi
Readiness: http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
RANGE_START: <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'> Optional: false
RANGE_END: <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'> Optional: false
Mounts:
/etc/webhook/certs from tls-key-pair (ro)
Volumes:
tls-key-pair:
Type: Secret (a volume populated by a Secret)
SecretName: kubemacpool-service
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Available False MinimumReplicasUnavailable
Progressing False ProgressDeadlineExceeded
OldReplicaSets: <none>
NewReplicaSet: kubemacpool-mac-controller-manager-6cdb6dff69 (2/2 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 95m deployment-controller Scaled down replica set kubemacpool-mac-controller-manager-798cc44d88 to 0
Normal ScalingReplicaSet 95m deployment-controller Scaled up replica set kubemacpool-mac-controller-manager-6cdb6dff69 to 2
======================================================================================
$ oc get replicasets -n openshift-cnv|grep pool
kubemacpool-mac-controller-manager-6cdb6dff69 2 2 0 96m
kubemacpool-mac-controller-manager-798cc44d88 0 0 0 7h16m
======================================================================================
$ oc describe replicasets -n openshift-cnv kubemacpool-mac-controller-manager-6cdb6dff69
Name: kubemacpool-mac-controller-manager-6cdb6dff69
Namespace: openshift-cnv
Selector: control-plane=mac-controller-manager,controller-tools.k8s.io=1.0,pod-template-hash=6cdb6dff69
Labels: app=kubemacpool
control-plane=mac-controller-manager
controller-tools.k8s.io=1.0
pod-template-hash=6cdb6dff69
Annotations: deployment.kubernetes.io/desired-replicas: 2
deployment.kubernetes.io/max-replicas: 2
deployment.kubernetes.io/revision: 2
Controlled By: Deployment/kubemacpool-mac-controller-manager
Replicas: 2 current / 2 desired
Pods Status: 2 Running / 0 Waiting / 0 Succeeded / 0 Failed
Pod Template:
Labels: app=kubemacpool
control-plane=mac-controller-manager
controller-tools.k8s.io=1.0
pod-template-hash=6cdb6dff69
Containers:
manager:
Image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:bd55beb56f4a481c83bdb64e58311146510b6cbe2641c58f059143467052bf1b
Port: 8000/TCP
Host Port: 0/TCP
Command:
/manager
Args:
--v=production
--wait-time=600
Limits:
cpu: 300m
memory: 600Mi
Requests:
cpu: 100m
memory: 300Mi
Readiness: http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
RANGE_START: <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'> Optional: false
RANGE_END: <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'> Optional: false
Mounts:
/etc/webhook/certs from tls-key-pair (ro)
Volumes:
tls-key-pair:
Type: Secret (a volume populated by a Secret)
SecretName: kubemacpool-service
Optional: false
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal SuccessfulCreate 97m replicaset-controller Created pod: kubemacpool-mac-controller-manager-6cdb6dff69-xfshg
Normal SuccessfulCreate 97m replicaset-controller Created pod: kubemacpool-mac-controller-manager-6cdb6dff69-tls55
Thanks Ruth. Since it happened again and is causing dead-on-arrival CNV, I'm proposing this as a blocker for 2.4.1. Created attachment 1710678 [details]
Pods yaml,log and describe files
*** Bug 1868403 has been marked as a duplicate of this bug. *** Test Environment :
==================
$ oc version
Client Version: 4.5.0-202007240519.p0-b66f2d3
Server Version: 4.5.6
Kubernetes Version: v1.18.3+002a51f
CNV Version:
$ oc get csv -n openshift-cnv | awk ' { print $4 } ' | tail -n1
2.4.1
steps
=====
Bug Summary: CNV was dead on arrival, or was deployed but kmp service was down because of leader issues of kmp pods.
Fix: only one kmp pod will be deployed and with no leader label.
================================================================================================================================================================
check cnv is deployed successfully:
$ oc get csv -A|grep kube
NAMESPACE NAME DISPLAY VERSION REPLACES PHASE
openshift-cnv kubevirt-hyperconverged-operator.v2.4.1 OpenShift Virtualization 2.4.1 Succeeded
================================================================================================================================================================
check there's only one kmp pod and it is running:
$ oc get pod -n openshift-cnv|grep kubemacpool
NAME READY STATUS RESTARTS AGE
kubemacpool-mac-controller-manager-7d89555fd-75xbw 1/1 Running 0 41m
================================================================================================================================================================
check deployment conditions are good:
$ oc describe deployment -n openshift-cnv kubemacpool-mac-controller-manager
Name: kubemacpool-mac-controller-manager
Namespace: openshift-cnv
CreationTimestamp: Tue, 18 Aug 2020 10:20:40 +0000
Labels: control-plane=mac-controller-manager
controller-tools.k8s.io=1.0
networkaddonsoperator.network.kubevirt.io/version=sha256_496384595539e60f6652ee4283f857e23608cdf4a975b1037a4abefb
Annotations: deployment.kubernetes.io/revision: 1
Selector: control-plane=mac-controller-manager,controller-tools.k8s.io=1.0
Replicas: 1 desired | 1 updated | 1 total | 1 available | 0 unavailable
StrategyType: Recreate
MinReadySeconds: 0
Pod Template:
Labels: app=kubemacpool
control-plane=mac-controller-manager
controller-tools.k8s.io=1.0
Containers:
manager:
Image: registry-proxy.engineering.redhat.com/rh-osbs/container-native-virtualization-kubemacpool@sha256:ab09d946469e26ddf52791dff864e24662d9ba898f6082d92b50023f080b4467
Port: 8000/TCP
Host Port: 0/TCP
Command:
/manager
Args:
--v=production
--wait-time=600
Limits:
cpu: 300m
memory: 600Mi
Requests:
cpu: 100m
memory: 300Mi
Readiness: http-get https://:webhook-server/readyz delay=10s timeout=1s period=10s #success=1 #failure=3
Environment:
POD_NAMESPACE: (v1:metadata.namespace)
POD_NAME: (v1:metadata.name)
RANGE_START: <set to the key 'RANGE_START' of config map 'kubemacpool-mac-range-config'> Optional: false
RANGE_END: <set to the key 'RANGE_END' of config map 'kubemacpool-mac-range-config'> Optional: false
Mounts:
/etc/webhook/certs from tls-key-pair (ro)
Volumes:
tls-key-pair:
Type: Secret (a volume populated by a Secret)
SecretName: kubemacpool-service
Optional: false
Conditions:
Type Status Reason
---- ------ ------
Progressing True NewReplicaSetAvailable
Available True MinimumReplicasAvailable
OldReplicaSets: <none>
NewReplicaSet: kubemacpool-mac-controller-manager-7d89555fd (1/1 replicas created)
Events:
Type Reason Age From Message
---- ------ ---- ---- -------
Normal ScalingReplicaSet 45m deployment-controller Scaled down replica set kubemacpool-mac-controller-manager-7d89555fd to 0
Normal ScalingReplicaSet 45m (x2 over 15h) deployment-controller Scaled up replica set kubemacpool-mac-controller-manager-7d89555fd to 1
================================================================================================================================================================
check replicaset:
$ oc get replicasets -n openshift-cnv|grep pool
NAME DESIRED CURRENT READY AGE
kubemacpool-mac-controller-manager-7d89555fd 1 1 1 15h
================================================================================================================================================================
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (OpenShift Virtualization 2.4.1 images), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2020:3629 |
Created attachment 1701254 [details] Pod log Description of problem: On a cluster which was deployed successfully and was functioning OK, kubemacpool-leader is not defined, kmp service is down (no endpoint is associated with it). Both kubemacpool-mac-controller-manager pods are running. Deleting both pods solved the problem Version-Release number of selected component (if applicable): CNV 2.4.0 How reproducible: Unknown Steps to Reproduce: 1. No specific steps. Actual results: kmp service is down, fail to create a VM Expected results: leader should always be assigned. Additional info: ======================= KMP logs attached. ======================= $ oc describe Service -n openshift-cnv kubemacpool-service Name: kubemacpool-service Namespace: openshift-cnv Labels: networkaddonsoperator.network.kubevirt.io/version=sha256_ecbcbe6e8ed9015ed23aa3a93440fc3f4728ee79b97c1cfcf9152d05 Annotations: <none> Selector: kubemacpool-leader=true Type: ClusterIP IP: 172.30.137.156 Port: <unset> 443/TCP TargetPort: 8000/TCP Endpoints: <none> Session Affinity: None Events: <none> ======================= $ oc get pods -A| grep kubemac openshift-cnv kubemacpool-mac-controller-manager-865d98484c-7hkhc 1/1 Running 0 3h58m openshift-cnv kubemacpool-mac-controller-manager-865d98484c-8vpvp 1/1 Running 0 3h50m ======================= $ oc get pods -n openshift-cnv -l kubemacpool-leader=true No resources found in openshift-cnv namespace. ======================= $ oc get Service -n openshift-cnv kubemacpool-service -oyaml apiVersion: v1 kind: Service metadata: creationTimestamp: "2020-07-14T10:34:12Z" labels: networkaddonsoperator.network.kubevirt.io/version: sha256_ecbcbe6e8ed9015ed23aa3a93440fc3f4728ee79b97c1cfcf9152d05 managedFields: - apiVersion: v1 fieldsType: FieldsV1 fieldsV1: f:metadata: f:labels: .: {} f:networkaddonsoperator.network.kubevirt.io/version: {} f:ownerReferences: .: {} k:{"uid":"f5b53444-eae9-4810-8252-94739af6cfb8"}: .: {} f:apiVersion: {} f:blockOwnerDeletion: {} f:controller: {} f:kind: {} f:name: {} f:uid: {} f:spec: f:ports: .: {} k:{"port":443,"protocol":"TCP"}: .: {} f:port: {} f:protocol: {} f:targetPort: {} f:publishNotReadyAddresses: {} f:selector: .: {} f:kubemacpool-leader: {} f:sessionAffinity: {} f:type: {} manager: cluster-network-addons-operator operation: Update time: "2020-07-14T10:34:12Z" name: kubemacpool-service namespace: openshift-cnv ownerReferences: - apiVersion: networkaddonsoperator.network.kubevirt.io/v1alpha1 blockOwnerDeletion: true controller: true kind: NetworkAddonsConfig name: cluster uid: f5b53444-eae9-4810-8252-94739af6cfb8 resourceVersion: "46753" selfLink: /api/v1/namespaces/openshift-cnv/services/kubemacpool-service uid: 0eb110eb-898d-43fa-b09c-4f669add5c19 spec: clusterIP: 172.30.137.156 ports: - port: 443 protocol: TCP targetPort: 8000 publishNotReadyAddresses: true selector: kubemacpool-leader: "true" sessionAffinity: None type: ClusterIP status: loadBalancer: {} ===============