Description of problem: On BM's (this one seen on bm03-cnvqe2-rdu2), hpp-pool pod gets stuck in a CrashLoopBackOff state. Seems like it is ceph related. HPP is backed by OCS. Version-Release number of selected component (if applicable): $ oc get csv -A | grep kubevirt openshift-cnv kubevirt-hyperconverged-operator.v4.13.0 OpenShift Virtualization 4.13.0 kubevirt-hyperconverged-operator.v4.12.3 Succeeded $ oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.13.0-rc.5 True False 11d Cluster version is 4.13.0-rc.5 How reproducible: I'm not sure what's triggering this issue. After running the network team test suite, which creates network components and VMs, on some BM's the hpp-pool gets into this state. Steps to Reproduce: 1. 2. 3. Actual results: Hpp-pool pod gets stuck in a CrashLoopBackOff state. $ oc get pods -n openshift-cnv | grep hpp openshift-cnv hpp-pool-29ab9406-755647446d-44jfk 0/1 Terminating 10 43h openshift-cnv hpp-pool-29ab9406-755647446d-d6rn7 0/1 CrashLoopBackOff 497 (4m5s ago) 42h openshift-cnv hpp-pool-4356e54b-7df67db896-8vq5t 0/1 Terminating 3 43h openshift-cnv hpp-pool-4356e54b-7df67db896-ntqpr 0/1 CrashLoopBackOff 502 (3m22s ago) 42h openshift-cnv hpp-pool-7dfd761c-cf499b659-9mdk7 1/1 Running 0 42h $ oc get pods hpp-pool-29ab9406-755647446d-d6rn7 -oyaml apiVersion: v1 kind: Pod metadata: annotations: k8s.ovn.org/pod-networks: '{"default":{"ip_addresses":["10.128.2.5/23"],"mac_address":"0a:58:0a:80:02:05","gateway_ips":["10.128.2.1"],"ip_address":"10.128.2.5/23","gateway_ip":"10.128.2.1"}}' k8s.v1.cni.cncf.io/network-status: |- [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.2.5" ], "mac": "0a:58:0a:80:02:05", "default": true, "dns": {} }] openshift.io/scc: hostpath-provisioner-csi creationTimestamp: "2023-05-13T14:24:31Z" generateName: hpp-pool-29ab9406-755647446d- labels: hpp-pool: hpp-csi-pvc-block-hpp pod-template-hash: 755647446d name: hpp-pool-29ab9406-755647446d-d6rn7 namespace: openshift-cnv ownerReferences: - apiVersion: apps/v1 blockOwnerDeletion: true controller: true kind: ReplicaSet name: hpp-pool-29ab9406-755647446d uid: 6d6089af-1e72-4602-9f67-c212bcb1dac8 resourceVersion: "22166040" uid: a5162c1e-babc-455e-a071-262b81d48c8a spec: affinity: nodeAffinity: requiredDuringSchedulingIgnoredDuringExecution: nodeSelectorTerms: - matchExpressions: - key: kubernetes.io/hostname operator: In values: - cnv-qe-infra-19.cnvqe2.lab.eng.rdu2.redhat.com containers: - command: - /usr/bin/mounter - --storagePoolPath - /dev/data - --mountPath - /var/hpp-csi-pvc-block/csi - --hostPath - /host image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e imagePullPolicy: IfNotPresent name: mounter resources: requests: cpu: 10m memory: 100Mi securityContext: capabilities: drop: - KILL - MKNOD - SETGID - SETUID privileged: true runAsUser: 0 terminationMessagePath: /dev/termination-log terminationMessagePolicy: File volumeDevices: - devicePath: /dev/data name: data volumeMounts: - mountPath: /host mountPropagation: Bidirectional name: host-root - mountPath: /var/run/secrets/kubernetes.io/serviceaccount name: kube-api-access-ql72g readOnly: true dnsPolicy: ClusterFirst enableServiceLinks: true imagePullSecrets: - name: hostpath-provisioner-admin-csi-dockercfg-xn7tq nodeName: cnv-qe-infra-19.cnvqe2.lab.eng.rdu2.redhat.com preemptionPolicy: PreemptLowerPriority priority: 0 restartPolicy: Always schedulerName: default-scheduler securityContext: {} serviceAccount: hostpath-provisioner-admin-csi serviceAccountName: hostpath-provisioner-admin-csi terminationGracePeriodSeconds: 30 tolerations: - effect: NoExecute key: node.kubernetes.io/not-ready operator: Exists tolerationSeconds: 300 - effect: NoExecute key: node.kubernetes.io/unreachable operator: Exists tolerationSeconds: 300 - effect: NoSchedule key: node.kubernetes.io/memory-pressure operator: Exists volumes: - name: data persistentVolumeClaim: claimName: hpp-pool-29ab9406 - hostPath: path: / type: Directory name: host-root - name: kube-api-access-ql72g projected: defaultMode: 420 sources: - serviceAccountToken: expirationSeconds: 3607 path: token - configMap: items: - key: ca.crt path: ca.crt name: kube-root-ca.crt - downwardAPI: items: - fieldRef: apiVersion: v1 fieldPath: metadata.namespace path: namespace - configMap: items: - key: service-ca.crt path: service-ca.crt name: openshift-service-ca.crt status: conditions: - lastProbeTime: null lastTransitionTime: "2023-05-13T14:45:11Z" status: "True" type: Initialized - lastProbeTime: null lastTransitionTime: "2023-05-13T14:45:11Z" message: 'containers with unready status: [mounter]' reason: ContainersNotReady status: "False" type: Ready - lastProbeTime: null lastTransitionTime: "2023-05-13T14:45:11Z" message: 'containers with unready status: [mounter]' reason: ContainersNotReady status: "False" type: ContainersReady - lastProbeTime: null lastTransitionTime: "2023-05-13T14:45:09Z" status: "True" type: PodScheduled containerStatuses: - containerID: cri-o://5c71c577ce6c36921126314719346663f5cf9c072264d408d362bf45857219f9 image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e imageID: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:045ad111f8d3fe28b8cf77df49a264922c9fa4cc46759ed98ef044077225a23e lastState: terminated: containerID: cri-o://5c71c577ce6c36921126314719346663f5cf9c072264d408d362bf45857219f9 exitCode: 2 finishedAt: "2023-05-15T08:29:59Z" reason: Error startedAt: "2023-05-15T08:29:59Z" name: mounter ready: false restartCount: 494 started: false state: waiting: message: back-off 5m0s restarting failed container=mounter pod=hpp-pool-29ab9406-755647446d-d6rn7_openshift-cnv(a5162c1e-babc-455e-a071-262b81d48c8a) reason: CrashLoopBackOff hostIP: 10.1.156.19 phase: Running podIP: 10.128.2.5 podIPs: - ip: 10.128.2.5 qosClass: Burstable startTime: "2023-05-13T14:45:11Z" Expected results: Additional info: W/A - force delete pvn+hpp-pool pods. Additional info from the cluster: $ oc get pods -n openshift-storage NAME READY STATUS RESTARTS AGE csi-addons-controller-manager-6976d48f69-fmpct 2/2 Running 9 (42h ago) 42h csi-cephfsplugin-7gqcc 2/2 Running 6 11d csi-cephfsplugin-pgg6z 2/2 Running 4 11d csi-cephfsplugin-provisioner-cc76c4b9-vmpk6 5/5 Running 0 42h csi-cephfsplugin-provisioner-cc76c4b9-xp9rt 5/5 Running 0 42h csi-cephfsplugin-q4r8n 2/2 Running 4 11d csi-rbdplugin-j8465 3/3 Running 9 11d csi-rbdplugin-jl4jf 3/3 Running 6 11d csi-rbdplugin-provisioner-8558756f4f-fvtb2 6/6 Running 0 42h csi-rbdplugin-provisioner-8558756f4f-kxgpp 6/6 Running 0 42h csi-rbdplugin-wgjml 3/3 Running 6 11d noobaa-operator-645c48c4c5-6gx4w 1/1 Running 0 42h ocs-metrics-exporter-774f4b58cc-5ngc5 1/1 Running 0 42h ocs-operator-5b5d98d58d-zl7zq 1/1 Running 11 (41h ago) 42h odf-console-78bb5b66-4mnfb 1/1 Running 0 42h odf-operator-controller-manager-7db8d4fd4c-ltzkd 2/2 Running 0 42h rook-ceph-crashcollector-03d7e1289c5164e19d0d22d6856ffdae-9b4nt 1/1 Running 0 42h rook-ceph-crashcollector-374253a427dc62aef82d81f5fc14643e-44bqw 1/1 Running 0 42h rook-ceph-crashcollector-c903e190df41042ede88f92c4aa10277-n5jbj 1/1 Running 0 42h rook-ceph-mds-ocs-storagecluster-cephfilesystem-a-666b46d6k42f8 2/2 Running 0 42h rook-ceph-mds-ocs-storagecluster-cephfilesystem-b-84bb79d6hz5dp 2/2 Running 0 42h rook-ceph-mgr-a-7fd8968d84-p2sx4 2/2 Running 0 42h rook-ceph-mon-d-54b48b9549-rf69w 2/2 Running 0 42h rook-ceph-mon-e-cc8d486-94tff 2/2 Running 0 42h rook-ceph-mon-g-66d7d99bd7-44gjd 2/2 Running 0 42h rook-ceph-operator-5b595585d7-kpnsd 1/1 Running 8 (42h ago) 42h rook-ceph-osd-0-7987b8c66c-89rws 2/2 Running 0 42h rook-ceph-osd-1-7956cc5998-6ghk2 2/2 Running 0 42h rook-ceph-osd-2-6f6cfb658f-kdcmp 2/2 Running 0 42h $ oc get pods -A | grep hostpath openshift-cnv hostpath-provisioner-csi-lzvq6 4/4 Running 4 5d1h openshift-cnv hostpath-provisioner-csi-s69jh 4/4 Running 8 5d1h openshift-cnv hostpath-provisioner-csi-td8hj 4/4 Running 4 5d1h openshift-cnv hostpath-provisioner-operator-77f6f799d5-5dtlz 1/1 Running 1 (42h ago) 42h $ oc get pods -A | grep hpp openshift-cnv hpp-pool-29ab9406-755647446d-44jfk 0/1 Terminating 10 43h openshift-cnv hpp-pool-29ab9406-755647446d-d6rn7 0/1 CrashLoopBackOff 497 (4m5s ago) 42h openshift-cnv hpp-pool-4356e54b-7df67db896-8vq5t 0/1 Terminating 3 43h openshift-cnv hpp-pool-4356e54b-7df67db896-ntqpr 0/1 CrashLoopBackOff 502 (3m22s ago) 42h openshift-cnv hpp-pool-7dfd761c-cf499b659-9mdk7 1/1 Running 0 42h
Is this still reproducing? I wonder if it was just an intermittent environmental issue.
can't be reproduce.
I can reproduce this error with NFS backed storage. [root@api-int ~]# oc debug hpp-pool-66a3ae7d-7b586fb698-znstp Starting pod/hpp-pool-66a3ae7d-7b586fb698-znstp-debug, command was: /usr/bin/mounter --storagePoolPath /source --mountPath /var/hpvolumes/csi --hostPath /host Pod IP: 10.128.1.19 If you don't see a command prompt, try pressing enter. sh-5.1# /usr/bin/mounter --storagePoolPath /source --mountPath /var/hpvolumes/csi --hostPath /host {"level":"info","ts":1695162977.3244886,"logger":"mounter","msg":"Go Version: go1.19.10"} {"level":"info","ts":1695162977.3245575,"logger":"mounter","msg":"Go OS/Arch: linux/amd64"} {"level":"info","ts":1695162977.370991,"logger":"mounter","msg":"Found mount info","source path on host":"hostname.domain.net:/mnt/ovirt/openshift/nfs/vols/pvc-68b78950-9bbc-4d55-96c7-b27c5c66bbfb"} {"level":"info","ts":1695162977.371048,"logger":"mounter","msg":"Target path","path":"/var/hpvolumes/csi"} {"level":"info","ts":1695162977.3710966,"logger":"mounter","msg":"host path","path":"/host"} panic: stat hostname.domain.net:/mnt/ovirt/openshift/nfs/vols/pvc-68b78950-9bbc-4d55-96c7-b27c5c66bbfb: no such file or directory ### sh-5.1# stat /source File: /source Size: 3 Blocks: 1 IO Block: 131072 directory Device: 400046h/4194374d Inode: 34 Links: 2 Access: (0777/drwxrwxrwx) Uid: ( 0/ root) Gid: ( 0/ root) Access: 2023-09-19 19:04:47.270279314 +0000 Modify: 2023-09-19 21:49:05.032855479 +0000 Change: 2023-09-19 21:49:05.032855479 +0000 Birth: - ### sh-5.1# mountpoint /source/ /source/ is a mountpoint ### sh-5.1# echo "write-test" > /source/test sh-5.1# cat /source/test write-test ### [root@api-int ~]# oc describe pod hpp-pool-66a3ae7d-7b586fb698-znstp Name: hpp-pool-66a3ae7d-7b586fb698-znstp Namespace: openshift-cnv Priority: 0 Service Account: hostpath-provisioner-admin-csi Node: api-int.os-prd.domain.net.0.168.192.in-addr.arpa/192.168.0.26 Start Time: Tue, 19 Sep 2023 19:04:55 +0000 Labels: hpp-pool=local-hpp pod-template-hash=7b586fb698 Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.128.0.160/23"],"mac_address":"0a:58:0a:80:00:a0","gateway_ips":["10.128.0.1"],"ip_address":"10.128.0.160/2... k8s.v1.cni.cncf.io/network-status: [{ "name": "ovn-kubernetes", "interface": "eth0", "ips": [ "10.128.0.160" ], "mac": "0a:58:0a:80:00:a0", "default": true, "dns": {} }] openshift.io/scc: hostpath-provisioner-csi Status: Running IP: 10.128.0.160 IPs: IP: 10.128.0.160 Controlled By: ReplicaSet/hpp-pool-66a3ae7d-7b586fb698 Containers: mounter: Container ID: cri-o://62a93cb7d3465ec9322b40fa6cd028e12f4a36978a3af686c35e99c8d24381cc Image: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:e5fa0aa2d6a48dd2b5e14b9d3741c144b371845c3dbee0dd3a440a1d5fa6d777 Image ID: registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:e5fa0aa2d6a48dd2b5e14b9d3741c144b371845c3dbee0dd3a440a1d5fa6d777 Port: <none> Host Port: <none> Command: /usr/bin/mounter --storagePoolPath /source --mountPath /var/hpvolumes/csi --hostPath /host State: Waiting Reason: CrashLoopBackOff Last State: Terminated Reason: Error Exit Code: 2 Started: Tue, 19 Sep 2023 23:05:52 +0000 Finished: Tue, 19 Sep 2023 23:05:52 +0000 Ready: False Restart Count: 52 Requests: cpu: 10m memory: 100Mi Environment: <none> Mounts: /host from host-root (rw) /source from data (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-tvh7z (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: data: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: hpp-pool-66a3ae7d ReadOnly: false host-root: Type: HostPath (bare host directory volume) Path: / HostPathType: Directory kube-api-access-tvh7z: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: Burstable Node-Selectors: <none> Tolerations: node.kubernetes.io/memory-pressure:NoSchedule op=Exists node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Pulled 25m (x49 over 4h5m) kubelet Container image "registry.redhat.io/container-native-virtualization/hostpath-provisioner-operator-rhel9@sha256:e5fa0aa2d6a48dd2b5e14b9d3741c144b371845c3dbee0dd3a440a1d5fa6d777" already present on machine Warning BackOff 21s (x1118 over 4h5m) kubelet Back-off restarting failed container mounter in pod hpp-pool-66a3ae7d-7b586fb698-znstp_openshift-cnv(f18608e2-05c3-4c7b-80b7-08a41bb10e65) [root@api-int ~]#
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 120 days