Bug 2185573
| Summary: | [Longevity] rbd pvc mount to a pod failed with error: "rbd: map failed: (108) Cannot send after transport endpoint shutdown" | ||
|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Prasad Desala <tdesala> |
| Component: | ceph | Assignee: | Ilya Dryomov <idryomov> |
| ceph sub component: | RBD | QA Contact: | Prasad Desala <tdesala> |
| Status: | NEW --- | Docs Contact: | |
| Severity: | high | ||
| Priority: | unspecified | CC: | bniver, hnallurv, idryomov, muagarwa, odf-bz-bot, sheggodu, sostapov, ypadia |
| Version: | 4.12 | ||
| Target Milestone: | --- | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | Type: | Bug | |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Description of problem (please be detailed as possible and provide log snippests): ================================================================================== rbd pvc mount to a pod failed with below error, when running Stage4 test script developed for ODF Longevity testing. This script executes concurrent PVC clone, snapshot and expand operations. ``` rbd error output: rbd: sysfs write failed rbd: map failed: (108) Cannot send after transport endpoint shutdown ``` Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m4s default-scheduler Successfully assigned stage-4-cycle-12-concurrent-operation/pod-test-rbd-5fa7cd7c079d45579522c712c82 to compute-5 Normal SuccessfulAttachVolume 5m4s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c" Warning FailedMount 58s (x2 over 3m1s) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-kh2jq]: timed out waiting for the condition Warning FailedMount 50s (x10 over 5m1s) kubelet MountVolume.MountDevice failed for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 172.30.50.148:6789,172.30.78.51:6789,172.30.65.120:6789 --keyfile=***stripped*** map ocs-storagecluster-cephblockpool/csi-vol-ef418668-5bad-4741-8cf3-95c03098b9a8 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed rbd: map failed: (108) Cannot send after transport endpoint shutdown ocs-ci timestamps logs: ======================== 15:39:08 - ThreadPoolExecutor-2980_0 - ocs_ci.helpers.helpers - INFO - Creating new Pod pod-test-rbd-5fa7cd7c079d45579522c712c82 for test 15:39:08 - ThreadPoolExecutor-2980_0 - ocs_ci.utility.templating - INFO - apiVersion: v1 kind: Pod metadata: name: pod-test-rbd-5fa7cd7c079d45579522c712c82 namespace: stage-4-cycle-12-concurrent-operation spec: containers: - image: quay.io/ocsci/nginx:latest name: web-server volumeMounts: - mountPath: /var/lib/www/html name: mypvc volumes: - name: mypvc persistentVolumeClaim: claimName: clone-pvc-test-d620d47ac72d48-064609de89 readOnly: false 15:44:13 - ThreadPoolExecutor-2980_0 - ocs_ci.ocs.ocp - WARNING - Description of the resource(s) we were waiting for: Name: pod-test-rbd-5fa7cd7c079d45579522c712c82 Namespace: stage-4-cycle-12-concurrent-operation Priority: 0 Service Account: default Node: compute-5/10.1.114.73 Start Time: Sat, 08 Apr 2023 15:39:09 +0300 Labels: <none> Annotations: k8s.ovn.org/pod-networks: {"default":{"ip_addresses":["10.128.4.251/23"],"mac_address":"0a:58:0a:80:04:fb","gateway_ips":["10.128.4.1"],"ip_address":"10.128.4.251/2... openshift.io/scc: privileged Status: Pending IP: IPs: <none> Containers: web-server: Container ID: Image: quay.io/ocsci/nginx:latest Image ID: Port: <none> Host Port: <none> State: Waiting Reason: ContainerCreating Ready: False Restart Count: 0 Environment: <none> Mounts: /var/lib/www/html from mypvc (rw) /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kh2jq (ro) Conditions: Type Status Initialized True Ready False ContainersReady False PodScheduled True Volumes: mypvc: Type: PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace) ClaimName: clone-pvc-test-d620d47ac72d48-064609de89 ReadOnly: false kube-api-access-kh2jq: Type: Projected (a volume that contains injected data from multiple sources) TokenExpirationSeconds: 3607 ConfigMapName: kube-root-ca.crt ConfigMapOptional: <nil> DownwardAPI: true ConfigMapName: openshift-service-ca.crt ConfigMapOptional: <nil> QoS Class: BestEffort Node-Selectors: <none> Tolerations: node.kubernetes.io/not-ready:NoExecute op=Exists for 300s node.kubernetes.io/unreachable:NoExecute op=Exists for 300s Events: Type Reason Age From Message ---- ------ ---- ---- ------- Normal Scheduled 5m4s default-scheduler Successfully assigned stage-4-cycle-12-concurrent-operation/pod-test-rbd-5fa7cd7c079d45579522c712c82 to compute-5 Normal SuccessfulAttachVolume 5m4s attachdetach-controller AttachVolume.Attach succeeded for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c" Warning FailedMount 58s (x2 over 3m1s) kubelet Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-kh2jq]: timed out waiting for the condition Warning FailedMount 50s (x10 over 5m1s) kubelet MountVolume.MountDevice failed for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 172.30.50.148:6789,172.30.78.51:6789,172.30.65.120:6789 --keyfile=***stripped*** map ocs-storagecluster-cephblockpool/csi-vol-ef418668-5bad-4741-8cf3-95c03098b9a8 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed rbd: map failed: (108) Cannot send after transport endpoint shutdown 15:44:13 - ThreadPoolExecutor-2980_0 - ocs_ci.ocs.ocp - ERROR - Wait for Pod resource pod-test-rbd-5fa7cd7c079d45579522c712c82 at column STATUS to reach desired condition Running failed, last actual status was ContainerCreating Version of all relevant components (if applicable): Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Reporting at first occurrence Can this issue reproduce from the UI? NA If this is a regression, please provide more details to justify this: Steps to Reproduce: ==================== 1) Run Stage4 - https://github.com/red-hat-storage/ocs-ci/blob/master/tests/e2e/longevity/test_stage4.py setting the run time for 4 days Summary of the steps: 1. PVC, POD Creation + fill data upto 25% of mount point space 2. Start Concurrent PVC operations of, a) Clone - Creation, Deletion b) Snapshot - Creation, Restoration, Deletion c) Expansion of original PVCs 3. PVC, POD deletion Actual results: ================ rbd pvc mount failed with error: "rbd: map failed: (108) Cannot send after transport endpoint shutdown" Expected results: ================= RBD PVC should mount to a pod successfully without any issues/errors. Additional info: