Bug 2185573 - [Longevity] rbd pvc mount to a pod failed with error: "rbd: map failed: (108) Cannot send after transport endpoint shutdown" [NEEDINFO]
Summary: [Longevity] rbd pvc mount to a pod failed with error: "rbd: map failed: (108)...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: ceph
Version: 4.12
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Ilya Dryomov
QA Contact: Prasad Desala
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2023-04-10 10:51 UTC by Prasad Desala
Modified: 2023-08-09 16:37 UTC (History)
8 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:
tdesala: needinfo? (idryomov)


Attachments (Terms of Use)

Description Prasad Desala 2023-04-10 10:51:34 UTC
Description of problem (please be detailed as possible and provide log
snippests):
==================================================================================
rbd pvc mount to a pod failed with below error, when running Stage4 test script developed for ODF Longevity testing. This script executes concurrent PVC clone, snapshot and expand operations.

```
rbd error output: rbd: sysfs write failed
rbd: map failed: (108) Cannot send after transport endpoint shutdown

```

Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               5m4s                 default-scheduler        Successfully assigned stage-4-cycle-12-concurrent-operation/pod-test-rbd-5fa7cd7c079d45579522c712c82 to compute-5
  Normal   SuccessfulAttachVolume  5m4s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c"
  Warning  FailedMount             58s (x2 over 3m1s)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-kh2jq]: timed out waiting for the condition
  Warning  FailedMount             50s (x10 over 5m1s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 172.30.50.148:6789,172.30.78.51:6789,172.30.65.120:6789 --keyfile=***stripped*** map ocs-storagecluster-cephblockpool/csi-vol-ef418668-5bad-4741-8cf3-95c03098b9a8 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (108) Cannot send after transport endpoint shutdown


ocs-ci timestamps logs:
========================

15:39:08 - ThreadPoolExecutor-2980_0 - ocs_ci.helpers.helpers - INFO  - Creating new Pod pod-test-rbd-5fa7cd7c079d45579522c712c82 for test

15:39:08 - ThreadPoolExecutor-2980_0 - ocs_ci.utility.templating - INFO  - apiVersion: v1
kind: Pod
metadata:
  name: pod-test-rbd-5fa7cd7c079d45579522c712c82
  namespace: stage-4-cycle-12-concurrent-operation
spec:
  containers:
  - image: quay.io/ocsci/nginx:latest
    name: web-server
    volumeMounts:
    - mountPath: /var/lib/www/html
      name: mypvc
  volumes:
  - name: mypvc
    persistentVolumeClaim:
      claimName: clone-pvc-test-d620d47ac72d48-064609de89
      readOnly: false

15:44:13 - ThreadPoolExecutor-2980_0 - ocs_ci.ocs.ocp - WARNING  - Description of the resource(s) we were waiting for:
Name:             pod-test-rbd-5fa7cd7c079d45579522c712c82
Namespace:        stage-4-cycle-12-concurrent-operation
Priority:         0
Service Account:  default
Node:             compute-5/10.1.114.73
Start Time:       Sat, 08 Apr 2023 15:39:09 +0300
Labels:           <none>
Annotations:      k8s.ovn.org/pod-networks:
                    {"default":{"ip_addresses":["10.128.4.251/23"],"mac_address":"0a:58:0a:80:04:fb","gateway_ips":["10.128.4.1"],"ip_address":"10.128.4.251/2...
                  openshift.io/scc: privileged
Status:           Pending
IP:
IPs:              <none>
Containers:
  web-server:
    Container ID:
    Image:          quay.io/ocsci/nginx:latest
    Image ID:
    Port:           <none>
    Host Port:      <none>
    State:          Waiting
      Reason:       ContainerCreating
    Ready:          False
    Restart Count:  0
    Environment:    <none>
    Mounts:
      /var/lib/www/html from mypvc (rw)
      /var/run/secrets/kubernetes.io/serviceaccount from kube-api-access-kh2jq (ro)
Conditions:
  Type              Status
  Initialized       True
  Ready             False
  ContainersReady   False
  PodScheduled      True
Volumes:
  mypvc:
    Type:       PersistentVolumeClaim (a reference to a PersistentVolumeClaim in the same namespace)
    ClaimName:  clone-pvc-test-d620d47ac72d48-064609de89
    ReadOnly:   false
  kube-api-access-kh2jq:
    Type:                    Projected (a volume that contains injected data from multiple sources)
    TokenExpirationSeconds:  3607
    ConfigMapName:           kube-root-ca.crt
    ConfigMapOptional:       <nil>
    DownwardAPI:             true
    ConfigMapName:           openshift-service-ca.crt
    ConfigMapOptional:       <nil>
QoS Class:                   BestEffort
Node-Selectors:              <none>
Tolerations:                 node.kubernetes.io/not-ready:NoExecute op=Exists for 300s
                             node.kubernetes.io/unreachable:NoExecute op=Exists for 300s
Events:
  Type     Reason                  Age                  From                     Message
  ----     ------                  ----                 ----                     -------
  Normal   Scheduled               5m4s                 default-scheduler        Successfully assigned stage-4-cycle-12-concurrent-operation/pod-test-rbd-5fa7cd7c079d45579522c712c82 to compute-5
  Normal   SuccessfulAttachVolume  5m4s                 attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c"
  Warning  FailedMount             58s (x2 over 3m1s)   kubelet                  Unable to attach or mount volumes: unmounted volumes=[mypvc], unattached volumes=[mypvc kube-api-access-kh2jq]: timed out waiting for the condition
  Warning  FailedMount             50s (x10 over 5m1s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-13e9667f-abbb-4652-bd8f-6b8e70c62c5c" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 172.30.50.148:6789,172.30.78.51:6789,172.30.65.120:6789 --keyfile=***stripped*** map ocs-storagecluster-cephblockpool/csi-vol-ef418668-5bad-4741-8cf3-95c03098b9a8 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (108) Cannot send after transport endpoint shutdown

15:44:13 - ThreadPoolExecutor-2980_0 - ocs_ci.ocs.ocp - ERROR  - Wait for Pod resource pod-test-rbd-5fa7cd7c079d45579522c712c82 at column STATUS to reach desired condition Running failed, last actual status was ContainerCreating


Version of all relevant components (if applicable):


Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?


Is there any workaround available to the best of your knowledge?


Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?


Can this issue reproducible?
Reporting at first occurrence 


Can this issue reproduce from the UI?
NA

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
====================
1) Run Stage4 - https://github.com/red-hat-storage/ocs-ci/blob/master/tests/e2e/longevity/test_stage4.py setting the run time for 4 days

Summary of the steps:
1. PVC, POD Creation + fill data upto 25% of mount point space
2. Start Concurrent PVC operations of,
   a) Clone - Creation, Deletion
   b) Snapshot - Creation, Restoration, Deletion
   c) Expansion of original PVCs
3. PVC, POD deletion

Actual results:
================
rbd pvc mount failed with error: "rbd: map failed: (108) Cannot send after transport endpoint shutdown"

Expected results:
=================
RBD PVC should mount to a pod successfully without any issues/errors.

Additional info:


Note You need to log in before you can comment on or make changes to this bug.