Bug 2302073 - [External mode] Fio failed on RBD volume followed by error while creating app pod with RBD PVC
Summary: [External mode] Fio failed on RBD volume followed by error while creating app...
Keywords:
Status: NEW
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.16
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Praveen M
QA Contact: krishnaram Karthick
URL:
Whiteboard:
: 2318774 (view as bug list)
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2024-07-31 15:46 UTC by Jilju Joy
Modified: 2024-10-17 09:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Red Hat Issue Tracker OCSBZM-8784 0 None None None 2024-07-31 15:47:13 UTC

Description Jilju Joy 2024-07-31 15:46:36 UTC
Description of problem (please be detailed as possible and provide log
snippests):

The test cases given below failed due to two different errors in external mode cluster. Though this test cases are disruptive in nature, the errors occurred before starting any disruption in the cluster.  

1. tests/functional/pv/pv_services/test_resource_deletion_during_pvc_pod_creation_deletion_and_io.py::TestResourceDeletionDuringMultipleCreateDeleteOperations::test_resource_deletion_during_pvc_pod_creation_deletion_and_io

The test case failed during fio on pod with RBD Block volume mode PVC.

Test case error:

ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n namespace-test-646e9cdc1ac041a3b822c5d9c rsh pod-test-rbd-ec7be2dff97f4a3fa6038ca1a18 fio --name=fio-rand-readwrite --filename=/dev/rbdblock --readwrite=randrw --bs=4K --direct=1 --numjobs=1 --time_based=1 --runtime=30 --size=2G --iodepth=4 --invalidate=0 --fsync_on_close=1 --rwmixread=75 --ioengine=libaio --rate=1m --rate_process=poisson --output-format=json.
Error is fio: io_u error on file /dev/rbdblock: I/O error: read offset=686010368, buflen=4096
fio: io_u error on file /dev/rbdblock: I/O error: read offset=1549148160, buflen=4096
fio: io_u error on file /dev/rbdblock: I/O error: read offset=886792192, buflen=4096
fio: io_u error on file /dev/rbdblock: I/O error: read offset=1520287744, buflen=4096
command terminated with exit code 1


2. After the fio error in the previous test case , the test cases that followed failed while creating RBD PVC (either Block or Filesystem volume mode).
* tests/functional/pv/pvc_clone/test_resource_deletion_during_pvc_clone.py::TestResourceDeletionDuringPvcClone::test_resource_deletion_during_pvc_clone

* tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin]

* tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[cephfsplugin]

* tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin_provisioner]

Error  from the test case test_resource_deletion_during_pvc_clone:

Events:
  Type     Reason                  Age               From                     Message
  ----     ------                  ----              ----                     -------
  Normal   Scheduled               76s               default-scheduler        Successfully assigned namespace-test-a3ed2aa514084ff6840faf7fb/pod-test-rbd-a59ee5fb88884539adb884322ef to compute-1
  Normal   SuccessfulAttachVolume  76s               attachdetach-controller  AttachVolume.Attach succeeded for volume "pvc-47995060-2825-439b-8fa1-a3e3af68833d"
  Warning  FailedMount             6s (x8 over 73s)  kubelet                  MountVolume.MountDevice failed for volume "pvc-47995060-2825-439b-8fa1-a3e3af68833d" : rpc error: code = Internal desc = rbd: map failed with error an error (exit status 108) occurred while running rbd args: [--id csi-rbd-node -m 10.1.160.202:6789,10.1.160.201:6789,10.1.160.198:6789 --keyfile=***stripped*** map rbd/csi-vol-72a53628-18ba-4b5a-adfd-074add00b015 --device-type krbd --options noudev], rbd error output: rbd: sysfs write failed
rbd: map failed: (108) Cannot send after transport endpoint shutdown


The pod creations failed on the node compute-1. The pod in which fio failed was also on the node compute-1.


Test report with error details - https://ocs4-jenkins-csb-odf-qe.apps.ocp-c1.prod.psi.redhat.com/job/qe-deploy-ocs-cluster/40443/testReport/

Must-gather logs collected after individual test failure - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-ext/jijoy-ext_20240731T011605/logs/failed_testcase_ocs_logs_1722434106/

Must-gather collected at the end of all tests - http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-ext/jijoy-ext_20240731T011605/logs/testcases_1722434106/jijoy-ext/

==============================================================================
Version of all relevant components (if applicable):
Cluster Version	4.16.0-0.nightly-2024-07-30-181230
ODF 4.16.1-6
Ceph Version	18.2.1-194.el9cp (04a992766839cd3207877e518a1238cdbac3787e) reef (stable)

===============================================================================

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, I/O failure and error while creating pod with RBD PVC

Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
1

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?


If this is a regression, please provide more details to justify this:
These test cases passed in 4.16.0 and previous versions of ODF in external mode.

Steps to Reproduce:
(describing automated test steps)
1. Create an external mode ODF cluster
2. Create multiple PVCs of RBD block and filesystem volume mode with supported access modes. Create CephFS PVCS as well. 
3. Attach the PVCs to pods. RWX PVC on more than 1 pod.
4. Run fio on pods.
(from next test)
5. Create new RBD PVC and attach it to app pod. Select node where the pod where fio failed in the step 4 was present.

To replicate the exact procedure, run the set of test cases
* tests/functional/pv/pv_services/test_resource_deletion_during_pvc_pod_creation_deletion_and_io.py
* tests/functional/pv/pvc_clone/test_resource_deletion_during_pvc_clone.py::TestResourceDeletionDuringPvcClone::test_resource_deletion_during_pvc_clone

* tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin]

* tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[cephfsplugin]

* tests/functional/pv/pvc_resize/test_resource_deletion_during_pvc_expansion.py::TestResourceDeletionDuringPvcExpansion::test_resource_deletion_during_pvc_expansion[rbdplugin_provisioner]

==============================================================================
Actual results:
In step 4, fio failed on RBD Block volume mode PVC.
In step 5, app pod creation failed.

Expected results:
fio and app pods creation should be successful

Additional info:


Note You need to log in before you can comment on or make changes to this bug.