Description of problem (please be detailed as possible and provide log snippests): While using RBD RWX access mode PVC on more than one pod, the data written from one pod is not available on other pods. On each pod, only the content created from that particular pod is available. Failed ocs-ci test case : tests/manage/pv_services/test_pvc_rwx_writeable_after_pod_deletions.py::TestRWXMountPoint::test_pvc_rwx_writeable_after_pod_deletions[CephBlockPool] The test case was executed from the PR https://github.com/red-hat-storage/ocs-ci/pull/4674 Test case error: E ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n namespace-test-555409e4fbe8499daed8b08cf rsh pod-test-rbd-c7339c72260240c5a71ae110912 bash -c "find /var/lib/www/html/pod-test-rbd-52d00d92c21249818ba7ff9e37c". E Error is find: '/var/lib/www/html/pod-test-rbd-52d00d92c21249818ba7ff9e37c': No such file or directory E command terminated with exit code 1 ocs_ci/utility/utils.py:510: CommandFailed Test case logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/ocs-ci-logs-1628165121/tests/manage/pv_services/test_pvc_rwx_writeable_after_pod_deletions.py/TestRWXMountPoint/test_pvc_rwx_writeable_after_pod_deletions-CephBlockPool/ must-gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/failed_testcase_ocs_logs_1628165121/test_pvc_rwx_writeable_after_pod_deletions%5bCephBlockPool%5d_ocs_logs/ PVC yaml used in the test case: apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-test-75af265fd6dc4efb890f0f55e407459 namespace: namespace-test-555409e4fbe8499daed8b08cf spec: accessModes: - ReadWriteMany resources: requests: storage: 10Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block Name of the pods created in the test case: pod-test-rbd-c7339c72260240c5a71ae110912 pod-test-rbd-52d00d92c21249818ba7ff9e37c pod-test-rbd-1f974f9b45914b039ff9a746596 Pod yaml example: apiVersion: v1 kind: Pod metadata: name: pod-test-rbd-c7339c72260240c5a71ae110912 namespace: namespace-test-555409e4fbe8499daed8b08cf spec: containers: - image: quay.io/ocsci/nginx:latest imagePullPolicy: IfNotPresent name: my-container securityContext: capabilities: add: - SYS_ADMIN privileged: true volumeDevices: - devicePath: /dev/rbdblock name: my-volume nodeName: compute-0 volumes: - name: my-volume persistentVolumeClaim: claimName: pvc-test-75af265fd6dc4efb890f0f55e407459 fio was run from each pod with different file name: fio --name=fio-rand-readwrite --filename=/var/lib/www/html/pod-test-rbd-c7339c72260240c5a71ae110912 --readwrite=randrw --bs=4K --direct=1 --numjobs=1 --time_based=1 --runtime=30 --size=512M --iodepth=4 --invalidate=1 --fsync_on_close=1 --rwmixread=75 --ioengine=libaio --rate=1m --rate_process=poisson --end_fsync=1 --output-format=json =============================================================================================================== Version of all relevant components (if applicable): OCS operator v4.8.0-175.ci Cluster Version 4.8.0-0.nightly-2021-07-31-065602 Ceph Version 14.2.11-184.el8cp (44441323476fee97be0ff7a92c6065958c77f1b9) nautilus (stable) rook_csi_ceph cephcsi@sha256:6077d1254592eab58f5e11a6ad19612bf0387427426718bf523a2299b48334c8 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Yes, more than one pod cannot consume the same content when using RBD RWX PVC. Is there any workaround available to the best of your knowledge? No Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? Yes If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create RBD PVC with volume mode Block and access mode RWX. 2. Create 3 pods on three different nodes to consume the PVC. 3. Use ext4 file system to mount the volume on each pod. # mkfs.ext4 /dev/rbdblock - from one pod only # mount -t ext4 /dev/rbdblock /var/lib/www/html - all pods. 4. Run fio from each of the three pods. Keep the fine name different while running from each pod. Add --fsync_on_close=1 parameter. 5. Wait for fio to finish on all pods. 6. Check the existence of 3 files from each pod. Or run the test case tests/manage/pv_services/test_pvc_rwx_writeable_after_pod_deletions.py::TestRWXMountPoint::test_pvc_rwx_writeable_after_pod_deletions[CephBlockPool] from the PR https://github.com/red-hat-storage/ocs-ci/pull/4674 ========================================================================================================== Actual results: On each pod, only the file created from that particular pod is available. File created from the other two pods are missing. Expected results: The data written from one pod should be available on the other pod which is consuming the same RBD RWX PVC. Additional info: CephFS RWX PVC test case passed.
These steps are not correct for block-mode volumes: 1. Create RBD PVC with volume mode Block and access mode RWX. 2. Create 3 pods on three different nodes to consume the PVC. 3. Use ext4 file system to mount the volume on each pod. # mkfs.ext4 /dev/rbdblock - from one pod only # mount -t ext4 /dev/rbdblock /var/lib/www/html - all pods. Creating a local filesytem on the shared block-device will not work. You will need to use an application that can consume the block-device without a filesystem (or use a cluster aware filesystem like GFS2, but that is out of scope for OCP). So, in tests, only use the device as specified in the Pod, without creating a filesystem on it: volumeDevices: - devicePath: /dev/rbdblock Use direct-io to prevent caching. I do not know if there is a way in fio to specify an offset that should be used for running the test. Each Pod should use its own area on the device, so that they do not overwrite each others data. Or, maybe fio can be informed that the block-device is shared, and it can coordinate parallel access by itself. Adding the TestCaseRejected keyword, please rework the test. This can probably be closed as NOTABUG.
Hi Niels, Created a new test case which is not using a filesystem to consume the block device. The test case creates 3 pods and run fio from one pod only. 1. Create RBD PVC with volume mode Block and access mode RWX. apiVersion: v1 kind: PersistentVolumeClaim metadata: name: pvc-test-db366e74a3b1487585ca7aafa5ae5d9 namespace: namespace-test-1386ef1881ac4ac8b14d3b101 spec: accessModes: - ReadWriteMany resources: requests: storage: 5Gi storageClassName: ocs-storagecluster-ceph-rbd volumeMode: Block 2. Create 3 pods on three different nodes to consume the PVC. Example pod yaml: apiVersion: v1 kind: Pod metadata: name: pod-test-rbd-2b3bbed7be354ab3bf33b80097d namespace: namespace-test-1386ef1881ac4ac8b14d3b101 spec: containers: - image: quay.io/ocsci/nginx:latest imagePullPolicy: IfNotPresent name: my-container securityContext: capabilities: add: - SYS_ADMIN volumeDevices: - devicePath: /dev/rbdblock name: my-volume nodeName: compute-0 volumes: - name: my-volume persistentVolumeClaim: claimName: pvc-test-db366e74a3b1487585ca7aafa5ae5d9 3. Verify that the md5sum value is same on all pods. md5sum /dev/rbdblock (where /dev/rbdblock is the devicePath) 4. Run fio from one pod. fio --name=fio-rand-write --filename=/dev/rbdblock --rw=randwrite --bs=4K --direct=0 --numjobs=1 --time_based=1 --runtime=30 --size=1G --ioengine=libaio --iodepth=4 --rate=1m --rate_process=poisson --end_fsync=1 --output-format=json 5. Wait for fio completion 6. Verify that the md5sum value is same on all pods. md5sum /dev/rbdblock Actual result: Step 6 failed. The md5sum value of /dev/rbdblock (devicePath) changed only on the pod from which fio was executed. md5sum value obtained from the other two pods was same as the initial value from step 3. Test case was run from PR https://github.com/red-hat-storage/ocs-ci/pull/4677 Relevant logs from the test case: 10:41:15 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - Find initial md5sum value 10:41:15 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-2b3bbed7be354ab3bf33b80097d bash -c "md5sum /dev/rbdblock" 10:41:42 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45 10:41:42 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-551fc92e58564d4295c71f5b450 bash -c "md5sum /dev/rbdblock" 10:42:06 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45 10:42:06 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-6654faf450a8477aac40dff34ca bash -c "md5sum /dev/rbdblock" 10:42:31 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45 10:42:31 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - Run IO from one pod 10:44:01 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - Verify md5sum has changed after IO. Verify from all pods. 10:44:01 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-2b3bbed7be354ab3bf33b80097d bash -c "md5sum /dev/rbdblock" 10:44:27 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: a5d7cf45af6fa00a436f160b60f8e4ac 10:44:27 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - md5sum obtained from the pod pod-test-rbd-2b3bbed7be354ab3bf33b80097d has changed after IO 10:44:27 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-551fc92e58564d4295c71f5b450 bash -c "md5sum /dev/rbdblock" 10:44:47 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45 Test case error: E AssertionError: md5sum obtained from the pod pod-test-rbd-551fc92e58564d4295c71f5b450 has not changed after IO. IO was run from pod pod-test-rbd-2b3bbed7be354ab3bf33b80097d E assert 'ec4bcc8776ea04479b786e063a9ace45' != 'ec4bcc8776ea04479b786e063a9ace45' E + where 'ec4bcc8776ea04479b786e063a9ace45' = <ocs_ci.ocs.resources.pod.Pod object at 0x7fa03e348d90>.md5sum_before_io E + and 'ec4bcc8776ea04479b786e063a9ace45' = <ocs_ci.ocs.resources.pod.Pod object at 0x7fa03e348d90>.md5sum_after_io tests/manage/pv_services/test_rbd_rwx_pvc.py:93: AssertionError --------------------------- Captured stdout teardown --------------------------- FAILED [100%] Test case logs: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/ocs-ci-logs-1628246083/tests/manage/pv_services/test_rbd_rwx_pvc.py/TestRbdBlockPvc/test_rbd_block_rwx_pvc/ must-gather: http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/failed_testcase_ocs_logs_1628246083/test_rbd_block_rwx_pvc_ocs_logs/ Tested in version: OCS operator v4.8.0-175.ci Cluster Version 4.8.0-0.nightly-2021-07-31-065602 Ceph Version 14.2.11-184.el8cp (44441323476fee97be0ff7a92c6065958c77f1b9) nautilus (stable) rook_csi_ceph cephcsi@sha256:6077d1254592eab58f5e11a6ad19612bf0387427426718bf523a2299b48334c8
Hi Jilju, this looks much better already :) There is just a small issue which might be related to how `md5sum` is used here. When the Linux kernel attaches a block-device, it does some probing for the contents (like partition table). That means, reading from the device might actually read parts from a cache. The 2nd md5sum command (after fio was run) might as well read from the cache again (filled when reading with the 1st md5sum command). Could you test this by reading the device through `dd`, maybe like `dd iflag=direct if=/dev/rbdblock | md5sum`?
(In reply to Niels de Vos from comment #4) > Hi Jilju, this looks much better already :) > > There is just a small issue which might be related to how `md5sum` is used > here. When the Linux kernel attaches a block-device, it does some probing > for the contents (like partition table). That means, reading from the device > might actually read parts from a cache. The 2nd md5sum command (after fio > was run) might as well read from the cache again (filled when reading with > the 1st md5sum command). > > Could you test this by reading the device through `dd`, maybe like `dd > iflag=direct if=/dev/rbdblock | md5sum`? Thanks Niels. The test case passed after using the method you suggested above to obtain md5sum. Closing the as not a bug.