Bug 1990491 - [RBD]While using RBD RWX access mode PVC on more than one pod, the data written from one pod is not available on other pods
Summary: [RBD]While using RBD RWX access mode PVC on more than one pod, the data writt...
Keywords:
Status: CLOSED NOTABUG
Alias: None
Product: Red Hat OpenShift Data Foundation
Classification: Red Hat Storage
Component: csi-driver
Version: 4.8
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Niels de Vos
QA Contact: Elad
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2021-08-05 14:10 UTC by Jilju Joy
Modified: 2023-08-09 16:37 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2021-08-10 12:30:42 UTC
Embargoed:


Attachments (Terms of Use)

Description Jilju Joy 2021-08-05 14:10:42 UTC
Description of problem (please be detailed as possible and provide log
snippests):

While using RBD RWX access mode PVC on more than one pod, the data written from one pod is not available on other pods. On each pod, only the content created from that particular pod is available.

Failed ocs-ci test case : tests/manage/pv_services/test_pvc_rwx_writeable_after_pod_deletions.py::TestRWXMountPoint::test_pvc_rwx_writeable_after_pod_deletions[CephBlockPool]

The test case was executed from the PR https://github.com/red-hat-storage/ocs-ci/pull/4674


Test case error:

E           ocs_ci.ocs.exceptions.CommandFailed: Error during execution of command: oc -n namespace-test-555409e4fbe8499daed8b08cf rsh pod-test-rbd-c7339c72260240c5a71ae110912 bash -c "find /var/lib/www/html/pod-test-rbd-52d00d92c21249818ba7ff9e37c".
E           Error is find: '/var/lib/www/html/pod-test-rbd-52d00d92c21249818ba7ff9e37c': No such file or directory
E           command terminated with exit code 1

ocs_ci/utility/utils.py:510: CommandFailed



Test case logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/ocs-ci-logs-1628165121/tests/manage/pv_services/test_pvc_rwx_writeable_after_pod_deletions.py/TestRWXMountPoint/test_pvc_rwx_writeable_after_pod_deletions-CephBlockPool/


must-gather:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/failed_testcase_ocs_logs_1628165121/test_pvc_rwx_writeable_after_pod_deletions%5bCephBlockPool%5d_ocs_logs/


PVC yaml used in the test case:

apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-test-75af265fd6dc4efb890f0f55e407459
  namespace: namespace-test-555409e4fbe8499daed8b08cf
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 10Gi
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Block



Name of the pods created in the test case:
pod-test-rbd-c7339c72260240c5a71ae110912
pod-test-rbd-52d00d92c21249818ba7ff9e37c
pod-test-rbd-1f974f9b45914b039ff9a746596


Pod yaml example:

apiVersion: v1
kind: Pod
metadata:
  name: pod-test-rbd-c7339c72260240c5a71ae110912
  namespace: namespace-test-555409e4fbe8499daed8b08cf
spec:
  containers:
  - image: quay.io/ocsci/nginx:latest
    imagePullPolicy: IfNotPresent
    name: my-container
    securityContext:
      capabilities:
        add:
        - SYS_ADMIN
      privileged: true
    volumeDevices:
    - devicePath: /dev/rbdblock
      name: my-volume
  nodeName: compute-0
  volumes:
  - name: my-volume
    persistentVolumeClaim:
      claimName: pvc-test-75af265fd6dc4efb890f0f55e407459


fio was run from each pod with different file name:

fio --name=fio-rand-readwrite --filename=/var/lib/www/html/pod-test-rbd-c7339c72260240c5a71ae110912 --readwrite=randrw --bs=4K --direct=1 --numjobs=1 --time_based=1 --runtime=30 --size=512M --iodepth=4 --invalidate=1 --fsync_on_close=1 --rwmixread=75 --ioengine=libaio --rate=1m --rate_process=poisson --end_fsync=1 --output-format=json


===============================================================================================================

Version of all relevant components (if applicable):
OCS operator	v4.8.0-175.ci
Cluster Version	4.8.0-0.nightly-2021-07-31-065602
Ceph Version	14.2.11-184.el8cp (44441323476fee97be0ff7a92c6065958c77f1b9) nautilus (stable)

rook_csi_ceph	cephcsi@sha256:6077d1254592eab58f5e11a6ad19612bf0387427426718bf523a2299b48334c8

Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
Yes, more than one pod cannot consume the same content when using RBD RWX PVC.


Is there any workaround available to the best of your knowledge?
No

Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2

Can this issue reproducible?
Yes

Can this issue reproduce from the UI?
Yes

If this is a regression, please provide more details to justify this:


Steps to Reproduce:
1. Create RBD PVC with volume mode Block and access mode RWX.
2. Create 3 pods on three different nodes to consume the PVC.
3. Use ext4 file system to mount the volume on each pod.
# mkfs.ext4 /dev/rbdblock   - from one pod only
# mount -t ext4 /dev/rbdblock /var/lib/www/html - all pods.

4. Run fio from each of the three pods. Keep the fine name different while running from each pod. Add --fsync_on_close=1 parameter.
5. Wait for fio to finish on all pods.
6. Check the existence of 3 files from each pod.

Or run the test case tests/manage/pv_services/test_pvc_rwx_writeable_after_pod_deletions.py::TestRWXMountPoint::test_pvc_rwx_writeable_after_pod_deletions[CephBlockPool]
from the PR https://github.com/red-hat-storage/ocs-ci/pull/4674

==========================================================================================================

Actual results:
On each pod, only the file created from that particular pod is available. File created from the other two pods are missing.


Expected results:
The data written from one pod should be available on the other pod which is consuming the same RBD RWX PVC.

Additional info:
CephFS RWX PVC test case passed.

Comment 2 Niels de Vos 2021-08-05 17:17:06 UTC
These steps are not correct for block-mode volumes:

1. Create RBD PVC with volume mode Block and access mode RWX.
2. Create 3 pods on three different nodes to consume the PVC.
3. Use ext4 file system to mount the volume on each pod.
# mkfs.ext4 /dev/rbdblock   - from one pod only
# mount -t ext4 /dev/rbdblock /var/lib/www/html - all pods.

Creating a local filesytem on the shared block-device will not work. You will need to use an application that can consume the block-device without a filesystem (or use a cluster aware filesystem like GFS2, but that is out of scope for OCP).

So, in tests, only use the device as specified in the Pod, without creating a filesystem on it:

    volumeDevices:
    - devicePath: /dev/rbdblock

Use direct-io to prevent caching.

I do not know if there is a way in fio to specify an offset that should be used for running the test. Each Pod should use its own area on the device, so that they do not overwrite each others data. Or, maybe fio can be informed that the block-device is shared, and it can coordinate parallel access by itself.

Adding the TestCaseRejected keyword, please rework the test. This can probably be closed as NOTABUG.

Comment 3 Jilju Joy 2021-08-06 11:22:24 UTC
Hi Niels,

Created a new test case which is not using a filesystem to consume the  block device.
The test case creates 3 pods and run fio from one pod only.

1. Create RBD PVC with volume mode Block and access mode RWX.
apiVersion: v1
kind: PersistentVolumeClaim
metadata:
  name: pvc-test-db366e74a3b1487585ca7aafa5ae5d9
  namespace: namespace-test-1386ef1881ac4ac8b14d3b101
spec:
  accessModes:
  - ReadWriteMany
  resources:
    requests:
      storage: 5Gi
  storageClassName: ocs-storagecluster-ceph-rbd
  volumeMode: Block

2. Create 3 pods on three different nodes to consume the PVC.

Example pod yaml:

apiVersion: v1
kind: Pod
metadata:
  name: pod-test-rbd-2b3bbed7be354ab3bf33b80097d
  namespace: namespace-test-1386ef1881ac4ac8b14d3b101
spec:
  containers:
  - image: quay.io/ocsci/nginx:latest
    imagePullPolicy: IfNotPresent
    name: my-container
    securityContext:
      capabilities:
        add:
        - SYS_ADMIN
    volumeDevices:
    - devicePath: /dev/rbdblock
      name: my-volume
  nodeName: compute-0
  volumes:
  - name: my-volume
    persistentVolumeClaim:
      claimName: pvc-test-db366e74a3b1487585ca7aafa5ae5d9

3. Verify that the md5sum value is same on all pods.
md5sum /dev/rbdblock  (where /dev/rbdblock is the devicePath)


4. Run fio from one pod.
fio --name=fio-rand-write --filename=/dev/rbdblock --rw=randwrite --bs=4K --direct=0 --numjobs=1 --time_based=1 --runtime=30 --size=1G --ioengine=libaio --iodepth=4 --rate=1m --rate_process=poisson --end_fsync=1 --output-format=json

5. Wait for fio completion

6. Verify that the md5sum value is same on all pods.
md5sum /dev/rbdblock


Actual result:
Step 6 failed. The md5sum value of /dev/rbdblock (devicePath) changed only on the pod from which fio was executed. md5sum value obtained from the other two pods was same as the initial value from step 3.


Test case was run from PR https://github.com/red-hat-storage/ocs-ci/pull/4677


Relevant logs from the test case:

10:41:15 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - Find initial md5sum value
10:41:15 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-2b3bbed7be354ab3bf33b80097d bash -c "md5sum /dev/rbdblock"
10:41:42 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45
10:41:42 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-551fc92e58564d4295c71f5b450 bash -c "md5sum /dev/rbdblock"
10:42:06 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45
10:42:06 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-6654faf450a8477aac40dff34ca bash -c "md5sum /dev/rbdblock"
10:42:31 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45
10:42:31 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - Run IO from one pod


10:44:01 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - Verify md5sum has changed after IO. Verify from all pods.
10:44:01 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-2b3bbed7be354ab3bf33b80097d bash -c "md5sum /dev/rbdblock"
10:44:27 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: a5d7cf45af6fa00a436f160b60f8e4ac
10:44:27 - MainThread - tests.manage.pv_services.test_rbd_rwx_pvc - INFO - md5sum obtained from the pod pod-test-rbd-2b3bbed7be354ab3bf33b80097d has changed after IO
10:44:27 - MainThread - ocs_ci.utility.utils - INFO - Executing command: oc -n namespace-test-1386ef1881ac4ac8b14d3b101 rsh pod-test-rbd-551fc92e58564d4295c71f5b450 bash -c "md5sum /dev/rbdblock"
10:44:47 - MainThread - ocs_ci.ocs.resources.pod - INFO - md5sum of file /dev/rbdblock: ec4bcc8776ea04479b786e063a9ace45


Test case error:

E           AssertionError: md5sum obtained from the pod pod-test-rbd-551fc92e58564d4295c71f5b450 has not changed after IO. IO was run from pod pod-test-rbd-2b3bbed7be354ab3bf33b80097d
E           assert 'ec4bcc8776ea04479b786e063a9ace45' != 'ec4bcc8776ea04479b786e063a9ace45'
E            +  where 'ec4bcc8776ea04479b786e063a9ace45' = <ocs_ci.ocs.resources.pod.Pod object at 0x7fa03e348d90>.md5sum_before_io
E            +  and   'ec4bcc8776ea04479b786e063a9ace45' = <ocs_ci.ocs.resources.pod.Pod object at 0x7fa03e348d90>.md5sum_after_io

tests/manage/pv_services/test_rbd_rwx_pvc.py:93: AssertionError
--------------------------- Captured stdout teardown ---------------------------
FAILED                                                                   [100%]


Test case logs:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/ocs-ci-logs-1628246083/tests/manage/pv_services/test_rbd_rwx_pvc.py/TestRbdBlockPvc/test_rbd_block_rwx_pvc/


must-gather:
http://magna002.ceph.redhat.com/ocsci-jenkins/openshift-clusters/jijoy-aug3/jijoy-aug3_20210803T055626/logs/failed_testcase_ocs_logs_1628246083/test_rbd_block_rwx_pvc_ocs_logs/


Tested in version:
OCS operator	v4.8.0-175.ci
Cluster Version	4.8.0-0.nightly-2021-07-31-065602
Ceph Version	14.2.11-184.el8cp (44441323476fee97be0ff7a92c6065958c77f1b9) nautilus (stable)

rook_csi_ceph	cephcsi@sha256:6077d1254592eab58f5e11a6ad19612bf0387427426718bf523a2299b48334c8

Comment 4 Niels de Vos 2021-08-06 12:10:14 UTC
Hi Jilju, this looks much better already :)

There is just a small issue which might be related to how `md5sum` is used here. When the Linux kernel attaches a block-device, it does some probing for the contents (like partition table). That means, reading from the device might actually read parts from a cache. The 2nd md5sum command (after fio was run) might as well read from the cache again (filled when reading with the 1st md5sum command).

Could you test this by reading the device through `dd`, maybe like `dd iflag=direct if=/dev/rbdblock | md5sum`?

Comment 5 Jilju Joy 2021-08-10 12:30:42 UTC
(In reply to Niels de Vos from comment #4)
> Hi Jilju, this looks much better already :)
> 
> There is just a small issue which might be related to how `md5sum` is used
> here. When the Linux kernel attaches a block-device, it does some probing
> for the contents (like partition table). That means, reading from the device
> might actually read parts from a cache. The 2nd md5sum command (after fio
> was run) might as well read from the cache again (filled when reading with
> the 1st md5sum command).
> 
> Could you test this by reading the device through `dd`, maybe like `dd
> iflag=direct if=/dev/rbdblock | md5sum`?

Thanks Niels.
The test case passed after using the method you suggested above to obtain md5sum. Closing the as not a bug.


Note You need to log in before you can comment on or make changes to this bug.