Description of problem (please be detailed as possible and provide log snippests): tests/functional/pv/pv_services/test_rbd_rwx_pvc.py::TestRbdBlockPvc::test_rbd_block_rwx_pvc is failing on IBM Power on ODF 4.16. Error being - # Verify md5sum has changed after IO log.info("Verify md5sum has changed after IO. Verify from all pods.") for pod_obj in self.pod_objs: # Find md5sum pod_obj.md5sum_after_io = pod_obj.exec_sh_cmd_on_pod( command=f"dd iflag=direct if={pod_obj.get_storage_path(storage_type='block')} | md5sum" ) > assert pod_obj.md5sum_after_io != md5sum_value_initial, ( f"md5sum obtained from the pod {pod_obj.name} has not changed after IO. " f"IO was run from pod {io_pod.name}" ) E AssertionError: md5sum obtained from the pod pod-test-rbd-21b0670da23f483a8f5619730ea has not changed after IO. IO was run from pod pod-test-rbd-27928049ec6045608dbfe88c953 E assert 'f1c9645dbc14efddc7d8a322685f26eb -\n' != 'f1c9645dbc14efddc7d8a322685f26eb -\n' E + where 'f1c9645dbc14efddc7d8a322685f26eb -\n' = <ocs_ci.ocs.resources.pod.Pod object at 0x7ffe9b2787c0>.md5sum_after_io tests/functional/pv/pv_services/test_rbd_rwx_pvc.py:92: AssertionError Version of all relevant components (if applicable): # oc version Client Version: 4.16.0-rc.4 Kustomize Version: v5.0.4-0.20230601165947-6ce0bf390ce3 Server Version: 4.16.0-rc.4 Kubernetes Version: v1.29.5+f6419fb ODF : 4.16.0 This issue is seen from 4.16 version of ODF. Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Yes , this issue is consistent and occurred every time the test case is run. Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info:
Hi - What is `test_rbd_block_rwx_pvc`? - What are the steps to reproduce? - What is the expected and actual behavior?
Some comment got posted multiple times due to some network issues at my end.
Automated test case steps: 1. Create RBD PVC of volume mode Block and access mode RWX. 2. Create 1 pod on each worker node to consume the same PVC. 3. Verify md5sum from the pods and ensure that the value obtained from all pods are the same before starting the I/O. Example command to obtain the md5sum where "/dev/rbdblock" is the devicePath: oc -n namespace-test-3379d12f369a4da0b6d20ad40 exec pod-test-rbd-d0f6155489e149e6b33ad93caf0 -- bash -c "dd iflag=direct if=/dev/rbdblock | md5sum" 4. Write data using fio from one of the pods. Example: oc -n namespace-test-3379d12f369a4da0b6d20ad40 rsh pod-test-rbd-d0f6155489e149e6b33ad93caf0 fio --name=fio-rand-write --filename=/dev/rbdblock --rw=randwrite --bs=4K --direct=1 --numjobs=1 --time_based=1 --runtime=5 --size=5M --ioengine=libaio --invalidate=0 --iodepth=4 --rate=1m --rate_process=poisson --end_fsync=1 --output-format=json 5. Verify that md5sum changed on all pods (command in step 3). Expected result: In the step 5, md5sum obtained from all pods should be different from that obtained in the step 3. And the value should be same on all pods. Actual result: The value of md5sum changed on the pod from where fio was executed. Other pods still have the md5sum obtained before fio.
(In reply to Niels de Vos from comment #12) > (In reply to Madhu Rajanna from comment #11) > > >oc -n namespace-test-3379d12f369a4da0b6d20ad40 rsh pod-test-rbd-d0f6155489e149e6b33ad93caf0 fio --name=fio-rand-write --filename=/dev/rbdblock --rw=randwrite --bs=4K --direct=1 --numjobs=1 --time_based=1 --runtime=5 --size=5M --ioengine=libaio --invalidate=0 --iodepth=4 --rate=1m --rate_process=poisson --end_fsync=1 --output-format=json > > > > @Niels, do you see any problem with the fio command due to this checksum > > mismatch can happen? > > I do not immediately see an issue with that command, the important options > seem to be set ("direct" and/or "end_fsync"). > > @Pooja, can you confirm that the md5sum is different on the pod where fio > was executed? That should confirm the fio command did write to the block > device. > On the pod from where the fio was ran, the md5sum changed. > > The actual output of the failure reported by the test should ideally be > improved. When comparing two checksums, both checksums should be logged. > This output makes it impossible to see what the 2nd checksum is: > > E AssertionError: md5sum obtained from the pod > pod-test-rbd-21b0670da23f483a8f5619730ea has not changed after IO. IO was > run from pod pod-test-rbd-27928049ec6045608dbfe88c953 > E assert 'f1c9645dbc14efddc7d8a322685f26eb -\n' != > 'f1c9645dbc14efddc7d8a322685f26eb -\n' E > + where 'f1c9645dbc14efddc7d8a322685f26eb -\n' = > <ocs_ci.ocs.resources.pod.Pod object at 0x7ffe9b2787c0>.md5sum_after_io > tests/functional/pv/pv_services/test_rbd_rwx_pvc.py:92: AssertionError > Yeah, previous (before fio) and current (after fio) are not logged in the message. But it is visible in the error given in the assert statement. assert 'f1c9645dbc14efddc7d8a322685f26eb -\n' != 'f1c9645dbc14efddc7d8a322685f26eb Both values are same. > > Or, could this be a bug that should be redirected to the test-case, as > comparing a checksum against <ocs_ci.ocs.resources.pod.Pod object at > 0x7ffe9b2787c0>.md5sum_after_io will never be true? May be debugging on the same cluster will be helpful because this issue is not observed in our regression runs on other platforms.
I think as @Niels said, it is probably a bug in test-case. Could we please get this verified?