Description of problem (please be detailed as possible and provide log snippests): On running Tier1 on ODF 4.15.2 ceph health is going in error state. sh-5.1$ ceph health HEALTH_ERR 1/654 objects unfound (0.153%); 7 scrub errors; Possible data damage: 1 pg recovery_unfound, 4 pgs inconsistent; Degraded data redundancy: 3/1962 objects degraded (0.153%), 1 pg degraded; 3 slow ops, oldest one blocked for 265101 sec, daemons [osd.1,osd.2] have slow ops. sh-5.1$ Version of all relevant components (if applicable): ODF version - 4.15.2 OCP version - 4.15.2 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? Is there any workaround available to the best of your knowledge? We can skip the test case and continue with other test case execution. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? Can this issue reproducible? Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Steps to Reproduce: 1. Create ODF 4.15.2 and execute the tier1 test suite. 2. During execution of test_selinux_relabel_for_existing_pvc[5] test case ceph health is going into error state. Actual results: Expected results: Additional info:
Must gather logs - https://drive.google.com/file/d/14R_3VhDIXHHv9ZRyCUBJ3BqvWVmyXZ41/view?usp=sharing
Hi, what is Tier1? What operations were performed on the cluster before it went to the current state?
Tier1 is the test suite which is getting executed as part of zstream 4.15.2 and below test cases is part of the same test suite - test_selinux_relabel_for_existing_pvc[5] after running test_selinux_relabel_for_existing_pvc[5] test case we are seeing this issue. Link to test case - https://github.com/red-hat-storage/ocs-ci/blob/6de27377af27d626991b2b0b590f534a91a81400/tests/cross_functional/kcs/test_selinux_relabel_solution.py#L227 This test case is creating PVC, attach to the pod and creating multiple directories with files and applying selinux relabeling. This issue is seen in multiple cluster having ODF 4.15.2 installed.
(In reply to Pooja Soni from comment #5) > Tier1 is the test suite which is getting executed as part of zstream 4.15.2 > and below test cases is part of the same test suite - > test_selinux_relabel_for_existing_pvc[5] > > after running test_selinux_relabel_for_existing_pvc[5] test case we are > seeing this issue. Link to test case - > https://github.com/red-hat-storage/ocs-ci/blob/ > 6de27377af27d626991b2b0b590f534a91a81400/tests/cross_functional/kcs/ > test_selinux_relabel_solution.py#L227 > > This test case is creating PVC, attach to the pod and creating multiple > directories with files and applying selinux relabeling. This issue is seen > in multiple cluster having ODF 4.15.2 installed. Thanks for the details. Do you have a live cluster that I can take a look at?
Got a live cluster from Aaruni. Didn't see any issues at the Rook level. But the OSD.0 pod is crashing. Hi Radoslaw, Can you take a look at the OSD pod crashing? Thanks.
we tried again on diff setup and it failed again with error: Ceph cluster health is not OK. Health: HEALTH_ERR 1/60838 objects unfound (0.002%); Reduced data availability: 37 pgs peering; Possible data damage: 1 pg recovery_unfound; Degraded data redundancy: 41007/182514 objects degraded (22.468%), 75 pgs degraded, 109 pgs undersized; 5 daemons have recently crashed; 1 slow ops, oldest one blocked for 120 sec, daemons [osd.1,osd.2] have slow ops. Must gather log for this setup - https://drive.google.com/file/d/1G_zg9vF8xI3c74hZBxmK4Q-Q-3VmZwtk/view?usp=sharing
I got the same issue on running Tier1 on ODF 4.14.7. Ceph health went into error state after execution of test_selinux_relabel_for_existing_pvc[5] test case.