DescriptionSidhant Agrawal
2023-03-21 08:07:06 UTC
Description of problem (please be detailed as possible and provide log
snippests):
On a RDR setup after performing failover and relocate operations and then deleting DR workload, observed that the RBD images were not deleted from the secondary managed cluster.
Version of all relevant components (if applicable):
OCP: 4.13.0-0.nightly-2023-03-14-053612
ODF: 4.13.0-107
Ceph: 17.2.5-75.el9cp (52c8ab07f1bc5423199eeb6ab5714bc30a930955) quincy (stable)
ACM: 2.7.2
Submariner: 0.14.2
Does this issue impact your ability to continue to work with the product
(please explain in detail what is the user impact)?
RBD images left behind in one of the managed cluster and mirroring status will show health, image_health in WARNING state
Is there any workaround available to the best of your knowledge?
Restart the RBD mirror daemon on the managed cluster where images were left behind.
Rate from 1 - 5 the complexity of the scenario you performed that caused this
bug (1 - very simple, 5 - very complex)?
2
Can this issue reproducible?
Yes
Can this issue reproduce from the UI?
If this is a regression, please provide more details to justify this:
Yes, issue was not observed in 4.12
Steps to Reproduce:
1. Configure RDR setup
2. Deploy an application containing 20 PVCs/Pods on C1
3. Wait for 10 minutes to run IOs
4. Scale down RBD mirror daemon deployment to 0
5. Initiate failover to C2
6. Check PVC and pod resources are created on C2 successfully.
7. Scale up RBD mirror daemon deployment back to 1
8. Check application and replication resources deleted from C1
9. Check mirroring status
cluster: sagrawal-c1
{'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}}
cluster: sagrawal-c2
{'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}}
10. Wait for 10 minutes to run IOs
11. Initiate Relocate to C1
12. Check mirroring status after relocate operation
cluster: sagrawal-c1
{'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}}
cluster: sagrawal-c2
{'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}}
13. Delete the application
14. Observe the mirroring status
cluster: sagrawal-c1
{"daemon_health":"OK","health":"OK","image_health":"OK","states":{}}
cluster: sagrawal-c2
{"daemon_health":"OK","health":"WARNING","image_health":"WARNING","states":{"unknown":15}}
Automated test:
tests/disaster-recovery/regional-dr/test_failover_and_relocate.py
Actual results:
After deleting the application workload, mirroring status in WARNING and RBD images left behind in the managed cluster
Expected results:
Mirroring status should be OK and all RBD images should be deleted after deleting the application workload.
Description of problem (please be detailed as possible and provide log snippests): On a RDR setup after performing failover and relocate operations and then deleting DR workload, observed that the RBD images were not deleted from the secondary managed cluster. Version of all relevant components (if applicable): OCP: 4.13.0-0.nightly-2023-03-14-053612 ODF: 4.13.0-107 Ceph: 17.2.5-75.el9cp (52c8ab07f1bc5423199eeb6ab5714bc30a930955) quincy (stable) ACM: 2.7.2 Submariner: 0.14.2 Does this issue impact your ability to continue to work with the product (please explain in detail what is the user impact)? RBD images left behind in one of the managed cluster and mirroring status will show health, image_health in WARNING state Is there any workaround available to the best of your knowledge? Restart the RBD mirror daemon on the managed cluster where images were left behind. Rate from 1 - 5 the complexity of the scenario you performed that caused this bug (1 - very simple, 5 - very complex)? 2 Can this issue reproducible? Yes Can this issue reproduce from the UI? If this is a regression, please provide more details to justify this: Yes, issue was not observed in 4.12 Steps to Reproduce: 1. Configure RDR setup 2. Deploy an application containing 20 PVCs/Pods on C1 3. Wait for 10 minutes to run IOs 4. Scale down RBD mirror daemon deployment to 0 5. Initiate failover to C2 6. Check PVC and pod resources are created on C2 successfully. 7. Scale up RBD mirror daemon deployment back to 1 8. Check application and replication resources deleted from C1 9. Check mirroring status cluster: sagrawal-c1 {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}} cluster: sagrawal-c2 {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}} 10. Wait for 10 minutes to run IOs 11. Initiate Relocate to C1 12. Check mirroring status after relocate operation cluster: sagrawal-c1 {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}} cluster: sagrawal-c2 {'daemon_health': 'OK', 'health': 'OK', 'image_health': 'OK', 'states': {'replaying': 20}} 13. Delete the application 14. Observe the mirroring status cluster: sagrawal-c1 {"daemon_health":"OK","health":"OK","image_health":"OK","states":{}} cluster: sagrawal-c2 {"daemon_health":"OK","health":"WARNING","image_health":"WARNING","states":{"unknown":15}} Automated test: tests/disaster-recovery/regional-dr/test_failover_and_relocate.py Actual results: After deleting the application workload, mirroring status in WARNING and RBD images left behind in the managed cluster Expected results: Mirroring status should be OK and all RBD images should be deleted after deleting the application workload.