Bug 2090080
| Summary: | [RDR] Failover of workload does not happen when primary cluster is DOWN | |||
|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat OpenShift Data Foundation | Reporter: | Pratik Surve <prsurve> | |
| Component: | odf-dr | Assignee: | Benamar Mekhissi <bmekhiss> | |
| odf-dr sub component: | ramen | QA Contact: | Pratik Surve <prsurve> | |
| Status: | VERIFIED --- | Docs Contact: | ||
| Severity: | urgent | |||
| Priority: | unspecified | CC: | bmekhiss, kseeger, mmuench, muagarwa, odf-bz-bot, srangana | |
| Version: | 4.10 | Keywords: | TestBlocker | |
| Target Milestone: | --- | Flags: | prsurve:
needinfo?
(bmekhiss) |
|
| Target Release: | ODF 4.11.0 | |||
| Hardware: | Unspecified | |||
| OS: | Unspecified | |||
| Whiteboard: | ||||
| Fixed In Version: | Doc Type: | If docs needed, set a value | ||
| Doc Text: | Story Points: | --- | ||
| Clone Of: | ||||
| : | 2090568 (view as bug list) | Environment: | ||
| Last Closed: | Type: | Bug | ||
| Regression: | --- | Mount Type: | --- | |
| Documentation: | --- | CRM: | ||
| Verified Versions: | Category: | --- | ||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
| Cloudforms Team: | --- | Target Upstream Version: | ||
| Embargoed: | ||||
| Bug Depends On: | ||||
| Bug Blocks: | 2090568 | |||
|
Description
Pratik Surve
2022-05-25 05:30:46 UTC
In 4.10 DRPolicy reconciler validates that the s3store is reachable and accessible in very reconciliation (this validation has been moved in 4.11 to the DRCluster reconcile). In this setup, every managed cluster has an s3store that should be accessible by all managed clusters. When one cluster is unreachable, the validation will fail and no forward progress is made. We will fix it, as the failover often occurs due to the primary cluster failure and might not be reachable. PR posted, awaiting required acks to merge: https://github.com/red-hat-storage/ramen/pull/39 If I am not wrong, this is a must fix for 4.11 (even for TP). Can we have some ETA for the fix? Yes, needed for 4.11. @benamar we need to forward port https://github.com/red-hat-storage/ramen/pull/39 possibly in a better way to future proof it in 4.11. Assigning this to you. This BZ is fixed due to the split in DRPolicy resource into DRPolicy and DRCluster. Currently DRPolicy validated condition is not dependent on any of the DRClusters being valid (other than their existence), thus on failover when a DRCluster reports s3 connectivity loss, it is not considered a blocking condition by DRPC to fail the workload over. Marking this ON_QA, for testing the behavior as required. Please test with any of the latest 4.11 builds. |