Shyam, should this be a blocker for 4.10?
(In reply to Mudit Agarwal from comment #3) > Shyam, should this be a blocker for 4.10? Yes, fix is WIP and should land this week. Changing assignee as well to Jolly.
Please backport this to release-4.10
Please provide must-gather logs.
1. Multiple primaries in the log are seen because the failover cleanup was never completed. If you look at the DRPC status condition you will see the following: ``` lastTransitionTime: "2022-04-10T12:24:55Z" message: Started failover to cluster "amagrawa-c2-8ap" observedGeneration: 4 reason: NotStarted status: "False" type: PeerReady ``` 2. Attempting a relocation at that point will not work and actually, it messes things up. The action should be put back to `Failover` until the condition above is set to true. 3. The PVCs stuck in terminating state are separate from the issue in (1) and need to be looked at separately which I'll do next.
The PVC stuck in a terminating state is because the request to set the VRG on C1 to secondary was never issued because of the issue above in (1). In other words, that's normal behavior. To get out of this stuckness, change the Action back to Failover and then wait for the DRPC PeerReady condition status to change to True. You might want to open a low priority BZ against DRPC log logging at level verbose, which makes it difficult to diagnose issues when we have 100s of PVCs.
Moving DR BZs out of 4.10
What is the plan for this BZ in 4.11, this has no update for 20 days.
Not a TP blocker, we have a workaround. Moving it out of 4.11, please revert if my understanding is wrong.
User need to wait for failover to finish which includes cleanup to complete. DRPC status PeerReady should be TRUE to proceed with relocate. This is the normal behavior now. No code fix can be done for this.