Description of problem: [OCPonRHV] Cluster should be recovered after power outage Two days ago I managed to install cluster 4.7 with great success. 4.7.0-0.nightly-2021-01-12-150634 Cluster was alive for 24 hours, all was ready and nothing degraded. Yesterday, we had a major power outage(AC dead). The physical host which was running the engine VM and the master and worker VMs was dead. After the host recovered(after some hours), I started the master and worker VMs manually in RHV. One cluster operator is degraded since than: oc get clusterversion NAME VERSION AVAILABLE PROGRESSING SINCE STATUS version 4.7.0-0.nightly-2021-01-12-150634 True False 41h Error while reconciling 4.7.0-0.nightly-2021-01-12-150634: the cluster operator image-registry is degraded image-registry 4.7.0-0.nightly-2021-01-12-150634 True False True 18h Version-Release number of selected component (if applicable): 4.7.0-0.nightly-2021-01-12-150634 How reproducible: 1/1 Steps to Reproduce: 1. Install 4.7 cluster on RHV 4.4.4 2. Cluster installed successfully and it's alive 3. Unexpected power outage happens and killing the baremetal host on which the HE VM is running and the masters and worker VMs are running 4. Recover the baremetal host. recover HE VM and engine. Start the master and worker VMs manually. Actual results: All master and worker VMs running as expected, all got IPs. One cluster operator wasn't recovered properly, image-registry remained as degraded after the power outage recovery. version 4.7.0-0.nightly-2021-01-12-150634 True False 41h Error while reconciling 4.7.0-0.nightly-2021-01-12-150634: the cluster operator image-registry is degraded Expected results: Cluster should recover after power outage and be operational. All cluster operators shouldn't be degraded. Additional info: Janos from DEV team has acknowledged this bug and has collected the logs.
Can you provide us with logs? As the description doesn't contain messages from the registry operator, I'd suggest to use must-gather to collect all necessary information.
(In reply to Oleg Bulatov from comment #1) > Can you provide us with logs? As the description doesn't contain messages > from the registry operator, I'd suggest to use must-gather to collect all > necessary information. Hi Oleg, Janos from our development team has collected all relevant info. He will add his findings a bit later. Also, I'm not sure that i opened it on the right component.
Created attachment 1749344 [details] must-gather
This must-gather archive is almost empty, it doesn't have cluster-scoped resources nor the openshift-image-registry namespace. Janos, do you have something else?
@obulatov no, but the cluster is still up. I can run whatever you need me to run or @Michael Burman can give you access if needed.