Bug 1918272

Summary: [OCPonRHV] Cluster should be recovered after power outage
Product: OpenShift Container Platform Reporter: Michael Burman <mburman>
Component: Image RegistryAssignee: Oleg Bulatov <obulatov>
Status: CLOSED DUPLICATE QA Contact: Wenjing Zheng <wzheng>
Severity: high Docs Contact:
Priority: unspecified    
Version: 4.7CC: aos-bugs, jpasztor, obulatov
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-25 13:46:28 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
must-gather none

Description Michael Burman 2021-01-20 11:20:00 UTC
Description of problem:
[OCPonRHV] Cluster should be recovered after power outage

Two days ago I managed to install cluster 4.7 with great success.
4.7.0-0.nightly-2021-01-12-150634
Cluster was alive for 24 hours, all was ready and nothing degraded.

Yesterday, we had a major power outage(AC dead). The physical host which was running the engine VM and the master and worker VMs was dead.
After the host recovered(after some hours), I started the master and worker VMs manually in RHV.
One cluster operator is degraded since than:

oc get clusterversion
NAME      VERSION                             AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.7.0-0.nightly-2021-01-12-150634   True        False         41h     Error while reconciling 4.7.0-0.nightly-2021-01-12-150634: the cluster operator image-registry is degraded

image-registry                             4.7.0-0.nightly-2021-01-12-150634   True        False         True       18h

Version-Release number of selected component (if applicable):
4.7.0-0.nightly-2021-01-12-150634

How reproducible:
1/1 

Steps to Reproduce:
1. Install 4.7 cluster on RHV 4.4.4
2. Cluster installed successfully and it's alive
3. Unexpected power outage happens and killing the baremetal host on which the HE VM is running and the masters and worker VMs are running
4. Recover the baremetal host. recover HE VM and engine. Start the master and worker VMs manually. 

Actual results:
All master and worker VMs running as expected, all got IPs.
One cluster operator wasn't recovered properly, image-registry remained as degraded after the power outage recovery. 

version   4.7.0-0.nightly-2021-01-12-150634   True        False         41h     Error while reconciling 4.7.0-0.nightly-2021-01-12-150634: the cluster operator image-registry is degraded


Expected results:
Cluster should recover after power outage and be operational. All cluster operators shouldn't be degraded.

Additional info:
Janos from DEV team has acknowledged this bug and has collected the logs.

Comment 1 Oleg Bulatov 2021-01-20 13:37:33 UTC
Can you provide us with logs? As the description doesn't contain messages from the registry operator, I'd suggest to use must-gather to collect all necessary information.

Comment 2 Michael Burman 2021-01-20 15:20:23 UTC
(In reply to Oleg Bulatov from comment #1)
> Can you provide us with logs? As the description doesn't contain messages
> from the registry operator, I'd suggest to use must-gather to collect all
> necessary information.

Hi Oleg,

Janos from our development team has collected all relevant info. He will add his findings a bit later.
Also, I'm not sure that i opened it on the right component.

Comment 3 Janos Bonic 2021-01-21 10:36:19 UTC
Created attachment 1749344 [details]
must-gather

Comment 4 Oleg Bulatov 2021-01-22 15:17:27 UTC
This must-gather archive is almost empty, it doesn't have cluster-scoped resources nor the openshift-image-registry namespace.

Janos, do you have something else?

Comment 5 Janos Bonic 2021-01-22 21:00:07 UTC
@obulatov no, but the cluster is still up. I can run whatever you need me to run or @Michael Burman can give you access if needed.