Bug 1910070

Summary: KubeVirt VMs, used for masters, are created with too short termination grace period which leads to FS corruption
Product: OpenShift Container Platform Reporter: Chen Yosef <cyosef>
Component: Cloud ComputeAssignee: Nir Argaman <nargaman>
Cloud Compute sub component: KubeVirt Provider QA Contact: Chen Yosef <cyosef>
Status: CLOSED CURRENTRELEASE Docs Contact:
Severity: urgent    
Priority: urgent CC: aos-bugs
Version: 4.7   
Target Milestone: ---   
Target Release: 4.7.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-01-10 08:14:54 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Chen Yosef 2020-12-22 14:37:43 UTC
Description of problem:
The installer creates the master VMs without specifying termination grace period, which means using the default 30 seconds. When restarting the VM, the graceful shutdown takes more than that, so after 30 seconds the VM is killed during the shutdown process. This leads in some case (timing related) to XFS metadata corruption that can be recovered only manually. 

Version-Release number of selected component (if applicable):


How reproducible:


Steps to Reproduce:
1. Install a tenant cluster
2. Restart one of the VMs (virtctl restart <vmi>) 
3. Repeat step 2 until the VM won't boot anymore.

Actual results:


Expected results:


Additional info: