Bug 1925597

Summary: kata-operator-daemon-uninstall pods did not exit after deleting kataconfig instance
Product: OpenShift Container Platform Reporter: Cameron Meadors <cmeadors>
Component: sandboxed-containersAssignee: Ariel Adam <aadam>
Status: CLOSED NOTABUG QA Contact: Cameron Meadors <cmeadors>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 4.7CC: aos-bugs, fidencio, prbanerj
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2021-09-16 20:10:38 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Cameron Meadors 2021-02-05 16:16:15 UTC
Description of problem:

When I deleted the kataconfig instance, the uninstall pods never exited leaving the system in an ambiguous state.

Version-Release number of selected component (if applicable):

Installed kata-operator from the openshift/kata-operator on master.

How reproducible:

I can not reproduce it.

Steps to Reproduce:
1. Have a kataconfig instance created
2. oc delete kataconfig <instance>
3. 

Actual results:

uninstall pods never exited.  Kata runtime looked like it was indeed uninstalled from nodes.

Expected results:

Uninstall pods would exit after a short time (about 8 minutes) and runtime would be removed from nodes

Additional info:

I tried to get things to retrigger and clean up buy creating another kataconfig instance, but that resulted in the install pods working correctly and exiting, but the uninstall pods were still around.  This also resulted the kataconfig instance reporting that the runtime was both installed and being uninstalled on all the nodes. From 'oc describe kataconfig example-kataconfig:

Installation Status:
    Completed:
      Completed Nodes Count:  3
      Completed Nodes List:
        cmead-kata47-7-wcvj7-worker-a-9h7j9.c.openshift-qe.internal
        cmead-kata47-7-wcvj7-worker-b-x9grc.c.openshift-qe.internal
        cmead-kata47-7-wcvj7-worker-c-mqmwb.c.openshift-qe.internal
    Failed:
    In Progress:
  Kata Image:         
  Runtime Class:      kata
  Total Nodes Count:  3
  Un Installation Status:
    Completed:
    Failed:
    In Progress:
      Binaries Uninstall Nodes List:
        cmead-kata47-7-wcvj7-worker-a-9h7j9.c.openshift-qe.internal
        cmead-kata47-7-wcvj7-worker-b-x9grc.c.openshift-qe.internal
        cmead-kata47-7-wcvj7-worker-c-mqmwb.c.openshift-qe.internal
      In Progress Nodes Count:  3

After a few more cycles of installing and uninstalling the uninstall pods finally exited and nodes appeared to be in the correct state.  Uninstall pods had an error, but I unfortunately did not capture it.

Comment 3 Pradipta Banerjee 2021-09-16 09:36:44 UTC
Hi Cameron, can this bug be closed? It's no longer relevant with latest operator code using RHCOS extension.