Bug 2030029

Summary: [4.10][goroutine]Namespace stuck terminating: Failed to delete all resource types, 1 remaining: unexpected items still remain in namespace
Product: OpenShift Container Platform Reporter: Christoffer Back <cback>
Component: NodeAssignee: Peter Hunt <pehunt>
Node sub component: CRI-O QA Contact: Sunil Choudhary <schoudha>
Status: CLOSED ERRATA Docs Contact:
Severity: high    
Priority: high CC: akrzos, aos-bugs, bzhai, djuran, fminafra, harpatil, igreen, joboyer, kkarampo, mavazque, minmli, mmethot, nagrawal, nchhabra, oarribas, openshift-bugs-escalate, pehunt, schoudha
Version: 4.8Keywords: Reopened
Target Milestone: ---   
Target Release: 4.10.0   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: No Doc Update
Doc Text:
Story Points: ---
Clone Of: 2021431
: 2040711 2040712 (view as bug list) Environment:
Last Closed: 2022-03-10 16:32:46 UTC Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On: 2003206    
Bug Blocks: 2021431, 2021432, 2040711, 2040712    

Comment 1 Peter Hunt 2021-12-07 22:10:41 UTC
I've asked for a new bug in https://bugzilla.redhat.com/show_bug.cgi?id=2021431#c8 to investigate the new set of deadlocks on pod stop we're seeing

I have created a scratch build of cri-o that I'm interested in seeing whether it triggers the problem

http://brew-task-repos.usersys.redhat.com/repos/scratch/pehunt/cri-o/1.21.4/5.rhaos4.8.git84fa55d.el8/

Comment 2 Peter Hunt 2021-12-07 22:12:35 UTC
oh and if the issue reproduces even with the scratch build I'll want the info provided in https://bugzilla.redhat.com/show_bug.cgi?id=2021431#c6 again

Comment 7 Sascha Grunert 2021-12-09 10:58:36 UTC
Chris, the must-gather tells me that they run CRI-O 1.21.4-5.rhaos4.8.git84fa55d.el8, which is not the one Peter provided (1.21.4-6.rhaos4.8.gitc845cf4.el8). Can you ask them to override the rpm via rpm-ostree?

Comment 11 Sascha Grunert 2021-12-10 09:45:37 UTC
Hey Chris, I uploaded two modified test binaries for 4.8 and 4.9. Please request another test from the customer. Then I will sync with Peter on monday how to approach this issue.

Comment 12 Christoffer Back 2021-12-10 10:52:05 UTC
(In reply to Sascha Grunert from comment #11)
> Hey Chris, I uploaded two modified test binaries for 4.8 and 4.9. Please
> request another test from the customer. Then I will sync with Peter on
> monday how to approach this issue.


Hi Sascha, the binaries and instructions have been delievered to the customer. I will link a set of logs once their testing is complete. 

Instructions delievered: 
########################################################

Create a node debug container:

```
oc debug node/ci-ln-2myl9xb-f76d1-ck27t-master-0
```

Copy the tarball to the container:

```
kubectl cp crio.tar.gz ci-ln-2myl9xb-f76d1-ck27t-master-0-debug:/tmp/crio.tar.gz
```

In the container, move the tarball to the destination and verify that the executable works:

```
mv /tmp/crio.tar.gz /host/tmp
chroot /host
tar xf /tmp/crio.tar.gz -C /usr/local/bin/
/usr/local/bin/crio version
```

```
Version:       1.21.3
GitCommit:     51409e1b2dc9ccfbb7d7f4fd543a094097627ae2
GitTreeState:  dirty
BuildDate:     1980-01-01T00:00:00Z
GoVersion:     go1.15.7
Compiler:      gc
Platform:      linux/amd64
Linkmode:      static
```

Edit the crio unit file:

```
systemctl edit crio
```

Add the following override:

```
[Service]
ExecStart=
ExecStart=-/usr/local/bin/crio
```

Restart crio:

```
systemctl daemon-reload
systemctl restart crio
```
############################################################


Br, 
Chris

Comment 50 Peter Hunt 2022-01-17 17:59:43 UTC
*** Bug 2014083 has been marked as a duplicate of this bug. ***

Comment 51 Sunil Choudhary 2022-01-18 11:27:13 UTC
Tested on 4.10.0-0.nightly-2022-01-17-223655

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-17-223655   True        False         136m    Cluster version is 4.10.0-0.nightly-2022-01-17-223655

$ oc get nodes -o wide
NAME                                                STATUS   ROLES           AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
master-00.sunilc410bm.qe.devcluster.openshift.com   Ready    master,worker   156m   v1.23.0+60f5a1c   147.75.80.115   <none>        Red Hat Enterprise Linux CoreOS 410.84.202201171746-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-102.rhaos4.10.git9c23ef3.el8

Comment 52 Peter Hunt 2022-01-26 19:55:44 UTC
*** Bug 2040485 has been marked as a duplicate of this bug. ***

Comment 53 Peter Hunt 2022-02-01 14:52:24 UTC
*** Bug 2015412 has been marked as a duplicate of this bug. ***

Comment 56 errata-xmlrpc 2022-03-10 16:32:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056