Bug 2030029 - [4.10][goroutine]Namespace stuck terminating: Failed to delete all resource types, 1 remaining: unexpected items still remain in namespace
Summary: [4.10][goroutine]Namespace stuck terminating: Failed to delete all resource t...
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: OpenShift Container Platform
Classification: Red Hat
Component: Node
Version: 4.8
Hardware: Unspecified
OS: Unspecified
high
high
Target Milestone: ---
: 4.10.0
Assignee: Peter Hunt
QA Contact: Sunil Choudhary
URL:
Whiteboard:
: 2014083 2015412 2040485 (view as bug list)
Depends On: 2003206
Blocks: 2021431 2021432 2040711 2040712
TreeView+ depends on / blocked
 
Reported: 2021-12-07 20:07 UTC by Christoffer Back
Modified: 2023-09-15 01:16 UTC (History)
18 users (show)

Fixed In Version:
Doc Type: No Doc Update
Doc Text:
Clone Of: 2021431
: 2040711 2040712 (view as bug list)
Environment:
Last Closed: 2022-03-10 16:32:46 UTC
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)


Links
System ID Private Priority Status Summary Last Updated
Github cri-o cri-o pull 5514 0 None Merged prepend commands with taskset if InfraCtrCPUSet is configured 2022-01-17 18:01:35 UTC
Github cri-o cri-o pull 5535 0 None Merged Use timeout for conmon cgroup move 2022-01-11 15:05:08 UTC
Red Hat Product Errata RHSA-2022:0056 0 None None None 2022-03-10 16:33:11 UTC

Comment 1 Peter Hunt 2021-12-07 22:10:41 UTC
I've asked for a new bug in https://bugzilla.redhat.com/show_bug.cgi?id=2021431#c8 to investigate the new set of deadlocks on pod stop we're seeing

I have created a scratch build of cri-o that I'm interested in seeing whether it triggers the problem

http://brew-task-repos.usersys.redhat.com/repos/scratch/pehunt/cri-o/1.21.4/5.rhaos4.8.git84fa55d.el8/

Comment 2 Peter Hunt 2021-12-07 22:12:35 UTC
oh and if the issue reproduces even with the scratch build I'll want the info provided in https://bugzilla.redhat.com/show_bug.cgi?id=2021431#c6 again

Comment 7 Sascha Grunert 2021-12-09 10:58:36 UTC
Chris, the must-gather tells me that they run CRI-O 1.21.4-5.rhaos4.8.git84fa55d.el8, which is not the one Peter provided (1.21.4-6.rhaos4.8.gitc845cf4.el8). Can you ask them to override the rpm via rpm-ostree?

Comment 11 Sascha Grunert 2021-12-10 09:45:37 UTC
Hey Chris, I uploaded two modified test binaries for 4.8 and 4.9. Please request another test from the customer. Then I will sync with Peter on monday how to approach this issue.

Comment 12 Christoffer Back 2021-12-10 10:52:05 UTC
(In reply to Sascha Grunert from comment #11)
> Hey Chris, I uploaded two modified test binaries for 4.8 and 4.9. Please
> request another test from the customer. Then I will sync with Peter on
> monday how to approach this issue.


Hi Sascha, the binaries and instructions have been delievered to the customer. I will link a set of logs once their testing is complete. 

Instructions delievered: 
########################################################

Create a node debug container:

```
oc debug node/ci-ln-2myl9xb-f76d1-ck27t-master-0
```

Copy the tarball to the container:

```
kubectl cp crio.tar.gz ci-ln-2myl9xb-f76d1-ck27t-master-0-debug:/tmp/crio.tar.gz
```

In the container, move the tarball to the destination and verify that the executable works:

```
mv /tmp/crio.tar.gz /host/tmp
chroot /host
tar xf /tmp/crio.tar.gz -C /usr/local/bin/
/usr/local/bin/crio version
```

```
Version:       1.21.3
GitCommit:     51409e1b2dc9ccfbb7d7f4fd543a094097627ae2
GitTreeState:  dirty
BuildDate:     1980-01-01T00:00:00Z
GoVersion:     go1.15.7
Compiler:      gc
Platform:      linux/amd64
Linkmode:      static
```

Edit the crio unit file:

```
systemctl edit crio
```

Add the following override:

```
[Service]
ExecStart=
ExecStart=-/usr/local/bin/crio
```

Restart crio:

```
systemctl daemon-reload
systemctl restart crio
```
############################################################


Br, 
Chris

Comment 50 Peter Hunt 2022-01-17 17:59:43 UTC
*** Bug 2014083 has been marked as a duplicate of this bug. ***

Comment 51 Sunil Choudhary 2022-01-18 11:27:13 UTC
Tested on 4.10.0-0.nightly-2022-01-17-223655

$ oc get clusterversion
NAME      VERSION                              AVAILABLE   PROGRESSING   SINCE   STATUS
version   4.10.0-0.nightly-2022-01-17-223655   True        False         136m    Cluster version is 4.10.0-0.nightly-2022-01-17-223655

$ oc get nodes -o wide
NAME                                                STATUS   ROLES           AGE    VERSION           INTERNAL-IP     EXTERNAL-IP   OS-IMAGE                                                        KERNEL-VERSION                 CONTAINER-RUNTIME
master-00.sunilc410bm.qe.devcluster.openshift.com   Ready    master,worker   156m   v1.23.0+60f5a1c   147.75.80.115   <none>        Red Hat Enterprise Linux CoreOS 410.84.202201171746-0 (Ootpa)   4.18.0-305.30.1.el8_4.x86_64   cri-o://1.23.0-102.rhaos4.10.git9c23ef3.el8

Comment 52 Peter Hunt 2022-01-26 19:55:44 UTC
*** Bug 2040485 has been marked as a duplicate of this bug. ***

Comment 53 Peter Hunt 2022-02-01 14:52:24 UTC
*** Bug 2015412 has been marked as a duplicate of this bug. ***

Comment 56 errata-xmlrpc 2022-03-10 16:32:46 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Moderate: OpenShift Container Platform 4.10.3 security update), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHSA-2022:0056


Note You need to log in before you can comment on or make changes to this bug.