*** Bug 1967807 has been marked as a duplicate of this bug. ***
haven't gotten a moment to look at this yet
sorry, still haven't
*** Bug 1980522 has been marked as a duplicate of this bug. ***
many of these issues should have been mitigated in the fixes for https://bugzilla.redhat.com/show_bug.cgi?id=1952137 can we have the pod spec associated with this amq image so we can test that we don't get zombies in 4.6.36
Hey folks, I've emerged from deep within the code to say that fixes in 4.6.36 do improve the situation, but don't totally fix it, but that's the applications fault. here's why: Liveness and readiness exec probes eventually call `runc exec` to spawn the container process. `runc exec` will join the namespace of the parent container. If the container is in a private pid namespace (as is default), that means the exec process will be a child of the container process. If that container process is killed without being reaped, then its zombie will live in the process table until either it's parent reaps it (calls wait()), or that parent is killed. This is less of an issue for liveness probes, as a timed out liveness probe eventually results in the container being killed, thus having the second condition satisfied. But for readiness probes, the container is kept alive, thus keeping the zombies "alive". There was a bug that was finally fixed in 4.6.36 that cut conmon out of the middle. There was a period of 4.6 where conmon was at risk of being zombified (as shown in https://bugzilla.redhat.com/show_bug.cgi?id=1967808#c1). However, 4.6.36 now makes it so that *only* the container process is zombified. There is nothing further that cri-o can do about this. It is up to the application author (amq in this case) to have the pid 1 of the container (the initial container process) reap the exec processes, or to be in the pod pid namespace (where the pod infra container is pid 1 and does the reaping).
according to Comment 12 , zombie process caused by amq image only can be resolved by installing an init container into the image. set verified.
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Moderate: OpenShift Container Platform 4.9.0 bug fix and security update), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHSA-2021:3759
The needinfo request[s] on this closed bug have been removed as they have been unresolved for 500 days