Bug 1977144
| Summary: | Neutron fails to detect sidecar container ids when a process belongs to an "unexpected" cgroup namespace | ||
|---|---|---|---|
| Product: | Red Hat OpenStack | Reporter: | Takashi Kajinami <tkajinam> |
| Component: | openstack-tripleo-heat-templates | Assignee: | Takashi Kajinami <tkajinam> |
| Status: | CLOSED ERRATA | QA Contact: | Alex Katz <akatz> |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 16.1 (Train) | CC: | bdobreli, bshephar, ccamposr, ekuris, mburns, pmannidi, skaplons |
| Target Milestone: | --- | Keywords: | Triaged |
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | openstack-tripleo-heat-templates-11.3.2-1.20210722133312.29a02c1.el8ost | Doc Type: | If docs needed, set a value |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2022-03-24 10:59:23 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHBA-2022:0986 |
Description of problem: We observed some instance creation error caused by timeout during VIF creation after executing a senario tests with concurrent instance creation and deletion. Looking at neutron log, it turned out that neutron is not able to obtain id of sidecar containers. /var/log/containers/neutron/kill-script.log ~~~ + exec + SIG=9 + PID=<pid> ++ ip netns identify <pid> + NETNS=qdhcp-<network id> + '[' xqdhcp-<network id> == x ']' + CLI='nsenter --net=/run/netns/qdhcp-<network id> --preserve-credentials -m -t 1 podman' + '[' -f /proc/<pid>/cgroup ']' ++ awk 'BEGIN {FS="[-.]"} /name=/{print $3}' /proc/<pid>/cgroup + CT_ID=conmon ++ nsenter --net=/run/netns/qdhcp-<network id> --preserve-credentials -m -t 1 podman inspect -f '{{.Name}}' conmon Error: error getting image "conmon": unable to find a name and tag match for conmon in repotags: no such image ~~~ We've checked cgroups to which the process belong, and it seems that the process belongs to libpod-conmon-<conainer id>.scope instead of lobpod-<container id>.scope. ~~~ $ sudo cat /proc/<pid>/cgroup ... 3:pids:/machine.slice/libpod-<container id>.scope 2:net_cls,net_prio:/machine.slice/libpod-<container id>.scope 1:name=systemd:/machine.slice/libpod-conmon-<container id>.scope <=== (*) ~~~ We discussed this issue with RHEL engineering in bz1976734 . According to the feedback we received, in podman we should not rely on name cgroup but rely on pid one, which podman actively manages and it is better guaranteed that the cgroup name is consistent. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1976734#c3 Version-Release number of selected component (if applicable): How reproducible: Occasionally. A customer has hit this issue several times during their scenario tests Steps to Reproduce: 1. 2. 3. Actual results: Expected results: Additional info: