Note: This bug is displayed in read-only format because the product is no longer active in Red Hat Bugzilla.

Bug 1977144

Summary:	Neutron fails to detect sidecar container ids when a process belongs to an "unexpected" cgroup namespace
Product:	Red Hat OpenStack	Reporter:	Takashi Kajinami <tkajinam>
Component:	openstack-tripleo-heat-templates	Assignee:	Takashi Kajinami <tkajinam>
Status:	CLOSED ERRATA	QA Contact:	Alex Katz <akatz>
Severity:	high	Docs Contact:
Priority:	high
Version:	16.1 (Train)	CC:	bdobreli, bshephar, ccamposr, ekuris, mburns, pmannidi, skaplons
Target Milestone:	---	Keywords:	Triaged
Target Release:	---
Hardware:	Unspecified
OS:	Unspecified
Whiteboard:
Fixed In Version:	openstack-tripleo-heat-templates-11.3.2-1.20210722133312.29a02c1.el8ost	Doc Type:	If docs needed, set a value
Doc Text:		Story Points:	---
Clone Of:		Environment:
Last Closed:	2022-03-24 10:59:23 UTC	Type:	Bug
Regression:	---	Mount Type:	---
Documentation:	---	CRM:
Verified Versions:		Category:	---
oVirt Team:	---	RHEL 7.3 requirements from Atomic Host:
Cloudforms Team:	---	Target Upstream Version:
Embargoed:

Description Takashi Kajinami 2021-06-29 04:53:39 UTC

Description of problem:

We observed some instance creation error caused by timeout during VIF creation after executing a senario tests with concurrent instance creation and deletion.

Looking at neutron log, it turned out that neutron is not able to obtain id of sidecar containers.

/var/log/containers/neutron/kill-script.log
~~~
+ exec
+ SIG=9
+ PID=<pid>
++ ip netns identify <pid>
+ NETNS=qdhcp-<network id>
+ '[' xqdhcp-<network id> == x ']'
+ CLI='nsenter --net=/run/netns/qdhcp-<network id> --preserve-credentials -m -t 1 podman'
+ '[' -f /proc/<pid>/cgroup ']'
++ awk 'BEGIN {FS="[-.]"} /name=/{print $3}' /proc/<pid>/cgroup
+ CT_ID=conmon
++ nsenter --net=/run/netns/qdhcp-<network id> --preserve-credentials -m -t 1 podman inspect -f '{{.Name}}' conmon
Error: error getting image "conmon": unable to find a name and tag match for conmon in repotags: no such image
~~~

We've checked cgroups to which the process belong, and it seems that the process belongs to libpod-conmon-<conainer id>.scope instead of lobpod-<container id>.scope.
~~~
$ sudo cat /proc/<pid>/cgroup
...
3:pids:/machine.slice/libpod-<container id>.scope
2:net_cls,net_prio:/machine.slice/libpod-<container id>.scope
1:name=systemd:/machine.slice/libpod-conmon-<container id>.scope <=== (*)
~~~

We discussed this issue with RHEL engineering in bz1976734 .

According to the feedback we received, in podman we should not rely on name cgroup but rely on pid one, which podman actively manages and it is better guaranteed that the cgroup name is consistent.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1976734#c3


Version-Release number of selected component (if applicable):


How reproducible:
Occasionally. A customer has hit this issue several times during their scenario tests

Steps to Reproduce:
1.
2.
3.

Actual results:


Expected results:


Additional info:

Comment 10 errata-xmlrpc 2022-03-24 10:59:23 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory (Red Hat OpenStack Platform 16.1.8 bug fix and enhancement advisory), and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHBA-2022:0986