Bug 1544150

Summary: Restarting Neutron containers which make use of network namespaces doesn't work
Product: Red Hat OpenStack Reporter: Daniel Alvarez Sanchez <dalvarez>
Component: openstack-tripleo-heat-templatesAssignee: Brent Eagles <beagles>
Status: CLOSED ERRATA QA Contact: Roee Agiman <ragiman>
Severity: high Docs Contact:
Priority: high    
Version: 13.0 (Queens)CC: aschultz, bcafarel, chjones, jschluet, mburns, michele, nyechiel, rhel-osp-director-maint, rscarazz, tfreger, ushkalim
Target Milestone: betaKeywords: Regression, Triaged
Target Release: 13.0 (Queens)   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: openstack-tripleo-heat-templates-8.0.2-0.20180327213843.f25e2d8.el7ost Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2018-06-27 13:44:34 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Bug Depends On:    
Bug Blocks: 1527130    

Description Daniel Alvarez Sanchez 2018-02-10 18:50:01 UTC
When DHCP, L3, Metadata or OVN-Metadata containers are restarted they can't
set the previous namespaces:

[heat-admin@overcloud-novacompute-0 neutron]$ sudo docker restart 8559f5a7fa45
8559f5a7fa45

[heat-admin@overcloud-novacompute-0 neutron]$ tail -f /var/log/containers/neutron/networking-ovn-metadata-agent.log
2018-02-09 08:34:41.059 5 CRITICAL neutron [-] Unhandled error: ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2018-02-09 08:34:41.059 5 ERROR neutron Traceback (most recent call last):
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/bin/networking-ovn-metadata-agent", line 10, in <module>
2018-02-09 08:34:41.059 5 ERROR neutron sys.exit(main())
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/cmd/eventlet/agents/metadata.py", line 17, in main
2018-02-09 08:34:41.059 5 ERROR neutron metadata_agent.main()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata_agent.py", line 38, in main
2018-02-09 08:34:41.059 5 ERROR neutron agt.start()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 147, in start
2018-02-09 08:34:41.059 5 ERROR neutron self.sync()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 56, in wrapped
2018-02-09 08:34:41.059 5 ERROR neutron return f(*args, **kwargs)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 169, in sync
2018-02-09 08:34:41.059 5 ERROR neutron metadata_namespaces = self.ensure_all_networks_provisioned()
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 350, in ensure_all_networks_provisioned
2018-02-09 08:34:41.059 5 ERROR neutron netns = self.provision_datapath(datapath)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/networking_ovn/agent/metadata/agent.py", line 294, in provision_datapath
2018-02-09 08:34:41.059 5 ERROR neutron veth_name[0], veth_name[1], namespace)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 182, in add_veth
2018-02-09 08:34:41.059 5 ERROR neutron self._as_root([], 'link', tuple(args))
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 94, in _as_root
2018-02-09 08:34:41.059 5 ERROR neutron namespace=namespace)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/ip_lib.py", line 102, in _execute
2018-02-09 08:34:41.059 5 ERROR neutron log_fail_as_error=self.log_fail_as_error)
2018-02-09 08:34:41.059 5 ERROR neutron File "/usr/lib/python2.7/site-packages/neutron/agent/linux/utils.py", line 151, in execute
2018-02-09 08:34:41.059 5 ERROR neutron raise ProcessExecutionError(msg, returncode=returncode)
2018-02-09 08:34:41.059 5 ERROR neutron ProcessExecutionError: Exit code: 2; Stdin: ; Stdout: ; Stderr: RTNETLINK answers: Invalid argument
2018-02-09 08:34:41.059 5 ERROR neutron
2018-02-09 08:34:41.059 5 ERROR neutron
2018-02-09 08:34:41.177 21 INFO oslo_service.service [-] Parent process has died unexpectedly, exiting
2018-02-09 08:34:41.178 21 INFO eventlet.wsgi.server [-] (21) wsgi exited, is_accepting=True

An easy way to reproduce the bug:

[heat-admin@overcloud-novacompute-0 ~]$ sudo docker exec -u root -it 5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d /bin/bash

[root@overcloud-novacompute-0 /]# ip netns a my_netns
[root@overcloud-novacompute-0 /]# exit

[heat-admin@overcloud-novacompute-0 ~]$ sudo ip netns
[heat-admin@overcloud-novacompute-0 ~]$ sudo docker restart 5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d
5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d

[heat-admin@overcloud-novacompute-0 ~]$ sudo docker exec -u root -it 5c5f254a9321bd74b5911f46acb9513574c2cd9a3c59805a85cffd960bcc864d /bin/bash
[root@overcloud-novacompute-0 /]# ip netns
RTNETLINK answers: Invalid argument
RTNETLINK answers: Invalid argument
my_netns

[root@overcloud-novacompute-0 /]# ip netns e my_netns ip a
RTNETLINK answers: Invalid argument
setting the network namespace "my_netns" failed: Invalid argument

Deleting everything under /run/netns/* from kolla_start but this would involve
a full sync of the agents which is not desirable:

[root@overcloud-novacompute-0 /]# rm /run/netns/my_netns
rm: remove regular empty file '/run/netns/my_netns'? y
[root@overcloud-novacompute-0 /]# ip netns
[root@overcloud-novacompute-0 /]# ip netns a my_netns
[root@overcloud-novacompute-0 /]#

Comment 4 Daniel Alvarez Sanchez 2018-02-23 12:41:09 UTC
This BZ is addressed by https://review.openstack.org/#/c/542858/
Still we have this other problem: https://bugzilla.redhat.com/show_bug.cgi?id=1527130 that we should look into (for example, trying to make processes started in the container moved to PID 1 so that they persist after the container is stopped?).

Comment 9 errata-xmlrpc 2018-06-27 13:44:34 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2018:2086