Bug 872689

Summary: Quantum: root cannot access network namespaces created by Quantum service
Product: [Fedora] Fedora Reporter: Dan Prince <dprince>
Component: openstack-quantumAssignee: Bob Kukura <rkukura>
Status: CLOSED WONTFIX QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 17CC: apevec, apevec, breu, chrisw, dennisml, enakai, Jan.van.Eldik, jose.castro.leon, lpeer, markmc, maurizio.antillon, rkukura
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-08-01 18:49:07 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Dan Prince 2012-11-02 18:15:54 UTC
Description of problem:

I'm using Quantum Fedora 17 packages for Folsom (or upstream Grizzly) with namespaces enabled:

[root@nova1 ~]# cat /etc/quantum/dhcp_agent.ini | grep ^use_namespaces
use_namespaces=True
[root@nova1 ~]# cat /etc/quantum/l3_agent.ini | grep ^use_namespaces
use_namespaces=True

When I start up the dhcp and l3 agents the following network namespaces get created:

[root@nova1 ~]# ip netns
qdhcp-dd1e3062-8dfe-4251-b326-e9634c344fd4
qrouter-9a8ce59a-09e0-421b-ad38-7c1a8d7ff9d4

I am then however unable to access these namespaces as root. For example:

qrouter-9a8ce59a-09e0-421b-ad38-7c1a8d7ff9d4
[root@nova1 ~]# ip netns exec qrouter-9a8ce59a-09e0-421b-ad38-7c1a8d7ff9d4 bash
seting the network namespace failed: Invalid argument

Running the same command with an strace shows the following:

getsockname(3, {sa_family=AF_NETLINK, pid=3146, groups=00000000}, [12]) = 0
open("/var/run/netns/qrouter-9a8ce59a-09e0-421b-ad38-7c1a8d7ff9d4", O_RDONLY) = 4
setns(4, 1073741824)                    = -1 EINVAL (Invalid argument)
write(2, "seting the network namespace fai"..., 54seting the network namespace failed: Invalid argument
) = 54
exit_group(1)                           = ?
+++ exited with 1 +++

----

The permissions on the netns files in /var/run/netns is curious as well:

[root@nova1 ~]# ll /var/run/netns/
total 0
----------. 1 root root 0 Nov  2 14:05 qdhcp-dd1e3062-8dfe-4251-b326-e9634c344fd4
----------. 1 root root 0 Nov  2 14:05 qrouter-9a8ce59a-09e0-421b-ad38-7c1a8d7ff9d4

----

Ultimately quantum does seem to work in this mode but not being able to access the IP namespaces as root is annoying when trying to follow some of the upstream documentation on this service.

As a workaround... if you run Quantum in the foreground as root things seem to work just fine. So something about running the Quantum dhcp and l3 agents as proper systemd services (as the quantum user) is causing this issue.

Comment 1 Dennis Jacobfeuerborn 2012-11-09 01:58:32 UTC
I'm having issues with this too. Specifically the l3 agent doesn't seem to be able to enable ip forwarding.

In the l3-agent.log I see this:
...
RuntimeError:
Command: ['sudo', 'quantum-rootwrap', '/etc/quantum/rootwrap.conf', 'ip',
'netns', 'exec', 'qrouter-39e52a8a-9ffe-4ba5-a625-bb7513035cbf', 'sysctl',
'-w', 'net.ipv4.ip_forward=1']
Exit code: 1
Stdout: ''
Stderr: 'mount of /sys failed: Device or resource busy\n'

Comment 2 Etsuji Nakai 2012-12-01 01:23:49 UTC
Two things to note:

1. Dennis's problem (Comment 1) is not related to the original problem. See https://bugzilla.redhat.com/show_bug.cgi?id=881733 

2. I think the original problem is caused by the systemd's PrivateTmp setting. I'm not sure the exact mechanism how it happens. But at least setting "PrivateTmp=false" in the follwing files resolved the issue.

/usr/lib/systemd/system/quantum-dhcp-agent.service
/usr/lib/systemd/system/quantum-l3-agent.service
/usr/lib/systemd/system/quantum-openvswitch-agent.service
/usr/lib/systemd/system/quantum-server.service

Thanks.

Comment 3 Fedora Update System 2012-12-03 21:52:07 UTC
openstack-quantum-2012.2.1-1.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/openstack-quantum-2012.2.1-1.fc18

Comment 4 Bob Kukura 2012-12-03 21:58:59 UTC
The above update turns off PrivateTmp for l3_agent and dhcp_agent.

Comment 5 Alan Pevec 2012-12-03 23:01:42 UTC
(In reply to comment #2)
> 2. I think the original problem is caused by the systemd's PrivateTmp
> setting.

There was a systemd bug 851970 for PrivateTmp namespace, which was fixed in F18 only.
Could you try with PrivateTmp=true but on F18 ?

Comment 6 Etsuji Nakai 2012-12-04 00:13:31 UTC
Alan,

This problem happens with F18 and PrivateTmp=true. So this is a different problem from 851970. I'm using the following packages.

# rpm -qa | grep systemd
systemd-195-8.fc18.x86_64
systemd-libs-195-8.fc18.x86_64
systemd-sysv-195-8.fc18.x86_64

# rpm -qa | grep quantum
python-quantum-2012.2-1.fc18.noarch
openstack-quantum-openvswitch-2012.2-1.fc18.noarch
python-quantumclient-2.1.1-0.fc18.noarch
openstack-quantum-2012.2-1.fc18.noarch

Comment 7 Etsuji Nakai 2012-12-04 01:17:04 UTC
Alan, 

I investigated how it happens. I think the following could explain the reason.

1. With "PrivateTmp=true", systemd remounts the process's root filesystem as "MS_SLAVE". It means, any newly mounted filesystems within this process's fs-namespace don't appear in the parent fs-namespace.

/systemd-195/src/core/namespace.c
-------
    287         /* Remount / as SLAVE so that nothing now mounted in the namespace
    288            shows up in the parent */
    289         if (mount(NULL, "/", NULL, MS_SLAVE|MS_REC, NULL) < 0) {
    290                 r = -errno;
    291                 goto fail;
    292         }
-------

2. When new network namespace is created with "ip netns add", it makes a bind-mount from /proc/self/ns/net to /var/run/netns/<name>. 

iproute2-3.6.0/ip/ipnetns.c
-------
    234         /* Bind the netns last so I can watch for it */
    235         if (mount("/proc/self/ns/net", netns_path, "none", MS_BIND, NULL) < 0) {
    236                 fprintf(stderr, "Bind /proc/self/ns/net -> %s failed: %s\n",
    237                         netns_path, strerror(errno));
    238                 goto out_delete;
    239         }
-------

However, it's not reflected to the parent fs-namespace due to the "MS_SLAVE". Hence, other processes in the parent fs-namespace cannot use that network namespace.

I think there are several design choices how it can be resolved...

1) Add a systemd's service config option not to set the root filesystem as "MS_SLAVE".
2) Simply guide users not to use PrivateTmp=true for processes which mounts new filesystems and share them with other processes.

Thanks.

Comment 8 Fedora Update System 2013-01-12 00:01:55 UTC
openstack-quantum-2012.2.1-1.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 9 Fedora End Of Life 2013-07-04 07:05:31 UTC
This message is a reminder that Fedora 17 is nearing its end of life.
Approximately 4 (four) weeks from now Fedora will stop maintaining
and issuing updates for Fedora 17. It is Fedora's policy to close all
bug reports from releases that are no longer maintained. At that time
this bug will be closed as WONTFIX if it remains open with a Fedora 
'version' of '17'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version prior to Fedora 17's end of life.

Bug Reporter:  Thank you for reporting this issue and we are sorry that 
we may not be able to fix it before Fedora 17 is end of life. If you 
would still like  to see this bug fixed and are able to reproduce it 
against a later version  of Fedora, you are encouraged  change the 
'version' to a later Fedora version prior to Fedora 17's end of life.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.

Comment 10 Fedora End Of Life 2013-08-01 18:49:14 UTC
Fedora 17 changed to end-of-life (EOL) status on 2013-07-30. Fedora 17 is 
no longer maintained, which means that it will not receive any further 
security or bug fix updates. As a result we are closing this bug.

If you can reproduce this bug against a currently maintained version of 
Fedora please feel free to reopen this bug against that version.

Thank you for reporting this bug and we are sorry it could not be fixed.