Bug 882047

Summary: "ip netns exec" destroys the /sys mounting and causes systemd problem
Product: [Fedora] Fedora Reporter: Etsuji Nakai <enakai>
Component: iprouteAssignee: Petr Šabata <psabata>
Status: CLOSED ERRATA QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: unspecified Docs Contact:
Priority: unspecified    
Version: 18CC: apevec, jpopelka, mrunge, psabata, rvokal, sbaker, twoerner
Target Milestone: ---   
Target Release: ---   
Hardware: Unspecified   
OS: Unspecified   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2013-02-20 21:16:20 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Description Etsuji Nakai 2012-11-30 01:57:09 UTC
Description of problem:
"ip netns exec" destroys the /sys mounting and results in the systemd problem.

Version-Release number of selected component (if applicable):
# rpm -q iproute
iproute-3.6.0-2.fc18.x86_64
# rpm -q systemd
systemd-195-8.fc18.x86_64


How reproducible:
Steps to Reproduce:
1. Check systemctl works well and /sys mounting status.

# systemctl
(No error)
# mount | grep /sys
sysfs on /sys type sysfs (rw,nosuid,nodev,noexec,relatime,seclabel)
securityfs on /sys/kernel/security type securityfs (rw,nosuid,nodev,noexec,relatime)
selinuxfs on /sys/fs/selinux type selinuxfs (rw,relatime)
tmpfs on /sys/fs/cgroup type tmpfs (rw,nosuid,nodev,noexec,seclabel,mode=755)
cgroup on /sys/fs/cgroup/systemd type cgroup (rw,nosuid,nodev,noexec,relatime,release_agent=/usr/lib/systemd/systemd-cgroups-agent,name=systemd)
cgroup on /sys/fs/cgroup/cpuset type cgroup (rw,nosuid,nodev,noexec,relatime,cpuset)
cgroup on /sys/fs/cgroup/cpu,cpuacct type cgroup (rw,nosuid,nodev,noexec,relatime,cpuacct,cpu)
cgroup on /sys/fs/cgroup/memory type cgroup (rw,nosuid,nodev,noexec,relatime,memory)
cgroup on /sys/fs/cgroup/devices type cgroup (rw,nosuid,nodev,noexec,relatime,devices)
cgroup on /sys/fs/cgroup/freezer type cgroup (rw,nosuid,nodev,noexec,relatime,freezer)
cgroup on /sys/fs/cgroup/net_cls type cgroup (rw,nosuid,nodev,noexec,relatime,net_cls)
cgroup on /sys/fs/cgroup/blkio type cgroup (rw,nosuid,nodev,noexec,relatime,blkio)
cgroup on /sys/fs/cgroup/perf_event type cgroup (rw,nosuid,nodev,noexec,relatime,perf_event)
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
debugfs on /sys/kernel/debug type debugfs (rw,relatime)
configfs on /sys/kernel/config type configfs (rw,relatime)

2. Execute "ip netns exec"
# ip netns add test; ip netns exec test ip link
5: lo: <LOOPBACK> mtu 16436 qdisc noop state DOWN mode DEFAULT 
    link/loopback 00:00:00:00:00:00 brd 00:00:00:00:00:00

3. /sys mounting status is broken. Many of previous results have gone away.
# mount | grep /sys
systemd-1 on /proc/sys/fs/binfmt_misc type autofs (rw,relatime,fd=29,pgrp=1,timeout=300,minproto=5,maxproto=5,direct)
test on /sys type sysfs (rw,relatime,seclabel)

4. As a result systemctl fails with the following error
# systemctl 
Failed to get D-Bus connection: No connection to service manager.

Comment 1 Etsuji Nakai 2012-12-01 01:50:35 UTC
By the way, I'm not sure the reason but by "run-and-stop systemd" as below makes systemctl works again.
----
$ systemd
Failed to open private bus connection: Unable to autolaunch a dbus-daemon without a $DISPLAY for X11
(Stop it with Ctrl+C)
----

Comment 2 Etsuji Nakai 2012-12-04 04:45:20 UTC
I looked into iproute2's source code.

I'm afraid that the "netns exec" part of "ip" is broken.

iproute2-3.6.0/ip/ipnetns.c.orig
============
    119 static int netns_exec(int argc, char **argv)
    120 {
...
    155         /* Mount a version of /sys that describes the network namespace */
    156         if (umount2("/sys", MNT_DETACH) < 0) {
    157                 fprintf(stderr, "umount of /sys failed: %s\n", strerror(errno));
    158                 return -1;
    159         }
    160         if (mount(name, "/sys", "sysfs", 0, NULL) < 0) {
    161                 fprintf(stderr, "mount of /sys failed: %s\n",strerror(errno));
    162                 return -1;
    163         }
    164 
    165         /* Setup bind mounts for config files in /etc */
    166         bind_etc(name);
    167 
    168         if (execvp(cmd, argv + 1)  < 0)
    169                 fprintf(stderr, "exec of %s failed: %s\n",
    170                         cmd, strerror(errno));
    171         exit(-1);
    172 }
============

1. This remounts "/sys" without any consideration and breaks the original mount tree under /sys.
2. This leaves the remounted /sys containing the child network namespace information.

The second point can be confirmed as below:
--------------
Add "parent" device in the parent namespace.
[root@localhost ~]# ls /sys/devices/virtual/net/
lo
[root@localhost ~]# ip link add parent type dummy
[root@localhost ~]# ls /sys/devices/virtual/net/
lo  parent

Add "child" device in the child (test) namespace.
[root@localhost ~]# ip netns add test
[root@localhost ~]# ip netns exec test ip link add child type dummy

In the parent namesapce, /sys remains as the child's one. You cannot see the "parent" device there.
[root@localhost ~]# ls /sys/devices/virtual/net/
child  lo
--------------

IMHO, it's better not to remount /sys from netns_exec, and let the executed command to take care of it. As "ip netns exec" is just a convenient way of tweaking the namespace, if you need more consistent namespace management, you'd better use the LXC container toolsets.

Comment 3 Petr Šabata 2012-12-04 16:06:11 UTC
Removing the remount fixes the issue (obviously).

Given ip-netns(8) manpage, I'd say the current behaviour is intentional.  However, I agree with you this should be probably handled elsewhere as 'netns exec' users won't probably be interested in breaking their systemd mounts every time...

Comment 4 Petr Šabata 2013-01-10 12:51:01 UTC
*** Bug 892927 has been marked as a duplicate of this bug. ***

Comment 5 Petr Šabata 2013-01-10 12:52:21 UTC
I'm just checking if removing this bit might hit other things or not.

Comment 6 Petr Šabata 2013-01-16 16:09:18 UTC
This was caused by mount changes in the new namespace propagated to the parent since /sys mounts were explicitly marked as shared.

A fix remounting the whole cloned tree as private should be upstream soon.

Comment 7 Steve Baker 2013-02-07 23:17:32 UTC
I'm blocked on other tasks due to this bug, is there anything I can help test to move progress along?

Comment 8 Petr Šabata 2013-02-08 08:14:25 UTC
The patch got accepted upstream, I'll submit an update today.

Comment 9 Fedora Update System 2013-02-08 14:05:26 UTC
iproute-3.6.0-6.fc18 has been submitted as an update for Fedora 18.
https://admin.fedoraproject.org/updates/iproute-3.6.0-6.fc18

Comment 10 Etsuji Nakai 2013-02-10 06:09:54 UTC
iproute-3.6.0-6.fc18 worked for me. Thanks.

Comment 11 Fedora Update System 2013-02-12 05:07:00 UTC
iproute-3.6.0-6.fc18 has been pushed to the Fedora 18 stable repository.  If problems still persist, please make note of it in this bug report.

Comment 12 Steve Baker 2013-02-13 20:45:05 UTC
This has worked for me too, thanks very much.

Comment 13 Alan Pevec 2013-02-20 21:16:20 UTC
(In reply to comment #11)
> iproute-3.6.0-6.fc18 has been pushed to the Fedora 18 stable repository.  If
> problems still persist, please make note of it in this bug report.

Not sure why Bodhi hasn't changed status to CLOSED->ERRATA ?

Comment 14 Fedora Update System 2013-03-06 14:20:51 UTC
iproute-3.3.0-6.fc17 has been submitted as an update for Fedora 17.
https://admin.fedoraproject.org/updates/iproute-3.3.0-6.fc17

Comment 15 Fedora Update System 2013-03-22 00:36:32 UTC
iproute-3.3.0-6.fc17 has been pushed to the Fedora 17 stable repository.  If problems still persist, please make note of it in this bug report.