Bug 1466848

Summary: Restart of atomic-openshift-node container terminates pod glusterfs mount
Product: OpenShift Container Platform Reporter: Jan Safranek <jsafrane>
Component: StorageAssignee: Jan Safranek <jsafrane>
Status: CLOSED DUPLICATE QA Contact: Jianwei Hou <jhou>
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.5.0CC: aos-bugs, atumball, bchilds, bleanhar, bmchugh, csaba, ekuric, eparis, erich, hchiramm, jhou, jkaur, jnordell, jsafrane, mrobson, rcyriac, rhs-bugs, sdodson, tlarsson
Target Milestone: ---Keywords: NeedsTestCase, Reopened
Target Release: 3.7.0   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: 1423640 Environment:
Last Closed: 2017-09-11 08:48:15 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:

Comment 1 Jan Safranek 2017-06-30 14:27:10 UTC
I lowered the severity from the original bug, AFAIK no customer is complaining so far.

Comment 2 Jan Safranek 2017-06-30 14:29:42 UTC
Pruned bug dependencies

Comment 3 Jan Safranek 2017-06-30 14:43:34 UTC
I am talking to local systemd guys about escaping a docker container properly so fuse daemon runs really on the host and restart of docker container won't kill it.

Option 1:
Newer systemd (v233?) ships systemd-mount, which creates an transient unit file that mounts. Fuse daemon would probably run in its context. In the container we would probably do 'nsenter --mount=/rootfs/proc/1/ns/mnt -- /bin/systemd-mount -t glusterfs -o <opts> <what> <where>' (testing needed).

Unfortunately, RHEL7 has too old systemd and systemd-mount is not there and rebase is not planned. Backport could be possible though.

Option 2:
systemd in RHEL7 has systemd-run command, which creates a transient service and executes something there. kubelet would do `nsenter --mount=/rootfs/proc/1/ns/mnt -- /bin/systemd-run /bin/mount -t glusterfs -o <opts> <what> <where>'. Again, testing needed as I am not sure if the service would not be killed by systemd when /bin/mount finishes and only glusterfs fuse daemon is running.

I'm investigating these options.

Obviously, both these options will make openshift-node container dependent on the host running systemd. So far that was not hard requirement.

Any other smart ideas how to escape a container are welcome.

Comment 4 Jan Safranek 2017-06-30 15:17:08 UTC
Tested option 2, this looks working:

nsenter --mount=/rootfs/proc/1/ns/mnt -- systemd-run --scope /bin/mount -t glusterfs 172.17.0.2:test_vol /var/lib/origin/openshift.local.volumes/xyz

(and nsenter --mount=/rootfs/proc/1/ns/mnt -- umount /var/lib/origin/openshift.local.volumes/xyz)

- glusterfs fuse daemon runs in its own systemd slice (=cgroup) with a random name (run-11615.scope)
- it is not killed when /bin/mount finishes
- it is killed by unmount
- the slice is automatically deleted when the last process dies, i.e. after unmount


That brings us to hard dependency on systemd on the host... In OpenShift it's probably OK, I am not sure about upstream.

Comment 5 Jan Safranek 2017-07-03 13:19:31 UTC
created https://github.com/kubernetes/kubernetes/pull/48430, above systemd-run call is used when it's available on the host, otherwise simple 'nsenter --mount=/rootfs/proc/1/ns/mnt -- mount' is used.

Comment 6 Jan Safranek 2017-09-11 08:48:15 UTC

*** This bug has been marked as a duplicate of bug 1472370 ***