Bug 1427820
| Summary: | systemd container umounts local mounts(of host) when /dev exported. | ||
|---|---|---|---|
| Product: | Red Hat Enterprise Linux 7 | Reporter: | Mohamed Ashiq <mliyazud> |
| Component: | systemd | Assignee: | systemd-maint |
| Status: | CLOSED NOTABUG | QA Contact: | qe-baseos-daemons |
| Severity: | high | Docs Contact: | |
| Priority: | high | ||
| Version: | 7.3 | CC: | dwalsh, gscrivan, hchiramm, msekleta, pprakash, rcyriac, sheggodu, systemd-maint-list |
| Target Milestone: | rc | ||
| Target Release: | --- | ||
| Hardware: | Unspecified | ||
| OS: | Unspecified | ||
| Whiteboard: | |||
| Fixed In Version: | Doc Type: | If docs needed, set a value | |
| Doc Text: | Story Points: | --- | |
| Clone Of: | Environment: | ||
| Last Closed: | 2020-06-16 14:10:30 UTC | Type: | Bug |
| Regression: | --- | Mount Type: | --- |
| Documentation: | --- | CRM: | |
| Verified Versions: | Category: | --- | |
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
| Cloudforms Team: | --- | Target Upstream Version: | |
| Embargoed: | |||
| Bug Depends On: | |||
| Bug Blocks: | 1427823 | ||
|
Description
Mohamed Ashiq
2017-03-01 10:21:51 UTC
My setup has /var/, /var/log/, /home/, /boot/, swap and /(root) as separate fs. When I try with plain systemd container I hit the agetty issue which we faced and it consumes whole memory so lost the machine. I could not retrieve anything from it. Created a image with systemd and mask getty. Things which I noticed are * When the pod starts local mounts start unmounting. * The issue is reproducible with space left in all the fs. * No service is failing. * I tried to have few local mounts in another setup which had no separate /var or /var/log, I mounted a small lv on /home. Start of container did not cause any issue. Things found out: While mounting takes place in the container, systemd sends SIGTERM to systemd-journald. That is the exact point the node starts unmounting. Also there is service running only in these situation which looks suspicious systemd-tmpfiles-setup-dev.service(Started Create Static Device Nodes in /dev). In some cases, I saw SIGTERM(15) is sent from systemd to all the services on the host. The usual test setup which has no local mounts for /var and /var/log, there is no SIGTERM sent to systemd-journald. I hope this is helpful. This is part of the Container Native Storage (CNS) initiative for OpenShift. For any kind of storage (e.g. Gluster or Ceph) which has udev dependencies, resolving this issue is important. (In reply to Ju Lim from comment #5) > This is part of the Container Native Storage (CNS) initiative for OpenShift. > For any kind of storage (e.g. Gluster or Ceph) which has udev dependencies, > resolving this issue is important. Please point us to docker images you are using in your Kubernetes cluster. If we are to do something about this problem we need to reproduce it locally first. Note that I found this repo that looks related, https://github.com/gluster/gluster-kubernetes Can I follow "Quickstart" section from there to setup this in vagrant VM? (In reply to Michal Sekletar from comment #6) > (In reply to Ju Lim from comment #5) > > This is part of the Container Native Storage (CNS) initiative for OpenShift. > > For any kind of storage (e.g. Gluster or Ceph) which has udev dependencies, > > resolving this issue is important. > > Please point us to docker images you are using in your Kubernetes cluster. > If we are to do something about this problem we need to reproduce it locally > first. > > Note that I found this repo that looks related, > > https://github.com/gluster/gluster-kubernetes > > Can I follow "Quickstart" section from there to setup this in vagrant VM? Hi, The issue is reproducible on plain docker. What has to be done is # docker run -d --privileged --net=host -v /dev:/dev -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/lib/glusterd:/var/lib/glusterd:z brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7:3.1.3-16 Image name: brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7:3.1.3-16 This is the last image without the workaround/hack. Basically any image with systemd and host having separate fs on /var. this issue is reproducible. I tested with above. If you want our complete cns setup. then yeah you are in right path. Just use the downstream Image. same downstream bits are available here: http://download-node-02.eng.bos.redhat.com/rel-eng/RHGS/3.1-u3-cns-RHEL-7/latest/Server-RH-Gluster-3-Server/x86_64/os/Packages/ also available live. you can follow the documentation here: https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/html/container-native_storage_for_openshift_container_platform/ Thanks for taking this up. Let me know if you need anything. (In reply to Mohamed Ashiq from comment #8) > (In reply to Michal Sekletar from comment #6) > > (In reply to Ju Lim from comment #5) > > > This is part of the Container Native Storage (CNS) initiative for OpenShift. > > > For any kind of storage (e.g. Gluster or Ceph) which has udev dependencies, > > > resolving this issue is important. > > > > Please point us to docker images you are using in your Kubernetes cluster. > > If we are to do something about this problem we need to reproduce it locally > > first. > > > > Note that I found this repo that looks related, > > > > https://github.com/gluster/gluster-kubernetes > > > > Can I follow "Quickstart" section from there to setup this in vagrant VM? > > Hi, > > The issue is reproducible on plain docker. What has to be done is > > # docker run -d --privileged --net=host -v /dev:/dev -v > /sys/fs/cgroup:/sys/fs/cgroup:ro -v /var/lib/glusterd:/var/lib/glusterd:z > brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7: > 3.1.3-16 > > Image name: > brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7: > 3.1.3-16 > > This is the last image without the workaround/hack. Basically any image with > systemd and host having separate fs on /var. this issue is reproducible. > I tested with above. > > If you want our complete cns setup. then yeah you are in right path. Just > use the downstream Image. > > same downstream bits are available here: > http://download-node-02.eng.bos.redhat.com/rel-eng/RHGS/3.1-u3-cns-RHEL-7/ > latest/Server-RH-Gluster-3-Server/x86_64/os/Packages/ > > also available live. > > you can follow the documentation here: > https://access.redhat.com/documentation/en-us/red_hat_gluster_storage/3.1/ > html/container-native_storage_for_openshift_container_platform/ > > Thanks for taking this up. Let me know if you need anything. make sure you have port 24007 open on host. I would prefer using a dockerfile with content: ####################### FROM rhel ENV container docker RUN systemctl mask getty.target CMD ["/usr/sbin/init"] ####################### # docker run -d -v /dev:/dev --privileged --net=host -v /sys/fs/cgroup:/sys/fs/cgroup:ro <IMAGENAME> and Host /var should be on separate mounted fs than /. This should do it. The systemd version in use was "systemd-219-19.el7_2.13.x86_64" when we hit the issue. Can we run this container as a system container rather then as a container running under docker? If we do, could we do the whole thing as a chroot rather then using any kind of container technology. Make sure that you are sharing the host with the udev rules? The code would still need to differentiate between / and /host, but it would share everything else with the host, and would not have to deal with the container runtime or things like the mount namespace. Only problem would be this can not be orchestrated via k8s it would need to be installed using something like Ansible on each container node. The container should not be able to mount/umount from /dev. Can you check what propagation are used for /dev? From the host: # cat /proc/self/mountinfo | grep " /dev " # cat /proc/$ID_OF_THE_init_PROCESS_IN_THE_CONTAINER/mountinfo | grep " /dev " Also, and this hopefully will suffice to fix your problem, the --privileged is too wide, you probably don't want this. I recommend that you start without caps and that you add caps (--add-cap) that are required by your container instead. For instance systemd-tmpfiles-setup-dev.service which is the service that is causing this issue depends on `CAP_SYS_MODULE`. Does this container require to mount kernel modules? You probably don't want it, so you can try adding a --cap-drop=CAP_SYS_MODULE to your docker run. |