1398235 – Local mounts from openshift nodes gets unmounted while deploying glusterfs container

Bug 1398235 - Local mounts from openshift nodes gets unmounted while deploying glusterfs container

Summary: Local mounts from openshift nodes gets unmounted while deploying glusterfs c...

Keywords:
Status:	CLOSED ERRATA
Alias:	None
Product:	Red Hat Gluster Storage
Classification:	Red Hat Storage
Component:	CNS-deployment
Sub Component:
Version:	rhgs-3.1
Hardware:	x86_64
OS:	Linux
Priority:	urgent
Severity:	urgent
Target Milestone:	---
Target Release:	CNS 3.4
Assignee:	Raghavendra Talur
QA Contact:	Prasanth
Docs Contact:
URL:
Whiteboard:
Duplicates (1):	1427823 (view as bug list)
Depends On:
Blocks:	1385247
TreeView+	depends on / blocked

Reported:	2016-11-24 10:29 UTC by Bipin Kunal
Modified:	2020-05-14 15:25 UTC (History)
CC List:	19 users (show)
Fixed In Version:	rhgs-server-docker-3.1.3-17
Doc Type:	If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:	2017-01-18 14:59:59 UTC
Embargoed:
Dependent Products:

Attachments	(Terms of Use)

Links
System	ID	Priority	Status	Summary	Last Updated
Red Hat Bugzilla	1294776	unspecified	CLOSED	LVs for bricks get unmounted from atomic host automatically on starting RHGS container	2021-02-22 00:41:40 UTC
Red Hat Bugzilla	1359375	unspecified	CLOSED	[Scale Testing] nsenter: Unable to fork: Cannot allocate memory	2021-02-22 00:41:40 UTC
Red Hat Product Errata	RHEA-2017:0149	normal	SHIPPED_LIVE	rhgs-server-docker bug fix and enhancement update	2017-01-18 20:08:41 UTC

Internal Links: 1294776 1359375

Description Bipin Kunal 2016-11-24 10:29:02 UTC

Description of problem:
Local mounts from openshift nodes gets unmounted while deploying  glusterfs container 


Version-Release number of selected component (if applicable):

# oc version
oc v3.3.0.32
kubernetes v1.3.0+52492b4
features: Basic-Auth GSSAPI Kerberos SPNEGO

rhgs - 3.1.3

Actual results:
# oc logs -f glusterfs-dc-syy09946-1-deploy
Error from server: Get https://syy09946.<server>:10250/containerLogs/storage-project/glusterfs-dc-syy09946-1-deploy/deployment?follow=true: dial tcp <IP>:10250: getsockopt: connection refused



Running this leads to unmounting of local filesystems. Journal log contains :
Nov 22 10:56:09 syy09946 umount[58560]: umount: /home: target is busy.
Nov 22 10:56:09 syy09946 umount[58560]: (In some cases useful info about processes that use
Nov 22 10:56:09 syy09946 umount[58560]: the device is found by lsof(8) or fuser(1))
Nov 22 10:56:09 syy09946 umount[58565]: umount: /var/log: target is busy.
Nov 22 10:56:09 syy09946 umount[58565]: (In some cases useful info about processes that use
Nov 22 10:56:09 syy09946 umount[58565]: the device is found by lsof(8) or fuser(1))
Nov 22 10:56:09 syy09946 umount[58563]: umount: /usr: target is busy.
Nov 22 10:56:09 syy09946 umount[58563]: (In some cases useful info about processes that use
Nov 22 10:56:09 syy09946 umount[58563]: the device is found by lsof(8) or fuser(1))
Nov 22 10:56:09 syy09946 systemd[1]: Failed unmounting /tmp.
Nov 22 10:56:09 syy09946 umount[58566]: umount: /tmp: target is busy.
Nov 22 10:56:09 syy09946 umount[58566]: (In some cases useful info about processes that use
Nov 22 10:56:09 syy09946 umount[58566]: the device is found by lsof(8) or fuser(1))
Nov 22 10:56:09 syy09946 systemd[1]: Failed unmounting /var/log.
Nov 22 10:56:09 syy09946 umount[58569]: umount: /var/log: target is busy.
Nov 22 10:56:09 syy09946 umount[58569]: (In some cases useful info about processes that use
Nov 22 10:56:09 syy09946 umount[58569]: the device is found by lsof(8) or fuser(1))
Nov 22 10:56:09 syy09946 systemd[1]: Failed unmounting /var/itlm.
Nov 22 10:56:09 syy09946 umount[58564]: umount: /var/itlm: target is busy.
Nov 22 10:56:09 syy09946 umount[58564]: (In some cases useful info about processes that use
Nov 22 10:56:09 syy09946 umount[58564]: the device is found by lsof(8) or fuser(1))

Comment 27 Mohamed Ashiq 2016-12-02 12:39:09 UTC

Hi, 

The issue is reproducible when /var or any /var subdirectory is a separate filesystem. The reason is because when we do bind mount of /var/log/glusterfs docker actually mounts the /var or any /var subdirectory's filesystem (check the df -h from container). from [1], 
SIGTERM is normal, After switch root the journal is restarted on
the real log device. This information is taken from [2]. This SIGTERM causes host to umount all the filesystem. Fix is as mentioned in [3], We just need systemd to work in the container and cleanup other services which are not really required. This way we just depend on things the rhgs container requires, neglecting unnecessary dependency.

# df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/dm-11                            10G  305M  9.7G   3% /
/dev/mapper/rhel_dhcp43--16-var       15G  307M   15G   2% /run
devtmpfs                             3.9G     0  3.9G   0% /dev
shm                                   64M     0   64M   0% /dev/shm
/dev/mapper/rhel_dhcp43--16-root      38G  1.7G   37G   5% /etc/glusterfs
tmpfs                                3.9G  696K  3.9G   1% /run/lvm
tmpfs                                3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/rhel_dhcp43--16-var_log   10G  1.3G  8.8G  13% /var/log/glusterfs
tmpfs                                3.9G   16K  3.9G   1% /run/secrets/kubernetes.io/serviceaccount
tmpfs                                4.0E     0  4.0E   0% /tmp
tmpfs                                783M     0  783M   0% /run/user/0

[1] systemd-journald[89]: Received SIGTERM from PID 1 (systemd).
[2] https://lists.opensuse.org/opensuse-bugs/2015-02/msg00149.html
[3] http://developerblog.redhat.com/2014/05/05/running-systemd-within-docker-container/

--Ashiq

Comment 39 Bipin Kunal 2016-12-13 04:30:26 UTC

Thanks Ashiq. I have provided update to customer. Waiting for their feedback.

Comment 45 Prasanth 2017-01-04 09:46:53 UTC

The reported umount issue when deploying gluster containers is no longer seen in the latest image: rhgs-server-docker-3.1.3-17.

Marking it as verified.

Comment 50 errata-xmlrpc 2017-01-18 14:59:59 UTC

Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0149

Comment 51 Niels de Vos 2019-02-05 10:26:00 UTC

*** Bug 1427823 has been marked as a duplicate of this bug. ***

Note You need to log in before you can comment on or make changes to this bug.