Bug 1294776 - LVs for bricks get unmounted from atomic host automatically on starting RHGS container
LVs for bricks get unmounted from atomic host automatically on starting RHGS ...
Status: CLOSED ERRATA
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: rhgs-server-container (Show other bugs)
3.1
Unspecified Unspecified
unspecified Severity high
: ---
: CNS 3.4
Assigned To: Mohamed Ashiq
Anoop
: ZStream
Depends On:
Blocks: 1268895 1385246
  Show dependency treegraph
 
Reported: 2015-12-30 04:47 EST by Shruti Sampat
Modified: 2017-01-18 09:58 EST (History)
10 users (show)

See Also:
Fixed In Version: rhgs-server-docker-3.1.3-17
Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of:
Environment:
Last Closed: 2017-01-18 09:58:59 EST
Type: Bug
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Category: ---
oVirt Team: ---
RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: ---


Attachments (Terms of Use)
Logs from /var/log/dmesg* (80.00 KB, application/x-tar)
2015-12-30 04:55 EST, Shruti Sampat
no flags Details
Image working details (4.06 KB, text/plain)
2016-01-11 23:46 EST, Mohamed Ashiq
no flags Details

  None (edit)
Description Shruti Sampat 2015-12-30 04:47:14 EST
Description of problem:
-----------------------

LVs on atomic hosts are formatted with xfs and mounted on /mnt/brick* to be bind-mounted on RHGS containers and used as bricks. On starting the container, the LVs get automatically unmounted on the atomic host. 

Version-Release number of selected component (if applicable):
-------------------------------------------------------------
rhgs-server-rhel7:3.1.2-3

How reproducible:
-----------------
Frequently

Steps to Reproduce:
-------------------
1. Install RHEL Atomic Host 7.2 and pull the rhgs container image.
2. The necessary directories for glusterfs are created (/etc/glusterfs, /var/lib/glusterd, /var/log/glusterfs). The LVs for use as glusterfs bricks are also prepared and mounted on /mnt/brick*
3. Load dm_snapshot kernel module.
4. Start the container -

# docker run --privileged=True --net=host -d --name gnode -v /etc/glusterfs:/etc/glusterfs -v /var/lib/glusterd:/var/lib/glusterd -v /var/log/glusterfs:/var/log/glusterfs -v /mnt/brick1:/bricks/brick0 -v /mnt/brick2:/bricks/brick1 -v /mnt/brick3:/bricks/brick2 -v /mnt/brick4:/bricks/brick3 -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /dev:/dev -it rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-3 /sbin/init

Actual results:
----------------
On starting the container, the LVs mounted to be used as bricks by the RHGS container get automatically unmounted from the atomic hosts. 

Expected results:
-----------------
LVs on the atomic host should not get unmounted as a result of starting the container.
Comment 1 Shruti Sampat 2015-12-30 04:55 EST
Created attachment 1110468 [details]
Logs from /var/log/dmesg*
Comment 2 Humble Chirammal 2015-12-31 04:28:02 EST
Are you facing this issue when: 

*) you export only one brick to the container  ? 
*) When you run this image http://docker-registry.usersys.redhat.com/#q=gluster/rhgs-3.1.0-3 instead of RCM built image ?

I hope above will help us to isolate this issue.
Comment 5 Prasanth 2016-01-08 05:02:39 EST
This issue is seen in the latest image even when /dev is NOT exported to the container. See below:

######
-bash-4.2# cat /etc/redhat-release 
Red Hat Enterprise Linux Atomic Host release 7.2

-bash-4.2# docker images
REPOSITORY                                                               TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7   latest              9e81beb5deac        2 days ago          255.3 MB

-bash-4.2# df -h
Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/rhelah_dhcp37--54-root  3.0G  933M  2.1G  31% /
devtmpfs                            3.9G     0  3.9G   0% /dev
tmpfs                               3.9G     0  3.9G   0% /dev/shm
tmpfs                               3.9G  8.5M  3.9G   1% /run
tmpfs                               3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1                           297M   87M  211M  30% /boot
tmpfs                               783M     0  783M   0% /run/user/0
/dev/mapper/RHGS_VG1-vol1            10G   33M   10G   1% /var/mnt/brick1
/dev/mapper/RHGS_VG2-vol2            10G   33M   10G   1% /var/mnt/brick2

-bash-4.2# docker -D run -d --privileged=true --net=host --name snapnode3 -v /etc/glusterfs/:/etc/glusterfs/ -v /var/lib/glusterd/:/var/lib/glusterd/ -v /var/log/glusterfs/:/var/log/glusterfs/ -v /mnt/brick1:/b1 -v /mnt/brick2:/b2 -v /sys/fs/cgroup:/sys/fs/cgroup:ro rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7
fe0794f6d068a4045572e3a991ea1bdb10b951ede935d994f6edaa462cda4066
DEBU[0000] End of CmdRun(), Waiting for hijack to finish. 

-bash-4.2# docker ps
CONTAINER ID        IMAGE                                                                    COMMAND             CREATED             STATUS              PORTS               NAMES
fe0794f6d068        rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7   "/usr/sbin/init"    2 seconds ago       Up 1 seconds                            snapnode3

-bash-4.2# df -h
Filesystem                          Size  Used Avail Use% Mounted on
/dev/mapper/rhelah_dhcp37--54-root  3.0G  981M  2.1G  33% /
devtmpfs                            3.9G     0  3.9G   0% /dev
tmpfs                               3.9G     0  3.9G   0% /dev/shm
tmpfs                               3.9G  8.5M  3.9G   1% /run
tmpfs                               3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1                           297M   78M  219M  27% /boot
tmpfs                               783M     0  783M   0% /run/user/0

-bash-4.2#  docker exec -ti fe0794f6d068 /bin/bash
bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory
bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory

[root@dhcp37-54 /]# df -h
Filesystem                                                                                          Size  Used Avail Use% Mounted on
/dev/mapper/docker-253:0-12592722-fe0794f6d068a4045572e3a991ea1bdb10b951ede935d994f6edaa462cda4066  100G  292M  100G   1% /
tmpfs                                                                                               3.9G     0  3.9G   0% /dev
shm                                                                                                  64M     0   64M   0% /dev/shm
/dev/mapper/RHGS_VG2-vol2                                                                            10G   33M   10G   1% /b2
/dev/mapper/RHGS_VG1-vol1                                                                            10G   33M   10G   1% /b1
/dev/mapper/rhelah_dhcp37--54-root                                                                  3.0G  933M  2.1G  31% /etc/hosts
tmpfs                                                                                               3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                                                                                               3.9G  8.3M  3.9G   1% /run
######

So any data written on the mount points inside the container (b1 , b2) is not available or visible in the bricks (/mnt/brick1, /mnt/brick2) mounted inside the atomic host. The user has to manually mount the LV's again to get the data.

Is that the expected behaviour which has to be documented or is it really a bug in the recent version of systemd+docker builds in 7.2?

Also let me know if you think that we need to track this issue separately as it happens even without /dev exported.
Comment 6 Mohamed Ashiq 2016-01-11 10:18:59 EST
I am planning to try the systemd fix mentioned @ https://bugzilla.redhat.com/show_bug.cgi?id=1294459#c7 . I will get back soon with an update
Comment 7 Mohamed Ashiq 2016-01-11 14:07:44 EST
As mentioned in previous comment, I applied the fix[1] and It worked fine for me. I will build an image soon with the fix and wait for QE to verify the same.

[1] https://bugzilla.redhat.com/show_bug.cgi?id=1285863#c2
Comment 8 Mohamed Ashiq 2016-01-11 23:46 EST
Created attachment 1113851 [details]
Image working details
Comment 13 Shruti Sampat 2016-01-21 08:02:20 EST
Tested with rhgs-server-rhel7:3.1.2-6 image. This issue is still reproducible.

I started the container and the LVs that were mounted on the atomic host to be used as bricks got unmounted automatically. Moving BZ to ASSIGNED.
Comment 14 Mohamed Ashiq 2016-01-21 08:55:53 EST
(In reply to Shruti Sampat from comment #13)
> Tested with rhgs-server-rhel7:3.1.2-6 image. This issue is still
> reproducible.
> 
> I started the container and the LVs that were mounted on the atomic host to
> be used as bricks got unmounted automatically. Moving BZ to ASSIGNED.

In c#10, you mentioned everything is working perfectly with image "docker pull docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:4 " even when you tried multiple iterations, however when using new image the issue still exist. Can you please confirm you are not facing any issues with rhgs-3.1.2:4 image in the problematic setup ?  Also, how frequent you are hitting this issue with the new image ? Are all the mount points unmounted from the atomic host ? can you provide the timestamp and log of this issue ? Also, when this issue occurs are the bricks ( which got unmounted from atomic host )  mounted inside the container ? what happens if you try to stop the container ?
Comment 16 Shruti Sampat 2016-01-22 04:30:12 EST
(In reply to Mohamed Ashiq from comment #14)
> (In reply to Shruti Sampat from comment #13)
> > Tested with rhgs-server-rhel7:3.1.2-6 image. This issue is still
> > reproducible.
> > 
> > I started the container and the LVs that were mounted on the atomic host to
> > be used as bricks got unmounted automatically. Moving BZ to ASSIGNED.
> 
> In c#10, you mentioned everything is working perfectly with image "docker
> pull docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:4 " even when you
> tried multiple iterations, however when using new image the issue still
> exist. Can you please confirm you are not facing any issues with
> rhgs-3.1.2:4 image in the problematic setup ?

Tried that. I am not facing any issues with docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:4

>  Also, how frequent you are
> hitting this issue with the new image ?

I tried twice and saw it both times.

> Are all the mount points unmounted
> from the atomic host ?

Yes, all LVs were unmounted except for the root LV. However, the LVs bind-mounted on the container were successfully mounted in the container.

> can you provide the timestamp and log of this issue ?

Can you tell me what logs you are looking for?

> Also, when this issue occurs are the bricks ( which got unmounted from
> atomic host )  mounted inside the container ? what happens if you try to
> stop the container ?

The `docker stop' command is now hung for more than 12 hours. After interrupting that and restarting docker service, I see that the container is stopped.
Comment 17 Prasanth 2016-02-16 08:48:52 EST
I'm still seeing this issue in a couple of machines even with the latest RHELAH 7.2.2, rhgs-server-docker-3.1.2-7 builds.

######
-bash-4.2# df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/rhelah_dhcp42--184-root  3.0G  1.8G  1.3G  59% /
devtmpfs                             3.9G     0  3.9G   0% /dev
tmpfs                                3.9G     0  3.9G   0% /dev/shm
tmpfs                                3.9G  464K  3.9G   1% /run
tmpfs                                3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/mapper/RHGS_VG4-vol4             10G   33M   10G   1% /var/mount/brick4
/dev/mapper/RHGS_VG1-vol1             10G   33M   10G   1% /var/mount/brick1
/dev/mapper/RHGS_VG3-vol3             10G   33M   10G   1% /var/mount/brick3
/dev/mapper/RHGS_VG2-vol2             10G   33M   10G   1% /var/mount/brick2
/dev/sda1                            297M  144M  154M  49% /boot
tmpfs                                783M     0  783M   0% /run/user/0


-bash-4.2# docker images
REPOSITORY                                                               TAG                 IMAGE ID            CREATED             VIRTUAL SIZE
rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7   3.1.2-7             66e18f43649a        5 days ago          257.2 MB


-bash-4.2# rpm-ostree status
  TIMESTAMP (UTC)         VERSION   ID             OSNAME               REFSPEC                                                        
* 2016-02-12 16:43:35     7.2.2     a903629278     rhel-atomic-host     rhel7.2.2:rhel-atomic-host/7/x86_64/standard                   
  2015-12-03 19:40:36     7.2.1     aaf67b91fa     rhel-atomic-host     rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard  


-bash-4.2# sudo docker -D run -d --privileged=true --net=host --name node4 -v /etc/glusterfs/:/etc/glusterfs/ -v /var/lib/glusterd/:/var/lib/glusterd/ -v /var/log/glusterfs/:/var/log/glusterfs/ -v /var/mount/brick1:/rhgs/b1 -v /var/mount/brick2:/rhgs/b2 -v /var/mount/brick3:/rhgs/b3 -v /var/mount/brick4:/rhgs/b4 -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /dev:/dev rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-7
7a4d51e6a90843ac6e86d7a8f951a7e8c52aa054342fe94fe917096b8e47eeab
DEBU[0015] End of CmdRun(), Waiting for hijack to finish. 


-bash-4.2# docker ps
CONTAINER ID        IMAGE                                                                            COMMAND             CREATED              STATUS              PORTS               NAMES
7a4d51e6a908        rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-7   "/usr/sbin/init"    About a minute ago   Up 53 seconds                           node4


-bash-4.2# df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/mapper/rhelah_dhcp42--184-root  3.0G  1.8G  1.3G  59% /
devtmpfs                             3.9G     0  3.9G   0% /dev
tmpfs                                3.9G     0  3.9G   0% /dev/shm
tmpfs                                3.9G  504K  3.9G   1% /run
tmpfs                                3.9G     0  3.9G   0% /sys/fs/cgroup
/dev/sda1                            297M  144M  154M  49% /boot
tmpfs                                783M     0  783M   0% /run/user/0
-bash-4.2# 
-bash-4.2# 
-bash-4.2# docker exec -ti 7a4d51e6a908 /bin/bash


[root@dhcp42-184 /]# df -h
Filesystem                           Size  Used Avail Use% Mounted on
/dev/dm-25                           100G  294M  100G   1% /
devtmpfs                             3.9G     0  3.9G   0% /dev
tmpfs                                3.9G     0  3.9G   0% /dev/shm
/dev/mapper/RHGS_VG2-vol2             10G   33M   10G   1% /rhgs/b2
/dev/mapper/RHGS_VG4-vol4             10G   33M   10G   1% /rhgs/b4
/dev/mapper/RHGS_VG3-vol3             10G   33M   10G   1% /rhgs/b3
/dev/mapper/rhelah_dhcp42--184-root  3.0G  1.8G  1.3G  59% /etc/hosts
/dev/mapper/RHGS_VG1-vol1             10G   33M   10G   1% /rhgs/b1
tmpfs                                3.9G     0  3.9G   0% /sys/fs/cgroup
tmpfs                                3.9G  8.4M  3.9G   1% /run
########

Can you please check and and confirm if the fix for the reported issue is actually available in the provided RHEL Atomic Host 7.2.2 OStree compose or not?
: http://download.eng.bos.redhat.com/rel-eng/Atomic-7.2-tree-20160212.0/ostree/repo/
Comment 19 Prasanth 2016-02-17 00:20:05 EST
Moving back to Assigned as this issue is seen in multiple setups.
Comment 20 Lukáš Nykrýn 2016-02-17 02:17:54 EST
Can you boot the machine (the host), with the word "debug" on the kernel cmdline, reproduce the issue and post here the whole output of journalctl -b?
Comment 23 Prasanth 2016-02-17 03:54:04 EST
(In reply to Lukáš Nykrýn from comment #20)
> Can you boot the machine (the host), with the word "debug" on the kernel
> cmdline, reproduce the issue and post here the whole output of journalctl -b?

Ok, i'll do the same as suggested and get back to you with the results once i reproduce the issue.
Comment 24 Mohamed Ashiq 2016-02-17 06:29:09 EST
Issue:

When We start the container with few brick mounts(LV mount for bricks) in Atomic Host as bind mount. It sometimes automatically umounts in atomic host, Which will cause problem on re-spawning the container 

Workaround:

After starting RHGS container, check if the brick's mount still exists in the Atomic Host. If the mounts are not found, remount the mount points on Atomic Host.
Comment 25 Lukáš Nykrýn 2016-02-17 07:11:22 EST
also journalctl -m might be useful.
Comment 32 Mohamed Ashiq 2016-02-19 02:26:09 EST
Thanks Laura, looks good.
Comment 41 krishnaram Karthick 2016-12-29 07:22:37 EST
The issue reported in this bug is no more seen anymore with the latest gluster container image.

-bash-4.2# atomic --version
1.13.8

-bash-4.2# docker images
REPOSITORY                                                                     TAG                 IMAGE ID            CREATED             SIZE
brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7   latest              1ada64c346f7        2 weeks ago         245.8 MB

-bash-4.2# docker ps
CONTAINER ID        IMAGE                                                                          COMMAND             CREATED             STATUS              PORTS               NAMES
a4ac7bcd35c3        brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7   "/usr/sbin/init"    8 minutes ago       Up 8 minutes                            glusternode1
-bash-4.2# df -h
Filesystem                      Size  Used Avail Use% Mounted on
/dev/mapper/rhelah-root         3.0G  1.3G  1.8G  42% /
devtmpfs                        7.8G     0  7.8G   0% /dev
tmpfs                           7.8G     0  7.8G   0% /dev/shm
tmpfs                           7.8G  596K  7.8G   1% /run
tmpfs                           7.8G     0  7.8G   0% /sys/fs/cgroup
/dev/sda1                       297M   92M  206M  31% /boot
tmpfs                           1.6G     0  1.6G   0% /run/user/0
/dev/mapper/RHS_vg0-RHS_vg0_lv  9.0G   33M  9.0G   1% /var/mnt/brick1
-bash-4.2# docker exec -it a4ac7bcd35c3 /bin/bash
[root@dhcp47-130 /]# df -h
Filesystem                                                                                         Size  Used Avail Use% Mounted on
/dev/mapper/docker-253:0-5137702-fc15fd45ae050d764a92ed0d07dc7fc4548f889cb70e1a97191ffa99ea73b2da   10G  300M  9.7G   3% /
tmpfs                                                                                              7.8G     0  7.8G   0% /dev
/dev/mapper/rhelah-root                                                                            3.0G  1.3G  1.8G  42% /run
/dev/mapper/RHS_vg0-RHS_vg0_lv                                                                     9.0G   33M  9.0G   1% /mnt/container_brick1
shm                                                                                                 64M     0   64M   0% /dev/shm
tmpfs                                                                                              7.8G     0  7.8G   0% /sys/fs/cgroup
tmpfs                                                                                              4.0E     0  4.0E   0% /tmp


Able to configure containerized gluster clusters, create & mount the volume and run IOs.

Moving the bug to verified based on the above result.
Comment 42 krishnaram Karthick 2017-01-02 00:58:58 EST
Removing the needinfo flag based on comment 41.
Comment 44 errata-xmlrpc 2017-01-18 09:58:59 EST
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://access.redhat.com/errata/RHEA-2017:0149

Note You need to log in before you can comment on or make changes to this bug.