Bug 1294776
| Summary: | LVs for bricks get unmounted from atomic host automatically on starting RHGS container | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Shruti Sampat <ssampat> | ||||||
| Component: | rhgs-server-container | Assignee: | Mohamed Ashiq <mliyazud> | ||||||
| Status: | CLOSED ERRATA | QA Contact: | Anoop <annair> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | unspecified | ||||||||
| Version: | rhgs-3.1 | CC: | annair, hchiramm, kramdoss, lnykryn, madam, mliyazud, pprakash, rcyriac, sankarshan, ssampat | ||||||
| Target Milestone: | --- | Keywords: | ZStream | ||||||
| Target Release: | CNS 3.4 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | rhgs-server-docker-3.1.3-17 | Doc Type: | Bug Fix | ||||||
| Doc Text: | Story Points: | --- | |||||||
| Clone Of: | Environment: | ||||||||
| Last Closed: | 2017-01-18 14:58:59 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Embargoed: | |||||||||
| Bug Depends On: | |||||||||
| Bug Blocks: | 1268895, 1385246 | ||||||||
| Attachments: |
|
||||||||
|
Description
Shruti Sampat
2015-12-30 09:47:14 UTC
Created attachment 1110468 [details]
Logs from /var/log/dmesg*
Are you facing this issue when: *) you export only one brick to the container ? *) When you run this image http://docker-registry.usersys.redhat.com/#q=gluster/rhgs-3.1.0-3 instead of RCM built image ? I hope above will help us to isolate this issue. This issue is seen in the latest image even when /dev is NOT exported to the container. See below: ###### -bash-4.2# cat /etc/redhat-release Red Hat Enterprise Linux Atomic Host release 7.2 -bash-4.2# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7 latest 9e81beb5deac 2 days ago 255.3 MB -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah_dhcp37--54-root 3.0G 933M 2.1G 31% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 8.5M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 297M 87M 211M 30% /boot tmpfs 783M 0 783M 0% /run/user/0 /dev/mapper/RHGS_VG1-vol1 10G 33M 10G 1% /var/mnt/brick1 /dev/mapper/RHGS_VG2-vol2 10G 33M 10G 1% /var/mnt/brick2 -bash-4.2# docker -D run -d --privileged=true --net=host --name snapnode3 -v /etc/glusterfs/:/etc/glusterfs/ -v /var/lib/glusterd/:/var/lib/glusterd/ -v /var/log/glusterfs/:/var/log/glusterfs/ -v /mnt/brick1:/b1 -v /mnt/brick2:/b2 -v /sys/fs/cgroup:/sys/fs/cgroup:ro rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7 fe0794f6d068a4045572e3a991ea1bdb10b951ede935d994f6edaa462cda4066 DEBU[0000] End of CmdRun(), Waiting for hijack to finish. -bash-4.2# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES fe0794f6d068 rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7 "/usr/sbin/init" 2 seconds ago Up 1 seconds snapnode3 -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah_dhcp37--54-root 3.0G 981M 2.1G 33% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 8.5M 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 297M 78M 219M 27% /boot tmpfs 783M 0 783M 0% /run/user/0 -bash-4.2# docker exec -ti fe0794f6d068 /bin/bash bash: warning: setlocale: LC_CTYPE: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_COLLATE: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_MESSAGES: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_NUMERIC: cannot change locale (en_US.UTF-8): No such file or directory bash: warning: setlocale: LC_TIME: cannot change locale (en_US.UTF-8): No such file or directory [root@dhcp37-54 /]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/docker-253:0-12592722-fe0794f6d068a4045572e3a991ea1bdb10b951ede935d994f6edaa462cda4066 100G 292M 100G 1% / tmpfs 3.9G 0 3.9G 0% /dev shm 64M 0 64M 0% /dev/shm /dev/mapper/RHGS_VG2-vol2 10G 33M 10G 1% /b2 /dev/mapper/RHGS_VG1-vol1 10G 33M 10G 1% /b1 /dev/mapper/rhelah_dhcp37--54-root 3.0G 933M 2.1G 31% /etc/hosts tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs 3.9G 8.3M 3.9G 1% /run ###### So any data written on the mount points inside the container (b1 , b2) is not available or visible in the bricks (/mnt/brick1, /mnt/brick2) mounted inside the atomic host. The user has to manually mount the LV's again to get the data. Is that the expected behaviour which has to be documented or is it really a bug in the recent version of systemd+docker builds in 7.2? Also let me know if you think that we need to track this issue separately as it happens even without /dev exported. I am planning to try the systemd fix mentioned @ https://bugzilla.redhat.com/show_bug.cgi?id=1294459#c7 . I will get back soon with an update As mentioned in previous comment, I applied the fix[1] and It worked fine for me. I will build an image soon with the fix and wait for QE to verify the same. [1] https://bugzilla.redhat.com/show_bug.cgi?id=1285863#c2 Created attachment 1113851 [details]
Image working details
Tested with rhgs-server-rhel7:3.1.2-6 image. This issue is still reproducible. I started the container and the LVs that were mounted on the atomic host to be used as bricks got unmounted automatically. Moving BZ to ASSIGNED. (In reply to Shruti Sampat from comment #13) > Tested with rhgs-server-rhel7:3.1.2-6 image. This issue is still > reproducible. > > I started the container and the LVs that were mounted on the atomic host to > be used as bricks got unmounted automatically. Moving BZ to ASSIGNED. In c#10, you mentioned everything is working perfectly with image "docker pull docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:4 " even when you tried multiple iterations, however when using new image the issue still exist. Can you please confirm you are not facing any issues with rhgs-3.1.2:4 image in the problematic setup ? Also, how frequent you are hitting this issue with the new image ? Are all the mount points unmounted from the atomic host ? can you provide the timestamp and log of this issue ? Also, when this issue occurs are the bricks ( which got unmounted from atomic host ) mounted inside the container ? what happens if you try to stop the container ? (In reply to Mohamed Ashiq from comment #14) > (In reply to Shruti Sampat from comment #13) > > Tested with rhgs-server-rhel7:3.1.2-6 image. This issue is still > > reproducible. > > > > I started the container and the LVs that were mounted on the atomic host to > > be used as bricks got unmounted automatically. Moving BZ to ASSIGNED. > > In c#10, you mentioned everything is working perfectly with image "docker > pull docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:4 " even when you > tried multiple iterations, however when using new image the issue still > exist. Can you please confirm you are not facing any issues with > rhgs-3.1.2:4 image in the problematic setup ? Tried that. I am not facing any issues with docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:4 > Also, how frequent you are > hitting this issue with the new image ? I tried twice and saw it both times. > Are all the mount points unmounted > from the atomic host ? Yes, all LVs were unmounted except for the root LV. However, the LVs bind-mounted on the container were successfully mounted in the container. > can you provide the timestamp and log of this issue ? Can you tell me what logs you are looking for? > Also, when this issue occurs are the bricks ( which got unmounted from > atomic host ) mounted inside the container ? what happens if you try to > stop the container ? The `docker stop' command is now hung for more than 12 hours. After interrupting that and restarting docker service, I see that the container is stopped. I'm still seeing this issue in a couple of machines even with the latest RHELAH 7.2.2, rhgs-server-docker-3.1.2-7 builds. ###### -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah_dhcp42--184-root 3.0G 1.8G 1.3G 59% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 464K 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/mapper/RHGS_VG4-vol4 10G 33M 10G 1% /var/mount/brick4 /dev/mapper/RHGS_VG1-vol1 10G 33M 10G 1% /var/mount/brick1 /dev/mapper/RHGS_VG3-vol3 10G 33M 10G 1% /var/mount/brick3 /dev/mapper/RHGS_VG2-vol2 10G 33M 10G 1% /var/mount/brick2 /dev/sda1 297M 144M 154M 49% /boot tmpfs 783M 0 783M 0% /run/user/0 -bash-4.2# docker images REPOSITORY TAG IMAGE ID CREATED VIRTUAL SIZE rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7 3.1.2-7 66e18f43649a 5 days ago 257.2 MB -bash-4.2# rpm-ostree status TIMESTAMP (UTC) VERSION ID OSNAME REFSPEC * 2016-02-12 16:43:35 7.2.2 a903629278 rhel-atomic-host rhel7.2.2:rhel-atomic-host/7/x86_64/standard 2015-12-03 19:40:36 7.2.1 aaf67b91fa rhel-atomic-host rhel-atomic-host-ostree:rhel-atomic-host/7/x86_64/standard -bash-4.2# sudo docker -D run -d --privileged=true --net=host --name node4 -v /etc/glusterfs/:/etc/glusterfs/ -v /var/lib/glusterd/:/var/lib/glusterd/ -v /var/log/glusterfs/:/var/log/glusterfs/ -v /var/mount/brick1:/rhgs/b1 -v /var/mount/brick2:/rhgs/b2 -v /var/mount/brick3:/rhgs/b3 -v /var/mount/brick4:/rhgs/b4 -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /dev:/dev rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-7 7a4d51e6a90843ac6e86d7a8f951a7e8c52aa054342fe94fe917096b8e47eeab DEBU[0015] End of CmdRun(), Waiting for hijack to finish. -bash-4.2# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 7a4d51e6a908 rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-7 "/usr/sbin/init" About a minute ago Up 53 seconds node4 -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah_dhcp42--184-root 3.0G 1.8G 1.3G 59% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 504K 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 297M 144M 154M 49% /boot tmpfs 783M 0 783M 0% /run/user/0 -bash-4.2# -bash-4.2# -bash-4.2# docker exec -ti 7a4d51e6a908 /bin/bash [root@dhcp42-184 /]# df -h Filesystem Size Used Avail Use% Mounted on /dev/dm-25 100G 294M 100G 1% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm /dev/mapper/RHGS_VG2-vol2 10G 33M 10G 1% /rhgs/b2 /dev/mapper/RHGS_VG4-vol4 10G 33M 10G 1% /rhgs/b4 /dev/mapper/RHGS_VG3-vol3 10G 33M 10G 1% /rhgs/b3 /dev/mapper/rhelah_dhcp42--184-root 3.0G 1.8G 1.3G 59% /etc/hosts /dev/mapper/RHGS_VG1-vol1 10G 33M 10G 1% /rhgs/b1 tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs 3.9G 8.4M 3.9G 1% /run ######## Can you please check and and confirm if the fix for the reported issue is actually available in the provided RHEL Atomic Host 7.2.2 OStree compose or not? : http://download.eng.bos.redhat.com/rel-eng/Atomic-7.2-tree-20160212.0/ostree/repo/ Moving back to Assigned as this issue is seen in multiple setups. Can you boot the machine (the host), with the word "debug" on the kernel cmdline, reproduce the issue and post here the whole output of journalctl -b? (In reply to Lukáš Nykrýn from comment #20) > Can you boot the machine (the host), with the word "debug" on the kernel > cmdline, reproduce the issue and post here the whole output of journalctl -b? Ok, i'll do the same as suggested and get back to you with the results once i reproduce the issue. Issue: When We start the container with few brick mounts(LV mount for bricks) in Atomic Host as bind mount. It sometimes automatically umounts in atomic host, Which will cause problem on re-spawning the container Workaround: After starting RHGS container, check if the brick's mount still exists in the Atomic Host. If the mounts are not found, remount the mount points on Atomic Host. also journalctl -m might be useful. Thanks Laura, looks good. The issue reported in this bug is no more seen anymore with the latest gluster container image. -bash-4.2# atomic --version 1.13.8 -bash-4.2# docker images REPOSITORY TAG IMAGE ID CREATED SIZE brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 latest 1ada64c346f7 2 weeks ago 245.8 MB -bash-4.2# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES a4ac7bcd35c3 brew-pulp-docker01.web.prod.ext.phx2.redhat.com:8888/rhgs3/rhgs-server-rhel7 "/usr/sbin/init" 8 minutes ago Up 8 minutes glusternode1 -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah-root 3.0G 1.3G 1.8G 42% / devtmpfs 7.8G 0 7.8G 0% /dev tmpfs 7.8G 0 7.8G 0% /dev/shm tmpfs 7.8G 596K 7.8G 1% /run tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup /dev/sda1 297M 92M 206M 31% /boot tmpfs 1.6G 0 1.6G 0% /run/user/0 /dev/mapper/RHS_vg0-RHS_vg0_lv 9.0G 33M 9.0G 1% /var/mnt/brick1 -bash-4.2# docker exec -it a4ac7bcd35c3 /bin/bash [root@dhcp47-130 /]# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/docker-253:0-5137702-fc15fd45ae050d764a92ed0d07dc7fc4548f889cb70e1a97191ffa99ea73b2da 10G 300M 9.7G 3% / tmpfs 7.8G 0 7.8G 0% /dev /dev/mapper/rhelah-root 3.0G 1.3G 1.8G 42% /run /dev/mapper/RHS_vg0-RHS_vg0_lv 9.0G 33M 9.0G 1% /mnt/container_brick1 shm 64M 0 64M 0% /dev/shm tmpfs 7.8G 0 7.8G 0% /sys/fs/cgroup tmpfs 4.0E 0 4.0E 0% /tmp Able to configure containerized gluster clusters, create & mount the volume and run IOs. Moving the bug to verified based on the above result. Removing the needinfo flag based on comment 41. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://access.redhat.com/errata/RHEA-2017:0149 |