| Summary: | [rhgs docker image] Bind-mounting of gluster bricks does not seems to be successful always | ||||||||
|---|---|---|---|---|---|---|---|---|---|
| Product: | Red Hat Gluster Storage | Reporter: | Prasanth <pprakash> | ||||||
| Component: | rhgs-server-container | Assignee: | Mohamed Ashiq <mliyazud> | ||||||
| Status: | CLOSED CURRENTRELEASE | QA Contact: | Prasanth <pprakash> | ||||||
| Severity: | high | Docs Contact: | |||||||
| Priority: | medium | ||||||||
| Version: | rhgs-3.1 | CC: | asrivast, hchiramm, mliyazud, pprakash, rcyriac, sankarshan, ssampat | ||||||
| Target Milestone: | --- | Keywords: | TestBlocker, ZStream | ||||||
| Target Release: | RHGS 3.1.2 | ||||||||
| Hardware: | Unspecified | ||||||||
| OS: | Unspecified | ||||||||
| Whiteboard: | |||||||||
| Fixed In Version: | rhgs-server-docker-3.1.2-6 | Doc Type: | Bug Fix | ||||||
| Doc Text: |
Cause:
systemd issue.
https://bugzilla.redhat.com/show_bug.cgi?id=1285863
Consequence:
Bind mount from Atomic host to RHGS container did not seem to be successful all the time.
Fix:
systemd-sysv-219-19.el7_2.4.x86_64
systemd-219-19.el7_2.4.x86_64
systemd-libs-219-19.el7_2.4.x86_64
Result:
Bind Mount happens successfully in the container from Atomic Host successfully always
|
Story Points: | --- | ||||||
| Clone Of: | |||||||||
| : | 1294459 (view as bug list) | Environment: | |||||||
| Last Closed: | 2016-12-19 17:31:05 UTC | Type: | Bug | ||||||
| Regression: | --- | Mount Type: | --- | ||||||
| Documentation: | --- | CRM: | |||||||
| Verified Versions: | Category: | --- | |||||||
| oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||||
| Cloudforms Team: | --- | Target Upstream Version: | |||||||
| Bug Depends On: | 1294459 | ||||||||
| Bug Blocks: | |||||||||
| Attachments: |
|
||||||||
|
Description
Prasanth
2015-12-22 16:07:20 UTC
>> *) In “docker run” ( same as the command mentioned in Page 27 , however we need to add extra switch “-v /dev:/dev” ) command we also have to export “/dev/” tree as a volume to get snapshot working inside the RHGS containers.
If I don't do the above work-around, I could see that the bind-mounting of bricks is happening successfully everytime. However, if I try to create a snapshot without the above work-around, it fails:
######
[root@dhcp37-150 /]# gluster snapshot create snap1 dock-vol1 no-timestamp
snapshot create: failed: Commit failed on localhost. Please check log file for details.
Snapshot command failed
######
From the logs:
############
[2015-12-23 02:21:51.452399] I [MSGID: 106057] [glusterd-snapshot.c:6218:glusterd_do_snap_cleanup] 0-management: Snapshot (snap1) does not exist [Invalid argument]
[2015-12-23 05:02:17.425202] I [MSGID: 106057] [glusterd-snapshot.c:6218:glusterd_do_snap_cleanup] 0-management: Snapshot (snap1) does not exist [Invalid argument]
[2015-12-23 05:02:23.621146] E [MSGID: 106058] [glusterd-snapshot.c:5050:glusterd_update_fs_label] 0-management: Failed to change filesystem label of /run/gluster/snaps/70ce8446fac8462199ea46fe002231b5/brick2/brick1 brick to 9152e4b2895a
[2015-12-23 05:02:23.621185] E [MSGID: 106058] [glusterd-snapshot.c:5110:glusterd_take_brick_snapshot] 0-management: Failed to update file-system label for /run/gluster/snaps/70ce8446fac8462199ea46fe002231b5/brick2/brick1 brick
[2015-12-23 05:02:23.639089] E [MSGID: 106098] [glusterd-snapshot-utils.c:2710:glusterd_mount_lvm_snapshot] 0-management: mounting the snapshot logical device /dev/RHGS_VG1/70ce8446fac8462199ea46fe002231b5_0 failed (error: Bad file descriptor)
[2015-12-23 05:02:23.639137] E [MSGID: 106059] [glusterd-snapshot.c:4767:glusterd_snap_brick_create] 0-management: Failed to mount lvm snapshot.
[2015-12-23 05:02:23.639146] W [MSGID: 106055] [glusterd-snapshot.c:4798:glusterd_snap_brick_create] 0-management: unmounting the snap brick mount /run/gluster/snaps/70ce8446fac8462199ea46fe002231b5/brick2
[2015-12-23 05:02:23.647179] E [MSGID: 106095] [glusterd-snapshot-utils.c:3362:glusterd_umount] 0-management: umounting /run/gluster/snaps/70ce8446fac8462199ea46fe002231b5/brick2 failed (Bad file descriptor) [Bad file descriptor]
[2015-12-23 05:02:23.647225] E [MSGID: 106050] [glusterd-snapshot.c:5124:glusterd_take_brick_snapshot] 0-management: not able to create the brick for the snap snap1, volume 70ce8446fac8462199ea46fe002231b5
[2015-12-23 05:02:23.647235] E [MSGID: 106030] [glusterd-snapshot.c:6322:glusterd_take_brick_snapshot_task] 0-management: Failed to take backend snapshot for brick 10.70.37.114:/run/gluster/snaps/70ce8446fac8462199ea46fe002231b5/brick2/brick1 volume(70ce8446fac8462199ea46fe002231b5)
[2015-12-23 05:02:23.647293] E [MSGID: 106030] [glusterd-snapshot.c:6464:glusterd_schedule_brick_snapshot] 0-management: Failed to create snapshot
[2015-12-23 05:02:23.647306] E [MSGID: 106030] [glusterd-snapshot.c:6790:glusterd_snapshot_create_commit] 0-management: Failed to take backend snapshot snap1
[2015-12-23 05:02:23.648241] E [MSGID: 106030] [glusterd-snapshot.c:8132:glusterd_snapshot] 0-management: Failed to create snapshot
[2015-12-23 05:02:23.648259] W [MSGID: 106123] [glusterd-mgmt.c:272:gd_mgmt_v3_commit_fn] 0-management: Snapshot Commit Failed
[2015-12-23 05:02:23.648270] E [MSGID: 106123] [glusterd-mgmt.c:1414:glusterd_mgmt_v3_commit] 0-management: Commit failed for operation Snapshot on local node
[2015-12-23 05:02:23.648278] E [MSGID: 106123] [glusterd-mgmt.c:2285:glusterd_mgmt_v3_initiate_snap_phases] 0-management: Commit Op Failed
[2015-12-23 05:02:24.201886] I [MSGID: 106057] [glusterd-snapshot.c:6218:glusterd_do_snap_cleanup] 0-management: Snapshot (snap1) does not exist [Invalid argument]
############
(In reply to Prasanth from comment #2) We need to export /dev ( -v /dev:/dev) to the container, because at time of snapshot creation, it creates an LV which need the "/dev/" access. so, lets move aside the bz comment#2. Coming back to the bugzilla report, this looks like an issue pop up in an inconsistent way in *some* systems. Is that correct? Whats the frequency of this issue? Because, we tried to reproduce this issue in couple of systems without luck. @Ashiq can you please share the system details here ? Also, as an isolation step, would it be possible for you to run couple of containers (systemd based) in your setup (by exporting a thin volume as mount points and by exporting /dev tree) and find the result ? it doesnt looks like an issue ( if there is one) with RHGS containers, rather a race condition comes in some combination of docker bits. @Prasanth, apart from the isolation steps mentioned in comment#3, can you please reproduce this issue by enabling debug option of docker ? Meanwhile we will try to reproduce this issue in few more systems. (In reply to Humble Chirammal from comment #3) > (In reply to Prasanth from comment #2) > > We need to export /dev ( -v /dev:/dev) to the container, because at time of > snapshot creation, it creates an LV which need the "/dev/" access. so, lets > move aside the bz comment#2. The above isolation was just to figure out and confirm that the race condition which we are hitting very often has something related to /dev exporting. But since we really need to bind-mount /dev for Snapshot to work, we need to figure out a way to make it work somehow or find a temporary work-around. > > Coming back to the bugzilla report, this looks like an issue pop up in an > inconsistent way in *some* systems. Is that correct? Whats the frequency of > this issue? Because, we tried to reproduce this issue in couple of systems > without luck. You are correct. I've been seeing this issue in atleast 2-3 machines always in all the 4 node setups that I've created so far. It was *never* successful in bind-mounting all the bricks in *all* the 4 nodes on any of the setup's. But I wouldn't say it's 100% reproducible as it works in some machines either in the first attempt or after multiple retries. > > @Ashiq can you please share the system details here ? In Ashiq's setup, I could see only 1 brick available. Please try to have more than one brick per node and try to setup a 4 node cluster as the probability of hitting this bug is more in that case and moreover that is when i start seeing this issue! > > Also, as an isolation step, would it be possible for you to run couple of > containers (systemd based) in your setup (by exporting a thin volume as > mount points and by exporting /dev tree) and find the result ? it doesnt > looks like an issue ( if there is one) with RHGS containers, rather a race > condition comes in some combination of docker bits. I can try this as well and update the BZ with the results. Created attachment 1113850 [details]
Image working details
(In reply to Mohamed Ashiq from comment #17) > The fix is available in > > docker pull docker-registry.usersys.redhat.com/gluster/rhgs-3.1.2:5 Bind-mounting of gluster bricks seems to be working fine with the latest image. However, I'm trying the same in multiple setups to re-confirm the same before I move this BZ to VERIFIED. Verified as fixed in rhgs-server-docker-3.1.2-6 Bind-mounting of gluster bricks seems to be working always as expected in all the setups that I tried. ########## -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah_dhcp42--208-root 3.0G 940M 2.1G 31% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 420K 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 297M 87M 211M 30% /boot tmpfs 783M 0 783M 0% /run/user/0 /dev/mapper/RHGS_VG1-vol1 10G 33M 10G 1% /var/mount/brick1 /dev/mapper/RHGS_VG2-vol2 10G 33M 10G 1% /var/mount/brick2 /dev/mapper/RHGS_VG3-vol3 10G 33M 10G 1% /var/mount/brick3 /dev/mapper/RHGS_VG4-vol4 10G 33M 10G 1% /var/mount/brick4 -bash-4.2# sudo docker -D run -d --privileged=true --net=host --name snapnode1 -v /etc/glusterfs/:/etc/glusterfs/ -v /var/lib/glusterd/:/var/lib/glusterd/ -v /var/log/glusterfs/:/var/log/glusterfs/ -v /var/mount/brick1:/b1 -v /var/mount/brick2:/b2 -v /var/mount/brick3:/b3 -v /var/mount/brick4:/b4 -v /sys/fs/cgroup:/sys/fs/cgroup:ro -v /dev:/dev rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-6 2ff93a40d54c018b48e3d745ef81285b3d115e472d4b20d398122bb986ef969e DEBU[0000] End of CmdRun(), Waiting for hijack to finish. -bash-4.2# docker ps CONTAINER ID IMAGE COMMAND CREATED STATUS PORTS NAMES 2ff93a40d54c rcm-img-docker01.build.eng.bos.redhat.com:5001/rhgs3/rhgs-server-rhel7:3.1.2-6 "/usr/sbin/init" 4 seconds ago Up 3 seconds snapnode1 -bash-4.2# df -h Filesystem Size Used Avail Use% Mounted on /dev/mapper/rhelah_dhcp42--208-root 3.0G 940M 2.1G 31% / devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm tmpfs 3.9G 432K 3.9G 1% /run tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup /dev/sda1 297M 87M 211M 30% /boot tmpfs 783M 0 783M 0% /run/user/0 -bash-4.2# docker exec -ti 2ff93a40d54c /bin/bash [root@dhcp42-208 /]# df -h Filesystem Size Used Avail Use% Mounted on /dev/dm-25 100G 293M 100G 1% / /dev/mapper/RHGS_VG4-vol4 10G 33M 10G 1% /b4 /dev/mapper/RHGS_VG2-vol2 10G 33M 10G 1% /b2 devtmpfs 3.9G 0 3.9G 0% /dev tmpfs 3.9G 0 3.9G 0% /dev/shm /dev/mapper/RHGS_VG1-vol1 10G 33M 10G 1% /b1 /dev/mapper/RHGS_VG3-vol3 10G 33M 10G 1% /b3 /dev/mapper/rhelah_dhcp42--208-root 3.0G 979M 2.1G 32% /etc/hosts tmpfs 3.9G 0 3.9G 0% /sys/fs/cgroup tmpfs 3.9G 8.4M 3.9G 1% /run [root@dhcp42-208 /]# mount |grep RHGS /dev/mapper/RHGS_VG4-vol4 on /b4 type xfs (rw,relatime,seclabel,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota) /dev/mapper/RHGS_VG2-vol2 on /b2 type xfs (rw,relatime,seclabel,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota) /dev/mapper/RHGS_VG1-vol1 on /b1 type xfs (rw,relatime,seclabel,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota) /dev/mapper/RHGS_VG3-vol3 on /b3 type xfs (rw,relatime,seclabel,attr2,inode64,logbsize=64k,sunit=128,swidth=128,noquota) [root@dhcp42-208 /]# cat /etc/redhat-storage-release Red Hat Gluster Storage Server 3.1 Update 2 ( Container) [root@dhcp42-208 /]# rpm -qa |grep glusterfs glusterfs-libs-3.7.5-16.el7rhgs.x86_64 glusterfs-client-xlators-3.7.5-16.el7rhgs.x86_64 glusterfs-fuse-3.7.5-16.el7rhgs.x86_64 glusterfs-geo-replication-3.7.5-16.el7rhgs.x86_64 glusterfs-3.7.5-16.el7rhgs.x86_64 glusterfs-api-3.7.5-16.el7rhgs.x86_64 glusterfs-cli-3.7.5-16.el7rhgs.x86_64 glusterfs-server-3.7.5-16.el7rhgs.x86_64 ######### |