Bug 761736 (GLUSTER-4)

Summary: mount --bind fails if run immediately after mounting GlusterFS
Product: [Community] GlusterFS Reporter: Vikas Gorur <vikas>
Component: coreAssignee: Vijay Bellur <vbellur>
Status: CLOSED CURRENTRELEASE QA Contact:
Severity: medium Docs Contact:
Priority: low    
Version: mainlineCC: gluster-bugs, gowda, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: All   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: Type: ---
Regression: RTP Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:

Description Vikas Gorur 2009-06-08 06:23:36 UTC
Reported by Gordan Bobic <gordan> on the mailing list:


Fast First Access Bug
=====================
To reproduce, use a script that mounts a glusterfs cluster/replicate
share from the local node with only the local node being up, and then
immediately tries to bind mount a subdirectory from that share into
another directory, e.g.

8K-----8K-----8K-----8K-----8K-----8K-----8K-----
#!/bin/bash
mount -t glusterfs \
-o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null\
,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot
mount --bind /mnt/newroot/cluster/cdsl/2 /mnt/newroot/cdsl.local
8K-----8K-----8K-----8K-----8K-----8K-----8K-----

The bind mount will reliably fail. I'm not sure if this makes any
difference WRT the amount of content in the directory being mounted, but
in case it does, path that root.vol points at should contain something
resembling a Linux root file system (i.e. not that many directories in
the root).

Here is the root.vol I'm using:
8K-----8K-----8K-----8K-----8K-----8K-----8K-----
volume root1
         type protocol/client
         option transport-type socket
         option address-family inet
         option remote-host 10.1.0.10
         option remote-subvolume root1
end-volume

volume root-store
         type storage/posix
         option directory /mnt/tmproot/gluster/root/x86_64
end-volume

volume root2
         type features/posix-locks
         subvolumes root-store
end-volume

volume server
         type protocol/server
         option transport-type socket
         option address-family inet
         subvolumes root2
         option auth.addr.root2.allow 127.0.0.1,10.*
end-volume

volume root
         type cluster/replicate
         subvolumes root1 root2
         option read-subvolume root2
end-volume
8K-----8K-----8K-----8K-----8K-----8K-----8K-----

Note that 10.1.0.10 node isn't up, only the local node is up. I haven't
tested with the 2nd node up since I haven't built the 2nd node yet.

If I modify the mounting script to do something like this instead:

8K-----8K-----8K-----8K-----8K-----8K-----8K-----
#!/bin/bash
mount -t glusterfs \
-o defaults,noatime,nodiratime,direct-io-mode=off,log-file=/dev/null\
,log-level=NONE /etc/glusterfs/root.vol /mnt/newroot
# Note - added sleep and ls
sleep 2
ls -la /mnt/newroot > /dev/null
sleep 2
ls -laR /mnt/newroot/cluster > /dev/null
sleep 2
mount --bind /mnt/newroot/cluster/cdsl/2 /mnt/newroot/cdsl.local
8K-----8K-----8K-----8K-----8K-----8K-----8K-----

then it works.

Comment 1 Basavanagowda Kanur 2009-07-04 14:18:51 UTC
I could reproduce this bug. The findings out of the procedure were:

glusterfs mount point is /mnt/glusterfs/1 and it is being mount bound to /mnt/glusterfs/2.

before trying to mount & bind:

# stat /mnt/glusterfs/1
  File: `/mnt/glusterfs/1'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 801h/2049d      Inode: 271578      Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2009-05-13 04:40:05.000000000 +0530
Modify: 2009-04-14 04:40:10.000000000 +0530
Change: 2009-04-14 04:40:10.000000000 +0530

# stat /mnt/glusterfs/2
  File: `/mnt/glusterfs/2'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 801h/2049d      Inode: 271579      Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2009-05-13 04:40:05.000000000 +0530
Modify: 2009-04-14 04:40:10.000000000 +0530
Change: 2009-04-14 04:40:10.000000000 +0530

after mount & bind:

# stat /mnt/glusterfs/2
  File: `/mnt/glusterfs/2'
  Size: 4096            Blocks: 8          IO Block: 4096   directory
Device: 801h/2049d      Inode: 271578      Links: 2
Access: (0755/drwxr-xr-x)  Uid: (    0/    root)   Gid: (    0/    root)
Access: 2009-05-13 04:40:05.000000000 +0530
Modify: 2009-04-14 04:40:10.000000000 +0530
Change: 2009-04-14 04:40:10.000000000 +0530

It is evident here that /mnt/glusterfs/2 was bound to /mnt/glusterfs/1, even before glusterfs is reflected.

I verified the above explanation further by creating files/directories under /mnt/glusterfs/2. As expected these files were visible under directory /mnt/glusterfs/1 after umounting glusterfs.

This behaviour is also observed while using a simple posix volume in volume spec file.

--
Gowda

Comment 2 Vikas Gorur 2009-07-06 06:35:19 UTC
The issue is because the fuse_mount is called after GlusterFS becomes a daemon. This is necessary because FUSE expects that the process which called fuse_mount and the process which does further I/O on the fuse channel to be same. 

As soon as GlusterFS becomes a daemon, the shell command returns. There is a race there between the next shell command and fuse_mount being called.

The way to fix this is to stop relying on the daemonize() function and write the daemonizing logic ourselves.

Comment 3 Vikas Gorur 2009-07-06 06:47:00 UTC
(In reply to comment #2)
> The issue is because the fuse_mount is called after GlusterFS becomes a daemon.
> This is necessary because FUSE expects that the process which called fuse_mount
> and the process which does further I/O on the fuse channel to be same. 

This comment also has a bug. It is not FUSE that expects this, but the ib-verbs library. The process that did ib-verbs init must also be the process that does send/recv. Thus we cannot initialize the translator graph before daemonizing.

Comment 4 Anand Avati 2009-09-23 10:27:25 UTC
PATCH: http://patches.gluster.com/patch/1464 in master (Changes for custom daemon function.)

Comment 5 Anand Avati 2009-09-23 10:27:40 UTC
PATCH: http://patches.gluster.com/patch/1465 in release-2.0 (Changes for custom daemon function.)

Comment 6 Anand Avati 2009-09-24 09:19:33 UTC
PATCH: http://patches.gluster.com/patch/1488 in master (glusterfsd/main: Do a sem_post only if running in daemon mode.)

Comment 7 Anand Avati 2009-09-24 09:19:37 UTC
PATCH: http://patches.gluster.com/patch/1487 in release-2.0 (glusterfsd/main: Do a sem_post only if running in daemon mode.)

Comment 8 Anand Avati 2009-10-22 02:52:45 UTC
PATCH: http://patches.gluster.com/patch/1970 in master (glusterfsd.c: Unnecessary writing of strerror of errorno on pipe)

Comment 9 Anand Avati 2009-10-22 02:52:50 UTC
PATCH: http://patches.gluster.com/patch/1971 in release-2.0 (glusterfsd.c: Unnecessary writing of strerror of errorno on pipe)