Description of problem: `rm` of a file on a mirrored glusterfs filesystem blocks sometimes (~ 1 in 1000 times). The file still continues to exist in both copies of the filesystem. A manual `rm` of the same file removes the file, but the original `rm` continues to block. Version-Release number of selected component (if applicable): 3.5.2-2+deb8u2 (debian jessie stock release) How reproducible: Mmm. Complex Kaldi training script. The file is probably used as a semaphore between parallel jobs, but is only removed until the jobs have finished. Steps to Reproduce: 1. install two nodes with large filesystems on /mnt/gluster/data 2. configure as mirror: `gluster volume create data replica 2 transport tcp host-1:/mnt/gluster/data/brick host-2:/mnt/gluster/data/brick` 3. mount as /data, use this as shared filesystem. Actual results: Every so often (1 in 1000 times or less) `rm file-on-glusterfs-fs` blocks, nothing happens, seemingly indefinitely. I have to kill the `rm` process for the calling script to continue. Expected results: `rm file-on-glusterfs-fs` always returns within a few seconds Additional info:
(In reply to David van Leeuwen from comment #0) > Description of problem: > > `rm` of a file on a mirrored glusterfs filesystem blocks sometimes (~ 1 in > 1000 times). The file still continues to exist in both copies of the > filesystem. A manual `rm` of the same file removes the file, but the > original `rm` continues to block. > > > Version-Release number of selected component (if applicable): > 3.5.2-2+deb8u2 (debian jessie stock release) > > > How reproducible: > Mmm. Complex Kaldi training script. The file is probably used as a > semaphore between parallel jobs, but is only removed until the jobs have > finished. > > > Steps to Reproduce: > 1. install two nodes with large filesystems on /mnt/gluster/data > 2. configure as mirror: `gluster volume create data replica 2 transport tcp > host-1:/mnt/gluster/data/brick host-2:/mnt/gluster/data/brick` > 3. mount as /data, use this as shared filesystem. > > Actual results: > Every so often (1 in 1000 times or less) `rm file-on-glusterfs-fs` blocks, > nothing happens, seemingly indefinitely. I have to kill the `rm` process > for the calling script to continue. > > Expected results: > `rm file-on-glusterfs-fs` always returns within a few seconds > > > > Additional info: Hi, Please provide the following information: 1. Mount Logs for when the problem is seen 2. Did the bricks already contain files when they were used to create the volume? 3. gluster volume info
1. and 3. I removed glusterfs from the production system, as it gave too many problems. Well, it may have, it is hard to tell---I see the machine load go to 514 sometimes, resulting in a un-responsive system, that requires a reboot. I still haven't found the subsystem/process causing this. 2. I don't believe so. Sorry I can't be more specific at the moment. I am using the two machines more/less in production, so I can't do too much experimenting. I now have a classical NFS mount from one to the other, probably not ideal, but works for now. Moving the files from /data to the non-brick part of /mnt/gluster/data was a lengthy process. It would be nice if this could have happened inside the host filesystem, i.e., moving the data from /mnt/gluster/data/brick to /mnt/gluster/data. But I suppose that is too hard for glusterfs, to move data away under its feet. The opposite would also be nice: initializing a glusterfs by moving in the local filesystem into the brick. ---david
closing against 3.6.0, which is EOL. Please reopen against a current version or file a new bug if still experiencing this problem.