Bug 1404693 - `rm` of file on mirrored glusterfs fs sometimes blocks indefinitely
Summary: `rm` of file on mirrored glusterfs fs sometimes blocks indefinitely
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.6.0
Hardware: x86_64
OS: Linux
medium
medium
Target Milestone: ---
Assignee: bugs@gluster.org
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-12-14 12:44 UTC by David van Leeuwen
Modified: 2017-04-03 14:40 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-03-08 16:49:03 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)

Description David van Leeuwen 2016-12-14 12:44:12 UTC
Description of problem:

`rm` of a file on a mirrored glusterfs filesystem blocks sometimes (~ 1 in 1000 times).  The file still continues to exist in both copies of the filesystem.  A manual `rm` of the same file removes the file, but the original `rm` continues to block. 


Version-Release number of selected component (if applicable):
3.5.2-2+deb8u2 (debian jessie stock release)


How reproducible:
Mmm.  Complex Kaldi training script.  The file is probably used as a semaphore between parallel jobs, but is only removed until the jobs have finished. 


Steps to Reproduce:
1. install two nodes with large filesystems on /mnt/gluster/data
2. configure as mirror: `gluster volume create data replica 2 transport tcp host-1:/mnt/gluster/data/brick host-2:/mnt/gluster/data/brick`
3. mount as /data, use this as shared filesystem. 

Actual results:
Every so often (1 in 1000 times or less) `rm file-on-glusterfs-fs` blocks, nothing happens, seemingly indefinitely.  I have to kill the `rm` process for the  calling script to continue. 

Expected results:
`rm file-on-glusterfs-fs` always returns within a few seconds



Additional info:

Comment 1 Nithya Balachandran 2017-01-30 06:48:01 UTC
(In reply to David van Leeuwen from comment #0)
> Description of problem:
> 
> `rm` of a file on a mirrored glusterfs filesystem blocks sometimes (~ 1 in
> 1000 times).  The file still continues to exist in both copies of the
> filesystem.  A manual `rm` of the same file removes the file, but the
> original `rm` continues to block. 
> 
> 
> Version-Release number of selected component (if applicable):
> 3.5.2-2+deb8u2 (debian jessie stock release)
> 
> 
> How reproducible:
> Mmm.  Complex Kaldi training script.  The file is probably used as a
> semaphore between parallel jobs, but is only removed until the jobs have
> finished. 
> 
> 
> Steps to Reproduce:
> 1. install two nodes with large filesystems on /mnt/gluster/data
> 2. configure as mirror: `gluster volume create data replica 2 transport tcp
> host-1:/mnt/gluster/data/brick host-2:/mnt/gluster/data/brick`
> 3. mount as /data, use this as shared filesystem. 
> 
> Actual results:
> Every so often (1 in 1000 times or less) `rm file-on-glusterfs-fs` blocks,
> nothing happens, seemingly indefinitely.  I have to kill the `rm` process
> for the  calling script to continue. 
> 
> Expected results:
> `rm file-on-glusterfs-fs` always returns within a few seconds
> 
> 
> 
> Additional info:

Hi,


Please provide the following information:

1. Mount Logs for when the problem is seen
2. Did the bricks already contain files when they were used to create the volume?
3. gluster volume info

Comment 2 David van Leeuwen 2017-02-20 12:56:36 UTC
1. and 3.  I removed glusterfs from the production system, as it gave too many problems. Well, it may have, it is hard to tell---I see the machine load go to 514 sometimes, resulting in a un-responsive system, that requires a reboot. I still haven't found the subsystem/process causing this. 

2. I don't believe so.  

Sorry I can't be more specific at the moment.  I am using the two machines more/less in production, so I can't do too much experimenting.  I now have a classical NFS mount from one to the other, probably not ideal, but works for now. 

Moving the files from /data to the non-brick part of /mnt/gluster/data was a lengthy process.  It would be nice if this could have happened inside the host filesystem, i.e., moving the data from /mnt/gluster/data/brick to /mnt/gluster/data.  But I suppose that is too hard for glusterfs, to move data away under its feet.  

The opposite would also be nice: initializing a glusterfs by moving in the local filesystem into the brick.  

---david

Comment 3 Kaleb KEITHLEY 2017-03-08 16:49:03 UTC
closing against 3.6.0, which is EOL.

Please reopen against a current version or file a new bug if still experiencing this problem.


Note You need to log in before you can comment on or make changes to this bug.