Hide Forgot
Is there any reason to think this problem has anything to do with the fact that cluster/distribute had a single subvolume? Presumably if cluster/distribute had many subvolumes and they were all almost full, the same problem would occur. I was the customer seeing the problems Vikas described, but my recollection is touch (creating a file) failed with "No such file or directory" but mkdir succeeded. Regarding Vikas' three suggestions, my $.02: 1. If a file can't be created due to space limitations, either because of min-free-disk or because the underlying filesystem is truly full, then "No space left on device" is certainly a more useful error message than "No such file or directory." 2. I think a warning message should be logged when distribute intentionally refuses to create a file due to min-free-disk. I saw that the disk was 92%, and we are working on enlarging the disk, but I figured "92% is not 100%" so it did not occur to me that the problem was disk-space related. A warning message would fix that. If the underlying bricks were at 100%, I would not need a warning message, because then clearly the problem is that the disk is full. But "disk full" at 92% is not expected, so an explicit warning is warranted. 3. I think distribute's min-free-disk should remain a hard limit. If you want that limit to be 0%, you can set it that way. But making min-free-disk actually be "warn-at-free-disk" just turns glusterfs into a monitoring tool. In general glusterfs should not try to take over the general job of monitoring disk usage; tools like Nagios are better for that.
The following was observed in a setup with distribute with a single subvolume which was 92% full. Since this passed the min-disk-free threshold, distribute did not allow creation of any new files or directories. However, the error messages seen by the user were not helpful: 1) touch failed with "No such file or directory". 2) mkdir failed with "Invalid argument". The client log file had the warning that the subvolume was 92% full. However, when the create failed distribute did not print any message pointing the user to the root of the problem. Suggestions to handle this case better: 1) create, mkdir, etc. operations should return: "No space left on device". 2) A warning log message should be printed indicating that none of distribute's subvolumes have enough space left. 3) It might be useful to change distribute's behavior such that if all its subvolumes are past the threshold it still allows creation of files using the remaining 10% on the subvolumes while printing warning messages.
Please update the status of this bug as its been more than 6months since its filed (bug id < 2000) Please resolve it with proper resolution if its not valid anymore. If its still valid and not critical, move it to 'enhancement' severity.
This situation came due to having distribute with just one subvol. Also this behavior is not reproducable in latest master branch. Hence closing the bug.