Hide Forgot
A problem with the SSD on mcloud4 caused a directory (not the entire drive) to hang. Gluster hung on the directory. Since the drive was not reported as failed on the OS level, we never stopped trying. The result was a complete hang on the volume. Would like to see a timeout function where if Gluster detects a hanging directory, that it would shutdown that particular brick as long as its replicated.
This is not valid as per the design. We don't want to take that decision automatically. Admin can use 'gluster volume remove-brick' to do this intentionally if needed.
This bug wasn't about removing a brick, but rather about glusterfsd exiting when it's posix translator fails. I believe that this bug should be re-evaluated on that basis.
This interpretation of the request was flawed. Please reopen this. A problem exists that can block the entire volume from use. Louis and I have both also had occasion where the brick's filesystem or drive has failed. glusterfsd tries to access that drive and hangs indefinately. This should be detected and the glusterfsd process should timeout and exit gracefully. Currently, filesystem blocks like this can lead to a zombie process that can only be restored by rebooting the server. This is not acceptable behavior. The priority and severity of this problem should be considered high.