Created attachment 1193658 [details] Two scripts used to reproduce error Description of problem: While a brick restart is occurring (by running service glusterd restart), there is a brief window under which file locking fails. We have attached two scripts that should be run simultaneously to simulate file locking and run brick restarts to reproduce the error. Version-Release number of selected component (if applicable): Seen on 3.8.1 and 3.8.2 How reproducible: Can reproduce consistently on different gluster installations Steps to Reproduce: 1. Connect to a site running gluster 2. Run the kill-brick-process.sh script 3. Run the createAndLock.sh script while the previous script is running Actual results: Get a file not found error for the file generated in createAndLock.sh Expected results: File is locked and no error is output Additional info: Originally found the issue through mysql, which showed a file not found error when attempting to lock files. mysqld: 2016-08-08T17:26:58.383273Z 9 [ERROR] InnoDB: Unable to lock ... error: 2 mysqld: 2016-08-08T17:26:58.384050Z 9 [ERROR] InnoDB: Operating system error number 2 in a file operation. mysqld: 2016-08-08T17:26:58.384068Z 9 [ERROR] InnoDB: The error means the system cannot find the path specified. mysqld: 2016-08-08T17:26:58.384075Z 9 [ERROR] InnoDB: Cannot create file ...
This doesn't seem like a glusterd issue as the error is basically from the client. Could you attach the client, brick and glusterd logs for further analysis?
Created attachment 1193675 [details] Zip with requested client, brick and daemon logs
Logs attached, let me know if you need additional information/ clarification.
I went through the scripts and log files. From the scripts, it looks like you are trying a negative test scenario with every 2 secs killing the brick processes and restarting glusterd and at the same time trying to issue flock from the mount point. If the brick processes are going through a restart at that time, the fops are bound to fail as client wouldn't be able to communicate to bricks. But ideally the failures should have been transport endpoint not connected instead of file not found. I am moving this BZ to distribute component for further analysis.
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.