Bug 1369839

Summary: File locking error while brick restarting
Product: [Community] GlusterFS Reporter: Matt Kornfield <mckornfield>
Component: distributeAssignee: Nithya Balachandran <nbalacha>
Status: CLOSED EOL QA Contact:
Severity: medium Docs Contact:
Priority: unspecified    
Version: 3.8CC: amukherj, bugs, mckornfield
Target Milestone: ---Keywords: Triaged
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2017-11-07 10:36:18 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Embargoed:
Attachments:
Description Flags
Two scripts used to reproduce error
none
Zip with requested client, brick and daemon logs none

Description Matt Kornfield 2016-08-24 14:13:00 UTC
Created attachment 1193658 [details]
Two scripts used to reproduce error

Description of problem:

While a brick restart is occurring (by running service glusterd restart), there is a brief window under which file locking fails. 
We have attached two scripts that should be run simultaneously to simulate file locking and run brick restarts to reproduce the error.

Version-Release number of selected component (if applicable):
Seen on 3.8.1 and 3.8.2

How reproducible:
Can reproduce consistently on different gluster installations

Steps to Reproduce:
1. Connect to a site running gluster
2. Run the kill-brick-process.sh script
3. Run the createAndLock.sh script while the previous script is running

Actual results:
Get a file not found error for the file generated in createAndLock.sh

Expected results:
File is locked and no error is output

Additional info:
Originally found the issue through mysql, which showed a file not found error when attempting to lock files.
mysqld: 2016-08-08T17:26:58.383273Z 9 [ERROR] InnoDB: Unable to lock ... error: 2
mysqld: 2016-08-08T17:26:58.384050Z 9 [ERROR] InnoDB: Operating system error number 2 in a file operation.
mysqld: 2016-08-08T17:26:58.384068Z 9 [ERROR] InnoDB: The error means the system cannot find the path specified.
mysqld: 2016-08-08T17:26:58.384075Z 9 [ERROR] InnoDB: Cannot create file ...

Comment 1 Atin Mukherjee 2016-08-24 14:46:33 UTC
This doesn't seem like a glusterd issue as the error is basically from the client. Could you attach the client, brick and glusterd logs for further analysis?

Comment 2 Matt Kornfield 2016-08-24 15:26:25 UTC
Created attachment 1193675 [details]
Zip with requested client, brick and daemon logs

Comment 3 Matt Kornfield 2016-08-24 15:27:12 UTC
Logs attached, let me know if you need additional information/ clarification.

Comment 4 Atin Mukherjee 2016-08-26 05:20:54 UTC
I went through the scripts and log files. From the scripts, it looks like you are trying a negative test scenario with every 2 secs killing the brick processes and restarting glusterd and at the same time trying to issue flock from the mount point. If the brick processes are going through a restart at that time, the fops are bound to fail as client wouldn't be able to communicate to bricks. But ideally the failures should have been transport endpoint not connected instead of file not found. 

I am moving this BZ to distribute component for further analysis.

Comment 5 Niels de Vos 2016-09-12 05:37:10 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 6 Niels de Vos 2017-11-07 10:36:18 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.