Bug 1369839 - File locking error while brick restarting
Summary: File locking error while brick restarting
Keywords:
Status: CLOSED EOL
Alias: None
Product: GlusterFS
Classification: Community
Component: distribute
Version: 3.8
Hardware: x86_64
OS: Linux
unspecified
medium
Target Milestone: ---
Assignee: Nithya Balachandran
QA Contact:
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-08-24 14:13 UTC by Matt Kornfield
Modified: 2017-11-07 10:36 UTC (History)
3 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2017-11-07 10:36:18 UTC
Regression: ---
Mount Type: ---
Documentation: ---
CRM:
Verified Versions:
Embargoed:


Attachments (Terms of Use)
Two scripts used to reproduce error (865 bytes, application/zip)
2016-08-24 14:13 UTC, Matt Kornfield
no flags Details
Zip with requested client, brick and daemon logs (257.53 KB, application/zip)
2016-08-24 15:26 UTC, Matt Kornfield
no flags Details

Description Matt Kornfield 2016-08-24 14:13:00 UTC
Created attachment 1193658 [details]
Two scripts used to reproduce error

Description of problem:

While a brick restart is occurring (by running service glusterd restart), there is a brief window under which file locking fails. 
We have attached two scripts that should be run simultaneously to simulate file locking and run brick restarts to reproduce the error.

Version-Release number of selected component (if applicable):
Seen on 3.8.1 and 3.8.2

How reproducible:
Can reproduce consistently on different gluster installations

Steps to Reproduce:
1. Connect to a site running gluster
2. Run the kill-brick-process.sh script
3. Run the createAndLock.sh script while the previous script is running

Actual results:
Get a file not found error for the file generated in createAndLock.sh

Expected results:
File is locked and no error is output

Additional info:
Originally found the issue through mysql, which showed a file not found error when attempting to lock files.
mysqld: 2016-08-08T17:26:58.383273Z 9 [ERROR] InnoDB: Unable to lock ... error: 2
mysqld: 2016-08-08T17:26:58.384050Z 9 [ERROR] InnoDB: Operating system error number 2 in a file operation.
mysqld: 2016-08-08T17:26:58.384068Z 9 [ERROR] InnoDB: The error means the system cannot find the path specified.
mysqld: 2016-08-08T17:26:58.384075Z 9 [ERROR] InnoDB: Cannot create file ...

Comment 1 Atin Mukherjee 2016-08-24 14:46:33 UTC
This doesn't seem like a glusterd issue as the error is basically from the client. Could you attach the client, brick and glusterd logs for further analysis?

Comment 2 Matt Kornfield 2016-08-24 15:26:25 UTC
Created attachment 1193675 [details]
Zip with requested client, brick and daemon logs

Comment 3 Matt Kornfield 2016-08-24 15:27:12 UTC
Logs attached, let me know if you need additional information/ clarification.

Comment 4 Atin Mukherjee 2016-08-26 05:20:54 UTC
I went through the scripts and log files. From the scripts, it looks like you are trying a negative test scenario with every 2 secs killing the brick processes and restarting glusterd and at the same time trying to issue flock from the mount point. If the brick processes are going through a restart at that time, the fops are bound to fail as client wouldn't be able to communicate to bricks. But ideally the failures should have been transport endpoint not connected instead of file not found. 

I am moving this BZ to distribute component for further analysis.

Comment 5 Niels de Vos 2016-09-12 05:37:10 UTC
All 3.8.x bugs are now reported against version 3.8 (without .x). For more information, see http://www.gluster.org/pipermail/gluster-devel/2016-September/050859.html

Comment 6 Niels de Vos 2017-11-07 10:36:18 UTC
This bug is getting closed because the 3.8 version is marked End-Of-Life. There will be no further updates to this version. Please open a new bug against a version that still receives bugfixes if you are still facing this issue in a more current release.


Note You need to log in before you can comment on or make changes to this bug.