Bug 765426 (GLUSTER-3694)

Summary: What those bailing out frame type mean ?
Product: [Community] GlusterFS Reporter: KentaroNishizawa <kentaro.nishizawa>
Component: glusterdAssignee: krishnan parthasarathi <kparthas>
Status: CLOSED NOTABUG QA Contact:
Severity: medium Docs Contact:
Priority: medium    
Version: pre-releaseCC: gluster-bugs, nsathyan, vijay
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: Bug Fix
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2012-02-06 08:38:36 EST Type: ---
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:

Description KentaroNishizawa 2011-10-05 02:43:12 EDT
Hi 

I'm using glusterfs-3.2.3 in following enviroment and I have couple of question.

glusterfs server  CentOS-5.5 x86_64
glusterfs client  Debian-5.0 Lenny  Reading data via fuse

I see many of following message in my gluster client server log.
When I see this message, some of my gluster client servers 
cpu load gets very high , but some of the client servers cpu load are normal.

What is this error message mean ? 
and when and what timing those this error message appear ?

--------------------------------------------------------------------
Error Message
--------------------------------------------------------------------
I have 20 glusterfs client server but the error message are all the same.

[2011-10-05 15:12:14.840485] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53471x sent = 2011-10-05 14:42:13.652077. timeout = 1800
[2011-10-05 15:12:14.840547] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53470x sent = 2011-10-05 14:42:13.652037. timeout = 1800
[2011-10-05 15:12:14.840611] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53469x sent = 2011-10-05 14:42:13.651900. timeout = 1800
[2011-10-05 15:12:14.840670] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53468x sent = 2011-10-05 14:42:13.651862. timeout = 1800
[2011-10-05 15:12:14.840733] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53467x sent = 2011-10-05 14:42:13.651821. timeout = 1800
[2011-10-05 15:12:24.841077] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53493x sent = 2011-10-05 14:42:23.652976. timeout = 1800
[2011-10-05 15:12:34.841460] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53567x sent = 2011-10-05 14:42:33.655952. timeout = 1800
[2011-10-05 15:12:34.841567] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53566x sent = 2011-10-05 14:42:33.655919. timeout = 1800
[2011-10-05 15:12:34.841638] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53565x sent = 2011-10-05 14:42:33.655886. timeout = 1800
[2011-10-05 15:12:34.841694] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53564x sent = 2011-10-05 14:42:33.655853. timeout = 1800
[2011-10-05 15:12:34.841744] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53563x sent = 2011-10-05 14:42:33.655821. timeout = 1800
[2011-10-05 15:12:34.841795] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53562x sent = 2011-10-05 14:42:33.655787. timeout = 1800
[2011-10-05 15:12:34.841848] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53561x sent = 2011-10-05 14:42:33.655753. timeout = 1800
[2011-10-05 15:12:34.841912] E [rpc-clnt.c:197:call_bail] 0-disk2: bailing out frame type(GlusterFS 3.1) op(FINODELK(30)) xid = 0x53560x sent = 2011-10-05 14:42:33.655719. timeout = 1800
Comment 1 krishnan parthasarathi 2011-10-13 10:45:00 EDT
Kentaro,

The messages that says "bailing out.." convey that a operation on the file didn't get any response from the server in the last 1800 seconds. The operation in this case being FINODELK.
We cannot keep this bug open to answer queries about Glusterfs. The best place for that would be http://community.gluster.org/

It would help us to investigate the issue, if you can provide the following information,

- What is the volume configuration (gluster volume info <volname>)
- What kind of 'workload' was seen on the glusterfs client?
- Did any of the servers 'go down' or was there any network outages,
  when you see these messages?
- Attach log files of client and server(s).
- When you observe a hang issue signal USR1 to glusterfs server process(es) -  
 'kill -s USR1 <pid>'
  It dumps the process state dump in '/tmp/glusterdump.<pid>'
  Attach the above (glusterdump.pid) file(s).