Bug 1381957 - Seeing errors in fuse client log like forced unwinding frame type(GlusterFS 3.3) when bricks are brought down while appends are happening
Summary: Seeing errors in fuse client log like forced unwinding frame type(GlusterFS ...
Keywords:
Status: CLOSED WONTFIX
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat
Component: replicate
Version: rhgs-3.2
Hardware: Unspecified
OS: Unspecified
unspecified
high
Target Milestone: ---
: ---
Assignee: Pranith Kumar K
QA Contact: Nag Pavan Chilakam
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2016-10-05 12:32 UTC by Nag Pavan Chilakam
Modified: 2017-02-01 14:21 UTC (History)
2 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2016-10-14 09:10:23 UTC
Target Upstream Version:


Attachments (Terms of Use)

Description Nag Pavan Chilakam 2016-10-05 12:32:25 UTC
Description of problem:
=======================
In my systemic setup, https://docs.google.com/spreadsheets/d/1iP5Mi1TewBFVh8HTmlcBm9072Bgsbgkr3CLcGmawDys/edit#gid=632186609

amongst many clients, I had two rhel 6 clients from where I was doing append to the same file from both the clients parallely as below:
 for i in {1..100000};do for j in {1..100000};do echo "hi this line is from $HOSTNAME" >> deadlockfile ;done; echo "########################" >> deadlockfile ; echo " LOOP $i from $HOSTNAME" >> deadlockfile ;done



It  ran for about a week, and while this was happening , I brought down one brick each from each replica pair as below:
[root@dhcp35-191 bricks]# gluster  v  status
Status of volume: distrepvol
Gluster process                             TCP Port  RDMA Port  Online  Pid
------------------------------------------------------------------------------
Brick 10.70.35.191:/rhs/brick1/distrepvol   N/A       N/A        N       N/A  
Brick 10.70.37.187:/rhs/brick1/distrepvol   49154     0          Y       3867 
Brick 10.70.35.3:/rhs/brick1/distrepvol     N/A       N/A        N       N/A  
Brick 10.70.37.150:/rhs/brick1/distrepvol   49154     0          Y       3918 
Brick 10.70.35.191:/rhs/brick2/distrepvol   49155     0          Y       7568 
Brick 10.70.37.187:/rhs/brick2/distrepvol   N/A       N/A        N       N/A  
Brick 10.70.35.3:/rhs/brick2/distrepvol     49155     0          Y       5341 
Brick 10.70.37.150:/rhs/brick2/distrepvol   N/A       N/A        N       N/A  
Snapshot Daemon on localhost                49152     0          Y       8358 
Self-heal Daemon on localhost               N/A       N/A        Y       7588 
Quota Daemon on localhost                   N/A       N/A        Y       8211 
Snapshot Daemon on 10.70.37.150             49152     0          Y       4477 
Self-heal Daemon on 10.70.37.150            N/A       N/A        Y       3957 
Quota Daemon on 10.70.37.150                N/A       N/A        Y       4380 
Snapshot Daemon on 10.70.37.187             49152     0          Y       4428 
Self-heal Daemon on 10.70.37.187            N/A       N/A        Y       3907 
Quota Daemon on 10.70.37.187                N/A       N/A        Y       4330 
Snapshot Daemon on 10.70.35.3               49152     0          Y       5858 
Self-heal Daemon on 10.70.35.3              N/A       N/A        Y       5361 
Quota Daemon on 10.70.35.3                  N/A       N/A        Y       5762 
 
Task Status of Volume distrepvol
------------------------------------------------------------------------------
There are no active volume tasks

When the bricks were brought down I noticed the following error logs on fuse client:
[2016-10-05 11:18:25.461727] W [socket.c:590:__socket_rwv] 0-distrepvol-client-0: readv on 10.70.35.191:49154 failed (Connection reset by peer)
[2016-10-05 11:18:25.462270] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1f2)[0x7fc88e04a442] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fc88de12747] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fc88de1285e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x99)[0x7fc88de12919] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1cf)[0x7fc88de12b3f] ))))) 0-distrepvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2016-10-05 11:18:25.347100 (xid=0x50cde43f)
[2016-10-05 11:18:25.462293] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-distrepvol-client-0: remote operation failed [Transport endpoint is not connected]
[2016-10-05 11:18:25.462487] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1f2)[0x7fc88e04a442] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fc88de12747] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fc88de1285e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x99)[0x7fc88de12919] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1cf)[0x7fc88de12b3f] ))))) 0-distrepvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(RELEASE(41)) called at 2016-10-05 11:18:25.346857 (xid=0x50cde43d)
[2016-10-05 11:18:25.462631] E [rpc-clnt.c:365:saved_frames_unwind] (--> /usr/lib64/libglusterfs.so.0(_gf_log_callingfn+0x1f2)[0x7fc88e04a442] (--> /usr/lib64/libgfrpc.so.0(saved_frames_unwind+0x1e7)[0x7fc88de12747] (--> /usr/lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7fc88de1285e] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x99)[0x7fc88de12919] (--> /usr/lib64/libgfrpc.so.0(rpc_clnt_notify+0x1cf)[0x7fc88de12b3f] ))))) 0-distrepvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FLUSH(15)) called at 2016-10-05 11:18:25.346926 (xid=0x50cde43e)
[2016-10-05 11:18:25.462650] E [MSGID: 114031] [client-rpc-fops.c:919:client3_3_flush_cbk] 0-distrepvol-client-0: remote operation failed [Transport endpoint is not connected]
[2016-10-05 11:18:25.462696] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-distrepvol-client-0: disconnected from distrepvol-client-0. Client process will keep trying to connect to glusterd until brick's port is available
[2016-10-05 11:18:35.826378] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:35.829761] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:39.829510] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:39.832450] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:43.833228] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:43.836276] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:47.836093] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:47.838921] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:51.839247] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:51.842059] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:52.474766] W [socket.c:590:__socket_rwv] 0-distrepvol-client-5: readv on 10.70.37.187:49155 failed (No data available)
[2016-10-05 11:18:52.474821] I [MSGID: 114018] [client.c:2280:client_rpc_notify] 0-distrepvol-client-5: disconnected from distrepvol-client-5. Client process will keep trying to connect to glusterd until brick's port is available
[2016-10-05 11:18:55.841724] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:55.844622] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:59.845327] I [rpc-clnt.c:1947:rpc_clnt_reconfig] 0-distrepvol-client-0: changing port to 49154 (from 0)
[2016-10-05 11:18:59.848412] E [socket.c:2309:socket_connect_finish] 0-distrepvol-client-0: connection to 10.70.35.191:49154 failed (Connection refused)
[2016-10-05 11:18:25.462824] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-distrepvol-client-0: remote operation failed [Transport endpoint is not connected]





Version-Release number of selected component (if applicable):
==============
client side
glusterfs-3.8.4-1.el6.x86_64
glusterfs-cli-3.8.4-1.el6.x86_64
glusterfs-libs-3.8.4-1.el6.x86_64
glusterfs-client-xlators-3.8.4-1.el6.x86_64
glusterfs-api-3.8.4-1.el6.x86_64
glusterfs-debuginfo-3.8.4-1.el6.x86_64
glusterfs-fuse-3.8.4-1.el6.x86_64
[root@dhcp35-29 glusterfs]#

Comment 2 Pranith Kumar K 2016-10-14 09:10:23 UTC
These logs are important for debugging ping-timeout related bugs. So it is not a good idea to remove these from appearing. When a disconnect happens after the operations are sent to bricks, these logs appear. It prints the time at which operation was sent etc, which is crucial in debugging ping-timeouts.


Note You need to log in before you can comment on or make changes to this bug.