Bug 1425740 - Disconnects in nfs mount leads to IO hang and mount inaccessible
Summary: Disconnects in nfs mount leads to IO hang and mount inaccessible
Keywords:
Status: CLOSED ERRATA
Alias: None
Product: Red Hat Gluster Storage
Classification: Red Hat Storage
Component: rpc
Version: rhgs-3.2
Hardware: All
OS: Linux
unspecified
urgent
Target Milestone: ---
: RHGS 3.2.0
Assignee: Raghavendra G
QA Contact: Karan Sandha
URL:
Whiteboard:
Depends On: 1427012 1428670
Blocks: 1351528 1409135
TreeView+ depends on / blocked
 
Reported: 2017-02-22 09:40 UTC by Karan Sandha
Modified: 2017-03-23 06:05 UTC (History)
10 users (show)

Fixed In Version: glusterfs-3.8.4-17
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
: 1427012 (view as bug list)
Environment:
Last Closed: 2017-03-23 06:05:28 UTC
Embargoed:


Attachments (Terms of Use)
logs of gdb session attached to gNFS server (296.10 KB, text/plain)
2017-03-03 08:53 UTC, Raghavendra G
no flags Details


Links
System ID Private Priority Status Summary Last Updated
Red Hat Product Errata RHSA-2017:0486 0 normal SHIPPED_LIVE Moderate: Red Hat Gluster Storage 3.2.0 security, bug fix, and enhancement update 2017-03-23 09:18:45 UTC

Description Karan Sandha 2017-02-22 09:40:57 UTC
Description of problem:
multiple disconnects on nfs mount leading to  IO hang and mount point inaccessible.IO pattern sequential writes.

Version-Release number of selected component (if applicable):
3.8.4-15
Logs are placed at 
rhsqe-repo.lab.eng.blr.redhat.com:/var/www/html/sosreports/<bug>

How reproducible:
100%

Steps to Reproduce:
1. 4 Servers And 4 Clients , Mount 1:1 with gnfs
2. Daemonize FIO on 4 Client and start sequential write with 0 direct flag
3. After the inception of IO's the tool got hanged.
4. Multiple errors and warnings in nfs.log

<SNIP>
[2017-02-21 12:12:51.320184] I [MSGID: 114035] [client-handshake.c:201:client_set_lk_version_cbk] 0-testvol-client-3: Server lk version = 1
[2017-02-21 12:12:51.321283] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-testvol-replicate-0: selecting local read_child testvol-client-0
[2017-02-21 12:49:49.701983] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 12:49:50.605486] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 12:57:37.436646] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 12:57:41.974013] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:06:02.813228] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:06:13.913704] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:09:46.068394] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:09:55.187138] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:13:29.838686] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:13:40.894034] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:17:13.926639] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:17:23.788660] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:20:59.255507] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:21:08.029787] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:30:14.654217] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:30:20.096036] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:36:51.797248] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:37:03.714977] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
[2017-02-21 13:40:31.413063] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 13:40:37.713545] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now
</SNIP>

<SNIP>
[2017-02-22 00:42:07.459998] W [rpc-clnt.c:1657:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0xb4941 Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (testvol-client-0)
[2017-02-22 00:42:07.460362] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5535a82672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f553584888e] (--> /lib64/libgfrpc.so.0(save
d_frames_destroy+0xe)[0x7f553584899e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f553584a104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f553584a994] ))))) 0-testvol-client-0: forced unwinding frame type(Gluste
rFS 3.3) op(FINODELK(30)) called at 2017-02-22 00:40:01.845368 (xid=0xb491e)
[2017-02-22 00:42:07.460487] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5535a82672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f553584888e] (--> /lib64/libgfrpc.so.0(save
d_frames_destroy+0xe)[0x7f553584899e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f553584a104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f553584a994] ))))) 0-testvol-client-0: forced unwinding frame type(Gluste
rFS 3.3) op(FINODELK(30)) called at 2017-02-22 00:40:26.777714 (xid=0xb4920)
</SNIP>


Actual results:
IO tool got hung and multiple error and warning in the logs

Expected results:
NO IO hang should be observed

Additional info:

[root@gqas004 ~]# gluster v info
 
Volume Name: testvol
Type: Distributed-Replicate
Volume ID: e2212c28-f04a-4f08-9f17-b0fb74434bbf
Status: Started
Snapshot Count: 0
Number of Bricks: 2 x 2 = 4
Transport-type: tcp
Bricks:
Brick1: gqas003.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick0
Brick2: gqas004.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick1
Brick3: gqas007.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick2
Brick4: gqas012.sbu.lab.eng.bos.redhat.com:/bricks/testvol_brick3
Options Reconfigured:
cluster.use-compound-fops: off
network.remote-dio: off
performance.strict-o-direct: on
network.inode-lru-limit: 90000
performance.md-cache-timeout: 600
performance.cache-invalidation: on
features.cache-invalidation-timeout: 600
features.cache-invalidation: on
server.allow-insecure: on
performance.stat-prefetch: on
transport.address-family: inet
performance.readdir-ahead: on
nfs.disable: off

Comment 2 rjoseph 2017-02-22 09:57:33 UTC
Additional info:

Following log entry seen in nfs.log as mentioned in bug description.
[2017-02-21 12:49:49.701983] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)

192.168.97.119 (gqac006.sbu.lab.eng.bos.redhat.com) is the NFS client machine and following errors are seen in /var/log/messages in this machine.
Feb 21 07:36:05 gqac006 kernel: nfs: server gqas001.sbu.lab.eng.bos.redhat.com not responding, still trying
Feb 21 07:36:31 gqac006 kernel: nfs: server gqas001.sbu.lab.eng.bos.redhat.com not responding, still trying
Feb 21 07:36:31 gqac006 kernel: nfs: server gqas001.sbu.lab.eng.bos.redhat.com not responding, still trying

Comment 5 Soumya Koduri 2017-02-22 17:48:32 UTC
[2017-02-21 12:12:51.321283] I [MSGID: 108031] [afr-common.c:2255:afr_local_discovery_cbk] 0-testvol-replicate-0: selecting local read_child testvol-client-0
[2017-02-21 12:49:49.701983] W [socket.c:595:__socket_rwv] 0-socket.nfs-server: writev on 192.168.97.119:843 failed (Connection reset by peer)
[2017-02-21 12:49:50.605486] I [socket.c:2408:socket_event_handler] 0-transport: disconnecting now

There has been disconnect between gNFS server and NFS client. Not sure why. Post that there seem to be disconnect with even brick-processed

[2017-02-22 00:41:25.933830] C [rpc-clnt-ping.c:160:rpc_clnt_ping_timer_expired] 0-testvol-client-0: server 192.168.97.115:49152 has not responded in the last 42 seconds, disconnecting.
[2017-02-22 00:41:34.457572] I [socket.c:3446:socket_submit_request] 0-testvol-client-0: not connected (priv->connected = -1)
[2017-02-22 00:41:34.457588] W [rpc-clnt.c:1657:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0xb4932 Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (testvol-client-0)
[2017-02-22 00:41:34.457599] E [MSGID: 114031] [client-rpc-fops.c:1601:client3_3_finodelk_cbk] 0-testvol-client-0: remote operation failed [Transport endpoint is not connected]
[2017-02-22 00:41:34.465945] W [rpc-clnt.c:1657:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0xb4933 Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (testvol-client-0)
[2017-02-22 00:41:34.466368] W [rpc-clnt.c:1657:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0xb4934 Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (testvol-client-0)
[2017-02-22 00:41:34.466419] W [rpc-clnt.c:1657:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0xb4935 Program: GlusterFS 3.3, ProgVers: 330, Proc: 30) to rpc-transport (testvol-client-0)
[2017-02-22 00:41:34.466416] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-testvol-client-0: remote operation failed [Transport endpoint is not connected]
[2017-02-22 00:41:34.466444] W [MSGID: 114029] [client-rpc-fops.c:4384:client3_3_writev] 0-testvol-client-0: failed to send the fop

...
...
resulting in saved_frames_destroy 

[2017-02-22 00:42:07.459998] W [rpc-clnt.c:1657:rpc_clnt_submit] 0-testvol-client-0: failed to submit rpc-request (XID: 0xb4941 Program: GlusterFS 3.3, ProgVers: 330, Proc: 13) to rpc-transport (testvol-client-0)
[2017-02-22 00:42:07.460362] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5535a82672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f553584888e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f553584899e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f553584a104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f553584a994] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2017-02-22 00:40:01.845368 (xid=0xb491e)
[2017-02-22 00:42:07.460487] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f5535a82672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f553584888e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f553584899e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f553584a104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f553584a994] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2017-02-22 00:40:26.777714 (xid=0xb4920)
@

Comment 6 Soumya Koduri 2017-02-22 17:53:40 UTC
Ideally after unwinding saved_frames the nfs-server should come back to normal state and respond to any new requests. But looks like even "showmount -e localhost" on the server node is hanging.

The reason being of the 2 poller threads , one thread is busy flushing saved_frames in wb_xlator and another poller thread waiting for the lock held by earlier one to be released.


[root@gqas001 tmp]# date
Wed Feb 22 08:08:34 EST 2017
[root@gqas001 tmp]# 



Thread 4 (Thread 0x7f5528a8f700 (LWP 31634)):
#0  0x00007f55348e81bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f55348e3d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f55348e3c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f553584ab12 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f55240dc7a8, event=<optimized out>, data=0x7f55240ec4a0)
    at rpc-clnt.c:994
#4  0x00007f55358469b3 in rpc_transport_notify (this=this@entry=0x7f55240ec4a0, event=event@entry=RPC_TRANSPORT_CONNECT, 
    data=data@entry=0x7f55240ec4a0) at rpc-transport.c:538
#5  0x00007f552a336177 in socket_connect_finish (this=this@entry=0x7f55240ec4a0) at socket.c:2348
#6  0x00007f552a33a778 in socket_event_handler (fd=<optimized out>, idx=13, data=0x7f55240ec4a0, poll_in=0, poll_out=4, poll_err=0) at socket.c:2391
---Type <return> to continue, or q <return> to quit--- 
#7  0x00007f5535ada770 in event_dispatch_epoll_handler (event=0x7f5528a8ee80, event_pool=0x7f5537978730) at event-epoll.c:571
#8  event_dispatch_epoll_worker (data=0x7f55379cb0e0) at event-epoll.c:674
#9  0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f553422673d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5520b9f700 (LWP 31672)):
#0  0x00007f553421bdfd in poll () from /lib64/libc.so.6
#1  0x00007f553425a8e8 in svc_run () from /lib64/libc.so.6
#2  0x00007f55234a51f4 in nsm_thread (argv=<optimized out>) at nlmcbk_svc.c:121
#3  0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f553422673d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f55168a8700 (LWP 31673)):
#0  wb_requests_conflict (lie=lie@entry=0x7f5461c17d80, req=req@entry=0x7f5525de8a70) at write-behind.c:290
#1  0x00007f55238f5bab in wb_liability_has_conflict (wb_inode=wb_inode@entry=0x7f5524072c80, req=req@entry=0x7f5525de8a70) at write-behind.c:309
#2  0x00007f55238f7dfb in __wb_pick_winds (wb_inode=wb_inode@entry=0x7f5524072c80, tasks=tasks@entry=0x7f55168a7400, 
    liabilities=liabilities@entry=0x7f55168a7420) at write-behind.c:1462
#3  0x00007f55238f803b in wb_process_queue (wb_inode=wb_inode@entry=0x7f5524072c80) at write-behind.c:1549
#4  0x00007f55238f8117 in wb_fulfill_cbk (frame=frame@entry=0x7f545a72128c, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=1, 
    op_errno=op_errno@entry=107, prebuf=prebuf@entry=0x7f545a72a17c, postbuf=postbuf@entry=0x7f545a72a1ec, xdata=xdata@entry=0x7f54759ff62c)
    at write-behind.c:1013
#5  0x00007f5523b5adf0 in dht_writev_cbk (frame=0x7f545a7213bc, cookie=<optimized out>, this=<optimized out>, op_ret=1, op_errno=107, 
    prebuf=0x7f545a72a17c, postbuf=0x7f545a72a1ec, xdata=0x7f54759ff62c) at dht-inode-write.c:106
#6  0x00007f5523da7d15 in afr_writev_unwind (frame=0x7f545a72154c, this=<optimized out>) at afr-inode-write.c:247
#7  0x00007f5523da7e5d in afr_transaction_writev_unwind (frame=0x7f545a72834c, this=0x7f552400cb50) at afr-inode-write.c:264
#8  0x00007f5523db3f31 in __afr_txn_write_done (frame=0x7f545a72834c, this=0x7f552400cb50) at afr-transaction.c:195
#9  0x00007f5523db90ab in afr_unlock_common_cbk (frame=frame@entry=0x7f545a72834c, this=this@entry=0x7f552400cb50, xdata=<optimized out>, 
    op_errno=107, op_ret=<optimized out>, cookie=0x0) at afr-lk-common.c:633
#10 0x00007f5523dba140 in afr_unlock_inodelk_cbk (frame=0x7f545a72834c, cookie=0x0, this=0x7f552400cb50, op_ret=<optimized out>, op_errno=107, 
    xdata=<optimized out>) at afr-lk-common.c:682
#11 0x00007f552805353c in client3_3_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f54431bd3fc)
    at client-rpc-fops.c:1606
#12 0x00007f55358488b5 in saved_frames_unwind (saved_frames=saved_frames@entry=0x7f54990ad510) at rpc-clnt.c:369
#13 0x00007f553584899e in saved_frames_destroy (frames=0x7f54990ad510) at rpc-clnt.c:386
#14 0x00007f553584a104 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7f55240dc7a8) at rpc-clnt.c:556
#15 0x00007f553584a994 in rpc_clnt_handle_disconnect (conn=0x7f55240dc7a8, clnt=0x7f55240dc750) at rpc-clnt.c:881
#16 rpc_clnt_notify (trans=<optimized out>, mydata=0x7f55240dc7a8, event=RPC_TRANSPORT_DISCONNECT, data=0x7f55240ec4a0) at rpc-clnt.c:937
---Type <return> to continue, or q <return> to quit---
#17 0x00007f55358469b3 in rpc_transport_notify (this=this@entry=0x7f55240ec4a0, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=data@entry=0x7f55240ec4a0) at rpc-transport.c:538
#18 0x00007f552a33a754 in socket_event_poll_err (this=0x7f55240ec4a0) at socket.c:1184
#19 socket_event_handler (fd=<optimized out>, idx=10, data=0x7f55240ec4a0, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2409
#20 0x00007f5535ada770 in event_dispatch_epoll_handler (event=0x7f55168a7e80, event_pool=0x7f5537978730) at event-epoll.c:571
#21 event_dispatch_epoll_worker (data=0x7f552404d120) at event-epoll.c:674
#22 0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#23 0x00007f553422673d in clone () from /lib64/libc.so.6


Post few hours -

[root@gqas001 ~]# date
Wed Feb 22 12:44:42 EST 2017
[root@gqas001 ~]# 

Thread 8 (Thread 0x7f552c54d700 (LWP 31625)):
#0  wb_requests_conflict (lie=lie@entry=0x7f54ba9a15e0, req=req@entry=0x7f5476a725f0) at write-behind.c:284
#1  0x00007f55238f5bab in wb_liability_has_conflict (wb_inode=wb_inode@entry=0x7f5524072c80, req=req@entry=0x7f5476a725f0) at write-behind.c:309
#2  0x00007f55238f7dfb in __wb_pick_winds (wb_inode=wb_inode@entry=0x7f5524072c80, tasks=tasks@entry=0x7f552c54c5e0, 
    liabilities=liabilities@entry=0x7f552c54c600) at write-behind.c:1462
#3  0x00007f55238f803b in wb_process_queue (wb_inode=wb_inode@entry=0x7f5524072c80) at write-behind.c:1549
#4  0x00007f55238f8117 in wb_fulfill_cbk (frame=frame@entry=0x7f546a6bd47c, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=-1, 
    op_errno=op_errno@entry=107, prebuf=prebuf@entry=0x7f5450001d8c, postbuf=postbuf@entry=0x7f5450001dfc, xdata=xdata@entry=0x0)
    at write-behind.c:1013
#5  0x00007f5523b5adf0 in dht_writev_cbk (frame=0x7f545038addc, cookie=<optimized out>, this=<optimized out>, op_ret=-1, op_errno=107, 
    prebuf=0x7f5450001d8c, postbuf=0x7f5450001dfc, xdata=0x0) at dht-inode-write.c:106
#6  0x00007f5523da7d15 in afr_writev_unwind (frame=0x7f545bd58d7c, this=<optimized out>) at afr-inode-write.c:247
#7  0x00007f5523da7e5d in afr_transaction_writev_unwind (frame=0x7f546adab0dc, this=0x7f552400cb50) at afr-inode-write.c:264
#8  0x00007f5523db8817 in afr_write_txn_refresh_done (frame=0x7f546adab0dc, this=0x7f552400cb50, err=<optimized out>) at afr-transaction.c:2613
#9  0x00007f5523dd831b in afr_txn_refresh_done (frame=frame@entry=0x7f546adab0dc, this=this@entry=0x7f552400cb50, err=err@entry=-107)
    at afr-common.c:1139
#10 0x00007f5523dd8b1d in afr_inode_refresh_done (frame=frame@entry=0x7f546adab0dc, this=this@entry=0x7f552400cb50) at afr-common.c:1181
#11 0x00007f5523de2bf5 in afr_inode_refresh_subvol_cbk (frame=0x7f546adab0dc, cookie=<optimized out>, this=0x7f552400cb50, op_ret=<optimized out>, 
    op_errno=<optimized out>, buf=<optimized out>, xdata=xdata@entry=0x0, par=par@entry=0x0) at afr-common.c:1218
#12 0x00007f5523de2c9b in afr_inode_refresh_subvol_with_fstat_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, 
    op_ret=<optimized out>, op_errno=<optimized out>, buf=<optimized out>, xdata=0x0) at afr-common.c:1267
#13 0x00007f55280527de in client3_3_fstat_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f546a800f6c)
    at client-rpc-fops.c:1497
#14 0x00007f5535849cac in call_bail (data=0x7f55240dc750) at rpc-clnt.c:204
#15 0x00007f5535a8dc81 in gf_timer_proc (data=0x7f5537987030) at timer.c:167
#16 0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#17 0x00007f553422673d in clone () from /lib64/libc.so.6



Thread 4 (Thread 0x7f5528a8f700 (LWP 31634)):
#0  0x00007f55348e81bd in __lll_lock_wait () from /lib64/libpthread.so.0
#1  0x00007f55348e3d02 in _L_lock_791 () from /lib64/libpthread.so.0
#2  0x00007f55348e3c08 in pthread_mutex_lock () from /lib64/libpthread.so.0
#3  0x00007f553584ab12 in rpc_clnt_notify (trans=<optimized out>, mydata=0x7f55240dc7a8, event=<optimized out>, data=0x7f55240ec4a0)
    at rpc-clnt.c:994
#4  0x00007f55358469b3 in rpc_transport_notify (this=this@entry=0x7f55240ec4a0, event=event@entry=RPC_TRANSPORT_CONNECT, 
    data=data@entry=0x7f55240ec4a0) at rpc-transport.c:538
#5  0x00007f552a336177 in socket_connect_finish (this=this@entry=0x7f55240ec4a0) at socket.c:2348
#6  0x00007f552a33a778 in socket_event_handler (fd=<optimized out>, idx=13, data=0x7f55240ec4a0, poll_in=0, poll_out=4, poll_err=0) at socket.c:2391
#7  0x00007f5535ada770 in event_dispatch_epoll_handler (event=0x7f5528a8ee80, event_pool=0x7f5537978730) at event-epoll.c:571
#8  event_dispatch_epoll_worker (data=0x7f55379cb0e0) at event-epoll.c:674
#9  0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#10 0x00007f553422673d in clone () from /lib64/libc.so.6

Thread 3 (Thread 0x7f5520b9f700 (LWP 31672)):
#0  0x00007f553421bdfd in poll () from /lib64/libc.so.6
#1  0x00007f553425a8e8 in svc_run () from /lib64/libc.so.6
#2  0x00007f55234a51f4 in nsm_thread (argv=<optimized out>) at nlmcbk_svc.c:121
#3  0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#4  0x00007f553422673d in clone () from /lib64/libc.so.6

Thread 2 (Thread 0x7f55168a8700 (LWP 31673)):
#0  0x00007f55348e6227 in pthread_spin_lock () from /lib64/libpthread.so.0
#1  0x00007f55238f8025 in wb_process_queue (wb_inode=wb_inode@entry=0x7f5524072c80) at write-behind.c:1545
---Type <return> to continue, or q <return> to quit---
#2  0x00007f55238f8117 in wb_fulfill_cbk (frame=frame@entry=0x7f546716050c, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=1, 
    op_errno=op_errno@entry=107, prebuf=prebuf@entry=0x7f54671693fc, postbuf=postbuf@entry=0x7f546716946c, xdata=xdata@entry=0x7f5461b7930c)
    at write-behind.c:1013
#3  0x00007f5523b5adf0 in dht_writev_cbk (frame=0x7f546716063c, cookie=<optimized out>, this=<optimized out>, op_ret=1, op_errno=107, 
    prebuf=0x7f54671693fc, postbuf=0x7f546716946c, xdata=0x7f5461b7930c) at dht-inode-write.c:106
#4  0x00007f5523da7d15 in afr_writev_unwind (frame=0x7f54671607cc, this=<optimized out>) at afr-inode-write.c:247
#5  0x00007f5523da7e5d in afr_transaction_writev_unwind (frame=0x7f54671675cc, this=0x7f552400cb50) at afr-inode-write.c:264
#6  0x00007f5523db3f31 in __afr_txn_write_done (frame=0x7f54671675cc, this=0x7f552400cb50) at afr-transaction.c:195
#7  0x00007f5523db90ab in afr_unlock_common_cbk (frame=frame@entry=0x7f54671675cc, this=this@entry=0x7f552400cb50, xdata=<optimized out>, 
    op_errno=107, op_ret=<optimized out>, cookie=0x0) at afr-lk-common.c:633
#8  0x00007f5523dba140 in afr_unlock_inodelk_cbk (frame=0x7f54671675cc, cookie=0x0, this=0x7f552400cb50, op_ret=<optimized out>, op_errno=107, 
    xdata=<optimized out>) at afr-lk-common.c:682
#9  0x00007f552805353c in client3_3_finodelk_cbk (req=<optimized out>, iov=<optimized out>, count=<optimized out>, myframe=0x7f54434e781c)
    at client-rpc-fops.c:1606
#10 0x00007f55358488b5 in saved_frames_unwind (saved_frames=saved_frames@entry=0x7f54990ad510) at rpc-clnt.c:369
#11 0x00007f553584899e in saved_frames_destroy (frames=0x7f54990ad510) at rpc-clnt.c:386
#12 0x00007f553584a104 in rpc_clnt_connection_cleanup (conn=conn@entry=0x7f55240dc7a8) at rpc-clnt.c:556
#13 0x00007f553584a994 in rpc_clnt_handle_disconnect (conn=0x7f55240dc7a8, clnt=0x7f55240dc750) at rpc-clnt.c:881
#14 rpc_clnt_notify (trans=<optimized out>, mydata=0x7f55240dc7a8, event=RPC_TRANSPORT_DISCONNECT, data=0x7f55240ec4a0) at rpc-clnt.c:937
#15 0x00007f55358469b3 in rpc_transport_notify (this=this@entry=0x7f55240ec4a0, event=event@entry=RPC_TRANSPORT_DISCONNECT, 
    data=data@entry=0x7f55240ec4a0) at rpc-transport.c:538
#16 0x00007f552a33a754 in socket_event_poll_err (this=0x7f55240ec4a0) at socket.c:1184
#17 socket_event_handler (fd=<optimized out>, idx=10, data=0x7f55240ec4a0, poll_in=1, poll_out=0, poll_err=<optimized out>) at socket.c:2409
#18 0x00007f5535ada770 in event_dispatch_epoll_handler (event=0x7f55168a7e80, event_pool=0x7f5537978730) at event-epoll.c:571
#19 event_dispatch_epoll_worker (data=0x7f552404d120) at event-epoll.c:674
#20 0x00007f55348e1dc5 in start_thread () from /lib64/libpthread.so.0
#21 0x00007f553422673d in clone () from /lib64/libc.so.6


All these hours wb xlator seem to working on single inode entry "wb_inode=wb_inode@entry=0x7f5524072c80". Is it stuck in kind of any infinite loop? Request Du to take a look.

Comment 7 Raghavendra G 2017-02-23 04:39:12 UTC
If the setup is still available, is it possible to take the statedump of NFS server? I need state of all inodes in the itable too (enable dump of inodes too).

Comment 8 Soumya Koduri 2017-02-23 05:33:00 UTC
yes. The system is in the same state -

gqas001.sbu.lab.eng.bos.redhat.com (root/redhat)
# gdb -p 31624 (nfs-pid)

I even geenrated core in case if gluster-NFS server restarts 
core-location: /root/core.31624

Unfortunately, I am unable to take statedumps -

[root@gqas001 ~]# gluster v status
Locking failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for details.
 
[root@gqas001 ~]# 

From glusterd logs -

failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for details.
[2017-02-23 05:27:11.606681] E [MSGID: 106151] [glusterd-syncop.c:1889:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2017-02-23 05:27:11.607519] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for details.
[2017-02-23 05:27:11.938853] E [MSGID: 106152] [glusterd-syncop.c:1590:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)
[2017-02-23 05:27:18.850688] I [MSGID: 106499] [glusterd-handler.c:4362:__glusterd_handle_status_volume] 0-management: Received status volume req for volume testvol
[2017-02-23 05:27:19.173293] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Locking failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for details.
[2017-02-23 05:27:19.472432] E [MSGID: 106151] [glusterd-syncop.c:1889:gd_sync_task_begin] 0-management: Locking Peers Failed.
[2017-02-23 05:27:19.473098] E [MSGID: 106116] [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for details.
[2017-02-23 05:27:19.793317] E [MSGID: 106152] [glusterd-syncop.c:1590:gd_unlock_op_phase] 0-management: Failed to unlock on some peer(s)

There are no other glusterd operations happening then it must be stale lock. I request Atin to take a look.. @Karan, I htink this needs to be filed as another bug regarding glusterd failures.

Comment 9 Atin Mukherjee 2017-02-23 05:39:51 UTC
Soumya - please restart glusterd on the node where the stale lock is seen, that should unblock you.

Comment 10 Raghavendra G 2017-02-24 04:58:14 UTC
Thanks to Soumya for assisting in debugging the issue. Following are the observations we made:

* Both poller threads are blocked. One is blocked unwinding saved-frames. The other is waiting on rpc->notifylock which is acquired by first thread. This explains the I/O hang. Without poller threads, gNFS server can't read responses.
* Unwinding of each frame approximately is taking 30s (note the time difference in log).

[2017-02-22 01:07:51.315392] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f4df29fc672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f4df27c288e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4df27c299e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f4df27c4104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f4df27c4994] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2017-02-21 14:10:29.546097 (xid=0xc460b)
[2017-02-22 01:08:24.299827] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f4df29fc672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f4df27c288e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4df27c299e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f4df27c4104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f4df27c4994] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2017-02-21 14:10:29.546140 (xid=0xc460c)

Till now one of the servers has unwound around 14k frames:
[root@gqas003 ~]# grep saved_frames_unwind  /var/log/glusterfs/nfs.log  | wc -l
14195

So, 14195 * 30 = 118hrs is the time taken. So, there is no deadlock. Its just that nfs server is slow in unwinding the frames.

Note that large number of saved frames is due to heavy write-back caching and large number of small write requests (1 byte in length). Also, since these were random writes, write-behind couldn't collapse them into larger buffers.

* To debug further as to why unwinding a single frame is taking so much time, I added some logs to gdb. Note that today, unwinding time for a single frame has come down to approximately 9 seconds

[2017-02-24 04:32:26.875780] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f4df29fc672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f4df27c288e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4df27c299e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f4df27c4104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f4df27c4994] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2017-02-21 14:15:30.379342 (xid=0xc7ec4)
[2017-02-24 04:32:35.385690] E [rpc-clnt.c:365:saved_frames_unwind] (--> /lib64/libglusterfs.so.0(_gf_log_callingfn+0x192)[0x7f4df29fc672] (--> /lib64/libgfrpc.so.0(saved_frames_unwind+0x1de)[0x7f4df27c288e] (--> /lib64/libgfrpc.so.0(saved_frames_destroy+0xe)[0x7f4df27c299e] (--> /lib64/libgfrpc.so.0(rpc_clnt_connection_cleanup+0x84)[0x7f4df27c4104] (--> /lib64/libgfrpc.so.0(rpc_clnt_notify+0x94)[0x7f4df27c4994] ))))) 0-testvol-client-0: forced unwinding frame type(GlusterFS 3.3) op(FINODELK(30)) called at 2017-02-21 14:15:30.379385 (xid=0xc7ec5)

and time taken in wb_process_queue doing list processing is also approximately 9 seconds

Breakpoint 1, 0x00007f4de48a8aa0 in __wb_preprocess_winds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
preprocess_winds Thu Feb 23 23:35:08 EST 2017
Breakpoint 2, 0x00007f4de48a8330 in wb_do_unwinds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
do_unwinds Thu Feb 23 23:35:17 EST 2017
Breakpoint 1, 0x00007f4de48a8aa0 in __wb_preprocess_winds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
preprocess_winds Thu Feb 23 23:35:17 EST 2017
Breakpoint 2, 0x00007f4de48a8330 in wb_do_unwinds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
do_unwinds Thu Feb 23 23:35:25 EST 2017
Breakpoint 1, 0x00007f4de48a8aa0 in __wb_preprocess_winds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
preprocess_winds Thu Feb 23 23:35:25 EST 2017
Breakpoint 2, 0x00007f4de48a8330 in wb_do_unwinds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
do_unwinds Thu Feb 23 23:35:34 EST 2017
Breakpoint 1, 0x00007f4de48a8aa0 in __wb_preprocess_winds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
preprocess_winds Thu Feb 23 23:35:34 EST 2017
Breakpoint 2, 0x00007f4de48a8330 in wb_do_unwinds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
do_unwinds Thu Feb 23 23:35:42 EST 2017
Breakpoint 1, 0x00007f4de48a8aa0 in __wb_preprocess_winds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
preprocess_winds Thu Feb 23 23:35:42 EST 2017
Breakpoint 2, 0x00007f4de48a8330 in wb_do_unwinds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
do_unwinds Thu Feb 23 23:35:50 EST 2017
Breakpoint 1, 0x00007f4de48a8aa0 in __wb_preprocess_winds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
preprocess_winds Thu Feb 23 23:35:51 EST 2017
Breakpoint 2, 0x00007f4de48a8330 in wb_do_unwinds () from /usr/lib64/glusterfs/3.8.4/xlator/performance/write-behind.so
do_unwinds Thu Feb 23 23:35:59 EST 2017

So, I assume, most of the time required for unwinding a frame is spent in traversing through various request lists in write-behind.

* As to why time required for unwinding a single frame has come down from 30 to 9 seconds in a span of couple of days is that unwinds can reduce the length liability queue as requests get deleted after a write_cbk from child xlators. So, the decreased time can be attributed to reduced liability queue length in write-behind.

* Just an interesting observation is that the length of todo request queue in write-behind has remained same in last couple of days. The reason is that one/two requests that are picked for winding will be failed by child xlators as connectivity is lost and these requests get added back to todo queue.

* Note that things got worse because of lock in notify codepath at rpc [1]. Till saved_frames_unwind is done, lock which was acquired while handling DISCONNECT event is not released. Till this lock is released, we won't be able to process CONNECT event to same brick. IOW, connection won't be established till all saved frames are unwind, which is an unnecessary dependency. Also, this resulted in loss of one poller thread waiting on notify lock to process CONNECT event. We should either remove the locking or move saved_frames_unwind out of the critical section of notify lock. That way atleast we won't be delaying reconnection and unnecessarily blocking another poller thread.

* I am not sure whether we can reduce the list processing time in write-behind. I'll have to think over it.

[1] http://review.gluster.org/15916

regards,
Raghavendra

Comment 11 Raghavendra G 2017-02-24 04:59:20 UTC
restoring needinfo on atin which got cleared during my response.

Comment 12 Atin Mukherjee 2017-02-24 06:41:36 UTC
(In reply to Soumya Koduri from comment #8)
> yes. The system is in the same state -
> 
> gqas001.sbu.lab.eng.bos.redhat.com (root/redhat)
> # gdb -p 31624 (nfs-pid)
> 
> I even geenrated core in case if gluster-NFS server restarts 
> core-location: /root/core.31624
> 
> Unfortunately, I am unable to take statedumps -
> 
> [root@gqas001 ~]# gluster v status
> Locking failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file
> for details.
>  
> [root@gqas001 ~]# 
> 
> From glusterd logs -
> 
> failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for
> details.
> [2017-02-23 05:27:11.606681] E [MSGID: 106151]
> [glusterd-syncop.c:1889:gd_sync_task_begin] 0-management: Locking Peers
> Failed.
> [2017-02-23 05:27:11.607519] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking
> failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for
> details.
> [2017-02-23 05:27:11.938853] E [MSGID: 106152]
> [glusterd-syncop.c:1590:gd_unlock_op_phase] 0-management: Failed to unlock
> on some peer(s)
> [2017-02-23 05:27:18.850688] I [MSGID: 106499]
> [glusterd-handler.c:4362:__glusterd_handle_status_volume] 0-management:
> Received status volume req for volume testvol
> [2017-02-23 05:27:19.173293] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Locking failed
> on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for details.
> [2017-02-23 05:27:19.472432] E [MSGID: 106151]
> [glusterd-syncop.c:1889:gd_sync_task_begin] 0-management: Locking Peers
> Failed.
> [2017-02-23 05:27:19.473098] E [MSGID: 106116]
> [glusterd-mgmt.c:135:gd_mgmt_v3_collate_errors] 0-management: Unlocking
> failed on gqas012.sbu.lab.eng.bos.redhat.com. Please check log file for
> details.
> [2017-02-23 05:27:19.793317] E [MSGID: 106152]
> [glusterd-syncop.c:1590:gd_unlock_op_phase] 0-management: Failed to unlock
> on some peer(s)
> 
> There are no other glusterd operations happening then it must be stale lock.
> I request Atin to take a look.. @Karan, I htink this needs to be filed as
> another bug regarding glusterd failures.

See https://bugzilla.redhat.com/show_bug.cgi?id=1425681#c4

Comment 21 Raghavendra G 2017-03-03 08:53:03 UTC
Created attachment 1259429 [details]
logs of gdb session attached to gNFS server

Please look at the logs in the end of the file to observe change in iov_len

Comment 22 Raghavendra G 2017-03-03 08:53:57 UTC
I added some logs in gdb. Here is what I saw:

Breakpoint 1, nfs3_write (req=req@entry=0x7f2cdfb678cc, fh=fh@entry=0x7f2bfa7fbb00, offset=4294148095, count=1, stable=UNSTABLE, payload=..., iobref=0x7f2be51325e0) at 
nfs3.c:2264
2264    {
xid=$158 = 2840998646
$159 = {iov_base = 0x7f2cd07e6a80, iov_len = 4}
cs=$160 = <optimized out>

Breakpoint 4, __nfs3_write_resume (cs=cs@entry=0x7f2cdda6d9cc) at nfs3.c:2213
2213            ret = nfs_write (cs->nfsx, cs->vol, &nfu, cs->fd, cs->iobref,
xid=$161 = 2840998646
$162 = {iov_base = 0x7f2cd07e6a80, iov_len = 4}
fd=$163 = 0x7f2cf2b5806c
offset=$164 = 4294148095
cs=$165 = 0x7f2cdda6d9cc

Breakpoint 6, wb_writev (frame=0x7f2be513e5bc, this=0x7f2ce0010630, fd=0x7f2cf2b5806c, vector=0x7f2cdda6ddac, count=1, offset=4294148095, flags=0, iobref=0x7f2be51325e0
, xdata=0x0) at write-behind.c:1624
1624    {
fd=$166 = 0x7f2cf2b5806c
$167 = {iov_base = 0x7f2cd07e6a80, iov_len = 1}

Breakpoint 5, nfs3svc_write_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, prebuf=0x7f2bfa7
fa800, postbuf=0x7f2bfa7fa800, xdata=0x0) at nfs3.c:2179
2179            nfs3_log_write_res (rpcsvc_request_xid (cs->req),
xid=$168 = 2840998646
cs=$169 = 0x7f2cdda6d9cc
$170 = {iov_base = 0x7f2cd07e6a80, iov_len = 1}
offset=$171 = 4294148095

Breakpoint 1, nfs3_write (req=req@entry=0x7f2cdfb678cc, fh=fh@entry=0x7f2bfa7fbb00, offset=4294152191, count=1, stable=UNSTABLE, payload=..., iobref=0x7f2be513d960) at 
nfs3.c:2264
2264    {
xid=$172 = 2857775862
$173 = {iov_base = 0x7f2cd07e6800, iov_len = 4}
cs=$174 = <optimized out>

Breakpoint 4, __nfs3_write_resume (cs=cs@entry=0x7f2cdda6d9cc) at nfs3.c:2213
2213            ret = nfs_write (cs->nfsx, cs->vol, &nfu, cs->fd, cs->iobref,
xid=$175 = 2857775862
$176 = {iov_base = 0x7f2cd07e6800, iov_len = 4}
fd=$177 = 0x7f2cf2b5806c
offset=$178 = 4294152191
cs=$179 = 0x7f2cdda6d9cc

Breakpoint 6, wb_writev (frame=0x7f2be514993c, this=0x7f2ce0010630, fd=0x7f2cf2b5806c, vector=0x7f2cdda6ddac, count=1, offset=4294152191, flags=0, iobref=0x7f2be513d960, xdata=0x0) at write-behind.c:1624
1624    {
fd=$180 = 0x7f2cf2b5806c
$181 = {iov_base = 0x7f2cd07e6800, iov_len = 1}

Breakpoint 5, nfs3svc_write_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, prebuf=0x7f2bfa7fa800, postbuf=0x7f2bfa7fa800, xdata=0x0) at nfs3.c:2179
2179            nfs3_log_write_res (rpcsvc_request_xid (cs->req),
xid=$182 = 2857775862
cs=$183 = 0x7f2cdda6d9cc
$184 = {iov_base = 0x7f2cd07e6800, iov_len = 1}
offset=$185 = 4294152191

Breakpoint 1, nfs3_write (req=req@entry=0x7f2cdfb678cc, fh=fh@entry=0x7f2bfa7fbb00, offset=4294156287, count=1, stable=UNSTABLE, payload=..., iobref=0x7f2be5148ce0) at nfs3.c:2264
2264    {
xid=$186 = 2874553078
$187 = {iov_base = 0x7f2cd07e6b00, iov_len = 4}
cs=$188 = <optimized out>

Breakpoint 4, __nfs3_write_resume (cs=cs@entry=0x7f2cdda6d9cc) at nfs3.c:2213
2213            ret = nfs_write (cs->nfsx, cs->vol, &nfu, cs->fd, cs->iobref,
xid=$189 = 2874553078
$190 = {iov_base = 0x7f2cd07e6b00, iov_len = 4}
fd=$191 = 0x7f2cf2b5806c

offset=$192 = 4294156287
cs=$193 = 0x7f2cdda6d9cc

Breakpoint 6, wb_writev (frame=0x7f2be5154cbc, this=0x7f2ce0010630, fd=0x7f2cf2b5806c, vector=0x7f2cdda6ddac, count=1, offset=4294156287, flags=0, iobref=0x7f2be5148ce0, xdata=0x0) at write-behind.c:1624
1624    {
fd=$194 = 0x7f2cf2b5806c
$195 = {iov_base = 0x7f2cd07e6b00, iov_len = 1}

Breakpoint 5, nfs3svc_write_cbk (frame=<optimized out>, cookie=<optimized out>, this=<optimized out>, op_ret=<optimized out>, op_errno=<optimized out>, prebuf=0x7f2bfa7fa800, postbuf=0x7f2bfa7fa800, xdata=0x0) at nfs3.c:2179
2179            nfs3_log_write_res (rpcsvc_request_xid (cs->req),
xid=$196 = 2874553078
cs=$197 = 0x7f2cdda6d9cc
$198 = {iov_base = 0x7f2cd07e6b00, iov_len = 1}
offset=$199 = 4294156287

Some observations:
* The iov_len in vector is changed to 1 from 4 by the time control is passed to wb_writev from __nfs3_write_resume. Note that I am printing xids to prove that its the same write request. Note also that iov_len remains 1 for the rest of the life of request. Is someone overwriting vector? Is object "cs" freed before a reply is written back to client?
* The offsets _seem_ random. IOW (current-offset + length) != next-offset. Hence, this _will_ be interpreted by write-behind as random I/O (even if iov_len is not changed to 1). However, the offsets from multiple requests are spaced exactly 4096 apart. Is it any coincidence that iov_len at nfs3_write says 4? Is it to be interpreted as 4k?

However, looking at the vectors handed over to nfs from socket, iov_len is 4 _not_ 4096.

(gdb) p pollin
$4 = (rpc_transport_pollin_t *) 0x7f794e925ee0
(gdb) p *pollin
$5 = {vector = {{iov_base = 0x7f794808d000, iov_len = 176}, {iov_base = 0x7f7927ea5f00, iov_len = 4}, {iov_base = 0x0, iov_len = 0} <repeats 14 times>}, count = 2, 
  vectored = 1 '\001', private = 0x0, iobref = 0x7f794f22c300, is_reply = 0 '\000'}

So two questions:
1. Why did get 4 as iov_len though offsets are spaced apart 4096? Is it possible to find out from tcpdump what was the length of the write payload over network?
2. Why did iov_len of 4 got changed to 1 by the time control was passed to write-behind?

I've attached log of gdb session. Please look at the logs at the end of file.

Comment 31 errata-xmlrpc 2017-03-23 06:05:28 UTC
Since the problem described in this bug report should be
resolved in a recent advisory, it has been closed with a
resolution of ERRATA.

For information on the advisory, and where to find the updated
files, follow the link below.

If the solution does not work for you, open a new bug report.

https://rhn.redhat.com/errata/RHSA-2017-0486.html


Note You need to log in before you can comment on or make changes to this bug.