Bug 1408576
Summary: | [Ganesha+SSL] : Bonnie++ hangs during rewrites. | |||
---|---|---|---|---|
Product: | [Red Hat Storage] Red Hat Gluster Storage | Reporter: | Ambarish <asoman> | |
Component: | core | Assignee: | Mohit Agrawal <moagrawa> | |
Status: | CLOSED ERRATA | QA Contact: | Ambarish <asoman> | |
Severity: | high | Docs Contact: | ||
Priority: | unspecified | |||
Version: | rhgs-3.2 | CC: | amukherj, asoman, bturner, dang, ffilz, jdarcy, jthottan, kaushal, mbenjamin, moagrawa, rcyriac, rgowdapp, rhinduja, rhs-bugs, rjoseph, rkavunga, rtalur, skoduri, srangana, storage-qa-internal | |
Target Milestone: | --- | |||
Target Release: | RHGS 3.2.0 | |||
Hardware: | x86_64 | |||
OS: | Linux | |||
Whiteboard: | ||||
Fixed In Version: | glusterfs-3.8.4-14 | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | ||
Clone Of: | ||||
: | 1418213 1418539 1418541 (view as bug list) | Environment: | ||
Last Closed: | 2017-03-23 05:59:45 UTC | Type: | Bug | |
Regression: | --- | Mount Type: | --- | |
Documentation: | --- | CRM: | ||
Verified Versions: | Category: | --- | ||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | ||
Cloudforms Team: | --- | Target Upstream Version: | ||
Embargoed: | ||||
Bug Depends On: | ||||
Bug Blocks: | 1351528, 1418213, 1418539, 1418541 |
Description
Ambarish
2016-12-25 09:16:31 UTC
Few other things we know : > The test passes in a Non SSL environment on 4 Ganesha v4 mounts. > The test passes on FUSE and SMB mounts with SSL enabled. Proposing this as a blocker since the application side is impacted . From gfapi-logs in gqas008 and gqas014 there are a lot of timeout/disconnect messages [2016-12-20 13:30:18.882704] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x75400 sent = 2016-12-20 13:00:14.948068. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.882937] W [MSGID: 114031] [client-rpc-fops.c:855:client3_3_writev_cbk] 0-testvolD-client-0: remote operation failed [Transport endpoint is not connected] [2016-12-20 13:30:18.909913] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753ff sent = 2016-12-20 13:00:14.941327. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.910054] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753e5 sent = 2016-12-20 13:00:14.930079. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.910334] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753e4 sent = 2016-12-20 13:00:14.923329. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.910495] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753e3 sent = 2016-12-20 13:00:14.916624. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.910680] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753e2 sent = 2016-12-20 13:00:14.909907. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.910940] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753e1 sent = 2016-12-20 13:00:14.903172. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.911102] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753e0 sent = 2016-12-20 13:00:14.896423. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.911300] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753df sent = 2016-12-20 13:00:14.889711. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.911467] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753de sent = 2016-12-20 13:00:14.882998. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 13:30:18.911867] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(WRITE(13)) xid = 0x753dd sent = 2016-12-20 13:00:14.876168. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 14:00:19.143825] W [MSGID: 108035] [afr-transaction.c:2166:afr_changelog_fsync_cbk] 0-testvolD-replicate-0: fsync(b77fbb20-c131-4886-bfd4-a5bd003bbe2f) failed on subvolume testvolD-client-0. Transaction was WRITE [Transport endpoint is not connected] The message "W [MSGID: 114031] [client-rpc-fops.c:974:client3_3_fsync_cbk] 0-testvolD-client-0: remote operation failed [Transport endpoint is not connected]" repeated 206 times between [2016-12-20 14:00:19.143743] and [2016-12-20 14:00:19.177858] The message "W [MSGID: 108035] [afr-transaction.c:2166:afr_changelog_fsync_cbk] 0-testvolD-replicate-0: fsync(b77fbb20-c131-4886-bfd4-a5bd003bbe2f) failed on subvolume testvolD-client-0. Transaction was WRITE [Transport endpoint is not connected]" repeated 206 times between [2016-12-20 14:00:19.143825] and [2016-12-20 14:00:19.177891] [2016-12-20 14:30:29.406483] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(FXATTROP(34)) xid = 0x7553b sent = 2016-12-20 14:00:19.144275. timeout = 1800 for 192.168.79.140:49152 [2016-12-20 14:30:29.406634] E [rpc-clnt.c:200:call_bail] 0-testvolD-client-0: bailing out frame type(GlusterFS 3.3) op(FXATTROP(34)) xid = 0x7553a sent = 2016-12-20 14:00:19.143995. timeout = 1800 for 192.168.79.140:49152 The message "W [MSGID: 114031] [client-rpc-fops.c:1830:client3_3_fxattrop_cbk] 0-testvolD-client-0: remote operation failed" repeated 206 times between [2016-12-20 14:30:29.379019] and [2016-12-20 14:30:29.406657] I tried to look into core, it complains while extracting.(tried with tar,unzip and gzip, core should give clear idea where it was stuck). In the bricks there was a disconnection happened at similar, but it got reconnected after some time. Requesting Talur/Du/Rafi to provide their input for this bug From the pstack on the server gqas008, there are 201 threads waiting on syncop_writev and remaining 55 threads are waiting on mdcache_getattrs(). Statedumps in brick side looks okay. The core given in the location is not usuable. Myself and Ambarish trying to reproduce issue using my setup. Tried to reproduce the issue with help of Ambarish in a two server/ two client scenario using 2x2 volume . In the first run , in one client rewrite got passed and error out it following Delete files in sequential order...done. Create files in random order...Can't create file jkLcF20000000001 Cleaning up test directory after error. but in second client Writing intelligently...Can't write block.: Input/output error Can't write block 1463302 I guess these error are thrown because one of the subvolume got completely filled. When I ran test using single client it got passed. And retry two client scenario. In this run for one client it got succeed and other client error out in a similar way. Also I tried iozone -a from two different clients(similar bench mark like bonnie which uses smaller file size) got passed. Since issue is not reproduced on dev setup, requesting qa to reproduce the issue again. Got a reproducer. Setup details shared with Dev as of now for further RCA. Jiffin, Repro'd again on the same setup. This time Bonnie was hung on rewrites on 1 client. On the other three clients ,it looks hung on writes itself from the past 12 hours. Can you plz take a look when you get a free sec. I found the following in bricks : at brick0 at gqas13 Thread 4 (Thread 0x7effaf266700 (LWP 22296)): #0 0x00007effafc53dfd in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007effa5d6ec43 in ssl_do.isra.9 () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #2 0x00007effa5d6f126 in __socket_rwv () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #3 0x00007effa5d6f882 in __socket_ioq_churn_entry () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #4 0x00007effa5d6fd2c in socket_submit_reply () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #5 0x00007effb1279e62 in rpcsvc_transport_submit () from /lib64/libgfrpc.so.0 #6 0x00007effb127b9a0 in rpcsvc_submit_generic () from /lib64/libgfrpc.so.0 #7 0x00007eff9bdc0f19 in server_submit_reply () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #8 0x00007eff9bdd78d7 in server_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #9 0x00007effa043002d in io_stats_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so #10 0x00007effb1534ea2 in default_readv_cbk () from /lib64/libglusterfs.so.0 #11 0x00007effa10b9630 in up_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so #12 0x00007effa12cf9f2 in leases_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/leases.so #13 0x00007effa190358f in pl_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/locks.so #14 0x00007effa284ef4d in ctr_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/changetimerecorder.so #15 0x00007effa329c970 in posix_readv () from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so #16 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #17 0x00007effa284ace0 in ctr_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/changetimerecorder.so #18 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #19 0x00007effa1d372ca in br_stub_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/bitrot-stub.so #20 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #21 0x00007effa1906546 in pl_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/locks.so #22 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #23 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #24 0x00007effa12d310e in leases_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/leases.so #25 0x00007effa10c1edb in up_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so #26 0x00007effb1549d94 in default_readv_resume () from /lib64/libglusterfs.so.0 #27 0x00007effb14db1f2 in call_resume_wind () from /lib64/libglusterfs.so.0 #28 0x00007effb14db64d in call_resume () from /lib64/libglusterfs.so.0 #29 0x00007effa0eac957 in iot_worker () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so #30 0x00007effb0319dc5 in start_thread (arg=0x7effaf266700) at pthread_create.c:308 #31 0x00007effafc5e73d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 Thread 3 (Thread 0x7eff641f6700 (LWP 22299)): #0 __lll_lock_wait () at ../nptl/sysdeps/unix/sysv/linux/x86_64/lowlevellock.S:135 #1 0x00007effb031bd02 in _L_lock_791 () from /lib64/libpthread.so.0 #2 0x00007effb031bc08 in __GI___pthread_mutex_lock (mutex=0x7eff900066d8) at pthread_mutex_lock.c:64 #3 0x00007effa5d6fbe2 in socket_submit_reply () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #4 0x00007effb1279e62 in rpcsvc_transport_submit () from /lib64/libgfrpc.so.0 #5 0x00007effb127b9a0 in rpcsvc_submit_generic () from /lib64/libgfrpc.so.0 #6 0x00007eff9bdc0f19 in server_submit_reply () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #7 0x00007eff9bdd78d7 in server_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #8 0x00007effa043002d in io_stats_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so #9 0x00007effb1534ea2 in default_readv_cbk () from /lib64/libglusterfs.so.0 #10 0x00007effa10b9630 in up_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so #11 0x00007effa12cf9f2 in leases_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/leases.so #12 0x00007effa190358f in pl_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/locks.so #13 0x00007effa284ef4d in ctr_readv_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/changetimerecorder.so #14 0x00007effa329c970 in posix_readv () from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so #15 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #16 0x00007effa284ace0 in ctr_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/changetimerecorder.so #17 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #18 0x00007effa1d372ca in br_stub_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/bitrot-stub.so #19 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #20 0x00007effa1906546 in pl_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/locks.so #21 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #22 0x00007effb1531317 in default_readv () from /lib64/libglusterfs.so.0 #23 0x00007effa12d310e in leases_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/leases.so #24 0x00007effa10c1edb in up_readv () from /usr/lib64/glusterfs/3.8.4/xlator/features/upcall.so #25 0x00007effb1549d94 in default_readv_resume () from /lib64/libglusterfs.so.0 #26 0x00007effb14db1f2 in call_resume_wind () from /lib64/libglusterfs.so.0 #27 0x00007effb14db64d in call_resume () from /lib64/libglusterfs.so.0 #28 0x00007effa0eac957 in iot_worker () from /usr/lib64/glusterfs/3.8.4/xlator/performance/io-threads.so #29 0x00007effb0319dc5 in start_thread (arg=0x7eff641f6700) at pthread_create.c:308 #30 0x00007effafc5e73d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 at brick3 - gqas011 Thread 27 (Thread 0x7f84df7fe700 (LWP 15855)): #0 0x00007f850d080dfd in poll () at ../sysdeps/unix/syscall-template.S:81 #1 0x00007f850319bc43 in ssl_do.isra.9 () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #2 0x00007f850319c126 in __socket_rwv () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #3 0x00007f850319c882 in __socket_ioq_churn_entry () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #4 0x00007f850319cd2c in socket_submit_reply () from /usr/lib64/glusterfs/3.8.4/rpc-transport/socket.so #5 0x00007f850e6a6e62 in rpcsvc_transport_submit () from /lib64/libgfrpc.so.0 #6 0x00007f850e6a89a0 in rpcsvc_submit_generic () from /lib64/libgfrpc.so.0 #7 0x00007f84f934af19 in server_submit_reply () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #8 0x00007f84f9360f72 in server_fsync_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/protocol/server.so #9 0x00007f84f97b4e2b in io_stats_fsync_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/debug/io-stats.so #10 0x00007f84f9e0414e in barrier_fsync_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/barrier.so #11 0x00007f850e967286 in default_fsync_cbk () from /lib64/libglusterfs.so.0 #12 0x00007f84fa654da6 in leases_fsync_cbk () from /usr/lib64/glusterfs/3.8.4/xlator/features/leases.so #13 0x00007f850e8fc977 in call_resume_unwind () from /lib64/libglusterfs.so.0 #14 0x00007f850e9086fd in call_unwind_error () from /lib64/libglusterfs.so.0 #15 0x00007f85006e2cec in posix_fsyncer_process () from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so #16 0x00007f85006e2eb3 in posix_fsyncer () from /usr/lib64/glusterfs/3.8.4/xlator/storage/posix.so #17 0x00007f850d746dc5 in start_thread (arg=0x7f84df7fe700) at pthread_create.c:308 #18 0x00007f850d08b73d in clone () at ../sysdeps/unix/sysv/linux/x86_64/clone.S:113 there are inodelk waits in this server as well Other two bricks are fine Setup is still in same state Requesting Rajesh to have a look IMHO this might have happen, two of the requests in readv at (brick0 gqas13) and fsync at (brick3 , gqas011) got stuck at ssl_do()->poll(). So all other request in rpc layer waiting on this request to finish, whole resulted in a hang From the initial analysis it seems that SSL_Write function failed, probably due to underlying socket failure. After the failure instead of returning error we are waiting on poll till the socket is available for write. I am not very familiar with this code may be Jeff can take a look and comment. Failure is seen in function ssl_do where SSL_get_error is returning SSL_ERROR_WANT_WRITE. Kaushal - Could you help here if possible? Rajesh explained this to me last week. And also proposed a possible solution. I'm also not really familiar with the actual SSL bits of RPC, I am more familiar with the integration bits of RPC and SSL. As Rajesh mentioned above, we hang when we get a SSL_ERROR_WANT_WRITE error when expecting to read or write from the socket. If we get a SSL_ERROR_WANT_READ error in the same case, we return an error, instead of waiting for a poll event. An error sould be returned for SSL_ERROR_WANT_WRITE as well, to allow upper layers to retry. I don't know if this is cause and solution is either complete or correct. People more familiar with SSL routines and errors should weigh in. Ideally Jeff, as he knows more about it. Resetting the needinfo back to Jeff as Kaushal accidentally cleared it. (In reply to Kaushal from comment #16) > Rajesh explained this to me last week. And also proposed a possible > solution. I'm also not really familiar with the actual SSL bits of RPC, I am > more familiar with the integration bits of RPC and SSL. > > As Rajesh mentioned above, we hang when we get a SSL_ERROR_WANT_WRITE error > when expecting to read or write from the socket. If we get a > SSL_ERROR_WANT_READ error in the same case, we return an error, instead of > waiting for a poll event. An error sould be returned for > SSL_ERROR_WANT_WRITE as well, to allow upper layers to retry. > > I don't know if this is cause and solution is either complete or correct. > People more familiar with SSL routines and errors should weigh in. Ideally > Jeff, as he knows more about it. First, I just have to ask: was this tried with own-thread instead of epoll? If so, what was the result? As explained in the comment for the WANT_READ case, the reason why we return an error there but not for WANT_WRITE is tied to upper layers' error/retry behavior, which was correct when both SSL and own-thread were originally being developed. Since the later epoll code did not adhere to the same error/retry model, it might well require different behavior from ssl_do. It certainly seems worth trying the fix Kaushal has suggested. This would probably break own-thread, but that's becoming deprecated anyway. Some thoughts on the bug: Returning from ssl_do when the error is SSL_ERROR_WANT_WRITE and the request was SSL_Write, is fine (of course r should be -1 and errno EAGAIN). NOTE: From comment #12 the stacks seem be attempting a write (socket_submit_reply), so the above change may help. Getting a SSL_ERROR_WANT_WRITE when attempting to do a SSL_Read, is due to SSL renegotiation that could be happening. Returning from ssl_do when error is SSL_ERROR_WANT_WRITE and the request was an SSL_Read is not ok on its face, as that code path in socket.c may not enable POLL_OUT in the events, and hence if there is data to be read from the socket, it will just keep getting triggered for read, till the SSL renegotiation write is possible. IOW, it will keep getting triggered, till the socket can accept writes, which should happen at some point in time (when the SendQ for that socket has space for example). Coming to specifics in this hang, - If client end is closing the socket after some duration of inactivity (due to the brick process getting stick in the WANT_WRITE->poll), then poll on the brick should break with POLLHUP/ERR/NVAL. Why is this not happening? - The poll is to check if we can write to the socket, and this will not break till SendQ is full. So in this case, why is the SendQ not getting drained? (also can we check netstat Q sizes for these sockets on the client and on the server) - Further, SendQ may not get drained by the client due to RPC throttling in our code, is that happening here? Further, can we test without RPC throttling here if possible. - The throttle code does not get triggered with own-thread, so testing with own-thread may add credence to the fact that SendQ is getting full or is not drained periodically. Finally, in the code here, we may want to return r=-1 and errno=EAGAIN very specifically, at the moment we are returning r as we receive from the SSL call made, and in some cases we set errno appropriately. > Coming to specifics in this hang, > > - If client end is closing the socket after some duration of inactivity (due > to the brick process getting stick in the WANT_WRITE->poll), then poll on > the brick should break with POLLHUP/ERR/NVAL. Why is this not happening? I assume this may not be a case as you mentioned we should have got some kind of an error on poll. > > - The poll is to check if we can write to the socket, and this will not > break till SendQ is full. So in this case, why is the SendQ not getting > drained? (also can we check netstat Q sizes for these sockets on the client > and on the server) > - Further, SendQ may not get drained by the client due to RPC throttling > in our code, is that happening here? Further, can we test without RPC > throttling here if possible. > > - The throttle code does not get triggered with own-thread, so testing with > own-thread may add credence to the fact that SendQ is getting full or is not > drained periodically. > I suspect this might be the real cause. In one of the NFS setup I have seen lot of TCP Zero Window messages when tested with FIO test suite. Jiffin: Can you please verify if we this is the real cause? > Finally, in the code here, we may want to return r=-1 and errno=EAGAIN very > specifically, at the moment we are returning r as we receive from the SSL > call made, and in some cases we set errno appropriately. (In reply to rjoseph from comment #20) > > Coming to specifics in this hang, > > > > - If client end is closing the socket after some duration of inactivity (due > > to the brick process getting stick in the WANT_WRITE->poll), then poll on > > the brick should break with POLLHUP/ERR/NVAL. Why is this not happening? > > I assume this may not be a case as you mentioned we should have got some > kind of an error on poll. > > > > > - The poll is to check if we can write to the socket, and this will not > > break till SendQ is full. So in this case, why is the SendQ not getting > > drained? (also can we check netstat Q sizes for these sockets on the client > > and on the server) > > - Further, SendQ may not get drained by the client due to RPC throttling > > in our code, is that happening here? Further, can we test without RPC > > throttling here if possible. > > > > - The throttle code does not get triggered with own-thread, so testing with > > own-thread may add credence to the fact that SendQ is getting full or is not > > drained periodically. > > > > I suspect this might be the real cause. In one of the NFS setup I have seen > lot of TCP Zero Window messages when tested with FIO test suite. > > Jiffin: Can you please verify if we this is the real cause? > When I got the setup there were no request(packets) send from the ganesha server[glusterfs client]( I took packet trace for half an hour) to bricks. So I am not sure whether this might happened. > > Finally, in the code here, we may want to return r=-1 and errno=EAGAIN very > > specifically, at the moment we are returning r as we receive from the SSL > > call made, and in some cases we set errno appropriately. Hi, I have tried to reproduce the issue on 2*2 environment after running bonnie.I am able to reproduce it . I am also getting same stack trace that were mentioned in earlier comments. As per this stack trace it shows threads(socket_submit_reply) are waiting to get the lock but lock is held by thread 19931 due to blocking on poll call because no POLLOUT event is getting on socket. Reproducer steps 1) Setup 2*2 environment (SSL + Ganesha) 2) Run bonnie from 3 client parallely After run test case ("Rewriting by bonnie) i have observed there was no respond packet is coming on the client. All waiting threads on lock is showing below information in stack #3 0x00007fa71af73bc2 in socket_submit_reply (this=0x7fa70c003df0, reply=0x7fa6edef6720) at socket.c:3503 3503 pthread_mutex_lock (&priv->lock); (gdb) p this->peerinfo->identifier $1 = "10.65.6.244:49143", '\000' <repeats 90 times> (gdb) p this->myinfo->identifier $2 = "10.65.6.244:49152", '\000' <repeats 90 times> (gdb) p priv->sock $3 = 30 (gdb) p priv->lock $4 = {__data = {__lock = 2, __count = 0, __owner = 19931, __nusers = 1, __kind = 0, __spins = 0, __list = {__prev = 0x0, __next = 0x0}}, __size = "\002\000\000\000\000\000\000\000\333M\000\000\001", '\000' <repeats 26 times>, __align = 2} Thread 19931 >>>>>>>>>>>>>>>>>>>>>>>>>>>> (gdb) thread 25 [Switching to thread 25 (Thread 0x7fa6f88f4700 (LWP 19931))] #0 0x00007fa724e57c3d in poll () from /lib64/libc.so.6 (gdb) f 3 #3 0x00007fa71af73106 in __socket_rwv (this=this@entry=0x7fa70c003df0, vector=<optimized out>, count=<optimized out>, pending_vector=pending_vector@entry=0x7fa728231e00, pending_count=pending_count@entry=0x7fa728231e08, bytes=bytes@entry=0x0, write=write@entry=1) at socket.c:542 542 ret = ssl_write_one (this, opvector->iov_base, (gdb) p ((socket_private_t*) this->private)->sock $46 = 30 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Below is the stack trace of thread (19931) >>>>>>>>>>>>>>>>>>>>>>>> #0 0x00007fa724e57c3d in poll () from /lib64/libc.so.6 #1 0x00007fa71af72c23 in poll (__timeout=-1, __nfds=1, __fds=0x7fa6f88f2530) at /usr/include/bits/poll2.h:46 #2 ssl_do (buf=0x7fa6efc40000, len=131072, func=0x7fa71ad41d50 <SSL_write>, this=0x7fa70c003df0, this=0x7fa70c003df0) at socket.c:268 #3 0x00007fa71af73106 in __socket_rwv (this=this@entry=0x7fa70c003df0, vector=<optimized out>, count=<optimized out>, pending_vector=pending_vector@entry=0x7fa728231e00, pending_count=pending_count@entry=0x7fa728231e08, bytes=bytes@entry=0x0, write=write@entry=1) at socket.c:542 #4 0x00007fa71af73862 in __socket_writev (pending_count=<optimized out>, pending_vector=<optimized out>, count=<optimized out>, vector=<optimized out>, this=0x7fa70c003df0) at socket.c:661 #5 __socket_ioq_churn_entry (this=this@entry=0x7fa70c003df0, entry=entry@entry=0x7fa728231ce0, direct=direct@entry=1) at socket.c:1097 #6 0x00007fa71af73d0c in socket_submit_reply (this=0x7fa70c003df0, reply=<optimized out>) at socket.c:3521 #7 0x00007fa72647ceb2 in rpcsvc_transport_submit (trans=trans@entry=0x7fa70c003df0, rpchdr=rpchdr@entry=0x7fa6f88f27f0, rpchdrcount=rpchdrcount@entry=1, proghdr=proghdr@entry=0x7fa6f88f28a0, proghdrcount=proghdrcount@entry=1, progpayload=progpayload@entry=0x7fa6f88f2e80, progpayloadcount=progpayloadcount@entry=1, iobref=iobref@entry=0x7fa7282a0880, ---Type <return> to continue, or q <return> to quit--- priv=0x0) at rpcsvc.c:1125 #8 0x00007fa72647e9f0 in rpcsvc_submit_generic (req=req@entry=0x7fa710e76ffc, proghdr=proghdr@entry=0x7fa6f88f28a0, hdrcount=hdrcount@entry=1, payload=payload@entry=0x7fa6f88f2e80, payloadcount=payloadcount@entry=1, iobref=iobref@entry=0x7fa7282a0880) at rpcsvc.c:1309 #9 0x00007fa711349ff9 in server_submit_reply (frame=frame@entry=0x7fa7241ba110, req=0x7fa710e76ffc, arg=arg@entry=0x7fa6f88f2970, payload=payload@entry=0x7fa6f88f2e80, payloadcount=payloadcount@entry=1, iobref=iobref@entry=0x7fa7282a0880, xdrproc=0x7fa725e4c980 <xdr_gfs3_read_rsp>) at server.c:186 #10 0x00007fa711361004 in server_readv_cbk (frame=0x7fa7241ba110, cookie=<optimized out>, this=0x7fa71402e400, op_ret=131072, op_errno=0, vector=0x7fa6f88f2e80, count=1, stbuf=0x7fa6f88f2e90, iobref=0x7fa7282a0880, xdata=0x0) at server-rpc-fops.c:1451 #11 0x00007fa7117b4f96 in io_stats_readv_cbk (frame=0x7fa7241a2420, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, vector=0x7fa6f88f2e80, count=1, buf=0x7fa6f88f2e90, iobref=0x7fa7282a0880, xdata=0x0) at io-stats.c:2049 #12 0x00007fa726738ebb in default_readv_cbk (frame=0x7fa7241cb0b0, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, vector=0x7fa6f88f2e80, count=1, stbuf=0x7fa6f88f2e90, iobref=0x7fa7282a0880, xdata=0x0) at defaults.c:1105 ---Type <return> to continue, or q <return> to quit--- #13 0x00007fa712441ce1 in up_readv_cbk (frame=0x7fa7241aff94, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, vector=0x7fa6f88f2e80, count=1, stbuf=0x7fa6f88f2e90, iobref=0x7fa7282a0880, xdata=0x0) at upcall.c:167 #14 0x00007fa712658a8b in leases_readv_cbk (frame=frame@entry=0x7fa7241bbabc, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=131072, op_errno=op_errno@entry=0, vector=vector@entry=0x7fa6f88f2e80, count=count@entry=1, stbuf=stbuf@entry=0x7fa6f88f2e90, iobref=iobref@entry=0x7fa7282a0880, xdata=xdata@entry=0x0) at leases.c:145 #15 0x00007fa712c911fe in pl_readv_cbk (frame=0x7fa7241c7018, cookie=<optimized out>, this=<optimized out>, op_ret=131072, op_errno=0, vector=0x7fa6f88f2e80, count=1, stbuf=0x7fa6f88f2e90, iobref=0x7fa7282a0880, xdata=<optimized out>) at posix.c:1759 #16 0x00007fa713be94a4 in ctr_readv_cbk (frame=frame@entry=0x7fa7241c2314, cookie=<optimized out>, this=<optimized out>, op_ret=op_ret@entry=131072, op_errno=0, vector=vector@entry=0x7fa6f88f2e80, count=count@entry=1, stbuf=stbuf@entry=0x7fa6f88f2e90, iobref=iobref@entry=0x7fa7282a0880, xdata=xdata@entry=0x0) at changetimerecorder.c:1630 #17 0x00007fa7184a4f64 in posix_readv (frame=0x7fa7241b7ff0, this=<optimized out>, fd=<optimized out>, size=<optimized out>, offset=<optimized out>, flags=<optimized out>, xdata=0x0) at posix.c:3167 #18 0x00007fa713be4fba in ctr_readv (frame=frame@entry=0x7fa7241c2314, ---Type <return> to continue, or q <return> to quit--- this=this@entry=0x7fa714005f60, fd=fd@entry=0x7fa7269cf328, size=size@entry=131072, off=off@entry=19660800, flags=flags@entry=0, xdata=xdata@entry=0x0) at changetimerecorder.c:1661 #19 0x00007fa7267351d1 in default_readv (frame=frame@entry=0x7fa7241c2314, this=this@entry=0x7fa714018cf0, fd=fd@entry=0x7fa7269cf328, size=size@entry=131072, offset=offset@entry=19660800, flags=flags@entry=0, xdata=xdata@entry=0x0) at defaults.c:2345 #20 0x00007fa7130cf6b4 in br_stub_readv (frame=frame@entry=0x7fa7241c2314, this=this@entry=0x7fa71401ad20, fd=fd@entry=0x7fa7269cf328, size=size@entry=131072, offset=offset@entry=19660800, flags=flags@entry=0, xdata=xdata@entry=0x0) at bit-rot-stub.c:1752 #21 0x00007fa7267351d1 in default_readv (frame=0x7fa7241c2314, this=<optimized out>, fd=0x7fa7269cf328, size=131072, offset=19660800, flags=0, xdata=0x0) at defaults.c:2345 #22 0x00007fa712c9a010 in pl_readv (frame=frame@entry=0x7fa7241c7018, this=this@entry=0x7fa71401d7f0, fd=fd@entry=0x7fa7269cf328, size=size@entry=131072, offset=offset@entry=19660800, flags=flags@entry=0, xdata=xdata@entry=0x0) at posix.c:1920 #23 0x00007fa7267351d1 in default_readv (frame=frame@entry=0x7fa7241c7018, this=this@entry=0x7fa71401ec30, fd=fd@entry=0x7fa7269cf328, size=size@entry=131072, offset=offset@entry=19660800, flags=flags@entry=0, xdata=xdata@entry=0x0) at defaults.c:2345 ---Type <return> to continue, or q <return> to quit--- #24 0x00007fa7267351d1 in default_readv (frame=0x7fa7241c7018, this=<optimized out>, fd=0x7fa7269cf328, size=131072, offset=19660800, flags=0, xdata=0x0) at defaults.c:2345 #25 0x00007fa71265c5a8 in leases_readv (frame=0x7fa7241bbabc, this=0x7fa7140217a0, fd=0x7fa7269cf328, size=131072, offset=19660800, flags=0, xdata=0x0) at leases.c:180 #26 0x00007fa71244ac45 in up_readv (frame=0x7fa7241aff94, this=0x7fa714022cd0, fd=0x7fa7269cf328, size=131072, offset=19660800, flags=0, xdata=0x0) at upcall.c:190 #27 0x00007fa72674f9be in default_readv_resume (frame=0x7fa7241cb0b0, this=0x7fa714024270, fd=0x7fa7269cf328, size=131072, offset=19660800, flags=0, xdata=0x0) at defaults.c:1687 #28 0x00007fa7266de872 in call_resume_wind (stub=0x7fa723a83dac) at call-stub.c:2039 #29 0x00007fa7266decc5 in call_resume (stub=0x7fa723a83dac) at call-stub.c:2508 #30 0x00007fa712234dd4 in iot_worker (data=0x7fa71407f800) at io-threads.c:222 #31 0x00007fa72551cdc5 in start_thread (arg=0x7fa6f88f4700) at pthread_create.c:308 #32 0x00007fa724e6228d in clone () from /lib64/libc.so.6 >>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>>> Below is the current code of ssl_do,In current code in case of getting SSL_ERROR_WANT_WRITE we need to wait(infinitely) on socket for pollout event >>>>>>>>>>>>>>>>>> case SSL_ERROR_WANT_WRITE: pfd.fd = priv->sock; pfd.events = POLLOUT; if (poll(&pfd,1,-1) < 0) { gf_log(this->name,GF_LOG_ERROR,"poll error %d", errno); } >>>>>>>>>>>>>>>>>>> I think we need to update the code here in case of calling function (SSL_write/SSL_read as Shyam suggested) we can return from here with error code (EAGAIN) and the error is already handled in upper functions instead of waiting on poll as we are doing same for SSL_ERROR_WANT_READ. I am testing my patch and update soon about the result. Regards Mohit Agrawal Hi, I have tested the patch after running bonnie tools, it is working fine. bonnie++ -u root Using uid:0, gid:0. Writing with putc()...done Writing intelligently...done Rewriting...done Reading with getc()...done Reading intelligently...done start 'em...done...done...done... Create files in sequential order...done. Stat files in sequential order...done. Delete files in sequential order... Regards Mohit Agrawal upstream patch : https://review.gluster.org/#/c/16501/ downstream patch : https://code.engineering.redhat.com/gerrit/#/c/96684/ Clean run of Bonnie++ on my Ganesha+SSL setup with 3.8.4-14/2.4.1-7 , Verified. Since the problem described in this bug report should be resolved in a recent advisory, it has been closed with a resolution of ERRATA. For information on the advisory, and where to find the updated files, follow the link below. If the solution does not work for you, open a new bug report. https://rhn.redhat.com/errata/RHSA-2017-0486.html |