Bug 1730109
Summary: | NSS fails to handshake on s390x | ||||||
---|---|---|---|---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Alex Scheel <ascheel> | ||||
Component: | jss | Assignee: | RHCS Maintainers <rhcs-maint> | ||||
Status: | CLOSED UPSTREAM | QA Contact: | PKI QE <bugzilla-pkiqe> | ||||
Severity: | high | Docs Contact: | |||||
Priority: | high | ||||||
Version: | 8.1 | CC: | ascheel, hkario, nss-nspr-maint, rrelyea | ||||
Target Milestone: | rc | Keywords: | Triaged | ||||
Target Release: | 8.2 | ||||||
Hardware: | s390 | ||||||
OS: | Unspecified | ||||||
Whiteboard: | |||||||
Fixed In Version: | Doc Type: | If docs needed, set a value | |||||
Doc Text: | Story Points: | --- | |||||
Clone Of: | Environment: | ||||||
Last Closed: | 2019-08-15 17:29:46 UTC | Type: | Bug | ||||
Regression: | --- | Mount Type: | --- | ||||
Documentation: | --- | CRM: | |||||
Verified Versions: | Category: | --- | |||||
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |||||
Cloudforms Team: | --- | Target Upstream Version: | |||||
Embargoed: | |||||||
Attachments: |
|
Description
Alex Scheel
2019-07-15 21:03:21 UTC
This problem occurs on Fedora as well. It is likely related to async io. Created attachment 1602241 [details]
ltrace on relevant test case, s390x
bash -c 'LD_LIBRARY_PATH="$(pwd)/lib:" ltrace -l "*ss*" ./bin/TestBufferPRFD "/root/sandbox/jss/build/results/nssdb" "m1oZilla" "Server_RSA" 2>&1' 2>&1 > ltrace.txt
(Grabs stuff from ssl3 and nss3 libraries; not entirely sure it is helpful).
Ok, so I think this is a weird edge case on s390x and its handling of async io. On s390x, the async io adapter gets: > Found cert with nickname: Server_RSA > Trying handshake... > Client Handshake: > <buffer:client_write_buffer:0xbbcc7c90> jb_write of 191 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0xbbcc7c90> jb_write of 0 bytes | write_cap of 1857 bytes | read_cap of 191 bytes > > > Server Handshake: > <buffer:client_write_buffer:0xbbcc7c90> jb_read of 5 bytes | write_cap of 1857 bytes | read_cap of 191 bytes > <buffer:client_write_buffer:0xbbcc7c90> jb_read of 0 bytes | write_cap of 1862 bytes | read_cap of 186 bytes > <buffer:client_write_buffer:0xbbcc7c90> jb_read of 186 bytes | write_cap of 1862 bytes | read_cap of 186 bytes > <buffer:client_write_buffer:0xbbcc7c90> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0xbbccc400> jb_write of 3342 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0xbbccc400> jb_write of 1294 bytes | write_cap of 0 bytes | read_cap of 2048 bytes > > > Client Handshake: > <buffer:client_read_buffer:0xbbccc400> jb_read of 5 bytes | write_cap of 0 bytes | read_cap of 2048 bytes > <buffer:client_read_buffer:0xbbccc400> jb_read of 0 bytes | write_cap of 5 bytes | read_cap of 2043 bytes > <buffer:client_read_buffer:0xbbccc400> jb_read of 3337 bytes | write_cap of 5 bytes | read_cap of 2043 bytes > <buffer:client_read_buffer:0xbbccc400> jb_read of 1294 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > > > Server Handshake: > > > Client Handshake: > > > Server Handshake: > > > Client Handshake:(i.e., > > It tries to write 3342 bytes into a buffer of size 2048. 1294 bytes remain, and it never tries to write those remaining bytes again. I think it should under semantics of write(...). On x86_64 and other platforms: > Found cert with nickname: Server_RSA > Trying handshake... > Client Handshake: > <buffer:client_write_buffer:0x1fe0430> jb_write of 177 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0x1fe0430> jb_write of 0 bytes | write_cap of 1871 bytes | read_cap of 177 bytes > > > Server Handshake: > <buffer:client_write_buffer:0x1fe0430> jb_read of 5 bytes | write_cap of 1871 bytes | read_cap of 177 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 1876 bytes | read_cap of 172 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 172 bytes | write_cap of 1876 bytes | read_cap of 172 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 3277 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 1229 bytes | write_cap of 0 bytes | read_cap of 2048 bytes > > > Client Handshake: > <buffer:client_read_buffer:0x1fe04e0> jb_read of 5 bytes | write_cap of 0 bytes | read_cap of 2048 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 5 bytes | read_cap of 2043 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 3272 bytes | write_cap of 5 bytes | read_cap of 2043 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 1229 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > > > Server Handshake: > <buffer:client_read_buffer:0x1fe04e0> jb_write of 1229 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 0 bytes | write_cap of 819 bytes | read_cap of 1229 bytes > > > Client Handshake: > <buffer:client_read_buffer:0x1fe04e0> jb_read of 1229 bytes | write_cap of 819 bytes | read_cap of 1229 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0x1fe0430> jb_write of 93 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0x1fe0430> jb_write of 0 bytes | write_cap of 1955 bytes | read_cap of 93 bytes > > > Server Handshake: > <buffer:client_write_buffer:0x1fe0430> jb_read of 5 bytes | write_cap of 1955 bytes | read_cap of 93 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 1960 bytes | read_cap of 88 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 37 bytes | write_cap of 1960 bytes | read_cap of 88 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 1997 bytes | read_cap of 51 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 5 bytes | write_cap of 1997 bytes | read_cap of 51 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2002 bytes | read_cap of 46 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 1 bytes | write_cap of 2002 bytes | read_cap of 46 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2003 bytes | read_cap of 45 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 5 bytes | write_cap of 2003 bytes | read_cap of 45 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2008 bytes | read_cap of 40 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 40 bytes | write_cap of 2008 bytes | read_cap of 40 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 51 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 0 bytes | write_cap of 1997 bytes | read_cap of 51 bytes > > > Send a message from the client to the server... > <buffer:client_read_buffer:0x1fe04e0> jb_read of 5 bytes | write_cap of 1997 bytes | read_cap of 51 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2002 bytes | read_cap of 46 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 1 bytes | write_cap of 2002 bytes | read_cap of 46 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2003 bytes | read_cap of 45 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 5 bytes | write_cap of 2003 bytes | read_cap of 45 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2008 bytes | read_cap of 40 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 40 bytes | write_cap of 2008 bytes | read_cap of 40 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0x1fe0430> jb_write of 40 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0x1fe0430> jb_write of 0 bytes | write_cap of 2008 bytes | read_cap of 40 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 5 bytes | write_cap of 2008 bytes | read_cap of 40 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2013 bytes | read_cap of 35 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 35 bytes | write_cap of 2013 bytes | read_cap of 35 bytes > <buffer:client_write_buffer:0x1fe0430> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > Received message from client: Cooking MCs [len: 11] > > > Send a message from the server to the client... > <buffer:client_read_buffer:0x1fe04e0> jb_write of 50 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 0 bytes | write_cap of 1998 bytes | read_cap of 50 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 5 bytes | write_cap of 1998 bytes | read_cap of 50 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2003 bytes | read_cap of 45 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 45 bytes | write_cap of 2003 bytes | read_cap of 45 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > Received message from client: like a pound of bacon [len: 21] > <buffer:client_write_buffer:0x1fe0430> jb_write of 31 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_write_buffer:0x1fe0430> jb_write of 0 bytes | write_cap of 2017 bytes | read_cap of 31 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 31 bytes | write_cap of 2048 bytes | read_cap of 0 bytes > <buffer:client_read_buffer:0x1fe04e0> jb_write of 0 bytes | write_cap of 2017 bytes | read_cap of 31 bytes It overflows the buffer size of 2048, (with a slightly larger write, I'm not sure why or what the differences are). And next time SSL_ForceHandshake gets called on the server socket, it tries to write the remaining bytes and succeeds. I think this is consistent with a race condition. I suspect we are racing on the ss->pendingBug.len. OK, The issue looks like it's in nspr: opt.option = PR_SockOpt_Nonblocking; opt.value.non_blocking = PR_FALSE; status = PR_GetSocketOption(fd, &opt); is returning PR_FALSE for non-blocking instead of PR_TRUE (since the socket is non-blocking). This mean SSL_ForceHandshake() won't write out the the buffered write data (since there should be none on a blocking socket). I don't know if the base problem is NSPR or the kernel. Moving component to NSPR. (doesn't change that I'm still actively looking at the underlying problem). The bug is in: ./org/mozilla/jss/ssl/javax/BufferPRFD.c The following function is incorrect: // Fake responses to getSocketOption requests static PRStatus PRBufferGetSocketOption(PRFileDesc *fd, PRSocketOptionData *data) { /* getSocketOption takes a PRFileDesc and modifies the PRSocketOptionData * with the options on this. We set a couple of sane defaults here: * * non_blocking = true * reuse_addr = true * keep_alive = false * no_delay = true * * However the list above is far fom extensive. Note that responses are * "fake" in that calls to setSocketOption fail to reflect here. */ if (data) { PRFilePrivate *internal = fd->secret; data->value.non_blocking = PR_TRUE; data->value.reuse_addr = PR_TRUE; data->value.keep_alive = PR_FALSE; data->value.mcast_loopback = PR_FALSE; data->value.no_delay = PR_TRUE; data->value.max_segment = jb_capacity(internal->read_buffer); data->value.recv_buffer_size = jb_capacity(internal->read_buffer); data->value.send_buffer_size = jb_capacity(internal->write_buffer); return PR_SUCCESS; } return PR_FAILURE; } PRSocketOptionData.value is a union. You need to switch on data->option to select the correct value. On all the other platforms this sort of worked because either sizeof data->value.send_buffer_size == sizeof (PRBool), or the platform was little endian so value->non_blocking was non-zero. on s390, sizeof (data->value.send_buffer_size) was greater than sizeof (PRBool) and the platform was big endian. Upshot PRBufferGetSocketOption needs to be coded as a switch statement and only set the requested value; A good testcase for this that would fail on all platforms: Use PR_GetSocketOption() to fetch each option and make sure it's correct (compare bools to PR_TRUE or PR_FALSE). Example: opt.option = PR_SockOpt_Nonblocking; opt.value.non_blocking = PR_FALSE; status = PR_GetSocketOption(fd, &opt); if (status != PR_SUCCESS) return FAIL; if (opt.value.non_blocking != PR_TRUE) return FAIL; return SUCCESS Do this for each value PRBufferGetSocketOption intends to return. Also test on a socket with different write and read buffer lengths. Ahh... That does make a lot more sense why I only see the problem on s390x. I think I had thought at the time it was weird that PRBufferGetSocketOption didn't take a specific option to query like getsockopt(...) does. I definitely need to figure out where NSPR has put its other PRFileDesc related tests and exercise mine as well. OTOH, we now know that NSS and NSPR both do non-blocking better than I can! :P My bad, and many thanks for pointing me in the right direction! |