RHEL Engineering is moving the tracking of its product development work on RHEL 6 through RHEL 9 to Red Hat Jira (issues.redhat.com). If you're a Red Hat customer, please continue to file support cases via the Red Hat customer portal. If you're not, please head to the "RHEL project" in Red Hat Jira and file new tickets here. Individual Bugzilla bugs in the statuses "NEW", "ASSIGNED", and "POST" are being migrated throughout September 2023. Bugs of Red Hat partners with an assigned Engineering Partner Manager (EPM) are migrated in late September as per pre-agreed dates. Bugs against components "kernel", "kernel-rt", and "kpatch" are only migrated if still in "NEW" or "ASSIGNED". If you cannot log in to RH Jira, please consult article #7032570. That failing, please send an e-mail to the RH Jira admins at rh-issues@redhat.com to troubleshoot your issue as a user management inquiry. The email creates a ServiceNow ticket with Red Hat. Individual Bugzilla bugs that are migrated will be moved to status "CLOSED", resolution "MIGRATED", and set with "MigratedToJIRA" in "Keywords". The link to the successor Jira issue will be found under "Links", have a little "two-footprint" icon next to it, and direct you to the "RHEL project" in Red Hat Jira (issue links are of type "https://issues.redhat.com/browse/RHEL-XXXX", where "X" is a digit). This same link will be available in a blue banner at the top of the page informing you that that bug has been migrated.
Bug 1730109 - NSS fails to handshake on s390x
Summary: NSS fails to handshake on s390x
Keywords:
Status: CLOSED UPSTREAM
Alias: None
Product: Red Hat Enterprise Linux 8
Classification: Red Hat
Component: jss
Version: 8.1
Hardware: s390
OS: Unspecified
high
high
Target Milestone: rc
: 8.2
Assignee: RHCS Maintainers
QA Contact: PKI QE
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-07-15 21:03 UTC by Alex Scheel
Modified: 2019-08-15 17:29 UTC (History)
4 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed: 2019-08-15 17:29:46 UTC
Type: Bug
Target Upstream Version:
Embargoed:


Attachments (Terms of Use)
ltrace on relevant test case, s390x (2.40 MB, text/plain)
2019-08-09 15:31 UTC, Alex Scheel
no flags Details

Description Alex Scheel 2019-07-15 21:03:21 UTC
Description of problem:


During the RHEL 8.1 release cycle, I had to disable a SSL handshake (client + server) test case in JSS. I've confirmed that this is a NSS issue. The following test case will not work on s390x with RSA certificates, but works fine with s390x and ECDSA certificates or on non-s390x architectures with either type of certificate.


https://github.com/dogtagpki/jss/blob/master/org/mozilla/jss/tests/TestBufferPRFD.c#L315-L338


Version-Release number of selected component (if applicable):

# rpm -qa | grep '^nss'
nss-softokn-freebl-3.41.0-5.el8.s390x
nss-util-3.41.0-5.el8.s390x
nss-util-devel-3.41.0-5.el8.s390x
nss-softokn-3.41.0-5.el8.s390x
nss-sysinit-3.41.0-5.el8.s390x
nss-softokn-devel-3.41.0-5.el8.s390x
nss-devel-3.41.0-5.el8.s390x
nss-3.41.0-5.el8.s390x
nss-softokn-freebl-devel-3.41.0-5.el8.s390x
nss-tools-3.41.0-5.el8.s390x


How reproducible:

Very



Steps to Reproduce:

1. Clone jss from upstream: git clone https://github.com/dogtagpki/jss
2. Install required dependencies: sudo dnf module enable -y javapackages-tools pki-deps && sudo dnf install apache-commons-codec apache-commons-lang gcc-c++ java-devel jpackage-utils slf4j zlib-devel glassfish-jaxb-api nss-tools nss-devel cmake junit tar zip unzip time git -y
3. Try building: cd build && cmake .. && make all && make test 



Actual results:

Two test cases will fail:

The following tests FAILED:
	 17 - JSS_Test_BufferPRFD (Failed)
	 36 - TestBufferPRFD_RSA (Failed)



Expected results:

All test cases succeed, like they do on every other architecture.

Comment 1 Alex Scheel 2019-08-08 16:50:24 UTC
This problem occurs on Fedora as well. It is likely related to async io.

Comment 2 Alex Scheel 2019-08-09 15:31:35 UTC
Created attachment 1602241 [details]
ltrace on relevant test case, s390x

bash -c 'LD_LIBRARY_PATH="$(pwd)/lib:" ltrace -l "*ss*" ./bin/TestBufferPRFD "/root/sandbox/jss/build/results/nssdb" "m1oZilla" "Server_RSA" 2>&1' 2>&1  > ltrace.txt

(Grabs stuff from ssl3 and nss3 libraries; not entirely sure it is helpful).

Comment 3 Alex Scheel 2019-08-09 17:10:19 UTC
Ok, so I think this is a weird edge case on s390x and its handling of async io. 


On s390x, the async io adapter gets:

> Found cert with nickname: Server_RSA
> Trying handshake...
> Client Handshake:
> <buffer:client_write_buffer:0xbbcc7c90>  jb_write of 191 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0xbbcc7c90>  jb_write of 0 bytes | write_cap of 1857 bytes | read_cap of 191 bytes
> 
> 
> Server Handshake:
> <buffer:client_write_buffer:0xbbcc7c90>  jb_read of 5 bytes | write_cap of 1857 bytes | read_cap of 191 bytes
> <buffer:client_write_buffer:0xbbcc7c90>  jb_read of 0 bytes | write_cap of 1862 bytes | read_cap of 186 bytes
> <buffer:client_write_buffer:0xbbcc7c90>  jb_read of 186 bytes | write_cap of 1862 bytes | read_cap of 186 bytes
> <buffer:client_write_buffer:0xbbcc7c90>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0xbbccc400>  jb_write of 3342 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0xbbccc400>  jb_write of 1294 bytes | write_cap of 0 bytes | read_cap of 2048 bytes
> 
> 
> Client Handshake:
> <buffer:client_read_buffer:0xbbccc400>  jb_read of 5 bytes | write_cap of 0 bytes | read_cap of 2048 bytes
> <buffer:client_read_buffer:0xbbccc400>  jb_read of 0 bytes | write_cap of 5 bytes | read_cap of 2043 bytes
> <buffer:client_read_buffer:0xbbccc400>  jb_read of 3337 bytes | write_cap of 5 bytes | read_cap of 2043 bytes
> <buffer:client_read_buffer:0xbbccc400>  jb_read of 1294 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> 
> 
> Server Handshake:
> 
> 
> Client Handshake:
> 
> 
> Server Handshake:
> 
> 
> Client Handshake:(i.e., 
> 
> 

It tries to write 3342 bytes into a buffer of size 2048. 1294 bytes remain, and it never tries to write those remaining bytes again. I think it should under semantics of write(...).


On x86_64 and other platforms: 

> Found cert with nickname: Server_RSA
> Trying handshake...
> Client Handshake:
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 177 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 0 bytes | write_cap of 1871 bytes | read_cap of 177 bytes
> 
> 
> Server Handshake:
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 5 bytes | write_cap of 1871 bytes | read_cap of 177 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 1876 bytes | read_cap of 172 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 172 bytes | write_cap of 1876 bytes | read_cap of 172 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 3277 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 1229 bytes | write_cap of 0 bytes | read_cap of 2048 bytes
> 
> 
> Client Handshake:
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 5 bytes | write_cap of 0 bytes | read_cap of 2048 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 5 bytes | read_cap of 2043 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 3272 bytes | write_cap of 5 bytes | read_cap of 2043 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 1229 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> 
> 
> Server Handshake:
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 1229 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 0 bytes | write_cap of 819 bytes | read_cap of 1229 bytes
> 
> 
> Client Handshake:
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 1229 bytes | write_cap of 819 bytes | read_cap of 1229 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 93 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 0 bytes | write_cap of 1955 bytes | read_cap of 93 bytes
> 
> 
> Server Handshake:
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 5 bytes | write_cap of 1955 bytes | read_cap of 93 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 1960 bytes | read_cap of 88 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 37 bytes | write_cap of 1960 bytes | read_cap of 88 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 1997 bytes | read_cap of 51 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 5 bytes | write_cap of 1997 bytes | read_cap of 51 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2002 bytes | read_cap of 46 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 1 bytes | write_cap of 2002 bytes | read_cap of 46 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2003 bytes | read_cap of 45 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 5 bytes | write_cap of 2003 bytes | read_cap of 45 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2008 bytes | read_cap of 40 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 40 bytes | write_cap of 2008 bytes | read_cap of 40 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 51 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 0 bytes | write_cap of 1997 bytes | read_cap of 51 bytes
> 
> 
> Send a message from the client to the server...
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 5 bytes | write_cap of 1997 bytes | read_cap of 51 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2002 bytes | read_cap of 46 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 1 bytes | write_cap of 2002 bytes | read_cap of 46 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2003 bytes | read_cap of 45 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 5 bytes | write_cap of 2003 bytes | read_cap of 45 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2008 bytes | read_cap of 40 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 40 bytes | write_cap of 2008 bytes | read_cap of 40 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 40 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 0 bytes | write_cap of 2008 bytes | read_cap of 40 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 5 bytes | write_cap of 2008 bytes | read_cap of 40 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2013 bytes | read_cap of 35 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 35 bytes | write_cap of 2013 bytes | read_cap of 35 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> Received message from client: Cooking MCs [len: 11]
> 
> 
> Send a message from the server to the client...
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 50 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 0 bytes | write_cap of 1998 bytes | read_cap of 50 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 5 bytes | write_cap of 1998 bytes | read_cap of 50 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2003 bytes | read_cap of 45 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 45 bytes | write_cap of 2003 bytes | read_cap of 45 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_read of 0 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> Received message from client: like a pound of bacon [len: 21]
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 31 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_write_buffer:0x1fe0430>  jb_write of 0 bytes | write_cap of 2017 bytes | read_cap of 31 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 31 bytes | write_cap of 2048 bytes | read_cap of 0 bytes
> <buffer:client_read_buffer:0x1fe04e0>  jb_write of 0 bytes | write_cap of 2017 bytes | read_cap of 31 bytes

It overflows the buffer size of 2048, (with a slightly larger write, I'm not sure why or what the differences are). And next time SSL_ForceHandshake gets called on the server socket, it tries to write the remaining bytes and succeeds.

Comment 5 Bob Relyea 2019-08-14 17:54:15 UTC
I think this is consistent with a race condition. I suspect we are racing on the ss->pendingBug.len.

Comment 6 Bob Relyea 2019-08-15 16:15:54 UTC
OK, The issue looks like it's in nspr: 

opt.option = PR_SockOpt_Nonblocking;
opt.value.non_blocking = PR_FALSE;
status = PR_GetSocketOption(fd, &opt);

is returning PR_FALSE for non-blocking instead of PR_TRUE (since the socket is non-blocking). This mean SSL_ForceHandshake() won't write out the the buffered write data (since there should be none on a blocking socket).

I don't know if the base problem is NSPR or the kernel. Moving component to NSPR. (doesn't change that I'm still actively looking at the underlying problem).

Comment 7 Bob Relyea 2019-08-15 16:56:18 UTC
The bug is in: ./org/mozilla/jss/ssl/javax/BufferPRFD.c


The following function is incorrect:

// Fake responses to getSocketOption requests
static PRStatus PRBufferGetSocketOption(PRFileDesc *fd, PRSocketOptionData *data)
{   
    /* getSocketOption takes a PRFileDesc and modifies the PRSocketOptionData
     * with the options on this. We set a couple of sane defaults here:
     *
     *   non_blocking = true
     *   reuse_addr = true
     *   keep_alive = false
     *   no_delay = true
     *
     * However the list above is far fom extensive. Note that responses are
     * "fake" in that calls to setSocketOption fail to reflect here.
     */

    if (data) {
        PRFilePrivate *internal = fd->secret;

        data->value.non_blocking = PR_TRUE;
        data->value.reuse_addr = PR_TRUE;
        data->value.keep_alive = PR_FALSE;
        data->value.mcast_loopback = PR_FALSE;
        data->value.no_delay = PR_TRUE;
        data->value.max_segment = jb_capacity(internal->read_buffer);
        data->value.recv_buffer_size = jb_capacity(internal->read_buffer);
        data->value.send_buffer_size = jb_capacity(internal->write_buffer);

        return PR_SUCCESS;
    }

    return PR_FAILURE;
}

PRSocketOptionData.value is a union. You need to switch on data->option to select the correct value. On all the other platforms this sort of worked because either sizeof data->value.send_buffer_size  == sizeof (PRBool), or the platform was little endian so value->non_blocking was non-zero. on s390, sizeof (data->value.send_buffer_size) was greater than sizeof (PRBool) and the platform was big endian.

Upshot PRBufferGetSocketOption needs to be coded as a switch statement and only set the requested value;

Comment 8 Bob Relyea 2019-08-15 17:17:09 UTC
A good testcase for this that would fail on all platforms:

Use PR_GetSocketOption() to fetch each option and make sure it's correct (compare bools to PR_TRUE or PR_FALSE).

Example:

opt.option = PR_SockOpt_Nonblocking;
opt.value.non_blocking = PR_FALSE;
status = PR_GetSocketOption(fd, &opt);
if (status != PR_SUCCESS)
    return FAIL;
if (opt.value.non_blocking != PR_TRUE)
    return FAIL;
return SUCCESS

Do this for each value PRBufferGetSocketOption intends to return. Also test on a socket with different write and read buffer lengths.

Comment 9 Alex Scheel 2019-08-15 17:29:46 UTC
Ahh... That does make a lot more sense why I only see the problem on s390x. 

I think I had thought at the time it was weird that PRBufferGetSocketOption didn't take a specific option to query like getsockopt(...) does. I definitely need to figure out where NSPR has put its other PRFileDesc related tests and exercise mine as well.

OTOH, we now know that NSS and NSPR both do non-blocking better than I can! :P

My bad, and many thanks for pointing me in the right direction!


Note You need to log in before you can comment on or make changes to this bug.