+++ This bug was initially created as a clone of Bug #534115 +++ Version-Release number of selected component (if applicable): nss-3.12.4-3.fc11.i586 How reproducible: occasionally, depends also on count of threads Actual results: SIGSEGV, backtrace available Expected results: no SIGSEGV Additional info: Formerly reported at https://bugzilla.redhat.com/504257#c7 --- Additional comment from kdudka on 2009-11-10 12:59:06 EDT --- Here is a scratch build of NSS with the mentioned patch partially reverted. It should not trigger the invalid writes: http://koji.fedoraproject.org/koji/taskinfo?taskID=1798840 Unfortunately it brings back some memory leaks fixed before. So it looks like we need to free the memory somewhere else. For now I am not sure if the invalid writes are actually related to the SEGFAULT or not. --- Additional comment from curb_pks on 2009-11-10 14:11:17 EDT --- I ran this build under valgrind, the invalid writes seem to be gone. However I still saw the core dump and a valgrind invalid read error: ==10352== Thread 60: ==10352== Invalid read of size 4 ==10352== at 0x47F8BA1: ssl_DefRecv (ssldef.c:91) ==10352== by 0x47F38C5: ssl3_GatherCompleteHandshake (ssl3gthr.c:90) ==10352== by 0x47F64BA: ssl_GatherRecord1stHandshake (sslcon.c:1258) ==10352== by 0x47FC6F4: ssl_Do1stHandshake (sslsecur.c:151) ==10352== by 0x47FDDA6: SSL_ForceHandshake (sslsecur.c:407) ==10352== by 0x47FDE76: SSL_ForceHandshakeWithTimeout (sslsecur.c:428) ==10352== by 0x437BBC7: Curl_nss_connect (nss.c:1214) ==10352== by 0x4372181: Curl_ssl_connect (sslgen.c:185) ==10352== by 0x43514F8: Curl_http_connect (http.c:1792) ==10352== by 0x43589AD: Curl_protocol_connect (url.c:3056) ==10352== by 0x435DC7E: Curl_connect (url.c:4690) ==10352== by 0x4366A03: Curl_perform (transfer.c:2481) ==10352== Address 0x8 is not stack'd, malloc'd or (recently) free'd ==10352== ==10352== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==10352== Access not within mapped region at address 0x8 ==10352== at 0x47F8BA1: ssl_DefRecv (ssldef.c:91) ==10352== by 0x47F38C5: ssl3_GatherCompleteHandshake (ssl3gthr.c:90) ==10352== by 0x47F64BA: ssl_GatherRecord1stHandshake (sslcon.c:1258) ==10352== by 0x47FC6F4: ssl_Do1stHandshake (sslsecur.c:151) ==10352== by 0x47FDDA6: SSL_ForceHandshake (sslsecur.c:407) ==10352== by 0x47FDE76: SSL_ForceHandshakeWithTimeout (sslsecur.c:428) ==10352== by 0x437BBC7: Curl_nss_connect (nss.c:1214) ==10352== by 0x4372181: Curl_ssl_connect (sslgen.c:185) ==10352== by 0x43514F8: Curl_http_connect (http.c:1792) ==10352== by 0x43589AD: Curl_protocol_connect (url.c:3056) ==10352== by 0x435DC7E: Curl_connect (url.c:4690) ==10352== by 0x4366A03: Curl_perform (transfer.c:2481)
Kevin has confirmed me off-list he can see the same core dump with a hand built libcurl without the PEM support. It means this crash is unrelated to bug 534115 (invalid writes within the PEM module).
When the core dump occurs in ssl_DefRecv: (gdb) p ss $1 = (sslSocket *) 0x8326338 (gdb) p ss->fd $2 = (PRFileDesc *) 0x0 When I run under the debugger before the core dump happens I see this happen: Program received signal SIGPIPE, Broken pipe. [Switching to Thread 0xb3ffbb70 (LWP 2379)] 0x00e08422 in __kernel_vsyscall () Missing separate debuginfos, use: debuginfo-install sqlite-3.6.12-3.fc11.i586 (gdb) bt #0 0x00e08422 in __kernel_vsyscall () #1 0x00a3d458 in send () from /lib/libpthread.so.0 #2 0x00ca3f43 in ?? () from /lib/libnspr4.so #3 0x00882d6e in ssl_DefSend (ss=0x85b0d88, buf=0x85c09d0 "\25\3\1", len=23, flags=0) at ssldef.c:127 #4 0x00873824 in ssl3_SendRecord (ss=0x85b0d88, type=<value optimized out>, pIn=0xb3ffae4e "\1", nIn=2, flags=0) at ssl3con.c:2061 #5 0x00873bf1 in SSL3_SendAlert (ss=0x85b0d88, level=alert_warning, desc=close_notify) at ssl3con.c:2316 #6 0x00886bf2 in ssl_SecureClose (ss=0x85b0d88) at sslsecur.c:1039 #7 0x0088b287 in ssl_Close (fd=0xb68fbde0) at sslsock.c:1406 #8 0x00c89181 in PR_Close () from /lib/libnspr4.so #9 0x002b7291 in Curl_nss_close (conn=0x84e71e8, sockindex=0) at nss.c:908 #10 0x002ae1e4 in Curl_ssl_close (conn=0x84e71e8, sockindex=0) at sslgen.c:368 #11 0x002964ab in Curl_disconnect (conn=0x84e71e8) at url.c:2414 #12 0x002966cb in Curl_done (connp=0xb3ffaff8, status=CURLE_SEND_ERROR, premature=false) at url.c:4897 #13 0x002a3617 in Curl_perform (data=0x845b030) at transfer.c:2625 #14 0x002a3fc3 in curl_easy_perform (curl=0x845b030) at easy.c:557 This stack calls: ssl_DefClose Which does this: ss->fd = NULL; Could this be causing the issue?
(In reply to comment #2) > This stack calls: > > ssl_DefClose > > Which does this: > > ss->fd = NULL; > > Could this be causing the issue? No unless the already closed socket is later used for reading/writing...
ssl_Close() should lock the socket for reading/writing, thus wait for any unfinished I/O operation before closing it. See the big comment within that function... It drives me to another idea. Which library are you using to create that threads in you application?
I'm using pthreads.
(In reply to comment #5) > I'm using pthreads. This should be OK. For now I am only reading the code, looking at the backtrace in comment #0. The "recv buf lock" should be held by ssl_GatherRecord1stHandshake(). This implies nobody closes the socket using ssl_Close() until the lock is released. Or am I wrong?
(In reply to comment #6) It's actually not true. I've just mixed up ss->recvLock and ss->recvBufLock in the reasoning...
I stopped the debugger within ssl_DefRecv() on the way from SSL_ForceHandshake() and got this: ss->recvLock->locked = 0 ss->recvBufLock->lock.locked = 1 Then I stopped it at the same place on the way from ssl_Recv() and got this: ss->recvLock->locked = 1 ss->recvBufLock->lock.locked = 1 Is there actually anything else which can hold the ssl_Close() until the receive operation is completed? It looks like ForceHandshake() is less thread-safe than other functions. Is it a bug or feature?
Created attachment 368969 [details] attempt to lock/unlock ss->recvLock in SSL_ForceHandshake() I am just build a scratch build with the patch applied for testing purposes: http://koji.fedoraproject.org/koji/taskinfo?taskID=1799721
Kevin, please test the patch or scratch build and leave a note here if anything changes. I hope it will not just cause a deadlock instead ... I have actually nothing to test it with except the singly-threaded curl tool and gdb :-)
Thanks Kamil!! I tried the patch, and the core dump went away. But my code now seems to have a bunch of threads stuck in this stack: Thread 18356 (Thread 0x910c3b70 (LWP 6750)): #0 0x0041b422 in __kernel_vsyscall () #1 0x005f3e43 in __lll_lock_wait_private () from /lib/libc.so.6 #2 0x0057cb94 in _L_lock_9571 () from /lib/libc.so.6 #3 0x0057abf4 in malloc () from /lib/libc.so.6 #4 0x00278018 in PR_Malloc () from /lib/libnspr4.so #5 0x002e6853 in PORT_Alloc_Util (bytes=18432) at secport.c:113 #6 0x00251b06 in sslBuffer_Grow (b=0xb9c392c, newLen=18432) at sslsecur.c:454 #7 0x00247a02 in ssl3_GatherData (flags=<value optimized out>, gs=<value optimized out>, ss=<value optimized out>) at ssl3gthr.c:149 #8 ssl3_GatherCompleteHandshake (flags=<value optimized out>, gs=<value optimized out>, ss=<value optimized out>) at ssl3gthr.c:195 #9 0x0024a4bb in ssl_GatherRecord1stHandshake (ss=0xb9c36b8) at sslcon.c:1258 #10 0x002506f5 in ssl_Do1stHandshake (ss=0xb9c36b8) at sslsecur.c:151 #11 0x00251df7 in SSL_ForceHandshake (fd=0xb75f3cb8) at sslsecur.c:410 #12 0x00251ee7 in SSL_ForceHandshakeWithTimeout (fd=0xb75f3cb8, timeout=30000) at sslsecur.c:434 #13 0x0014dbc8 in Curl_nss_connect () from /usr/lib/libcurl.so.4 #14 0x00144182 in Curl_ssl_connect () from /usr/lib/libcurl.so.4 #15 0x001234f9 in Curl_http_connect () from /usr/lib/libcurl.so.4 #16 0x0012a9ae in Curl_protocol_connect () from /usr/lib/libcurl.so.4 #17 0x0012fc7f in Curl_connect () from /usr/lib/libcurl.so.4 #18 0x00138a04 in Curl_perform () from /usr/lib/libcurl.so.4 #19 0x00139793 in curl_easy_perform () from /usr/lib/libcurl.so.4
I reran the code under valgrind, and eventually it cored. I saw a 4 of these: ==21045== Invalid read of size 4 ==21045== at 0x4975B02: PR_GetIdentitiesLayer (in /lib/libnspr4.so) ==21045== by 0x48015F6: ssl_FindSocket (sslsock.c:248) ==21045== by 0x4803418: SSL_GetChannelInfo (sslinfo.c:54) ==21045== by 0x437AC91: Curl_nss_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4371181: Curl_ssl_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x43504F8: Curl_http_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x43579AD: Curl_protocol_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x435CC7E: Curl_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4365A03: Curl_perform (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4366792: curl_easy_perform (in /usr/lib/libcurl.so.4.1.1) One of these: ==21045== Invalid read of size 4 ==21045== at 0x488AA77: CERT_NameToAsciiInvertible (alg1485.c:1100) ==21045== by 0x488AB94: CERT_NameToAscii (alg1485.c:1149) ==21045== by 0x4379779: (within /usr/lib/libcurl.so.4.1.1) ==21045== by 0x437ACD5: Curl_nss_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4371181: Curl_ssl_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x43504F8: Curl_http_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x43579AD: Curl_protocol_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x435CC7E: Curl_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4365A03: Curl_perform (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4366792: curl_easy_perform (in /usr/lib/libcurl.so.4.1.1) And the actual core itself: ==21045== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==21045== Access not within mapped region at address 0xD0 ==21045== at 0x488AA77: CERT_NameToAsciiInvertible (alg1485.c:1100) ==21045== by 0x488AB94: CERT_NameToAscii (alg1485.c:1149) ==21045== by 0x4379779: (within /usr/lib/libcurl.so.4.1.1) ==21045== by 0x437ACD5: Curl_nss_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4371181: Curl_ssl_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x43504F8: Curl_http_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x43579AD: Curl_protocol_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x435CC7E: Curl_connect (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4365A03: Curl_perform (in /usr/lib/libcurl.so.4.1.1) ==21045== by 0x4366792: curl_easy_perform (in /usr/lib/libcurl.so.4.1.1)
Last stack trace had no debug symbols for curl: #0 CERT_NameToAsciiInvertible (name=0xcc, strict=CERT_N2A_READABLE) at alg1485.c:1100 #1 0x0488ab95 in CERT_NameToAscii (name=0xcc) at alg1485.c:1149 #2 0x0437977a in display_cert_info (data=0x7c6e630, cert=0x0) at nss.c:669 #3 0x0437acd6 in display_conn_info (sock=<value optimized out>, conn=<value optimized out>) at nss.c:707 #4 Curl_nss_connect (sock=<value optimized out>, conn=<value optimized out>) at nss.c:1226 #5 0x04371182 in Curl_ssl_connect (conn=0x2f4a8370, sockindex=0) at sslgen.c:185 #6 0x043504f9 in Curl_http_connect (conn=0x2f4a8370, done=0x15a74e7e) at http.c:1792 #7 0x043579ae in Curl_protocol_connect (conn=0x2f4a8370, protocol_done=0x15a74e7e) at url.c:3056 #8 0x0435cc7f in setup_conn (protocol_done=<value optimized out>, hostaddr=<value optimized out>, conn=<value optimized out>) at url.c:4690 #9 Curl_connect (protocol_done=<value optimized out>, hostaddr=<value optimized out>, conn=<value optimized out>) at url.c:4766 #10 0x04365a04 in connect_host (conn=<value optimized out>, data=<value optimized out>) at transfer.c:2481 #11 Curl_perform (conn=<value optimized out>, data=<value optimized out>) at transfer.c:2562 #12 0x04366793 in curl_easy_perform (curl=0x7c6e630) at easy.c:557
==24503== Process terminating with default action of signal 11 (SIGSEGV): dumping core ==24503== Access not within mapped region at address 0xC ==24503== at 0x47F7C0F: ssl_DefClose (ssldef.c:215) ==24503== by 0x47FBBFF: ssl_SecureClose (sslsecur.c:1047) ==24503== by 0x48002F6: ssl_Close (sslsock.c:1406) ==24503== by 0x49741A0: PR_Close (in /lib/libnspr4.so) ==24503== by 0x437A255: Curl_nss_connect (nss.c:1268) ==24503== by 0x4371181: Curl_ssl_connect (sslgen.c:185) ==24503== by 0x43504F8: Curl_http_connect (http.c:1792) ==24503== by 0x43579AD: Curl_protocol_connect (url.c:3056) ==24503== by 0x435CC7E: Curl_connect (url.c:4690) ==24503== by 0x4365A03: Curl_perform (transfer.c:2481) ==24503== by 0x4366792: curl_easy_perform (easy.c:557) I also just got this core just now. Seems like the same issue.
Created attachment 369117 [details] Patch
Hi All, I spent some time debugging the curl code while Kamil and I were debugging this issue and I think I found out what is going on. Someplace in the code it looks like a data structure was released twice. So I looked in curl_nss_connect and saw that the model structure will be freed twice if the handshake fails. At line 939: PRFileDesc *model = NULL; At line 1201: PR_Close(model); /* We don't need this any more */ Then at line 1214 if SSL_ForceHandshakeWithTimeout fails, goto error is called and at line 1267: if(model) PR_Close(model); I created a patch that sets model to NULL around line 1201 and that seems to fix my crash. See below: *** curl-7.19.6/lib/nss.c Fri Jul 31 20:15:35 2009 --- /tmp/nss.c Wed Nov 11 17:52:27 2009 *************** *** 1184,1190 **** --- 1184,1192 ---- connssl->handle = SSL_ImportFD(model, connssl->handle); if(!connssl->handle) goto error; + PR_Close(model); /* We don't need this any more */ + model = NULL; /* This is the password associated with the cert that we're using */ if (data->set.str[STRING_KEY_PASSWD]) { It looks at some point the handshake failed and between the first and the second close was called someone else was using the NSS socket stucture and the fd element was released. Patch it attached. Thanks, Kevin
Kevin, thanks for digging this! Really nice catch! Could you please test it once again with the NSS distributed along with Fedora? Has it solved all problems from the original report? Have the invalid writes from bug 534115 also disappeared?
Applied upstream, the builds for Fedora will follow.
Note the bug itself is not specific to multi-threaded applications. The double close occurs in a single thread as well. Only the impact is worse in case of multi-threading probably.
curl-7.19.6-2.fc10 has been submitted as an update for Fedora 10. http://admin.fedoraproject.org/updates/curl-7.19.6-2.fc10
curl-7.19.7-1.fc12 has been submitted as an update for Fedora 12. http://admin.fedoraproject.org/updates/curl-7.19.7-1.fc12
curl-7.19.7-2.fc11 has been submitted as an update for Fedora 11. http://admin.fedoraproject.org/updates/curl-7.19.7-2.fc11
curl-7.19.7-2.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update curl'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-11454
curl-7.19.6-2.fc10 has been pushed to the Fedora 10 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update curl'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F10/FEDORA-2009-11457
curl-7.19.7-1.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update curl'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2009-11462
Looks good. Was not able to test last night due to network issues, but it seems to be working.
curl-7.19.7-3.fc11 has been pushed to the Fedora 11 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update curl'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F11/FEDORA-2009-12245
curl-7.19.7-2.fc12 has been pushed to the Fedora 12 testing repository. If problems still persist, please make note of it in this bug report. If you want to test the update, you can install it with su -c 'yum --enablerepo=updates-testing update curl'. You can provide feedback for this update here: http://admin.fedoraproject.org/updates/F12/FEDORA-2009-12235
curl-7.19.6-2.fc10 has been pushed to the Fedora 10 stable repository. If problems still persist, please make note of it in this bug report.
curl-7.19.7-3.fc11 has been pushed to the Fedora 11 stable repository. If problems still persist, please make note of it in this bug report.
curl-7.19.7-2.fc12 has been pushed to the Fedora 12 stable repository. If problems still persist, please make note of it in this bug report.