Bug 2188152
Summary: | Thunderbird 102.7.1+ breaks AD LDAP address book | ||
---|---|---|---|
Product: | Red Hat Enterprise Linux 8 | Reporter: | Orion Poplawski <orion> |
Component: | thunderbird | Assignee: | Jan Horak <jhorak> |
Status: | CLOSED MIGRATED | QA Contact: | Desktop QE <desktop-qa-list> |
Severity: | medium | Docs Contact: | |
Priority: | unspecified | ||
Version: | 8.7 | CC: | abobrov, rrelyea, tpopela |
Target Milestone: | rc | Keywords: | MigratedToJIRA |
Target Release: | --- | ||
Hardware: | Unspecified | ||
OS: | Unspecified | ||
Whiteboard: | |||
Fixed In Version: | Doc Type: | If docs needed, set a value | |
Doc Text: | Story Points: | --- | |
Clone Of: | Environment: | ||
Last Closed: | 2023-09-15 19:54:38 UTC | Type: | Bug |
Regression: | --- | Mount Type: | --- |
Documentation: | --- | CRM: | |
Verified Versions: | Category: | --- | |
oVirt Team: | --- | RHEL 7.3 requirements from Atomic Host: | |
Cloudforms Team: | --- | Target Upstream Version: | |
Embargoed: |
Description
Orion Poplawski
2023-04-19 22:13:56 UTC
Thanks for the report! Please set the mailnews.ldap.loglevel to 'Debug' in Config Editor and open the browser console (thundebird -jsconsole or Ctrl-Shift-J in the app) and look for the mailnews.ldap messages while trying to search the address. Please attach if you find something suspicious. Also make sure the 'Directory Server' in the settings points to the right ldap server (I had 'None' chosen from dropdown menu for some reason during trying to reproduce). Also double check that in the account settings in the composition & addressing there's correct settings in the Addressing section. You could also check the query in the Address book explicitly picking the ldap server from the list of address books in the left pane. It's weird - nothing seems amiss, but it just hangs with "Searching...". mailnews.ldap: Connecting to ldaps://SERVER:636 LDAPClient.jsm:45:18 mailnews.ldap: Connected LDAPClient.jsm:153:18 mailnews.ldap: Binding USER@DOMAIN LDAPClient.jsm:66:18 mailnews.ldap: C: [1] BindRequest LDAPClient.jsm:267:18 mailnews.ldap: S: [1] BindResponse resultCode=0 message="" LDAPClient.jsm:200:20 mailnews.ldap: Searching dn="BASEDN" filter="FILTER" LDAPClient.jsm:133:18 mailnews.ldap: C: [2] SearchRequest LDAPClient.jsm:267:18 If I run ldapsearch against the server with the same binddn, basedn, and filter it responds with the requested entries. If I disable SSL the search works fine - but we can't be having passwords sent in clear text on the network. With this query we get some more info: mailnews.ldap: C: [2] SearchRequest LDAPClient.jsm:267:18 mailnews.ldap: S: [2] SearchResultEntry <empty string> LDAPClient.jsm:200:20 mailnews.ldap: S: [2] SearchResultDone resultCode=0 message="" LDAPClient.jsm:200:20 mailnews.ldap: C: [3] UnbindRequest LDAPClient.jsm:267:18 mailnews.ldap: error { target: TCPSocket, isTrusted: true, name: "NetworkError", message: "Network", errorCode: 2152398868, srcElement: TCPSocket, currentTarget: TCPSocket, eventPhase: 2, bubbles: false, cancelable: false, … } _onError resource:///modules/LDAPClient.jsm:252:18 Hm, so ldaps, looks like some problem with the secure connection. It works for me for our server, but you most likely have different setup. I found how to get the ssl info about the server from: https://rakhesh.com/windows/how-to-check-ldaps-certificate-and-tls-version/ so please could you check the output: openssl s_client -connect LDAP_SERVER:636 There might be some algorithms disabled in the system nss which certificates from chain use. Do you by any chance run in the FIPS mode or custom nss config (it should be in /etc/crypto-policies/back-ends/nss.config, please attach if possible)? openssl connects fine: Requested Signature Algorithms: RSA+SHA256:RSA+SHA384:RSA+SHA1:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA1:DSA+SHA1:RSA+SHA512:ECDSA+SHA512 Shared Requested Signature Algorithms: RSA+SHA256:RSA+SHA384:RSA+SHA1:ECDSA+SHA256:ECDSA+SHA384:ECDSA+SHA1:RSA+SHA512:ECDSA+SHA512 Peer signing digest: SHA256 Peer signature type: RSA Server Temp Key: X25519, 253 bits --- SSL handshake has read 2372 bytes and written 427 bytes Verification: OK --- New, TLSv1.2, Cipher is ECDHE-RSA-AES256-GCM-SHA384 Server public key is 2048 bit Secure Renegotiation IS supported Compression: NONE Expansion: NONE No ALPN negotiated SSL-Session: Protocol : TLSv1.2 Cipher : ECDHE-RSA-AES256-GCM-SHA384 Session-ID: 4D1F00007AF3426618E45064EA89B1CA48992984E70BF482B55FFDD8D0C9F183 Session-ID-ctx: Master-Key: E9452B7857EC3A40AD4544E4876F3663D16A19DCF391A4B8683E82C8B75EF5EA26AD5106F64A96C75AC59E6BB00FD541 PSK identity: None PSK identity hint: None SRP username: None Start Time: 1683929543 Timeout : 7200 (sec) Verify return code: 0 (ok) Extended master secret: yes $ cat /etc/crypto-policies/back-ends/nss.config library= name=Policy NSS=flags=policyOnly,moduleDB config="disallow=ALL allow=HMAC-SHA256:HMAC-SHA1:HMAC-SHA384:HMAC-SHA512:HMAC-MD5:CURVE25519:SECP256R1:SECP384R1:SECP521R1:aes256-gcm:chacha20-poly1305:aes256-cbc:aes128-gcm:aes128-cbc:rc4:SHA256:SHA384:SHA512:SHA224:SHA1:ECDHE-RSA:ECDHE-ECDSA:RSA:DHE-RSA:ECDSA:RSA-PSS:RSA-PKCS:tls-version-min=tls1.0:dtls-version-min=dtls1.2:DH-MIN=2048:DSA-MIN=2048:RSA-MIN=2048" name=p11-kit-proxy library=p11-kit-proxy.so I am running a non-default crypto policy: # update-crypto-policies --show WIN2003 @Bob Do you see any reason why the ssl connection fails while unencrypted does not (talking about nss)? There are lots of reasons ssl connections could fail and unencrypted does not. Connections could be failing because of crypto policy. Connections could be failing because cert chain processing, etc. I'd have to see the error code NSS is returning to give you a better idea. On thing you can try is setting the crypto-policy to LEGACY and see if it still fails. That could eliminate policy as a reason. bob update-crypto-policies --set LEGACY did not appear to have any effect. according to the output in the Comment #2 the TLS session gets established just fine because there is outgoing LDAP BIND operation request and subsequent successful LDAP result for that BIND op. It is the LDAP SEARCH op that appears to be sent but no response arrives from AD for some reason. So the problem is either on AD side processing the request (unlikely since allegedly the same request over PLAIN connection works fine) or it is some sort of I/O problem somewhere (could be due to some TLS protocol or cipher incompatibilities and such). It would be helpful to check the event log on AD side to see whether or not it receives/decodes/recognizes that SEARCH request at all and if there is nothing there then running via ssltap might shed some light on this eg by running this on Thunderbird machine ssltap -hsx -p 1636 ad-hostname:636 and then configuring Thunderbird abook to point to 'localhost:1636' and trying the test case again to see where its at when it hangs, TLS session wise. So this is very strange. If I connect through ssltap, the directory search in TB works, but ssltap exits immediately after with: --> [ 0: 17 03 03 00 1f 00 00 00 00 00 00 00 03 71 12 73 | .............q.s 10: f8 1f dd 53 0c cd 75 2d 46 a5 a0 54 d8 7a 8d 37 | ...S..u-F..T.z.7 20: 65 6d 77 0d | emw. (36 bytes of 31) SSLRecord { [Fri Jun 2 12:27:10 2023] 0: 17 03 03 00 1f | ..... type = 23 (application_data) version = { 3,3 } length = 31 (0x1f) < encrypted > } ] ssltap: Error -5961: TCP connection reset by peer: error on server-side socket. Connection 1 Complete [Fri Jun 2 12:27:10 2023] I turned on some detailed ldap logging on the AD server and when I connect though ssltap I see the LDAP search get logged. But when I have TB connect directly no search is logged. @Orion Poplawski very strange indeed. The ssltap has the '-l' option to operate continuously in a loop so it might be worth trying to see if the above error is related to a single connection closure once the first LDAP lookup is done. I'm not sure that ssltap is going to gives us useful information at this point since TB "works" going through it. I think we need to know more details about what TB is trying to do when ssltap is not involved. @Orion Poplawski if both TB and ssltap use the same NSS and NSPR libs all should be equal ie all ssltap really does is it acts as SSL/TLS proxy so there should be no difference but clearly there is. ATM i'm not sure what it can be attributed to, it could be things like record and I/O sizes, so seeing if more than one LDAP search session can be sustained via ssltap in continuous mode or if it gets stuck eventually as well might help here. Other than that, since we don't have an in-house reproducer at the minute the only thing i can think of to investigate this further is running TB with a drop-in debug build of NSS+NSPR which can produce some diagnostics similar to ssltap plus give some extra state info from respective libraries. I rebuilt the nss package with DEBUG and TRACE enabled. I believe this is the relevant output with SSLDEBUG=1 SSLTRACE=3: 184623: SSL[181151872]: connect failed, errno=-5934 184623: SSL[181151872]: sending client-hello 184623: SSL3[181151872]: send initial ClientHello handshake 184623: TLS13[181151872]: send client key share xtn 184623: TLS13[181151872]: client send supported_versions extension 184623: TLS13[181151872]: send psk key exchange modes extension 184623: TLS13[181151872]: send post_handshake_auth extension 184623: SSL3[181151872] SendRecord type: handshake (22) nIn=512 184623: TLS13[181151872]: spec=361079232 epoch=0 (cleartext) protect 0x0 len=512 184623: SSL3[181151872]: recv error -5998 184623: SSL[181151872]: SecureSend: sending 55 bytes 184623: SSL3[181151872]: recv error -5998 184623: SSL[181151872]: SecureSend: returning -1 count, error -5998 184623: SSL[181151872]: SecureSend: sending 55 bytes 184623: SSL3[181151872]: handle server_hello handshake 184623: SSL[181151872]: Negotiated extended master secret extension. 184623: SSL3[181151872]: Set XXX Pending Cipher Suite to 0xc030 184623: SSL3[181151872]: handle certificate handshake 184623: SSL3[181151872]: handle server_key_exchange handshake 184623: SSL3[181151872]: handle certificate_request handshake 184623: SSL3[181151872]: handle server_hello_done handshake 184623: SSL3[7f820acc2880]: deferring ssl3_SendClientSecondRound because certificate authentication is still pending. 184623: SSL[181151872]: SecureSend: returning -1 count, error - seems to hang there. When I close TB it seems to continue: 5998 184623: SSL3[181151872]: send certificate handshake 184623: SSL3[181151872]: send client_key_exchange handshake 184623: SSL3[181151872]: DONE sending client_key_exchange 184623: SSL3[181151872]: send certificate_verify handshake 184623: SSL3[181151872]: send change_cipher_spec record 184623: SSL3[181151872] SendRecord type: handshake (22) nIn=2973 184623: SSL3[181151872] SendRecord type: change_cipher_spec (20) nIn=1 184623: SSL3[181151872] Set Current Write Cipher Suite to Pending 184623: SSL[181151872]: false start callback returned TRUE 184623: SSL3[181151872]: send finished handshake 184623: SSL3[181151872] SendRecord type: handshake (22) nIn=16 184623: SSL[181151872]: SecureSend: sending 55 bytes 184623: SSL3[181151872] SendRecord type: application_data (23) nIn=55 184623: SSL[181151872]: SecureSend: returning 55 count 184623: SSL3[181151872]: handle change_cipher_spec record 184623: SSL3[181151872] Set Current Read Cipher Suite to Pending 184623: SSL3[181151872]: handle finished handshake 184623: SSL[181151872]: handshake is completed 184623: SSL3[181151872]: recv error -5998 184623: SSL[181151872]: recving -1 bytes securely (errno=-5998) 184623: SSL[181151872]: recving 22 bytes securely (errno=-5998) 184623: SSL3[181151872]: recv error -5998 184623: SSL[181151872]: recving -1 bytes securely (errno=-5998) 184623: SSL[181151872]: SecureSend: sending 345 bytes 184623: SSL3[181151872] SendRecord type: application_data (23) nIn=345 184623: SSL[181151872]: SecureSend: returning 345 count so this is really strange: 184623: SSL[181151872]: SecureSend: returning -1 count, error -5998 the 5998 is EWOULDBLOCK on non blocking I/O socket. if you increase SSLTRACE to 30 it should show polling on that socket which i'm guessing never gets to ready state in this case when it hangs. i have tried reproducing this locally with Windows Server 2019 AD and TB 102.7.1 to no avail, it just works as it should. i wonder what specific differences could play a role here. is there anything on your network between TB and AD ? like VPN, Proxy, Firewall ie anything that can actively affect network traffic ? is IPv6 enabled ? also do you have any user certificates (plus their certificate chains) setup in TB PSM ? (In reply to Anton Bobrov from comment #15) > also do you have any user certificates (plus their certificate chains) setup > in TB PSM ? Oh, well done! I have two user certificates (one expired) issued by Entrust used for S/MIME encrypted email. Removing those allows the LDAP address book to work. So should this be closed as a NOTABUG or some improvements in TB could be done? I think that reporting the possible improvements to upstream might be enough. @Tomas Popela to address this properly we need to try to reproduce it with some client certs (cert chains) as i'm not quite sure of the root cause here as the certs themselves should not be a problem, rather they seem to affect the send buffer dynamics somehow (which was one of my suspicions as to the issue origins). it is also not clear whether or not this is a real regression or the later TB versions simply present more favorable conditions for this issue to surface. it is also unclear how the AD SSL/TLS implementation plays into all this (it could be AD bug too) so for me the bottom line is i cant draw any solid conclusions from this without a debug-able reproducer. Today I had a chance to try reproducing this with some test certs in PSM and I have been actually able to reproduce the problem with 102.13.0, although it seems I can make a problem go away (intermittently) if I force TB AB to close LDAP connection and open a new one then it works for a single LDAP search then its back to no results until the connection is reset again, rinse and repeat. So I have tried to reproduce the same with the latest trunk build so I can actually debug it as well as get more detailed and useful logging out of it (socket layer NSPR NSS) and I could not reproduce it at all. There been a few changes in the related code tho I could not attribute this behavior to any single change I see in the Hg log and without a reproducer that I can debug it is kinda difficult to guess what exactly could have gone wrong with earlier revisions. Another thing I have tried is running with a different versions of NSS, so instead using dist version that comes with the latest TB trunk I have switched TB trunk build to my system's installed NSS, my reasoning here was to use the same NSS as with 102.13.0 I could reproduce the problem with (in case its some sort of NSS/SSL problem) but no luck, even with system NSS the trunk build does not run into this problem. With the limited debug facilities I have for the standard 102.13.0 build (plus max LDAP event logging on AD side) all I can see is that when the problem occurs TB sends LDAP simple bind and receives success response from AD then TB claims (in the console) to send a search but that search never arrives at AD side and if left alone AD eventually kills the connection as idle. This leads me to suspect that perhaps some of the code related to async processing might be responsible here but it is just a guess or at least a good place to start looking at. Long story short without being able to reproduce the problem with the latest trunk build I'm not sure it is worth the effort to investigate this further (this would require making 102.13.0 debug build plus debugging effort on it to root cause) or even report it upstream (as they are unlikely to be able to reproduce it either) so I'd say if this is still a problem for the reporter or anyone else the next thing to try would be the latest nightly TB build, eg https://ftp.mozilla.org/pub/thunderbird/nightly/2023/07/2023-07-27-03-56-57-comm-central/ or later, to see if the problem can still be reproduced or not. Issue migration from Bugzilla to Jira is in process at this time. This will be the last message in Jira copied from the Bugzilla bug. This BZ has been automatically migrated to the issues.redhat.com Red Hat Issue Tracker. All future work related to this report will be managed there. Due to differences in account names between systems, some fields were not replicated. Be sure to add yourself to Jira issue's "Watchers" field to continue receiving updates and add others to the "Need Info From" field to continue requesting information. To find the migrated issue, look in the "Links" section for a direct link to the new issue location. The issue key will have an icon of 2 footprints next to it, and begin with "RHEL-" followed by an integer. You can also find this issue by visiting https://issues.redhat.com/issues/?jql= and searching the "Bugzilla Bug" field for this BZ's number, e.g. a search like: "Bugzilla Bug" = 1234567 In the event you have trouble locating or viewing this issue, you can file an issue by sending mail to rh-issues. You can also visit https://access.redhat.com/articles/7032570 for general account information. |