Bug 1766451

Summary: Intermittent null pointer exception in JSS during FreeIPA deployments
Product: [Fedora] Fedora Reporter: Adam Williamson <awilliam>
Component: jssAssignee: Endi Sukma Dewata <edewata>
Status: CLOSED CURRENTRELEASE QA Contact: Fedora Extras Quality Assurance <extras-qa>
Severity: high Docs Contact:
Priority: unspecified    
Version: 30CC: ascheel, cfu, dmoluguw, edewata, elio.maldonado.batiz, kwright, mharmsen, nkinder, patdung100, rcritten, tomek
Target Milestone: ---   
Target Release: ---   
Hardware: x86_64   
OS: Linux   
Whiteboard:
Fixed In Version: Doc Type: If docs needed, set a value
Doc Text:
Story Points: ---
Clone Of: Environment:
Last Closed: 2020-04-30 20:27:39 UTC Type: Bug
Regression: --- Mount Type: ---
Documentation: --- CRM:
Verified Versions: Category: ---
oVirt Team: --- RHEL 7.3 requirements from Atomic Host:
Cloudforms Team: --- Target Upstream Version:
Attachments:
Description Flags
/var/log tarball from one of the failures none

Description Adam Williamson 2019-10-29 05:46:05 UTC
So, I'm still feeling this bug out. But, in the last couple of days, several FreeIPA-related tests have failed in openQA in similar ways. https://openqa.fedoraproject.org/tests/477587 is one example. https://openqa.fedoraproject.org/tests/476911 is another. https://openqa.fedoraproject.org/tests/476877 is another. In all those cases, we get an error during FreeIPA server deployment when we try to connect to dogtag running on port 8443. In all those cases, looking at the journal (you can find a tarball of /var/log on the "Logs & Assets" tab for each test), we see a java.Lang.NullPointerException. One doesn't provide a traceback for some reason, but the other two provide one like this:

Oct 26 12:31:42 ipa001.domain.local server[6457]: SEVERE: Error running socket processor
Oct 26 12:31:42 ipa001.domain.local server[6457]: java.lang.Error: Error during hash calculation
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeHash.getFinishedHash(HandshakeHash.java:272)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeMessage$Finished.getFinished(HandshakeMessage.java:1951)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeMessage$Finished.verify(HandshakeMessage.java:1920)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.ServerHandshaker.clientFinished(ServerHandshaker.java:1788)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:317)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.Handshaker.process_record(Handshaker.java:965)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1015)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:896)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.SecureNioChannel.handshakeUnwrap(SecureNioChannel.java:499)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.SecureNioChannel.handshake(SecureNioChannel.java:238)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.lang.Thread.run(Thread.java:748)
Oct 26 12:31:42 ipa001.domain.local server[6457]: Caused by: java.lang.NullPointerException
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.util.NativeProxy.equals(NativeProxy.java:64)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.HashMap.putVal(HashMap.java:635)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.HashMap.put(HashMap.java:612)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.HashSet.add(HashSet.java:220)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.util.NativeProxy.<init>(NativeProxy.java:43)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.CipherContextProxy.<init>(CipherContextProxy.java:11)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.PK11MessageDigest.initDigest(Native Method)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.PK11MessageDigest.reset(PK11MessageDigest.java:96)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.PK11MessageDigest.digest(PK11MessageDigest.java:87)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.crypto.JSSMessageDigest.digest(JSSMessageDigest.java:65)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.provider.java.security.JSSMessageDigestSpi.engineDigest(JSSMessageDigestSpi.java:56)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:592)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.security.MessageDigest.digest(MessageDigest.java:365)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeHash.getFinishedHash(HandshakeHash.java:270)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         ... 18 more

I also note that a JSS update recently went stable across F29, F30 and F31:

https://bodhi.fedoraproject.org/updates/FEDORA-2019-4d33c62860 (F29)
https://bodhi.fedoraproject.org/updates/FEDORA-2019-68c2fbcf82 (F30)
https://bodhi.fedoraproject.org/updates/FEDORA-2019-24a0a2f24e (F31)

at the point it went stable it would start being used in these openQA tests, and that's approximately when these errors started showing up. So between that and the backtrace...jss 4.6.2 is my suspect for now. This bug doesn't seem to happen on *every* attempt; on some updates the tests are passing, and when I re-ran all the failures I saw today, most of them passed at the second attempt.

The tests being run here are deploying a FreeIPA server in a pretty standard way, please ask if more details are needed.

Comment 1 Adam Williamson 2019-10-29 05:47:40 UTC
Created attachment 1629987 [details]
/var/log tarball from one of the failures

Comment 2 Alex Scheel 2019-10-29 13:43:40 UTC
Assigned; patch here -- will get it into a Fedora update for people to test today.

https://github.com/dogtagpki/jss/pull/299

Comment 3 Alex Scheel 2019-10-29 16:23:34 UTC
Built in Rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=38646690
Adam did a scratch build for F31 and is running tests here: https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=31&build=Kojitask-38647088-NOREPORT&groupid=2

Once that is done and I get the thumbs up, I'll backport to F31...F29. 

Thanks!

Comment 4 Adam Williamson 2019-10-29 21:01:57 UTC
I ran through the FreeIPA tests four times and they all passed, so that looks good. Please send out updates for F29..F31. Thanks!

Note, POST for Fedora bugs means "fix is available upstream (as a PR or a commit) but has not yet reached an official Fedora package", i.e. it's the appropriate status for this bug.

Comment 5 Fedora Update System 2019-10-30 14:31:15 UTC
FEDORA-2019-2613ca7581 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2613ca7581

Comment 6 Fedora Update System 2019-10-30 14:31:16 UTC
FEDORA-2019-4129cdf50b has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-4129cdf50b

Comment 7 Fedora Update System 2019-10-30 14:31:20 UTC
FEDORA-2019-7685c49b92 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-7685c49b92

Comment 8 Fedora Update System 2019-10-31 01:52:43 UTC
jss-4.6.2-2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 9 Patrick Dung 2019-11-03 12:17:15 UTC
I had two IPA servers that are in pairs.
After installing the jss from updates-testing channel.
One of them complete the ipa-server-upgrade process but the tomcat/pki/ca is not function.
Another server cannot complete the ipa-server-upgrade process.

I had capture some logs and posted in the forum:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org/thread/4HJSWIJQQYZQXUCEDUNZGPEKMOE7Z3MD/

Thanks.

Comment 10 Patrick Dung 2019-11-03 12:17:51 UTC
For #9, I am using FC31.

Comment 11 Fedora Update System 2019-11-05 03:56:58 UTC
jss-4.6.2-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2019-11-05 04:15:09 UTC
jss-4.6.2-2.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.

Comment 13 Ben Cotton 2020-04-30 20:23:26 UTC
This message is a reminder that Fedora 30 is nearing its end of life.
Fedora will stop maintaining and issuing updates for Fedora 30 on 2020-05-26.
It is Fedora's policy to close all bug reports from releases that are no longer
maintained. At that time this bug will be closed as EOL if it remains open with a
Fedora 'version' of '30'.

Package Maintainer: If you wish for this bug to remain open because you
plan to fix it in a currently maintained version, simply change the 'version' 
to a later Fedora version.

Thank you for reporting this issue and we are sorry that we were not 
able to fix it before Fedora 30 is end of life. If you would still like 
to see this bug fixed and are able to reproduce it against a later version 
of Fedora, you are encouraged  change the 'version' to a later Fedora 
version prior this bug is closed as described in the policy above.

Although we aim to fix as many bugs as possible during every release's 
lifetime, sometimes those efforts are overtaken by events. Often a 
more recent Fedora release includes newer upstream software that fixes 
bugs or makes them obsolete.