Bug 1766451 - Intermittent null pointer exception in JSS during FreeIPA deployments
Summary: Intermittent null pointer exception in JSS during FreeIPA deployments
Keywords:
Status: MODIFIED
Alias: None
Product: Fedora
Classification: Fedora
Component: jss
Version: 30
Hardware: x86_64
OS: Linux
unspecified
high
Target Milestone: ---
Assignee: Endi Sukma Dewata
QA Contact: Fedora Extras Quality Assurance
URL:
Whiteboard:
Depends On:
Blocks:
TreeView+ depends on / blocked
 
Reported: 2019-10-29 05:46 UTC by Adam Williamson
Modified: 2020-03-03 18:17 UTC (History)
11 users (show)

Fixed In Version:
Doc Type: If docs needed, set a value
Doc Text:
Clone Of:
Environment:
Last Closed:
Type: Bug


Attachments (Terms of Use)
/var/log tarball from one of the failures (1.65 MB, application/octet-stream)
2019-10-29 05:47 UTC, Adam Williamson
no flags Details

Description Adam Williamson 2019-10-29 05:46:05 UTC
So, I'm still feeling this bug out. But, in the last couple of days, several FreeIPA-related tests have failed in openQA in similar ways. https://openqa.fedoraproject.org/tests/477587 is one example. https://openqa.fedoraproject.org/tests/476911 is another. https://openqa.fedoraproject.org/tests/476877 is another. In all those cases, we get an error during FreeIPA server deployment when we try to connect to dogtag running on port 8443. In all those cases, looking at the journal (you can find a tarball of /var/log on the "Logs & Assets" tab for each test), we see a java.Lang.NullPointerException. One doesn't provide a traceback for some reason, but the other two provide one like this:

Oct 26 12:31:42 ipa001.domain.local server[6457]: SEVERE: Error running socket processor
Oct 26 12:31:42 ipa001.domain.local server[6457]: java.lang.Error: Error during hash calculation
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeHash.getFinishedHash(HandshakeHash.java:272)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeMessage$Finished.getFinished(HandshakeMessage.java:1951)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeMessage$Finished.verify(HandshakeMessage.java:1920)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.ServerHandshaker.clientFinished(ServerHandshaker.java:1788)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.ServerHandshaker.processMessage(ServerHandshaker.java:317)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.Handshaker.processLoop(Handshaker.java:1037)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.Handshaker.process_record(Handshaker.java:965)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.SSLEngineImpl.readRecord(SSLEngineImpl.java:1015)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.SSLEngineImpl.readNetRecord(SSLEngineImpl.java:896)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.SSLEngineImpl.unwrap(SSLEngineImpl.java:766)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at javax.net.ssl.SSLEngine.unwrap(SSLEngine.java:624)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.SecureNioChannel.handshakeUnwrap(SecureNioChannel.java:499)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.SecureNioChannel.handshake(SecureNioChannel.java:238)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1566)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.lang.Thread.run(Thread.java:748)
Oct 26 12:31:42 ipa001.domain.local server[6457]: Caused by: java.lang.NullPointerException
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.util.NativeProxy.equals(NativeProxy.java:64)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.HashMap.putVal(HashMap.java:635)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.HashMap.put(HashMap.java:612)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.util.HashSet.add(HashSet.java:220)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.util.NativeProxy.<init>(NativeProxy.java:43)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.CipherContextProxy.<init>(CipherContextProxy.java:11)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.PK11MessageDigest.initDigest(Native Method)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.PK11MessageDigest.reset(PK11MessageDigest.java:96)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.pkcs11.PK11MessageDigest.digest(PK11MessageDigest.java:87)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.crypto.JSSMessageDigest.digest(JSSMessageDigest.java:65)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at org.mozilla.jss.provider.java.security.JSSMessageDigestSpi.engineDigest(JSSMessageDigestSpi.java:56)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.security.MessageDigest$Delegate.engineDigest(MessageDigest.java:592)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at java.security.MessageDigest.digest(MessageDigest.java:365)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         at sun.security.ssl.HandshakeHash.getFinishedHash(HandshakeHash.java:270)
Oct 26 12:31:42 ipa001.domain.local server[6457]:         ... 18 more

I also note that a JSS update recently went stable across F29, F30 and F31:

https://bodhi.fedoraproject.org/updates/FEDORA-2019-4d33c62860 (F29)
https://bodhi.fedoraproject.org/updates/FEDORA-2019-68c2fbcf82 (F30)
https://bodhi.fedoraproject.org/updates/FEDORA-2019-24a0a2f24e (F31)

at the point it went stable it would start being used in these openQA tests, and that's approximately when these errors started showing up. So between that and the backtrace...jss 4.6.2 is my suspect for now. This bug doesn't seem to happen on *every* attempt; on some updates the tests are passing, and when I re-ran all the failures I saw today, most of them passed at the second attempt.

The tests being run here are deploying a FreeIPA server in a pretty standard way, please ask if more details are needed.

Comment 1 Adam Williamson 2019-10-29 05:47:40 UTC
Created attachment 1629987 [details]
/var/log tarball from one of the failures

Comment 2 Alex Scheel 2019-10-29 13:43:40 UTC
Assigned; patch here -- will get it into a Fedora update for people to test today.

https://github.com/dogtagpki/jss/pull/299

Comment 3 Alex Scheel 2019-10-29 16:23:34 UTC
Built in Rawhide: https://koji.fedoraproject.org/koji/taskinfo?taskID=38646690
Adam did a scratch build for F31 and is running tests here: https://openqa.stg.fedoraproject.org/tests/overview?distri=fedora&version=31&build=Kojitask-38647088-NOREPORT&groupid=2

Once that is done and I get the thumbs up, I'll backport to F31...F29. 

Thanks!

Comment 4 Adam Williamson 2019-10-29 21:01:57 UTC
I ran through the FreeIPA tests four times and they all passed, so that looks good. Please send out updates for F29..F31. Thanks!

Note, POST for Fedora bugs means "fix is available upstream (as a PR or a commit) but has not yet reached an official Fedora package", i.e. it's the appropriate status for this bug.

Comment 5 Fedora Update System 2019-10-30 14:31:15 UTC
FEDORA-2019-2613ca7581 has been submitted as an update to Fedora 31. https://bodhi.fedoraproject.org/updates/FEDORA-2019-2613ca7581

Comment 6 Fedora Update System 2019-10-30 14:31:16 UTC
FEDORA-2019-4129cdf50b has been submitted as an update to Fedora 30. https://bodhi.fedoraproject.org/updates/FEDORA-2019-4129cdf50b

Comment 7 Fedora Update System 2019-10-30 14:31:20 UTC
FEDORA-2019-7685c49b92 has been submitted as an update to Fedora 29. https://bodhi.fedoraproject.org/updates/FEDORA-2019-7685c49b92

Comment 8 Fedora Update System 2019-10-31 01:52:43 UTC
jss-4.6.2-2.fc29 has been pushed to the Fedora 29 stable repository. If problems still persist, please make note of it in this bug report.

Comment 9 Patrick Dung 2019-11-03 12:17:15 UTC
I had two IPA servers that are in pairs.
After installing the jss from updates-testing channel.
One of them complete the ipa-server-upgrade process but the tomcat/pki/ca is not function.
Another server cannot complete the ipa-server-upgrade process.

I had capture some logs and posted in the forum:
https://lists.fedorahosted.org/archives/list/freeipa-users@lists.fedorahosted.org/thread/4HJSWIJQQYZQXUCEDUNZGPEKMOE7Z3MD/

Thanks.

Comment 10 Patrick Dung 2019-11-03 12:17:51 UTC
For #9, I am using FC31.

Comment 11 Fedora Update System 2019-11-05 03:56:58 UTC
jss-4.6.2-2.fc30 has been pushed to the Fedora 30 stable repository. If problems still persist, please make note of it in this bug report.

Comment 12 Fedora Update System 2019-11-05 04:15:09 UTC
jss-4.6.2-2.fc31 has been pushed to the Fedora 31 stable repository. If problems still persist, please make note of it in this bug report.


Note You need to log in before you can comment on or make changes to this bug.